From 0caa366d755785fc2a3c6241eaf846e824baf02e Mon Sep 17 00:00:00 2001 From: Thibaut Courouble Date: Sun, 17 Nov 2013 01:10:56 -0800 Subject: [PATCH] Updated Scraper Reference (markdown) --- Scraper-Reference.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Scraper-Reference.md b/Scraper-Reference.md index d08f7e52..d9622e3a 100644 --- a/Scraper-Reference.md +++ b/Scraper-Reference.md @@ -57,7 +57,7 @@ Configuration is done via class attributes and divided into three main categorie * `version` [String] **(required)** The version of the software at the time the scraper was last run. This is only informational and doesn't affect the scraper's behavior. -* `base_url` [String] **(required)** +* `base_url` [String] **(required in `UrlScraper`)** The documents' location. Only URLs _inside_ the `base_url` will be scraped. "inside" more or less means "starting with" except that `/docs` is outside `/doc` (but `/doc/` is inside). Unless `root_path` is set, the root/initial URL is equal to `base_url`. @@ -100,6 +100,7 @@ Default `html_filters`: * [`NormalizeUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_urls.rb) — replaces all URLs with their fully qualified counterpart * [`InternalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/internal_urls.rb) — detects internal URLs (the ones to scrape) and replaces them with their unqualified, relative counterpart * [`NormalizePathsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_paths.rb) — makes the internal paths consistent (e.g. always end with `.html`) +* [`CleanLocalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/clean_local_urls.rb) — remove links, iframes and images pointing to localhost (`FileScraper` only) Default `text_filters`: