Updated Scraper Reference (markdown)

pull/964/head
Thibaut Courouble 11 years ago
parent 3c26efb663
commit 0caa366d75

@ -57,7 +57,7 @@ Configuration is done via class attributes and divided into three main categorie
* `version` [String] **(required)** * `version` [String] **(required)**
The version of the software at the time the scraper was last run. This is only informational and doesn't affect the scraper's behavior. The version of the software at the time the scraper was last run. This is only informational and doesn't affect the scraper's behavior.
* `base_url` [String] **(required)** * `base_url` [String] **(required in `UrlScraper`)**
The documents' location. Only URLs _inside_ the `base_url` will be scraped. "inside" more or less means "starting with" except that `/docs` is outside `/doc` (but `/doc/` is inside). The documents' location. Only URLs _inside_ the `base_url` will be scraped. "inside" more or less means "starting with" except that `/docs` is outside `/doc` (but `/doc/` is inside).
Unless `root_path` is set, the root/initial URL is equal to `base_url`. Unless `root_path` is set, the root/initial URL is equal to `base_url`.
@ -100,6 +100,7 @@ Default `html_filters`:
* [`NormalizeUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_urls.rb) — replaces all URLs with their fully qualified counterpart * [`NormalizeUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_urls.rb) — replaces all URLs with their fully qualified counterpart
* [`InternalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/internal_urls.rb) — detects internal URLs (the ones to scrape) and replaces them with their unqualified, relative counterpart * [`InternalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/internal_urls.rb) — detects internal URLs (the ones to scrape) and replaces them with their unqualified, relative counterpart
* [`NormalizePathsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_paths.rb) — makes the internal paths consistent (e.g. always end with `.html`) * [`NormalizePathsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_paths.rb) — makes the internal paths consistent (e.g. always end with `.html`)
* [`CleanLocalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/clean_local_urls.rb) — remove links, iframes and images pointing to localhost (`FileScraper` only)
Default `text_filters`: Default `text_filters`:

Loading…
Cancel
Save