Add external_urls filter

This filter traverses all <a> tags and replaces
its url for an url poiting to a path of an existant
documentation.
pull/1495/head
Enoc 4 years ago
parent e9d7849412
commit 38e2b107a2

@ -84,6 +84,7 @@ The `call` method must return either `doc` or `html`, depending on the type of f
* [`AttributionFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/attribution.rb) — appends the license info and link to the original document * [`AttributionFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/attribution.rb) — appends the license info and link to the original document
* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) — prepends the document with a title (disabled by default) * [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) — prepends the document with a title (disabled by default)
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) — abstract filter for extracting the page's metadata * [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) — abstract filter for extracting the page's metadata
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) — replaces external URLs for relative URLs of existant devdocs documentation.
## Custom filters ## Custom filters

@ -115,6 +115,7 @@ Additionally:
* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) is a core HTML filter, disabled by default, which prepends the document with a title (`<h1>`). * [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) is a core HTML filter, disabled by default, which prepends the document with a title (`<h1>`).
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) is an abstract HTML filter that each scraper must implement and responsible for extracting the page's metadata. * [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) is an abstract HTML filter that each scraper must implement and responsible for extracting the page's metadata.
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) is an HTML filter that replaces external URLs found in `<a>` tags to urls pointing to existant devdocs documentation.
### Filter options ### Filter options
@ -185,6 +186,10 @@ More information about how filters work is available on the [Filter Reference](.
_Note: this filter is disabled by default._ _Note: this filter is disabled by default._
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb)
- `:external_urls` [Hash or Proc] If it is a Hash, replaces all URLs found in `<a>` tags for URLs of existant devdocs documentation. If it is a Proc, it is called with an URL (string) as argument and should return a relative URL pointing to an existant devdocs documentation. See [`backbone.rb`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/scrapers/backbone.rb)
## Keeping scrapers up-to-date ## Keeping scrapers up-to-date
In order to keep scrapers up-to-date the `get_latest_version(opts)` method should be overridden. If `self.release` is defined, this should return the latest version of the documentation. If `self.release` is not defined, it should return the Epoch time when the documentation was last modified. If the documentation will never change, simply return `1.0.0`. The result of this method is periodically reported in a "Documentation versions report" issue which helps maintainers keep track of outdated documentations. In order to keep scrapers up-to-date the `get_latest_version(opts)` method should be overridden. If `self.release` is defined, this should return the latest version of the documentation. If `self.release` is not defined, it should return the Epoch time when the documentation was last modified. If the documentation will never change, simply return `1.0.0`. The result of this method is periodically reported in a "Documentation versions report" issue which helps maintainers keep track of outdated documentations.

@ -96,5 +96,15 @@ module Docs
path = path.gsub %r{\+}, '_plus_' path = path.gsub %r{\+}, '_plus_'
path path
end end
def path_to_root
if subpath == ''
return '../'
else
previous_dirs = subpath.scan(/\//)
return '../' * previous_dirs.length
end
end
end end
end end

@ -41,7 +41,7 @@ module Docs
self.html_filters = FilterStack.new self.html_filters = FilterStack.new
self.text_filters = FilterStack.new self.text_filters = FilterStack.new
html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email' html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email', 'external_urls'
text_filters.push 'images' # ensure the images filter runs after all html filters text_filters.push 'images' # ensure the images filter runs after all html filters
text_filters.push 'inner_html', 'clean_text', 'attribution' text_filters.push 'inner_html', 'clean_text', 'attribution'

@ -0,0 +1,38 @@
# frozen_string_literal: true
module Docs
class ExternalUrlsFilter < Filter
def call
if context[:external_urls]
root = path_to_root
css('a').each do |node|
next unless anchorUrl = node['href']
# avoid links already converted to internal links
next if anchorUrl.match?(/\.\./)
if context[:external_urls].is_a?(Proc)
node['href'] = context[:external_urls].call(anchorUrl)
next
end
url = URI(anchorUrl)
context[:external_urls].each do |host, name|
if url.host.to_s.match?(host)
node['href'] = root + name + url.path.to_s + '#' + url.fragment.to_s
end
end
end
end
doc
end
end
end

@ -21,6 +21,10 @@ module Docs
Licensed under the MIT License. Licensed under the MIT License.
HTML HTML
options[:external_urls] = {
'underscorejs.org' => 'underscore'
}
def get_latest_version(opts) def get_latest_version(opts)
doc = fetch_doc('https://backbonejs.org/', opts) doc = fetch_doc('https://backbonejs.org/', opts)
doc.at_css('.version').content[1...-1] doc.at_css('.version').content[1...-1]

Loading…
Cancel
Save