Uses boilerpipe to extract raw text content from a page.
Uses boilerpipe to extract raw text content from a page.
ExtractBoilerpipeText removes boilerplate text (e.g. a copyright statement) from an HTML string.
an html string possibly containing boilerpipe text
text with boilerplate removed or Nil if the text is empty.
Extracts boilerplate.
Extracts boilerplate.
an html string possibly containing boilerpipe text
filtered text or Nil if the text is empty.
Extract raw text content from an HTML page, minus "boilerplate" content (using boilerpipe).