io.archivesunleashed.app
Extract plain text from web archive using DataFrame and Spark SQL.
DataFrame obtained from RecordLoader
Dataset[Row], where the schema is (crawl date, domain, url, text)