Extract domain frequency from web archive using DataFrame and Spark SQL.
Extract domain frequency from web archive using DataFrame and Spark SQL.
DataFrame obtained from RecordLoader
Dataset[Row], where the schema is (domain, count)
Extract domain frequency from web archive using RDD.
Extract domain frequency from web archive using RDD.
RDD[ArchiveRecord] obtained from RecordLoader
RDD[(String,Int))], which is (domain, count)