Extract domain graph from web archive using DataFrame and Spark SQL.
Extract domain graph from web archive using DataFrame and Spark SQL.
DataFrame obtained from RecordLoader
Dataset[Row], where the schema is (CrawlDate, SrcDomain, DestDomain, count)
Extract domain graph from web archive using RDD.
Extract domain graph from web archive using RDD.
RDD[ArchiveRecord] obtained from RecordLoader
RDD[(String, String, String), Int], which is ((CrawlDate, SourceDomain, DestinationDomain), Frequency)