Extract domain graph from web archive using Data Frame and Spark SQL.
Data frame obtained from RecordLoader
Dataset[Row], where the schema is (CrawlDate, SrcDomain, DestDomain, count)
Extract domain graph from web archive using MapReduce.
RDD[ArchiveRecord] obtained from RecordLoader
RDD[(String, String, String), Int], which holds ((CrawlDate, SourceDomain, DestinationDomain), Frequency)