Construct a Scallop option reader from command line argument string list
Main application that parse command line arguments and invoke appropriate extractor.
Classifies records using NER and stores results as JSON.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Extracts a site link structure using Spark's GraphX utility.
Extracts image details given raw bytes.
Extract most popular images from a Data Frame.
Extract most popular images from an RDD.
UDF for exporting an RDD representing a collection of links to a GEXF file.
UDF for exporting an RDD representing a collection of links to a GEXF file.
UDF for exporting an RDD representing a collection of links to a GraphML file.
UDF for exporting an GraphX object representing a collection of links to a GraphML file.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Named Entity Recognition applies rules formed in a Named Entity Classifier to identify locations, people or other objects from data.