Construct a Scallop option reader from command line argument string list
Main application that parse command line arguments and invoke appropriate extractor.
Classifies records using NER and stores results as JSON.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Extracts a site link structure using Spark's GraphX utility.
Extract most popular images from an RDD.
UDF for exporting an RDD representing a collection of links to a GEXF file.
UDF for exporting an RDD representing a collection of links to a GraphML file.
UDF for exporting an GraphX object representing a collection of links to a GraphML file.
Extracts a network graph using Spark's GraphX utility.
Extracts a network graph using Spark's GraphX utility.
(Since version 0.16.1) Use ExtractGraphX instead.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Named Entity Recognition applies rules formed in a Named Entity Classifier to identify locations, people or other objects from data.