Construct a Scallop option reader from command line argument string list.
Main application that parse command line arguments and invoke appropriate extractor.
Classifies records using NER and stores results as JSON.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Extracts image details given raw bytes.
Extract most popular images from a Data Frame.
Extract most popular images from an RDD.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Named Entity Recognition applies rules formed in a Named Entity Classifier to identify locations, people or other objects from data.