Classifies records using NER and stores results as JSON.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Extracts a network graph using Spark's GraphX utility.
Extract most popular images from an RDD.
UDF for exporting an RDD representing a collection of links to a GEXF file.
UDF for exporting an RDD representing a collection of links to a GraphML file.
Performs Named Entity Recognition (NER) on a WARC or ARC file.
Named Entity Recognition applies rules formed in a Named Entity Classifier to identify locations, people or other objects from data.