Package

io.archivesunleashed

app

Permalink

package app

Visibility
  1. Public
  2. All

Type Members

  1. class CmdAppConf extends ScallopConf

    Permalink

    Construct a Scallop option reader from command line argument string list

  2. class CommandLineApp extends AnyRef

    Permalink

    Main application that parse command line arguments and invoke appropriate extractor.

  3. class NERCombinedJson extends Serializable

    Permalink

    Classifies records using NER and stores results as JSON.

Value Members

  1. object CommandLineAppRunner

    Permalink
  2. object DomainFrequencyExtractor

    Permalink
  3. object DomainGraphExtractor

    Permalink
  4. object ExtractEntities

    Permalink

    Performs Named Entity Recognition (NER) on a WARC or ARC file.

    Performs Named Entity Recognition (NER) on a WARC or ARC file.

    Named Entity Recognition applies rules formed in a Named Entity Classifier to identify locations, people or other objects from data.

  5. object ExtractImageDetailsDF

    Permalink

    Extracts image details given raw bytes.

  6. object ExtractPopularImagesDF

    Permalink

    Extract most popular images from a Data Frame.

  7. object ExtractPopularImagesRDD

    Permalink

    Extract most popular images from an RDD.

  8. object ImageGraphExtractor

    Permalink
  9. object PlainTextExtractor

    Permalink
  10. object WebPagesExtractor

    Permalink
  11. object WriteGEXF

    Permalink

    UDF for exporting an RDD or DataFrame representing a collection of links to a GEXF file.

  12. object WriteGraphML

    Permalink

    UDF for exporting an RDD or DataFrame representing a collection of links to a GraphML file.

Ungrouped