o

io.archivesunleashed.app

PresentationProgramInformationExtractor

object PresentationProgramInformationExtractor

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PresentationProgramInformationExtractor
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def apply(d: DataFrame): Dataset[Row]

    Extract information about presentation program files from web archive using DataFrame and Spark SQL.

    Extract information about presentation program files from web archive using DataFrame and Spark SQL.

    d

    DataFrame obtained from RecordLoader

    returns

    Dataset[Row], where the schema is (crawl date, url, mime_type_web_server, mime_type_tika, language, content)