Packages

o

io.archivesunleashed

RecordLoader

object RecordLoader

Loads records from either WARCs or ARCs.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RecordLoader
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def getFiles(dir: Path, fs: FileSystem): String

    Gets all non-empty archive files.

    Gets all non-empty archive files.

    dir

    the path to the directory containing archive files

    fs

    filesystem

    returns

    a String consisting of all non-empty archive files path.

  2. def loadArchives(path: String, sc: SparkContext): RDD[ArchiveRecord]

    Creates an Archive Record RDD from a WARC or ARC file.

    Creates an Archive Record RDD from a WARC or ARC file.

    path

    the path to the WARC(s)

    sc

    the apache spark context

    returns

    an RDD of ArchiveRecords for mapping.