Creates an Archive Record RDD from a WARC or ARC file.
Creates an Archive Record RDD from a WARC or ARC file.
the path to the WARC(s)
the apache spark context
an RDD of ArchiveRecords for mapping.
Creates an Archive Record RDD from tweets.
Creates an Archive Record RDD from tweets.
the path to the Tweets file
the apache spark context
an RDD of JValue (json objects) for mapping.
Loads records from either WARCs, ARCs or Twitter API data (JSON). *