Scallop option reader constructed with class CmdAppConf
Prepare for invoking Data Frame implementation of extractors.
Prepare for invoking Data Frame implementation of extractors.
Any
Choose either Data Frame implementation or RDD implementation of extractors depending on the option specified in command line arguments.
Choose either Data Frame implementation or RDD implementation of extractors depending on the option specified in command line arguments.
Any
Prepare for invoking RDD implementation of extractors.
Prepare for invoking RDD implementation of extractors.
Any
Generic routine for saving RDD obtained from Map Reduce operation of extractors.
Generic routine for saving RDD obtained from Map Reduce operation of extractors.
template class name for RDD. Not used.
RDD obtained by applying RDD extractors to original RDD
Unit
Generic routine for saving Dataset obtained from querying Data Frames to file.
Generic routine for saving Dataset obtained from querying Data Frames to file. Files may be merged according to options specified in 'partition' setting.
generic dataset obtained from querying Data Frame
Unit
Set Spark context to be used.
Set Spark context to be used.
either a brand new or existing Spark context
Verify the validity of command line arguments regarding input and output files.
Verify the validity of command line arguments regarding input and output files.
All input files need to exist, and ouput files should not exist, for this to pass. Throws exception if condition is not met.
Unit
IllegalArgumentException
exception thrown
Main application that parse command line arguments and invoke appropriate extractor.