Package

io.archivesunleashed

matchbox

Permalink

package matchbox

Package object which supplies implicits providing common UDF-related functionalities.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. matchbox
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class ImageDetails extends AnyRef

    Permalink

    Information about an image.

    Information about an image. e.g. width, height.

  2. implicit class WWWLink extends AnyRef

    Permalink

Value Members

  1. object ComputeImageSize

    Permalink

    Image sizing utilities.

  2. object ComputeMD5RDD

    Permalink

    Compute MD5 checksum.

  3. object ComputeSHA1RDD

    Permalink

    Compute SHA1 checksum.

  4. object DetectLanguageRDD

    Permalink

    Detects language using Apache Tika.

  5. object DetectMimeTypeTika

    Permalink

    Detect MIME type using Apache Tika.

  6. object ExtractBoilerpipeTextRDD

    Permalink

    Extract raw text content from an HTML page, minus "boilerplate" content (using boilerpipe).

  7. object ExtractDateRDD

    Permalink

    Gets different parts of a dateString.

  8. object ExtractDomainRDD

    Permalink

    Extracts the host domain name from a full url string.

  9. object ExtractImageDetails

    Permalink

    Extracts image details given raw bytes.

  10. object ExtractImageLinksRDD

    Permalink

    Extracts image links from a webpage given the HTML content (using Jsoup).

  11. object ExtractLinksRDD

    Permalink

    Extracts links from a webpage given the HTML content (using Jsoup).

  12. object ExtractTextFromPDFs

    Permalink

    Exacts texts from PDFs using Apache Tika.

  13. object GetExtensionMimeRDD

    Permalink

    Get file extension using MIME type, then URL extension.

  14. object NERClassifier

    Permalink

    Reads in a text string, and returns entities identified by the configured Stanford NER classifier.

  15. object RemoveHTMLRDD

    Permalink

    Removes HTML markup with JSoup.

  16. object RemoveHTTPHeaderRDD

    Permalink

    Remove HTTP headers.

  17. object TupleFormatter

    Permalink

    Tuple formatter utility.

Inherited from AnyRef

Inherited from Any

Ungrouped