Package

io.archivesunleashed

matchbox

Permalink

package matchbox

Package object which supplies implicits providing common UDF-related functionalities.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. matchbox
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class ImageDetails extends AnyRef

    Permalink

    Information about an image.

    Information about an image. e.g. width, height.

  2. implicit class WWWLink extends AnyRef

    Permalink

Value Members

  1. object ComputeImageSize

    Permalink

    Image sizing utilities.

  2. object ComputeMD5

    Permalink

    Compute MD5 checksum.

  3. object DetectLanguage

    Permalink

    Detects language using Apache Tika.

  4. object DetectMimeTypeTika

    Permalink

    Detect MIME type using Apache Tika.

  5. object ExtractBoilerpipeText

    Permalink

    Extract raw text content from an HTML page, minus "boilerplate" content (using boilerpipe).

  6. object ExtractDate

    Permalink

    Gets different parts of a dateString.

  7. object ExtractDomain

    Permalink

    Extracts the host domain name from a full url string.

  8. object ExtractImageDetails

    Permalink

    Extracts image details given raw bytes.

  9. object ExtractImageLinks

    Permalink

    Extracts image links from a webpage given the HTML content (using Jsoup).

  10. object ExtractLinks

    Permalink

    Extracts links from a webpage given the HTML content (using Jsoup).

  11. object ExtractTextFromPDFs

    Permalink

    Exacts texts from PDFs using Apache Tika.

  12. object ExtractUrls

    Permalink

    Extracts Urls found in a string of text.

    Extracts Urls found in a string of text.

    returns

    a list of urls found in the string.

  13. object GetExtensionMime

    Permalink

    Get file extension using MIME type, then URL extension.

  14. object NERClassifier

    Permalink

    Reads in a text string, and returns entities identified by the configured Stanford NER classifier.

  15. object RemoveHTML

    Permalink

    Removes HTML markup with JSoup.

  16. object RemoveHttpHeader

    Permalink

    Remove HTTP headers.

  17. object TupleFormatter

    Permalink

    Tuple formatter utility.

Inherited from AnyRef

Inherited from Any

Ungrouped