Image sizing utilities.
Compute MD5 checksum.
Detects language using Apache Tika.
Detect MIME type using Apache Tika.
Extract Twitter mentions (e.g.
Extract Twitter mentions (e.g. "@username") from a string.
Extract raw text content from an HTML page, minus "boilerplate" content (using boilerpipe).
Gets different parts of a dateString.
Extracts the host domain name from a full url string.
Extract hashtags from tweets.
Extract hashtags from tweets.
a list of #hashtags contained in the string.
Extracts image links from a webpage given the HTML content (using Jsoup).
Extracts links from a webpage given the HTML content (using Jsoup).
Exacts texts from PDFs using Apache Tika.
Extracts Urls found in a string of text.
Extracts Urls found in a string of text.
a list of urls found in the string.
Reads in a text string, and returns entities identified by the configured Stanford NER classifier.
Removes HTML markup with JSoup.
Remove HTTP headers.
Tuple formatter utility.
Package object which supplies implicits providing common UDF-related functionalities.