io.archivesunleashed.spark.matchbox
UDF for extracting links from a webpage given the HTML content (using Jsoup).
the src link.
the content from which links are to be extracted.
an optional base URI. Returns a sequence of (source, target, anchortext)
UDF for extracting links from a webpage given the HTML content (using Jsoup).