public final class WarcRecordUtils extends Object implements org.archive.format.warc.WARCConstants
WARCRecord
s (from archive.org APIs).COLON_SPACE, COMPRESSED_WARC_FILE_EXTENSION, CONTENT_DESCRIPTION, CONTENT_LENGTH, CONTENT_TYPE, DEFAULT_ENCODING, DEFAULT_MAX_WARC_FILE_SIZE, DOT_COMPRESSED_WARC_FILE_EXTENSION, DOT_WARC_FILE_EXTENSION, FTP_CONTROL_CONVERSATION_MIMETYPE, HEADER_FIELD_SEPARATOR, HEADER_KEY_BLOCK_DIGEST, HEADER_KEY_CONCURRENT_TO, HEADER_KEY_DATE, HEADER_KEY_ETAG, HEADER_KEY_FILENAME, HEADER_KEY_ID, HEADER_KEY_IP, HEADER_KEY_LAST_MODIFIED, HEADER_KEY_PAYLOAD_DIGEST, HEADER_KEY_PROFILE, HEADER_KEY_REFERS_TO, HEADER_KEY_REFERS_TO_DATE, HEADER_KEY_REFERS_TO_FILE_OFFSET, HEADER_KEY_REFERS_TO_FILENAME, HEADER_KEY_REFERS_TO_TARGET_URI, HEADER_KEY_TRUNCATED, HEADER_KEY_TYPE, HEADER_KEY_URI, HEADER_LINE_ENCODING, HTTP_REQUEST_MIMETYPE, HTTP_RESPONSE_MIMETYPE, MAX_LINE_LENGTH, MAX_WARC_HEADER_LINE_LENGTH, NAMED_FIELD_CHECKSUM_LABEL, NAMED_FIELD_DESCRIPTION, NAMED_FIELD_FILEDESC, NAMED_FIELD_IP_LABEL, NAMED_FIELD_RELATED_LABEL, NAMED_FIELD_TRUNCATED, NAMED_FIELD_TRUNCATED_VALUE_HEAD, NAMED_FIELD_TRUNCATED_VALUE_LENGTH, NAMED_FIELD_TRUNCATED_VALUE_TIME, NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED, NAMED_FIELD_WARCFILENAME, PLACEHOLDER_RECORD_LENGTH_STRING, PROFILE_REVISIT_IDENTICAL_DIGEST, PROFILE_REVISIT_NOT_MODIFIED, TRUNCATED_VALUE_UNSPECIFIED, TYPE, WARC_FIELDS_TYPE, WARC_FILE_EXTENSION, WARC_HEADER_ENCODING, WARC_ID, WARC_MAGIC, WARC_VERSION, WSP
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DOT_COMPRESSED_FILE_EXTENSION, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, ORIGIN_FIELD_KEY, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
Modifier and Type | Method and Description |
---|---|
static org.archive.io.warc.WARCRecord |
fromBytes(byte[] bytes)
Converts raw bytes into an
WARCRecord . |
static byte[] |
getBodyContent(org.archive.io.warc.WARCRecord record)
Extracts contents of the body from a
WARCRecord . |
static byte[] |
getContent(org.archive.io.warc.WARCRecord record)
Extracts raw contents from a
WARCRecord (including HTTP headers). |
static String |
getWarcResponseMimeType(byte[] contents)
Extracts the MIME type of WARC response records.
|
static byte[] |
toBytes(org.archive.io.warc.WARCRecord record)
Converts WARC record into raw bytes.
|
public static org.archive.io.warc.WARCRecord fromBytes(byte[] bytes) throws IOException
WARCRecord
.bytes
- raw bytesWARCRecord
IOException
- if there is an issuepublic static byte[] toBytes(org.archive.io.warc.WARCRecord record) throws IOException
record
- conents of WARC response recordIOException
- if there is an issuepublic static String getWarcResponseMimeType(byte[] contents)
contents
- raw contents of the WARC response recordpublic static byte[] getContent(org.archive.io.warc.WARCRecord record) throws IOException
WARCRecord
(including HTTP headers).record
- the WARCRecord
IOException
- if there is an issuepublic static byte[] getBodyContent(org.archive.io.warc.WARCRecord record) throws IOException
WARCRecord
.
Excludes HTTP headers.record
- the WARCRecord
IOException
- if there is an issueCopyright © 2018 The Archives Unleased Project. All rights reserved.