HTML-hul Module

1 Introduction

The HTML-hul module recognizes and validates the HTML (Hypertext Markup Language) format. [HTML].

The module is invoked by the:

jhove ... -m HTML-hul ...

command line option.

The HTML-hul module recognizes XHTML 1.0 (including transitional, frameset and strict) and 1.1, making use of the XML-hul module. If the XML-hul module is not available, only limited information will be provided on XHTML documents.

This module can be configured with the following parameters:

  • withTextMD=true to ask for the output of a textMD block in the text technical properties.

2 Coverage

The HTML-hul module recognizes and validates the following public profiles:

3 Well-Formedness

For the HTML profiles JHOVE uses the criteria for HTML well-formedness defined by [HTML 3.2, HTML 4.0, HTML 4.01]; for the XHTML profiles, JHOVE uses the criteria defined by [XML]. Specifically, a well-formed HTML document must have no syntactic errors, and must contain at least one of the tags HTML, HEAD, BODY or TITLE.

4 Validity

For the HTML profiles JHOVE uses the criteria for HTML validity defined by [HTML 3.2, HTML 4.0, HTML 4.01]; for the XHTML profiles JHOVE uses the criteria defined by [XHTML 1.0, XHTML 1.1].

5 Representation Information

The MIME type is reported as: text/html [RFC 2854]

In addition to the standard JHOVE representation information, the following HTML-specific properties are reported:

  • Property “XMLMetadata” of type PROPERTY and arity LIST (for XHTML only; see the documentation of the XML-hul module for the contents of this property).
  • Property “HTMLMetadata” of type PROPERTY and arity LIST
    • Property “PrimaryLanguage” of type STRING
    • Property “OtherLanguages” of type STRING and arity SET
    • Property “Title” of type STRING
    • Property “MetaTags” of type PROPERTY and arity LIST
      • Property “Name” of type STRING
      • Property “Httpequiv” of type STRING
      • Property “Content” of type STRING
    • Property “Frames” of type PROPERTY and arity LIST
      • Property “Name” of type STRING
      • Property “Title” of type STRING
      • Property “Longdesc” of type STRING
      • Property “Src” of type STRING
    • Property “Links” of type STRING and arity LIST
    • Property “Scripts” of type STRING and arity LIST
    • Property “Images” of type PROPERTY and arity LIST
      • Property “Alt” of type STRING
      • Property “Longdesc” of type STRING
      • Property “Src” of type STRING
      • Property “Height” of type STRING
      • Property “Width” of type STRING
    • Property “Citations” of type STRING and arity LIST
    • Property “DefinedTerms” of type STRING and arity LIST
    • Property “Abbreviations” of type PROPERTY and arity LIST
      • Property “Text” of type STRING
      • Property “Title” of type STRING
    • Property “Entities” of type STRING and arity LIST
    • Property “UnicodeEntityBlocks” of type STRING and arity LIST
    • If withTextMD, Property “TextMDMetadata” of type TextMDMetadata and arity SCALAR

6 Additional Module Properties

  • Nominal file extension: .html, .htm
  • Macintosh OS file type: TEXT