Documentation

DOMTreeBuilder
in package
implements EventHandler

Create an HTML5 DOM tree from events.

This attempts to create a DOM from events emitted by a parser. This attempts (but does not guarantee) to up-convert older HTML documents to HTML5. It does this by applying HTML5's rules, but it will not change the architecture of the document itself.

Many of the error correction and quirks features suggested in the specification are implemented herein; however, not all of them are. Since we do not assume a graphical user agent, no presentation-specific logic is conducted during tree building.

FIXME: The present tree builder does not exactly follow the state machine rules for insert modes as outlined in the HTML5 spec. The processor needs to be re-written to accomodate this. See, for example, the Go language HTML5 parser.

Table of Contents

Interfaces

EventHandler
Standard events for HTML5.

Constants

IM_AFTER_AFTER_BODY  = 20
IM_AFTER_AFTER_FRAMESET  = 21
IM_AFTER_BODY  = 17
IM_AFTER_FRAMESET  = 19
IM_AFTER_HEAD  = 5
IM_BEFORE_HEAD  = 2
IM_BEFORE_HTML  = 1
IM_IN_BODY  = 6
IM_IN_CAPTION  = 10
IM_IN_CELL  = 14
IM_IN_COLUMN_GROUP  = 11
IM_IN_FRAMESET  = 18
IM_IN_HEAD  = 3
IM_IN_HEAD_NOSCRIPT  = 4
IM_IN_MATHML  = 23
IM_IN_ROW  = 13
IM_IN_SELECT  = 15
IM_IN_SELECT_IN_TABLE  = 16
IM_IN_SVG  = 22
IM_IN_TABLE  = 8
IM_IN_TABLE_BODY  = 12
IM_IN_TABLE_TEXT  = 9
IM_INITIAL  = 0
Defined in 8.2.5.
IM_TEXT  = 7
NAMESPACE_HTML  = 'http://www.w3.org/1999/xhtml'
Defined in http://www.w3.org/TR/html51/infrastructure.html#html-namespace-0.
NAMESPACE_MATHML  = 'http://www.w3.org/1998/Math/MathML'
NAMESPACE_SVG  = 'http://www.w3.org/2000/svg'
NAMESPACE_XLINK  = 'http://www.w3.org/1999/xlink'
NAMESPACE_XML  = 'http://www.w3.org/XML/1998/namespace'
NAMESPACE_XMLNS  = 'http://www.w3.org/2000/xmlns/'
OPT_DISABLE_HTML_NS  = 'disable_html_ns'
OPT_IMPLICIT_NS  = 'implicit_namespaces'
OPT_TARGET_DOC  = 'target_document'

Properties

$current  : mixed
$doc  : mixed
$errors  : mixed
$frag  : mixed
$implicitNamespaces  : array<string|int, mixed>
Holds the always available namespaces (which does not require the XMLNS declaration).
$insertMode  : mixed
$nsRoots  : array<string|int, mixed>
Holds the HTML5 element names that causes a namespace switch.
$nsStack  : array<string|int, mixed>
Holds a stack of currently active namespaces.
$onlyInline  : string|null
Track if we are in an element that allows only inline child nodes.
$options  : mixed
$processor  : mixed
$pushes  : array<string|int, mixed>
Holds the number of namespaces declared by a node.
$quirks  : mixed
Quirks mode is enabled by default.
$rules  : mixed
$stack  : mixed

Methods

__construct()  : mixed
cdata()  : mixed
A CDATA section.
comment()  : mixed
A comment section (unparsed character data).
doctype()  : mixed
A doctype declaration.
document()  : mixed
Get the document.
endTag()  : mixed
An end-tag.
eof()  : mixed
Indicates that the document has been entirely processed.
fragment()  : DOMDocumentFragment
Get the DOM fragment for the body.
getErrors()  : mixed
parseError()  : mixed
Emitted when the parser encounters an error condition.
processingInstruction()  : mixed
This is a holdover from the XML spec.
setInstructionProcessor()  : mixed
Provide an instruction processor.
startTag()  : int
Process the start tag.
text()  : mixed
A unit of parsed character data.
autoclose()  : bool
Automatically climb the tree and close the closest node with the matching $tag.
isAncestor()  : bool
Checks if the given tagname is an ancestor of the present candidate.
isParent()  : bool
Returns true if the immediate parent element is of the given tagname.
normalizeTagName()  : string
Apply normalization rules to a tag name.
quirksTreeResolver()  : mixed

Constants

IM_AFTER_AFTER_FRAMESET

public mixed IM_AFTER_AFTER_FRAMESET = 21

NAMESPACE_HTML

Defined in http://www.w3.org/TR/html51/infrastructure.html#html-namespace-0.

public mixed NAMESPACE_HTML = 'http://www.w3.org/1999/xhtml'

NAMESPACE_MATHML

public mixed NAMESPACE_MATHML = 'http://www.w3.org/1998/Math/MathML'

NAMESPACE_SVG

public mixed NAMESPACE_SVG = 'http://www.w3.org/2000/svg'
public mixed NAMESPACE_XLINK = 'http://www.w3.org/1999/xlink'

NAMESPACE_XML

public mixed NAMESPACE_XML = 'http://www.w3.org/XML/1998/namespace'

NAMESPACE_XMLNS

public mixed NAMESPACE_XMLNS = 'http://www.w3.org/2000/xmlns/'

OPT_DISABLE_HTML_NS

public mixed OPT_DISABLE_HTML_NS = 'disable_html_ns'

OPT_IMPLICIT_NS

public mixed OPT_IMPLICIT_NS = 'implicit_namespaces'

Properties

$implicitNamespaces

Holds the always available namespaces (which does not require the XMLNS declaration).

protected array<string|int, mixed> $implicitNamespaces = array('xml' => self::NAMESPACE_XML, 'xmlns' => self::NAMESPACE_XMLNS, 'xlink' => self::NAMESPACE_XLINK)

$nsRoots

Holds the HTML5 element names that causes a namespace switch.

protected array<string|int, mixed> $nsRoots = array('html' => self::NAMESPACE_HTML, 'svg' => self::NAMESPACE_SVG, 'math' => self::NAMESPACE_MATHML)

$nsStack

Holds a stack of currently active namespaces.

protected array<string|int, mixed> $nsStack = array()

$onlyInline

Track if we are in an element that allows only inline child nodes.

protected string|null $onlyInline

$pushes

Holds the number of namespaces declared by a node.

protected array<string|int, mixed> $pushes = array()

$quirks

Quirks mode is enabled by default.

protected mixed $quirks = true

Any document that is missing the DT will be considered to be in quirks mode.

Methods

__construct()

public __construct([mixed $isFragment = false ][, array<string|int, mixed> $options = array() ]) : mixed
Parameters
$isFragment : mixed = false
$options : array<string|int, mixed> = array()

cdata()

A CDATA section.

public cdata(mixed $data) : mixed
Parameters
$data : mixed

The unparsed character data

comment()

A comment section (unparsed character data).

public comment(mixed $cdata) : mixed
Parameters
$cdata : mixed

doctype()

A doctype declaration.

public doctype(mixed $name[, mixed $idType = 0 ][, mixed $id = null ][, mixed $quirks = false ]) : mixed
Parameters
$name : mixed

The name of the root element.

$idType : mixed = 0

One of DOCTYPE_NONE, DOCTYPE_PUBLIC, or DOCTYPE_SYSTEM

$id : mixed = null

The identifier. For DOCTYPE_PUBLIC, this is the public ID. If DOCTYPE_SYSTEM, then this is a system ID.

$quirks : mixed = false

Indicates whether the builder should enter quirks mode.

endTag()

An end-tag.

public endTag(mixed $name) : mixed
Parameters
$name : mixed

eof()

Indicates that the document has been entirely processed.

public eof() : mixed

parseError()

Emitted when the parser encounters an error condition.

public parseError(mixed $msg[, mixed $line = 0 ][, mixed $col = 0 ]) : mixed
Parameters
$msg : mixed
$line : mixed = 0
$col : mixed = 0

processingInstruction()

This is a holdover from the XML spec.

public processingInstruction(mixed $name[, mixed $data = null ]) : mixed

While user agents don't get PIs, server-side does.

Parameters
$name : mixed

The name of the processor (e.g. 'php').

$data : mixed = null

The unparsed data.

setInstructionProcessor()

Provide an instruction processor.

public setInstructionProcessor(InstructionProcessor $proc) : mixed

This is used for handling Processor Instructions as they are inserted. If omitted, PI's are inserted directly into the DOM tree.

Parameters
$proc : InstructionProcessor

startTag()

Process the start tag.

public startTag(string $name[, array<string|int, mixed> $attributes = array() ][, bool $selfClosing = false ]) : int
Parameters
$name : string
$attributes : array<string|int, mixed> = array()
$selfClosing : bool = false
Tags
todo
  • XMLNS namespace handling (we need to parse, even if it's not valid)
  • XLink, MathML and SVG namespace handling
  • Omission rules: 8.1.2.4 Optional tags
Return values
int

text()

A unit of parsed character data.

public text(mixed $data) : mixed

Entities in this text are already decoded.

Parameters
$data : mixed

autoclose()

Automatically climb the tree and close the closest node with the matching $tag.

protected autoclose(string $tagName) : bool
Parameters
$tagName : string
Return values
bool

isAncestor()

Checks if the given tagname is an ancestor of the present candidate.

protected isAncestor(string $tagName) : bool

If $this->current or anything above $this->current matches the given tag name, this returns true.

Parameters
$tagName : string
Return values
bool

isParent()

Returns true if the immediate parent element is of the given tagname.

protected isParent(string $tagName) : bool
Parameters
$tagName : string
Return values
bool

normalizeTagName()

Apply normalization rules to a tag name.

protected normalizeTagName(string $tagName) : string

See sections 2.9 and 8.1.2.

Parameters
$tagName : string
Return values
string

The normalized tag name.

quirksTreeResolver()

protected quirksTreeResolver(mixed $name) : mixed
Parameters
$name : mixed

        
On this page

Search results