DOMTreeBuilder
in package
implements
EventHandler
Create an HTML5 DOM tree from events.
This attempts to create a DOM from events emitted by a parser. This attempts (but does not guarantee) to up-convert older HTML documents to HTML5. It does this by applying HTML5's rules, but it will not change the architecture of the document itself.
Many of the error correction and quirks features suggested in the specification are implemented herein; however, not all of them are. Since we do not assume a graphical user agent, no presentation-specific logic is conducted during tree building.
FIXME: The present tree builder does not exactly follow the state machine rules for insert modes as outlined in the HTML5 spec. The processor needs to be re-written to accomodate this. See, for example, the Go language HTML5 parser.
Table of Contents
Interfaces
- EventHandler
- Standard events for HTML5.
Constants
- IM_AFTER_AFTER_BODY = 20
- IM_AFTER_AFTER_FRAMESET = 21
- IM_AFTER_BODY = 17
- IM_AFTER_FRAMESET = 19
- IM_AFTER_HEAD = 5
- IM_BEFORE_HEAD = 2
- IM_BEFORE_HTML = 1
- IM_IN_BODY = 6
- IM_IN_CAPTION = 10
- IM_IN_CELL = 14
- IM_IN_COLUMN_GROUP = 11
- IM_IN_FRAMESET = 18
- IM_IN_HEAD = 3
- IM_IN_HEAD_NOSCRIPT = 4
- IM_IN_MATHML = 23
- IM_IN_ROW = 13
- IM_IN_SELECT = 15
- IM_IN_SELECT_IN_TABLE = 16
- IM_IN_SVG = 22
- IM_IN_TABLE = 8
- IM_IN_TABLE_BODY = 12
- IM_IN_TABLE_TEXT = 9
- IM_INITIAL = 0
- Defined in 8.2.5.
- IM_TEXT = 7
- NAMESPACE_HTML = 'http://www.w3.org/1999/xhtml'
- Defined in http://www.w3.org/TR/html51/infrastructure.html#html-namespace-0.
- NAMESPACE_MATHML = 'http://www.w3.org/1998/Math/MathML'
- NAMESPACE_SVG = 'http://www.w3.org/2000/svg'
- NAMESPACE_XLINK = 'http://www.w3.org/1999/xlink'
- NAMESPACE_XML = 'http://www.w3.org/XML/1998/namespace'
- NAMESPACE_XMLNS = 'http://www.w3.org/2000/xmlns/'
- OPT_DISABLE_HTML_NS = 'disable_html_ns'
- OPT_IMPLICIT_NS = 'implicit_namespaces'
- OPT_TARGET_DOC = 'target_document'
Properties
- $current : mixed
- $doc : mixed
- $errors : mixed
- $frag : mixed
- $implicitNamespaces : array<string|int, mixed>
- Holds the always available namespaces (which does not require the XMLNS declaration).
- $insertMode : mixed
- $nsRoots : array<string|int, mixed>
- Holds the HTML5 element names that causes a namespace switch.
- $nsStack : array<string|int, mixed>
- Holds a stack of currently active namespaces.
- $onlyInline : string|null
- Track if we are in an element that allows only inline child nodes.
- $options : mixed
- $processor : mixed
- $pushes : array<string|int, mixed>
- Holds the number of namespaces declared by a node.
- $quirks : mixed
- Quirks mode is enabled by default.
- $rules : mixed
- $stack : mixed
Methods
- __construct() : mixed
- cdata() : mixed
- A CDATA section.
- comment() : mixed
- A comment section (unparsed character data).
- doctype() : mixed
- A doctype declaration.
- document() : mixed
- Get the document.
- endTag() : mixed
- An end-tag.
- eof() : mixed
- Indicates that the document has been entirely processed.
- fragment() : DOMDocumentFragment
- Get the DOM fragment for the body.
- getErrors() : mixed
- parseError() : mixed
- Emitted when the parser encounters an error condition.
- processingInstruction() : mixed
- This is a holdover from the XML spec.
- setInstructionProcessor() : mixed
- Provide an instruction processor.
- startTag() : int
- Process the start tag.
- text() : mixed
- A unit of parsed character data.
- autoclose() : bool
- Automatically climb the tree and close the closest node with the matching $tag.
- isAncestor() : bool
- Checks if the given tagname is an ancestor of the present candidate.
- isParent() : bool
- Returns true if the immediate parent element is of the given tagname.
- normalizeTagName() : string
- Apply normalization rules to a tag name.
- quirksTreeResolver() : mixed
Constants
IM_AFTER_AFTER_BODY
public
mixed
IM_AFTER_AFTER_BODY
= 20
IM_AFTER_AFTER_FRAMESET
public
mixed
IM_AFTER_AFTER_FRAMESET
= 21
IM_AFTER_BODY
public
mixed
IM_AFTER_BODY
= 17
IM_AFTER_FRAMESET
public
mixed
IM_AFTER_FRAMESET
= 19
IM_AFTER_HEAD
public
mixed
IM_AFTER_HEAD
= 5
IM_BEFORE_HEAD
public
mixed
IM_BEFORE_HEAD
= 2
IM_BEFORE_HTML
public
mixed
IM_BEFORE_HTML
= 1
IM_IN_BODY
public
mixed
IM_IN_BODY
= 6
IM_IN_CAPTION
public
mixed
IM_IN_CAPTION
= 10
IM_IN_CELL
public
mixed
IM_IN_CELL
= 14
IM_IN_COLUMN_GROUP
public
mixed
IM_IN_COLUMN_GROUP
= 11
IM_IN_FRAMESET
public
mixed
IM_IN_FRAMESET
= 18
IM_IN_HEAD
public
mixed
IM_IN_HEAD
= 3
IM_IN_HEAD_NOSCRIPT
public
mixed
IM_IN_HEAD_NOSCRIPT
= 4
IM_IN_MATHML
public
mixed
IM_IN_MATHML
= 23
IM_IN_ROW
public
mixed
IM_IN_ROW
= 13
IM_IN_SELECT
public
mixed
IM_IN_SELECT
= 15
IM_IN_SELECT_IN_TABLE
public
mixed
IM_IN_SELECT_IN_TABLE
= 16
IM_IN_SVG
public
mixed
IM_IN_SVG
= 22
IM_IN_TABLE
public
mixed
IM_IN_TABLE
= 8
IM_IN_TABLE_BODY
public
mixed
IM_IN_TABLE_BODY
= 12
IM_IN_TABLE_TEXT
public
mixed
IM_IN_TABLE_TEXT
= 9
IM_INITIAL
Defined in 8.2.5.
public
mixed
IM_INITIAL
= 0
IM_TEXT
public
mixed
IM_TEXT
= 7
NAMESPACE_HTML
Defined in http://www.w3.org/TR/html51/infrastructure.html#html-namespace-0.
public
mixed
NAMESPACE_HTML
= 'http://www.w3.org/1999/xhtml'
NAMESPACE_MATHML
public
mixed
NAMESPACE_MATHML
= 'http://www.w3.org/1998/Math/MathML'
NAMESPACE_SVG
public
mixed
NAMESPACE_SVG
= 'http://www.w3.org/2000/svg'
NAMESPACE_XLINK
public
mixed
NAMESPACE_XLINK
= 'http://www.w3.org/1999/xlink'
NAMESPACE_XML
public
mixed
NAMESPACE_XML
= 'http://www.w3.org/XML/1998/namespace'
NAMESPACE_XMLNS
public
mixed
NAMESPACE_XMLNS
= 'http://www.w3.org/2000/xmlns/'
OPT_DISABLE_HTML_NS
public
mixed
OPT_DISABLE_HTML_NS
= 'disable_html_ns'
OPT_IMPLICIT_NS
public
mixed
OPT_IMPLICIT_NS
= 'implicit_namespaces'
OPT_TARGET_DOC
public
mixed
OPT_TARGET_DOC
= 'target_document'
Properties
$current
protected
mixed
$current
$doc
protected
mixed
$doc
$errors
protected
mixed
$errors
= array()
$frag
protected
mixed
$frag
$implicitNamespaces
Holds the always available namespaces (which does not require the XMLNS declaration).
protected
array<string|int, mixed>
$implicitNamespaces
= array('xml' => self::NAMESPACE_XML, 'xmlns' => self::NAMESPACE_XMLNS, 'xlink' => self::NAMESPACE_XLINK)
$insertMode
protected
mixed
$insertMode
= 0
$nsRoots
Holds the HTML5 element names that causes a namespace switch.
protected
array<string|int, mixed>
$nsRoots
= array('html' => self::NAMESPACE_HTML, 'svg' => self::NAMESPACE_SVG, 'math' => self::NAMESPACE_MATHML)
$nsStack
Holds a stack of currently active namespaces.
protected
array<string|int, mixed>
$nsStack
= array()
$onlyInline
Track if we are in an element that allows only inline child nodes.
protected
string|null
$onlyInline
$options
protected
mixed
$options
= array()
$processor
protected
mixed
$processor
$pushes
Holds the number of namespaces declared by a node.
protected
array<string|int, mixed>
$pushes
= array()
$quirks
Quirks mode is enabled by default.
protected
mixed
$quirks
= true
Any document that is missing the DT will be considered to be in quirks mode.
$rules
protected
mixed
$rules
$stack
protected
mixed
$stack
= array()
Methods
__construct()
public
__construct([mixed $isFragment = false ][, array<string|int, mixed> $options = array() ]) : mixed
Parameters
- $isFragment : mixed = false
- $options : array<string|int, mixed> = array()
cdata()
A CDATA section.
public
cdata(mixed $data) : mixed
Parameters
- $data : mixed
-
The unparsed character data
comment()
A comment section (unparsed character data).
public
comment(mixed $cdata) : mixed
Parameters
- $cdata : mixed
doctype()
A doctype declaration.
public
doctype(mixed $name[, mixed $idType = 0 ][, mixed $id = null ][, mixed $quirks = false ]) : mixed
Parameters
- $name : mixed
-
The name of the root element.
- $idType : mixed = 0
-
One of DOCTYPE_NONE, DOCTYPE_PUBLIC, or DOCTYPE_SYSTEM
- $id : mixed = null
-
The identifier. For DOCTYPE_PUBLIC, this is the public ID. If DOCTYPE_SYSTEM, then this is a system ID.
- $quirks : mixed = false
-
Indicates whether the builder should enter quirks mode.
document()
Get the document.
public
document() : mixed
endTag()
An end-tag.
public
endTag(mixed $name) : mixed
Parameters
- $name : mixed
eof()
Indicates that the document has been entirely processed.
public
eof() : mixed
fragment()
Get the DOM fragment for the body.
public
fragment() : DOMDocumentFragment
This returns a DOMNodeList because a fragment may have zero or more DOMNodes at its root.
Tags
Return values
DOMDocumentFragmentgetErrors()
public
getErrors() : mixed
parseError()
Emitted when the parser encounters an error condition.
public
parseError(mixed $msg[, mixed $line = 0 ][, mixed $col = 0 ]) : mixed
Parameters
- $msg : mixed
- $line : mixed = 0
- $col : mixed = 0
processingInstruction()
This is a holdover from the XML spec.
public
processingInstruction(mixed $name[, mixed $data = null ]) : mixed
While user agents don't get PIs, server-side does.
Parameters
- $name : mixed
-
The name of the processor (e.g. 'php').
- $data : mixed = null
-
The unparsed data.
setInstructionProcessor()
Provide an instruction processor.
public
setInstructionProcessor(InstructionProcessor $proc) : mixed
This is used for handling Processor Instructions as they are inserted. If omitted, PI's are inserted directly into the DOM tree.
Parameters
- $proc : InstructionProcessor
startTag()
Process the start tag.
public
startTag(string $name[, array<string|int, mixed> $attributes = array() ][, bool $selfClosing = false ]) : int
Parameters
- $name : string
- $attributes : array<string|int, mixed> = array()
- $selfClosing : bool = false
Tags
Return values
inttext()
A unit of parsed character data.
public
text(mixed $data) : mixed
Entities in this text are already decoded.
Parameters
- $data : mixed
autoclose()
Automatically climb the tree and close the closest node with the matching $tag.
protected
autoclose(string $tagName) : bool
Parameters
- $tagName : string
Return values
boolisAncestor()
Checks if the given tagname is an ancestor of the present candidate.
protected
isAncestor(string $tagName) : bool
If $this->current or anything above $this->current matches the given tag name, this returns true.
Parameters
- $tagName : string
Return values
boolisParent()
Returns true if the immediate parent element is of the given tagname.
protected
isParent(string $tagName) : bool
Parameters
- $tagName : string
Return values
boolnormalizeTagName()
Apply normalization rules to a tag name.
protected
normalizeTagName(string $tagName) : string
See sections 2.9 and 8.1.2.
Parameters
- $tagName : string
Return values
string —The normalized tag name.
quirksTreeResolver()
protected
quirksTreeResolver(mixed $name) : mixed
Parameters
- $name : mixed