Documentation

Scanner
in package

The scanner scans over a given data input to react appropriately to characters.

Table of Contents

Constants

CHARS_ALNUM  = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890'
CHARS_ALPHA  = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
CHARS_HEX  = 'abcdefABCDEF01234567890'

Properties

$errors  : mixed
Parse errors.
$char  : mixed
The current integer byte position we are in $data.
$data  : mixed
The string data we're parsing.
$EOF  : mixed
Length of $data; when $char === $data, we are at the end-of-file.

Methods

__construct()  : mixed
Create a new Scanner.
charsUntil()  : mixed
Read chars until something in the mask is encountered.
charsWhile()  : int
Read chars as long as the mask matches.
columnOffset()  : int
Returns the current column of the current line that the tokenizer is at.
consume()  : mixed
Silently consume N chars.
current()  : string
Get the current character.
currentLine()  : int
Returns the current line that is being consumed.
getAsciiAlpha()  : string
Get the next group of characters that are ASCII Alpha characters.
getAsciiAlphaNum()  : string
Get the next group of characters that are ASCII Alpha characters and numbers.
getHex()  : string
Get the next group of that contains hex characters.
getNumeric()  : string
Get the next group of numbers.
next()  : string
Get the next character.
peek()  : string
Take a peek at the next character in the data.
position()  : int
Get the current position.
remainingChars()  : int
Get all characters until EOF.
sequenceMatches()  : bool
Check if upcomming chars match the given sequence.
unconsume()  : mixed
Unconsume some of the data.
whitespace()  : int
Consume whitespace.
doCharsUntil()  : mixed
Read to a particular match (or until $max bytes are consumed).
doCharsWhile()  : string
Returns the string so long as $bytes matches.
replaceLinefeeds()  : string
Replace linefeed characters according to the spec.

Constants

CHARS_ALNUM

public mixed CHARS_ALNUM = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890'

CHARS_ALPHA

public mixed CHARS_ALPHA = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

CHARS_HEX

public mixed CHARS_HEX = 'abcdefABCDEF01234567890'

Properties

$errors

Parse errors.

public mixed $errors = array()

$char

The current integer byte position we are in $data.

private mixed $char

$data

The string data we're parsing.

private mixed $data

$EOF

Length of $data; when $char === $data, we are at the end-of-file.

private mixed $EOF

Methods

__construct()

Create a new Scanner.

public __construct(string $data[, string $encoding = 'UTF-8' ]) : mixed
Parameters
$data : string

Data to parse.

$encoding : string = 'UTF-8'

The encoding to use for the data.

Tags
throws
Exception

If the given data cannot be encoded to UTF-8.

charsUntil()

Read chars until something in the mask is encountered.

public charsUntil(string $mask) : mixed
Parameters
$mask : string

charsWhile()

Read chars as long as the mask matches.

public charsWhile(string $mask) : int
Parameters
$mask : string
Return values
int

columnOffset()

Returns the current column of the current line that the tokenizer is at.

public columnOffset() : int

Newlines are column 0. The first char after a newline is column 1.

Return values
int

The column number.

consume()

Silently consume N chars.

public consume([int $count = 1 ]) : mixed
Parameters
$count : int = 1

current()

Get the current character.

public current() : string

Note, this does not advance the pointer.

Return values
string

The current character.

currentLine()

Returns the current line that is being consumed.

public currentLine() : int
Return values
int

The current line number.

getAsciiAlpha()

Get the next group of characters that are ASCII Alpha characters.

public getAsciiAlpha() : string

Note, along with getting the characters the pointer in the data will be moved as well.

Return values
string

The next group of ASCII alpha characters.

getAsciiAlphaNum()

Get the next group of characters that are ASCII Alpha characters and numbers.

public getAsciiAlphaNum() : string

Note, along with getting the characters the pointer in the data will be moved as well.

Return values
string

The next group of ASCII alpha characters and numbers.

getHex()

Get the next group of that contains hex characters.

public getHex() : string

Note, along with getting the characters the pointer in the data will be moved as well.

Return values
string

The next group that is hex characters.

getNumeric()

Get the next group of numbers.

public getNumeric() : string

Note, along with getting the characters the pointer in the data will be moved as well.

Return values
string

The next group of numbers.

next()

Get the next character.

public next() : string

Note: This advances the pointer.

Return values
string

The next character.

peek()

Take a peek at the next character in the data.

public peek() : string
Return values
string

The next character.

position()

Get the current position.

public position() : int
Return values
int

The current intiger byte position.

remainingChars()

Get all characters until EOF.

public remainingChars() : int

This consumes characters until the EOF.

Return values
int

The number of characters remaining.

sequenceMatches()

Check if upcomming chars match the given sequence.

public sequenceMatches(string $sequence[, bool $caseSensitive = true ]) : bool

This will read the stream for the $sequence. If it's found, this will return true. If not, return false. Since this unconsumes any chars it reads, the caller will still need to read the next sequence, even if this returns true.

Example: $this->scanner->sequenceMatches('</script>') will see if the input stream is at the start of a '</script>' string.

Parameters
$sequence : string
$caseSensitive : bool = true
Return values
bool

unconsume()

Unconsume some of the data.

public unconsume([int $howMany = 1 ]) : mixed

This moves the data pointer backwards.

Parameters
$howMany : int = 1

The number of characters to move the pointer back.

whitespace()

Consume whitespace.

public whitespace() : int

Whitespace in HTML5 is: formfeed, tab, newline, space.

Return values
int

The length of the matched whitespaces.

doCharsUntil()

Read to a particular match (or until $max bytes are consumed).

private doCharsUntil(string $bytes[, int $max = null ]) : mixed

This operates on byte sequences, not characters.

Matches as far as possible until we reach a certain set of bytes and returns the matched substring.

Parameters
$bytes : string

Bytes to match.

$max : int = null

Maximum number of bytes to scan.

Return values
mixed

Index or false if no match is found. You should use strong equality when checking the result, since index could be 0.

doCharsWhile()

Returns the string so long as $bytes matches.

private doCharsWhile(string $bytes[, int $max = null ]) : string

Matches as far as possible with a certain set of bytes and returns the matched substring.

Parameters
$bytes : string

A mask of bytes to match. If ANY byte in this mask matches the current char, the pointer advances and the char is part of the substring.

$max : int = null

The max number of chars to read.

Return values
string

replaceLinefeeds()

Replace linefeed characters according to the spec.

private replaceLinefeeds(mixed $data) : string
Parameters
$data : mixed
Return values
string

        
On this page

Search results