Qt 4.8
|
A hand-written tokenizer which tokenizes XQuery 1.0 & XPath 2.0, and delivers tokens to the Bison generated parser. More...
#include <qxquerytokenizer_p.h>
Public Types | |
enum | State { AfterAxisSeparator, AposAttributeContent, Axis, Default, ElementContent, EndTag, ItemType, KindTest, KindTestForPI, NamespaceDecl, NamespaceKeyword, OccurrenceIndicator, Operator, Pragma, PragmaContent, ProcessingInstructionContent, ProcessingInstructionName, QuotAttributeContent, StartTag, VarName, XMLComment, XMLSpaceDecl, XQueryVersion } |
Public Types inherited from QPatternist::Tokenizer | |
typedef QExplicitlySharedDataPointer< Tokenizer > | Ptr |
Public Types inherited from QPatternist::TokenSource | |
typedef QExplicitlySharedDataPointer< TokenSource > | Ptr |
typedef QQueue< Ptr > | Queue |
typedef yytokentype | TokenType |
Public Functions | |
virtual int | commenceScanOnly () |
virtual Token | nextToken (YYLTYPE *const sourceLocator) |
virtual void | resumeTokenizationFrom (const int position) |
virtual void | setParserContext (const ParserContext::Ptr &parseInfo) |
XQueryTokenizer (const QString &query, const QUrl &location, const State startingState=Default) | |
Public Functions inherited from QPatternist::Tokenizer | |
const QUrl & | queryURI () const |
Tokenizer (const QUrl &queryU) | |
Public Functions inherited from QPatternist::TokenSource | |
TokenSource () | |
virtual | ~TokenSource () |
Public Functions inherited from QSharedData | |
QSharedData () | |
Constructs a QSharedData object with a reference count of 0. More... | |
QSharedData (const QSharedData &) | |
Constructs a QSharedData object with reference count 0. More... | |
Private Types | |
typedef QSet< int > | CharacterSkips |
Private Functions | |
bool | aheadEquals (const char *const chs, const int len, const int offset=1) const |
bool | atEnd () const |
Token | attributeAsRaw (const QChar separator, int &stack, const int startPos, const bool inLiteral, QString &result) |
QChar | charForReference (const QString &reference) |
Tokenizer::TokenType | consumeComment () |
Parses comments: (: comment content :) . It recurses for parsing nested comments. More... | |
bool | consumeRawWhitespace () |
TokenType | consumeWhitespace () |
const QChar | current () const |
Token | nextToken () |
char | peekAhead (const int length=1) const |
char | peekCurrent () const |
Returns the character at the current position, converted to ASCII . More... | |
int | peekForColonColon () const |
void | popState () |
void | pushState (const State state) |
void | pushState () |
int | scanUntil (const char *const content) |
void | setState (const State s) |
State | state () const |
Token | tokenAndAdvance (const TokenType code, const int advance=1) |
Token | tokenAndChangeState (const TokenType code, const State state, const int advance=1) |
Token | tokenAndChangeState (const TokenType code, const QString &value, const State state) |
QString | tokenizeCharacterReference () |
Token | tokenizeNCName () |
Token | tokenizeNCNameOrQName () |
Token | tokenizeNumberLiteral () |
Token | tokenizeStringLiteral () |
Static Private Functions | |
static Token | error () |
static bool | isDigit (const char ch) |
static bool | isNCNameBody (const QChar ch) |
static bool | isNCNameStart (const QChar ch) |
static bool | isOperatorKeyword (const TokenType) |
static bool | isPhraseKeyword (const TokenType code) |
static bool | isTypeToken (const TokenType t) |
static const TokenMap * | lookupKeyword (const QString &keyword) |
static QString | normalizeEOL (const QString &input, const CharacterSkips &characterSkips) |
Properties | |
QHash< QString, QChar > | m_charRefs |
int | m_columnOffset |
const QString | m_data |
const int | m_length |
int | m_line |
const NamePool::Ptr | m_namePool |
int | m_pos |
bool | m_scanOnly |
State | m_state |
QStack< State > | m_stateStack |
QStack< Token > | m_tokenStack |
Additional Inherited Members | |
Public Variables inherited from QSharedData | |
QAtomicInt | ref |
Static Protected Functions inherited from QPatternist::Tokenizer | |
static QString | tokenToString (const Token &token) |
A hand-written tokenizer which tokenizes XQuery 1.0 & XPath 2.0, and delivers tokens to the Bison generated parser.
Definition at line 76 of file qxquerytokenizer_p.h.
|
private |
A set of indexes into a QString, the one being passed to normalizeEOL() whose characters shouldn't be normalized.
Definition at line 260 of file qxquerytokenizer_p.h.
Tokenizer states. Organized alphabetically.
Definition at line 82 of file qxquerytokenizer_p.h.
QPatternist::XQueryTokenizer::XQueryTokenizer | ( | const QString & | query, |
const QUrl & | location, | ||
const State | startingState = Default |
||
) |
Definition at line 62 of file qxquerytokenizer.cpp.
|
inlineprivate |
offset
from the current position, matches chs
. The length of chs
is len
. Definition at line 693 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Definition at line 270 of file qxquerytokenizer_p.h.
Referenced by attributeAsRaw(), consumeComment(), and nextToken().
|
private |
Instead of recognizing and tokenizing embedded expressions in direct attriute constructors, this function is essentially a mini recursive-descent parser that has the necessary logic to recognize embedded expressions and their potentially interfering string literals, in order to scan to the very end of the attribute value, and return the whole as a string.
There is of course syntax errors this function will not detect, but that is ok since the attributes will be parsed once more.
An inelegant solution, but which gets the job done.
Definition at line 2061 of file qxquerytokenizer.cpp.
Referenced by atEnd(), and nextToken().
Returns the character corresponding to the builtin reference reference
. For instance, passing gt
will give you '>' in return.
If reference
is an invalid character reference, a null QChar is returned.
Definition at line 607 of file qxquerytokenizer.cpp.
Referenced by tokenizeCharacterReference().
|
virtual |
Switches the Tokenizer to only do scanning, and returns complete strings for attribute value templates as opposed to the tokens for the contained expressions.
The current position in the stream is returned. It can be used to later resume regular tokenization.
Implements QPatternist::Tokenizer.
Definition at line 2229 of file qxquerytokenizer.cpp.
|
private |
Parses comments: (: comment content :)
. It recurses for parsing nested comments.
It is assumed that the start token for the comment, "(:", has already been parsed.
Typically, don't call this function, but ignoreWhitespace().
Definition at line 188 of file qxquerytokenizer.cpp.
Referenced by consumeWhitespace().
|
inlineprivate |
Consumes only whitespace, in the traditional sense. The function exits if non-whitespace is encountered, such as the start of a comment.
true
if the end was reached, otherwise false
Definition at line 246 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Definition at line 274 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Disregarding encoding conversion, equivalent to calling:
Definition at line 76 of file qxquerytokenizer.cpp.
Referenced by attributeAsRaw(), nextToken(), peekCurrent(), tokenizeNCName(), tokenizeNumberLiteral(), and tokenizeStringLiteral().
|
inlinestaticprivate |
Definition at line 325 of file qxquerytokenizer.cpp.
Referenced by nextToken(), tokenizeNCName(), tokenizeNumberLiteral(), and tokenizeStringLiteral().
|
inlinestaticprivate |
Definition at line 330 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlinestaticprivate |
Definition at line 354 of file qxquerytokenizer.cpp.
Referenced by tokenizeNCName().
|
inlinestaticprivate |
Definition at line 336 of file qxquerytokenizer.cpp.
Referenced by nextToken(), tokenizeNCName(), and tokenizeNumberLiteral().
|
inlinestaticprivate |
Definition at line 406 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlinestaticprivate |
Determines whether code
is a keyword that is followed by a second keyword. For instance declare function
.
Definition at line 382 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
staticprivate |
Definition at line 447 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlinestaticprivate |
Definition at line 712 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
virtual |
Implements QPatternist::TokenSource.
Definition at line 2182 of file qxquerytokenizer.cpp.
|
private |
Definition at line 745 of file qxquerytokenizer.cpp.
Referenced by atEnd(), and nextToken().
|
staticprivate |
Returns input
, normalized according to XQuery 1.0: An XML Query Language, A.2.3 End-of-Line Handling
Definition at line 146 of file qxquerytokenizer.cpp.
Referenced by nextToken(), and tokenizeStringLiteral().
|
inlineprivate |
length
characters from the current position. Definition at line 317 of file qxquerytokenizer.cpp.
Referenced by attributeAsRaw(), consumeComment(), consumeRawWhitespace(), consumeWhitespace(), nextToken(), peekForColonColon(), and tokenizeNCNameOrQName().
|
inlineprivate |
Returns the character at the current position, converted to ASCII
.
Equivalent to calling:
Definition at line 84 of file qxquerytokenizer.cpp.
Referenced by attributeAsRaw(), consumeComment(), consumeRawWhitespace(), consumeWhitespace(), nextToken(), tokenizeCharacterReference(), and tokenizeNCNameOrQName().
|
private |
hadWhitespace
is always set to a proper value.
Definition at line 89 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Definition at line 737 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Definition at line 727 of file qxquerytokenizer.cpp.
|
inlineprivate |
Same as calling:
Definition at line 732 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
virtual |
Resumes regular parsing from position
. The tokenizer must be in the scan-only state, which the commenceScanOnly() call transists to.
The tokenizer will return the token POSITION_SET once after this function has been called.
Implements QPatternist::Tokenizer.
Definition at line 2235 of file qxquerytokenizer.cpp.
|
private |
Advances m_pos until content is encountered.
Returned is the length stretching from m_pos when starting, until content
is encountered. content
is not included in the length.
Definition at line 593 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
virtual |
Does nothing.
Implements QPatternist::Tokenizer.
Definition at line 2241 of file qxquerytokenizer.cpp.
|
inlineprivate |
Definition at line 722 of file qxquerytokenizer.cpp.
Referenced by attributeAsRaw(), nextToken(), tokenAndChangeState(), and tokenizeNumberLiteral().
|
inlineprivate |
|
inlineprivate |
|
inlineprivate |
Definition at line 120 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Definition at line 130 of file qxquerytokenizer.cpp.
|
private |
Definition at line 532 of file qxquerytokenizer.cpp.
Referenced by attributeAsRaw(), nextToken(), and tokenizeStringLiteral().
|
inlineprivate |
Definition at line 673 of file qxquerytokenizer.cpp.
Referenced by nextToken(), and tokenizeNCNameOrQName().
|
inlineprivate |
Definition at line 469 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Definition at line 489 of file qxquerytokenizer.cpp.
Referenced by nextToken().
|
inlineprivate |
Definition at line 623 of file qxquerytokenizer.cpp.
Referenced by nextToken().
Definition at line 321 of file qxquerytokenizer_p.h.
Referenced by charForReference().
|
private |
The offset into m_length for where the current column starts. So m_length - m_columnOffset is the current column.
The line number and column number both starts at 1.
Definition at line 317 of file qxquerytokenizer_p.h.
Referenced by consumeComment(), consumeRawWhitespace(), consumeWhitespace(), and nextToken().
|
private |
Definition at line 297 of file qxquerytokenizer_p.h.
Referenced by aheadEquals(), current(), nextToken(), peekAhead(), peekForColonColon(), scanUntil(), tokenizeCharacterReference(), tokenizeNCName(), tokenizeNCNameOrQName(), and tokenizeNumberLiteral().
|
private |
Definition at line 298 of file qxquerytokenizer_p.h.
Referenced by aheadEquals(), atEnd(), consumeComment(), consumeRawWhitespace(), consumeWhitespace(), current(), nextToken(), peekAhead(), peekForColonColon(), tokenizeNCName(), tokenizeNumberLiteral(), and tokenizeStringLiteral().
|
private |
The current line number.
The line number and column number both starts at 1.
Definition at line 308 of file qxquerytokenizer_p.h.
Referenced by consumeComment(), consumeRawWhitespace(), consumeWhitespace(), and nextToken().
|
private |
Definition at line 319 of file qxquerytokenizer_p.h.
|
private |
Definition at line 301 of file qxquerytokenizer_p.h.
Referenced by aheadEquals(), atEnd(), attributeAsRaw(), commenceScanOnly(), consumeComment(), consumeRawWhitespace(), consumeWhitespace(), current(), nextToken(), peekAhead(), peekForColonColon(), resumeTokenizationFrom(), scanUntil(), tokenAndAdvance(), tokenAndChangeState(), tokenizeCharacterReference(), tokenizeNCName(), tokenizeNCNameOrQName(), tokenizeNumberLiteral(), and tokenizeStringLiteral().
|
private |
Definition at line 322 of file qxquerytokenizer_p.h.
Referenced by commenceScanOnly(), nextToken(), and resumeTokenizationFrom().
|
private |
Definition at line 299 of file qxquerytokenizer_p.h.
Referenced by popState(), pushState(), setState(), and state().
Definition at line 300 of file qxquerytokenizer_p.h.
Referenced by popState(), and pushState().
Definition at line 320 of file qxquerytokenizer_p.h.
Referenced by nextToken().