Class SimpleXMLParser
java.lang.Object
com.itextpdf.text.xml.simpleparser.SimpleXMLParser
A simple XML. This parser is, like the SAX parser,
an event based parser, but with much less functionality.
The parser can:
- It recognizes the encoding used
- It recognizes all the elements' start tags and end tags
- It lists attributes, where attribute values can be enclosed in single or double quotes
- It recognizes the
<[CDATA[ ... ]]>
construct - It recognizes the standard entities: &, <, >, ", and ', as well as numeric entities
- It maps lines ending in
\r\n
and\r
to\n
on input, in accordance with the XML Specification, Section 2.11
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
private static final int
private static final int
private String
the attribute key.current attributesprivate String
the attribute value.private static final int
private int
The current character.private int
the column where the current character occursprivate final SimpleXMLDocHandlerComment
The handler to which we are going to forward comments.private static final int
private final SimpleXMLDocHandler
The handler to which we are going to forward document contentprivate final StringBuffer
current entity (whatever is encountered between invalid input: '&' and ;)private static final int
private boolean
was the last character equivalent to a newline?private static final int
private final boolean
Are we parsing HTML?private static final int
private int
the line we are currently readingprivate int
Keeps track of the number of tags that are open.private NewLineHandler
private boolean
A boolean indicating if the next character should be taken into account if it's a space character.private static final int
private int
The previous character.private static final int
private int
the quote character that was used to open the quote.private static final int
the state stackprivate int
the current stateprivate String
current tagnameprivate static final int
private static final int
private final StringBuffer
current text (whatever is encountered between tags)private static final int
private static final int
possible states -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
SimpleXMLParser
(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, boolean html) Creates a Simple XML parser object. -
Method Summary
Modifier and TypeMethodDescriptionprivate void
doTag()
Sets the name of the tag.static String
Deprecated.private void
flush()
Flushes the text that is currently in the buffer.private static String
getDeclaredEncoding
(String decl) private void
Does the actual parsing.private void
initTag()
Initialized the tag name and attributes.static void
parse
(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html) Parses the XML document firing the events to the handler.static void
parse
(SimpleXMLDocHandler doc, InputStream in) Parses the XML document firing the events to the handler.static void
parse
(SimpleXMLDocHandler doc, Reader r) private void
processTag
(boolean start) processes the tag.private int
Gets a state from the stackprivate void
saveState
(int s) Adds a state to the stack.private void
Throws an exception
-
Field Details
-
UNKNOWN
private static final int UNKNOWNpossible states- See Also:
-
TEXT
private static final int TEXT- See Also:
-
TAG_ENCOUNTERED
private static final int TAG_ENCOUNTERED- See Also:
-
EXAMIN_TAG
private static final int EXAMIN_TAG- See Also:
-
TAG_EXAMINED
private static final int TAG_EXAMINED- See Also:
-
IN_CLOSETAG
private static final int IN_CLOSETAG- See Also:
-
SINGLE_TAG
private static final int SINGLE_TAG- See Also:
-
CDATA
private static final int CDATA- See Also:
-
COMMENT
private static final int COMMENT- See Also:
-
PI
private static final int PI- See Also:
-
ENTITY
private static final int ENTITY- See Also:
-
QUOTE
private static final int QUOTE- See Also:
-
ATTRIBUTE_KEY
private static final int ATTRIBUTE_KEY- See Also:
-
ATTRIBUTE_EQUAL
private static final int ATTRIBUTE_EQUAL- See Also:
-
ATTRIBUTE_VALUE
private static final int ATTRIBUTE_VALUE- See Also:
-
stack
the state stack -
character
private int characterThe current character. -
previousCharacter
private int previousCharacterThe previous character. -
lines
private int linesthe line we are currently reading -
columns
private int columnsthe column where the current character occurs -
eol
private boolean eolwas the last character equivalent to a newline? -
nowhite
private boolean nowhiteA boolean indicating if the next character should be taken into account if it's a space character. When nospace is false, the previous character wasn't whitespace.- Since:
- 2.1.5
-
state
private int statethe current state -
html
private final boolean htmlAre we parsing HTML? -
text
current text (whatever is encountered between tags) -
entity
current entity (whatever is encountered between invalid input: '&' and ;) -
tag
current tagname -
attributes
current attributes -
doc
The handler to which we are going to forward document content -
comment
The handler to which we are going to forward comments. -
nested
private int nestedKeeps track of the number of tags that are open. -
quoteCharacter
private int quoteCharacterthe quote character that was used to open the quote. -
attributekey
the attribute key. -
attributevalue
the attribute value. -
newLineHandler
-
-
Constructor Details
-
SimpleXMLParser
Creates a Simple XML parser object. Call go(BufferedReader) immediately after creation.
-
-
Method Details
-
go
Does the actual parsing. Perform this immediately after creating the parser object.- Throws:
IOException
-
restoreState
private int restoreState()Gets a state from the stack- Returns:
- the previous state
-
saveState
private void saveState(int s) Adds a state to the stack.- Parameters:
s
- a state to add to the stack
-
flush
private void flush()Flushes the text that is currently in the buffer. The text can be ignored, added to the document as content or as comment,... depending on the current state. -
initTag
private void initTag()Initialized the tag name and attributes. -
doTag
private void doTag()Sets the name of the tag. -
processTag
private void processTag(boolean start) processes the tag.- Parameters:
start
- if true we are dealing with a tag that has just been opened; if false we are closing a tag.
-
throwException
Throws an exception- Throws:
IOException
-
parse
public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html) throws IOException Parses the XML document firing the events to the handler.- Parameters:
doc
- the document handlercomment
- the comment handlerr
- the document. The encoding is already resolved. The reader is not closedhtml
-- Throws:
IOException
- on error
-
parse
Parses the XML document firing the events to the handler.- Parameters:
doc
- the document handlerin
- the document. The encoding is deduced from the stream. The stream is not closed- Throws:
IOException
- on error
-
getDeclaredEncoding
-
parse
- Parameters:
doc
-r
-- Throws:
IOException
-
escapeXML
Deprecated.moved toXMLUtil.escapeXML(String, boolean)
, left here for the sake of backwards compatibilityEscapes a string with the appropriated XML codes.- Parameters:
s
- the string to be escapedonlyASCII
- codes above 127 will always be escaped with &#nn; iftrue
- Returns:
- the escaped string
-
XMLUtil.escapeXML(String, boolean)
, left here for the sake of backwards compatibility