Class XMLEncodingDetector
- java.lang.Object
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.CharacterReceiver
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.XMLParseState
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.XMLEncodingDetector
-
- All Implemented Interfaces:
EncodingDetector
public class XMLEncodingDetector extends XMLParseState implements EncodingDetector
This is the XML encoding detector. It is basically looking for the preamble's <?xml ... ?> tag, which it parses looking for the "encoding" attribute. It stops either when it is beyond any possibility of finding the preamble, or it finds the tag, whichever comes first.
-
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.String
encoding
-
Fields inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
accumBuffer, ampBuffer, bTagDepth, currentAttrList, currentAttrName, currentAttrNameBuffer, currentState, currentTagName, currentTagNameBuffer, currentValueBuffer, inAmpersand, mapLookup, TAGPARSESTATE_IN_ATTR_LOOKING_FOR_VALUE, TAGPARSESTATE_IN_ATTR_NAME, TAGPARSESTATE_IN_ATTR_VALUE, TAGPARSESTATE_IN_BANG_TOKEN, TAGPARSESTATE_IN_BRACKET_TOKEN, TAGPARSESTATE_IN_CDATA_BODY, TAGPARSESTATE_IN_COMMENT, TAGPARSESTATE_IN_DOUBLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_END_TAG_NAME, TAGPARSESTATE_IN_QTAG_ATTR_LOOKING_FOR_VALUE, TAGPARSESTATE_IN_QTAG_ATTR_NAME, TAGPARSESTATE_IN_QTAG_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_DOUBLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_NAME, TAGPARSESTATE_IN_QTAG_SAW_QUESTION, TAGPARSESTATE_IN_QTAG_SINGLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_UNQUOTED_ATTR_VALUE, TAGPARSESTATE_IN_SINGLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_TAG_NAME, TAGPARSESTATE_IN_TAG_SAW_SLASH, TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE, TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE_SAW_SLASH, TAGPARSESTATE_NEED_FINAL_BRACKET, TAGPARSESTATE_NORMAL, TAGPARSESTATE_SAWCOMMENTDASH, TAGPARSESTATE_SAWDASH, TAGPARSESTATE_SAWEXCLAMATION, TAGPARSESTATE_SAWLEFTANGLE, TAGPARSESTATE_SAWRIGHTBRACKET, TAGPARSESTATE_SAWSECONDCOMMENTDASH, TAGPARSESTATE_SAWSECONDRIGHTBRACKET
-
Fields inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
charBuffer
-
-
Constructor Summary
Constructors Constructor Description XMLEncodingDetector()
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
getEncoding()
Retrieve final encoding determination.protected boolean
noteBTag(java.lang.String tagName)
This method is called for every <! <token> ...protected boolean
noteBTagToken(java.lang.String token)
This method gets called for every token inside a btag.protected boolean
noteEndBTag()
This method is called for the end of every btag, or any time there's a naked '>' in the document.protected boolean
noteEndEscaped()
Called for the end of every cdata-like tag.protected boolean
noteEndTag(java.lang.String tagName)
This method gets called for every end tag.protected boolean
noteEscaped(java.lang.String token)
Called for the start of every cdata-like tag, e.g.protected boolean
noteEscapedCharacter(char thisChar)
This method gets called for every character that is found within an escape block, e.g.protected boolean
noteNormalCharacter(char thisChar)
This method gets called for every character that is not part of a tag etc.protected boolean
noteQTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes)
Map version of noteQTag method.protected boolean
noteTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes)
Map version of the noteTag method.void
setEncoding(java.lang.String encoding)
Set initial encoding.-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.XMLParseState
noteQTag, noteTag
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
acceptNewTag, attributeDecode, dealWithCharacter, dumpValues, isPunctuation, isWhitespace, mapChunk, newBuffer, outputAmpBuffer
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
dealWithCharacters, dealWithRemainder
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.CharacterReceiver
finishUp
-
-
-
-
Method Detail
-
setEncoding
public void setEncoding(java.lang.String encoding)
Set initial encoding.- Specified by:
setEncoding
in interfaceEncodingDetector
-
getEncoding
public java.lang.String getEncoding()
Retrieve final encoding determination.- Specified by:
getEncoding
in interfaceEncodingDetector
-
noteTag
protected boolean noteTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes) throws ManifoldCFException
Map version of the noteTag method.- Overrides:
noteTag
in classXMLParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndTag
protected boolean noteEndTag(java.lang.String tagName) throws ManifoldCFException
This method gets called for every end tag. Override this method to intercept tag ends.- Overrides:
noteEndTag
in classTagParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteQTag
protected boolean noteQTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes) throws ManifoldCFException
Map version of noteQTag method.- Overrides:
noteQTag
in classXMLParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTag
protected boolean noteBTag(java.lang.String tagName) throws ManifoldCFException
This method is called for every <! <token> ... > construct, or 'btag'. Override it to intercept these.- Overrides:
noteBTag
in classTagParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndBTag
protected boolean noteEndBTag() throws ManifoldCFException
This method is called for the end of every btag, or any time there's a naked '>' in the document. Override it if you want to intercept these.- Overrides:
noteEndBTag
in classTagParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEscaped
protected boolean noteEscaped(java.lang.String token) throws ManifoldCFException
Called for the start of every cdata-like tag, e.g. <![ <token> [ ... ]]>- Overrides:
noteEscaped
in classTagParseState
- Parameters:
token
- may be empty!!!- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndEscaped
protected boolean noteEndEscaped() throws ManifoldCFException
Called for the end of every cdata-like tag.- Overrides:
noteEndEscaped
in classTagParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTagToken
protected boolean noteBTagToken(java.lang.String token) throws ManifoldCFException
This method gets called for every token inside a btag.- Overrides:
noteBTagToken
in classTagParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteNormalCharacter
protected boolean noteNormalCharacter(char thisChar) throws ManifoldCFException
This method gets called for every character that is not part of a tag etc. Override this method to intercept such characters.- Overrides:
noteNormalCharacter
in classTagParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEscapedCharacter
protected boolean noteEscapedCharacter(char thisChar) throws ManifoldCFException
This method gets called for every character that is found within an escape block, e.g. CDATA. Override this method to intercept such characters.- Overrides:
noteEscapedCharacter
in classTagParseState
- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
-