Class XMLEncodingDetector
- java.lang.Object
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.CharacterReceiver
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.XMLParseState
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.XMLEncodingDetector
-
- All Implemented Interfaces:
EncodingDetector
public class XMLEncodingDetector extends XMLParseState implements EncodingDetector
This is the XML encoding detector. It is basically looking for the preamble's <?xml ... ?> tag, which it parses looking for the "encoding" attribute. It stops either when it is beyond any possibility of finding the preamble, or it finds the tag, whichever comes first.
-
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.Stringencoding-
Fields inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
accumBuffer, ampBuffer, bTagDepth, currentAttrList, currentAttrName, currentAttrNameBuffer, currentState, currentTagName, currentTagNameBuffer, currentValueBuffer, inAmpersand, mapLookup, TAGPARSESTATE_IN_ATTR_LOOKING_FOR_VALUE, TAGPARSESTATE_IN_ATTR_NAME, TAGPARSESTATE_IN_ATTR_VALUE, TAGPARSESTATE_IN_BANG_TOKEN, TAGPARSESTATE_IN_BRACKET_TOKEN, TAGPARSESTATE_IN_CDATA_BODY, TAGPARSESTATE_IN_COMMENT, TAGPARSESTATE_IN_DOUBLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_END_TAG_NAME, TAGPARSESTATE_IN_QTAG_ATTR_LOOKING_FOR_VALUE, TAGPARSESTATE_IN_QTAG_ATTR_NAME, TAGPARSESTATE_IN_QTAG_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_DOUBLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_NAME, TAGPARSESTATE_IN_QTAG_SAW_QUESTION, TAGPARSESTATE_IN_QTAG_SINGLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_UNQUOTED_ATTR_VALUE, TAGPARSESTATE_IN_SINGLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_TAG_NAME, TAGPARSESTATE_IN_TAG_SAW_SLASH, TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE, TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE_SAW_SLASH, TAGPARSESTATE_NEED_FINAL_BRACKET, TAGPARSESTATE_NORMAL, TAGPARSESTATE_SAWCOMMENTDASH, TAGPARSESTATE_SAWDASH, TAGPARSESTATE_SAWEXCLAMATION, TAGPARSESTATE_SAWLEFTANGLE, TAGPARSESTATE_SAWRIGHTBRACKET, TAGPARSESTATE_SAWSECONDCOMMENTDASH, TAGPARSESTATE_SAWSECONDRIGHTBRACKET
-
Fields inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
charBuffer
-
-
Constructor Summary
Constructors Constructor Description XMLEncodingDetector()Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringgetEncoding()Retrieve final encoding determination.protected booleannoteBTag(java.lang.String tagName)This method is called for every <! <token> ...protected booleannoteBTagToken(java.lang.String token)This method gets called for every token inside a btag.protected booleannoteEndBTag()This method is called for the end of every btag, or any time there's a naked '>' in the document.protected booleannoteEndEscaped()Called for the end of every cdata-like tag.protected booleannoteEndTag(java.lang.String tagName)This method gets called for every end tag.protected booleannoteEscaped(java.lang.String token)Called for the start of every cdata-like tag, e.g.protected booleannoteEscapedCharacter(char thisChar)This method gets called for every character that is found within an escape block, e.g.protected booleannoteNormalCharacter(char thisChar)This method gets called for every character that is not part of a tag etc.protected booleannoteQTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes)Map version of noteQTag method.protected booleannoteTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes)Map version of the noteTag method.voidsetEncoding(java.lang.String encoding)Set initial encoding.-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.XMLParseState
noteQTag, noteTag
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
acceptNewTag, attributeDecode, dealWithCharacter, dumpValues, isPunctuation, isWhitespace, mapChunk, newBuffer, outputAmpBuffer
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
dealWithCharacters, dealWithRemainder
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.CharacterReceiver
finishUp
-
-
-
-
Method Detail
-
setEncoding
public void setEncoding(java.lang.String encoding)
Set initial encoding.- Specified by:
setEncodingin interfaceEncodingDetector
-
getEncoding
public java.lang.String getEncoding()
Retrieve final encoding determination.- Specified by:
getEncodingin interfaceEncodingDetector
-
noteTag
protected boolean noteTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes) throws ManifoldCFExceptionMap version of the noteTag method.- Overrides:
noteTagin classXMLParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndTag
protected boolean noteEndTag(java.lang.String tagName) throws ManifoldCFExceptionThis method gets called for every end tag. Override this method to intercept tag ends.- Overrides:
noteEndTagin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteQTag
protected boolean noteQTag(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes) throws ManifoldCFExceptionMap version of noteQTag method.- Overrides:
noteQTagin classXMLParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTag
protected boolean noteBTag(java.lang.String tagName) throws ManifoldCFExceptionThis method is called for every <! <token> ... > construct, or 'btag'. Override it to intercept these.- Overrides:
noteBTagin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndBTag
protected boolean noteEndBTag() throws ManifoldCFExceptionThis method is called for the end of every btag, or any time there's a naked '>' in the document. Override it if you want to intercept these.- Overrides:
noteEndBTagin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEscaped
protected boolean noteEscaped(java.lang.String token) throws ManifoldCFExceptionCalled for the start of every cdata-like tag, e.g. <![ <token> [ ... ]]>- Overrides:
noteEscapedin classTagParseState- Parameters:
token- may be empty!!!- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndEscaped
protected boolean noteEndEscaped() throws ManifoldCFExceptionCalled for the end of every cdata-like tag.- Overrides:
noteEndEscapedin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTagToken
protected boolean noteBTagToken(java.lang.String token) throws ManifoldCFExceptionThis method gets called for every token inside a btag.- Overrides:
noteBTagTokenin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteNormalCharacter
protected boolean noteNormalCharacter(char thisChar) throws ManifoldCFExceptionThis method gets called for every character that is not part of a tag etc. Override this method to intercept such characters.- Overrides:
noteNormalCharacterin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEscapedCharacter
protected boolean noteEscapedCharacter(char thisChar) throws ManifoldCFExceptionThis method gets called for every character that is found within an escape block, e.g. CDATA. Override this method to intercept such characters.- Overrides:
noteEscapedCharacterin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
-