Class XMLFuzzyParseState
- java.lang.Object
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.CharacterReceiver
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.XMLFuzzyParseState
-
- Direct Known Subclasses:
XMLFuzzyHierarchicalParseState
public class XMLFuzzyParseState extends TagParseState
Class to keep track of XML hierarchy in the face of possibly corrupt XML and with case-insensitive tags, etc. Basically, this class accepts what is supposedly XML but allows for various kinds of handwritten corruption. Specific kinds of errors allowed include: - Bad character encoding - Tag case match problems; all attributes are (optionally) bashed to lower case - Other parsing recoveries to be added as they arise The functionality of this class is also somewhat lessened vs. standard SAX-type parsers. No namespace interpretation is done, for instance; tag qnames are split into namespace name and local name, and that's all folks. But if you need more power, you can write a class extension that will do that readily.
-
-
Field Summary
Fields Modifier and Type Field Description protected booleanlowerCaseAttributesprotected booleanlowerCaseBTagsprotected booleanlowerCaseEscapeTagsprotected booleanlowerCaseQAttributesprotected booleanlowerCaseQTagsprotected booleanlowerCaseTags-
Fields inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
accumBuffer, ampBuffer, bTagDepth, currentAttrList, currentAttrName, currentAttrNameBuffer, currentState, currentTagName, currentTagNameBuffer, currentValueBuffer, inAmpersand, mapLookup, TAGPARSESTATE_IN_ATTR_LOOKING_FOR_VALUE, TAGPARSESTATE_IN_ATTR_NAME, TAGPARSESTATE_IN_ATTR_VALUE, TAGPARSESTATE_IN_BANG_TOKEN, TAGPARSESTATE_IN_BRACKET_TOKEN, TAGPARSESTATE_IN_CDATA_BODY, TAGPARSESTATE_IN_COMMENT, TAGPARSESTATE_IN_DOUBLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_END_TAG_NAME, TAGPARSESTATE_IN_QTAG_ATTR_LOOKING_FOR_VALUE, TAGPARSESTATE_IN_QTAG_ATTR_NAME, TAGPARSESTATE_IN_QTAG_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_DOUBLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_NAME, TAGPARSESTATE_IN_QTAG_SAW_QUESTION, TAGPARSESTATE_IN_QTAG_SINGLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_QTAG_UNQUOTED_ATTR_VALUE, TAGPARSESTATE_IN_SINGLE_QUOTES_ATTR_VALUE, TAGPARSESTATE_IN_TAG_NAME, TAGPARSESTATE_IN_TAG_SAW_SLASH, TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE, TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE_SAW_SLASH, TAGPARSESTATE_NEED_FINAL_BRACKET, TAGPARSESTATE_NORMAL, TAGPARSESTATE_SAWCOMMENTDASH, TAGPARSESTATE_SAWDASH, TAGPARSESTATE_SAWEXCLAMATION, TAGPARSESTATE_SAWLEFTANGLE, TAGPARSESTATE_SAWRIGHTBRACKET, TAGPARSESTATE_SAWSECONDCOMMENTDASH, TAGPARSESTATE_SAWSECONDRIGHTBRACKET
-
Fields inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
charBuffer
-
-
Constructor Summary
Constructors Constructor Description XMLFuzzyParseState(boolean lowerCaseAttributes, boolean lowerCaseTags, boolean lowerCaseQAttributes, boolean lowerCaseQTags, boolean lowerCaseBTags, boolean lowerCaseEscapeTags)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected booleannoteBTag(java.lang.String tagName)This method is called for every <! <token> ...protected booleannoteBTagEx(java.lang.String tagName)New version of the noteBTag method.protected booleannoteBTagToken(java.lang.String token)This method gets called for every token inside a btag.protected booleannoteBTagTokenEx(java.lang.String token)New version of the noteBTagToken method.protected booleannoteEndTag(java.lang.String tagName)This method gets called for every end tag.protected booleannoteEndTagEx(java.lang.String tagName, java.lang.String nameSpace, java.lang.String localName)Note end tag.protected booleannoteEscaped(java.lang.String token)Called for the start of every cdata-like tag, e.g.protected booleannoteEscapedEx(java.lang.String token)New version of the noteEscapedTag method.protected booleannoteQTag(java.lang.String tagName, java.util.List<AttrNameValue> attributes)This method is called for every <? ...protected booleannoteQTagEx(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes)Map version of the noteQTag method.protected booleannoteTag(java.lang.String tagName, java.util.List<AttrNameValue> attributes)This method gets called for every tag.protected booleannoteTagEx(java.lang.String tagName, java.lang.String nameSpace, java.lang.String localName, java.util.Map<java.lang.String,java.lang.String> attributes)Map version of the noteTag method.-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState
acceptNewTag, attributeDecode, dealWithCharacter, dumpValues, isPunctuation, isWhitespace, mapChunk, newBuffer, noteEndBTag, noteEndEscaped, noteEscapedCharacter, noteNormalCharacter, outputAmpBuffer
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
dealWithCharacters, dealWithRemainder
-
Methods inherited from class org.apache.manifoldcf.connectorcommon.fuzzyml.CharacterReceiver
finishUp
-
-
-
-
Field Detail
-
lowerCaseAttributes
protected final boolean lowerCaseAttributes
-
lowerCaseTags
protected final boolean lowerCaseTags
-
lowerCaseQAttributes
protected final boolean lowerCaseQAttributes
-
lowerCaseQTags
protected final boolean lowerCaseQTags
-
lowerCaseBTags
protected final boolean lowerCaseBTags
-
lowerCaseEscapeTags
protected final boolean lowerCaseEscapeTags
-
-
Method Detail
-
noteTag
protected final boolean noteTag(java.lang.String tagName, java.util.List<AttrNameValue> attributes) throws ManifoldCFExceptionThis method gets called for every tag. Override this method to intercept tag begins.- Overrides:
noteTagin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteTagEx
protected boolean noteTagEx(java.lang.String tagName, java.lang.String nameSpace, java.lang.String localName, java.util.Map<java.lang.String,java.lang.String> attributes) throws ManifoldCFExceptionMap version of the noteTag method.- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndTag
protected final boolean noteEndTag(java.lang.String tagName) throws ManifoldCFExceptionThis method gets called for every end tag. Override this method to intercept tag ends.- Overrides:
noteEndTagin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEndTagEx
protected boolean noteEndTagEx(java.lang.String tagName, java.lang.String nameSpace, java.lang.String localName) throws ManifoldCFExceptionNote end tag.- Throws:
ManifoldCFException
-
noteQTag
protected final boolean noteQTag(java.lang.String tagName, java.util.List<AttrNameValue> attributes) throws ManifoldCFExceptionThis method is called for every <? ... ?> construct, or 'qtag'. This is not useful for HTML.- Overrides:
noteQTagin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteQTagEx
protected boolean noteQTagEx(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributes) throws ManifoldCFExceptionMap version of the noteQTag method.- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTag
protected final boolean noteBTag(java.lang.String tagName) throws ManifoldCFExceptionThis method is called for every <! <token> ... > construct, or 'btag'. Override it to intercept these.- Overrides:
noteBTagin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTagEx
protected boolean noteBTagEx(java.lang.String tagName) throws ManifoldCFExceptionNew version of the noteBTag method.- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEscaped
protected final boolean noteEscaped(java.lang.String token) throws ManifoldCFExceptionCalled for the start of every cdata-like tag, e.g. <![ <token> [ ... ]]>- Overrides:
noteEscapedin classTagParseState- Parameters:
token- may be empty!!!- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteEscapedEx
protected boolean noteEscapedEx(java.lang.String token) throws ManifoldCFExceptionNew version of the noteEscapedTag method.- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTagToken
protected final boolean noteBTagToken(java.lang.String token) throws ManifoldCFExceptionThis method gets called for every token inside a btag.- Overrides:
noteBTagTokenin classTagParseState- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
noteBTagTokenEx
protected boolean noteBTagTokenEx(java.lang.String token) throws ManifoldCFExceptionNew version of the noteBTagToken method.- Returns:
- true to halt further processing.
- Throws:
ManifoldCFException
-
-