Class XMLFuzzyHierarchicalParseState


  • public class XMLFuzzyHierarchicalParseState
    extends XMLFuzzyParseState
    Class to keep track of XML hierarchy in the face of possibly corrupt XML and with case-insensitive tags, etc. Basically, this class accepts what is supposedly XML but allows for various kinds of handwritten corruption. Specific kinds of errors allowed include: - Bad character encoding - Tag case match problems; all attributes are (optionally) bashed to lower case, and tag names are checked to match when all lower case, if case-sensitive didn't work - End tag matching problems, where someone lost an end tag somehow - Other parsing recoveries to be added as they arise The functionality of this class is also somewhat lessened vs. standard SAX-type parsers. No namespace interpretation is done, for instance; tag qnames are split into namespace name and local name, and that's all folks. But if you need more power, you can write a class extension that will do that readily.
    • Field Detail

      • characterBuffer

        protected java.lang.StringBuilder characterBuffer
        The current value buffer
      • captureEscaped

        protected boolean captureEscaped
        Whether we're capturing escaped characters
      • MAX_CHUNK_SIZE

        protected static final int MAX_CHUNK_SIZE
        This is the maximum size of a chunk of characters getting sent to the characters() method.
        See Also:
        Constant Field Values
    • Constructor Detail

      • XMLFuzzyHierarchicalParseState

        public XMLFuzzyHierarchicalParseState()
        Constructor with default properties.
      • XMLFuzzyHierarchicalParseState

        public XMLFuzzyHierarchicalParseState​(boolean lowerCaseAttributes,
                                              boolean lowerCaseTags,
                                              boolean lowerCaseQAttributes,
                                              boolean lowerCaseQTags,
                                              boolean lowerCaseBTags,
                                              boolean lowerCaseEscapeTags)
        Constructor.