Class XMLFuzzyParseState

  • Direct Known Subclasses:
    XMLFuzzyHierarchicalParseState

    public class XMLFuzzyParseState
    extends TagParseState
    Class to keep track of XML hierarchy in the face of possibly corrupt XML and with case-insensitive tags, etc. Basically, this class accepts what is supposedly XML but allows for various kinds of handwritten corruption. Specific kinds of errors allowed include: - Bad character encoding - Tag case match problems; all attributes are (optionally) bashed to lower case - Other parsing recoveries to be added as they arise The functionality of this class is also somewhat lessened vs. standard SAX-type parsers. No namespace interpretation is done, for instance; tag qnames are split into namespace name and local name, and that's all folks. But if you need more power, you can write a class extension that will do that readily.
    • Field Detail

      • lowerCaseAttributes

        protected final boolean lowerCaseAttributes
      • lowerCaseTags

        protected final boolean lowerCaseTags
      • lowerCaseQAttributes

        protected final boolean lowerCaseQAttributes
      • lowerCaseQTags

        protected final boolean lowerCaseQTags
      • lowerCaseBTags

        protected final boolean lowerCaseBTags
      • lowerCaseEscapeTags

        protected final boolean lowerCaseEscapeTags
    • Constructor Detail

      • XMLFuzzyParseState

        public XMLFuzzyParseState​(boolean lowerCaseAttributes,
                                  boolean lowerCaseTags,
                                  boolean lowerCaseQAttributes,
                                  boolean lowerCaseQTags,
                                  boolean lowerCaseBTags,
                                  boolean lowerCaseEscapeTags)
        Constructor.
    • Method Detail

      • noteTagEx

        protected boolean noteTagEx​(java.lang.String tagName,
                                    java.lang.String nameSpace,
                                    java.lang.String localName,
                                    java.util.Map<java.lang.String,​java.lang.String> attributes)
                             throws ManifoldCFException
        Map version of the noteTag method.
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException
      • noteEndTag

        protected final boolean noteEndTag​(java.lang.String tagName)
                                    throws ManifoldCFException
        This method gets called for every end tag. Override this method to intercept tag ends.
        Overrides:
        noteEndTag in class TagParseState
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException
      • noteEndTagEx

        protected boolean noteEndTagEx​(java.lang.String tagName,
                                       java.lang.String nameSpace,
                                       java.lang.String localName)
                                throws ManifoldCFException
        Note end tag.
        Throws:
        ManifoldCFException
      • noteQTagEx

        protected boolean noteQTagEx​(java.lang.String tagName,
                                     java.util.Map<java.lang.String,​java.lang.String> attributes)
                              throws ManifoldCFException
        Map version of the noteQTag method.
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException
      • noteBTag

        protected final boolean noteBTag​(java.lang.String tagName)
                                  throws ManifoldCFException
        This method is called for every <! <token> ... > construct, or 'btag'. Override it to intercept these.
        Overrides:
        noteBTag in class TagParseState
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException
      • noteBTagEx

        protected boolean noteBTagEx​(java.lang.String tagName)
                              throws ManifoldCFException
        New version of the noteBTag method.
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException
      • noteEscaped

        protected final boolean noteEscaped​(java.lang.String token)
                                     throws ManifoldCFException
        Called for the start of every cdata-like tag, e.g. <![ <token> [ ... ]]>
        Overrides:
        noteEscaped in class TagParseState
        Parameters:
        token - may be empty!!!
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException
      • noteEscapedEx

        protected boolean noteEscapedEx​(java.lang.String token)
                                 throws ManifoldCFException
        New version of the noteEscapedTag method.
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException
      • noteBTagTokenEx

        protected boolean noteBTagTokenEx​(java.lang.String token)
                                   throws ManifoldCFException
        New version of the noteBTagToken method.
        Returns:
        true to halt further processing.
        Throws:
        ManifoldCFException