Class Parser


  • public class Parser
    extends java.lang.Object
    This is the main parser class. This class has an entry point for both parsing XML and HTML. The way the parser works is to accept both an input stream (which the caller is responsible for closing) as well as a CharacterReceiver that will do the actual parsing. This class is responsible mainly for setup and character set detection,
    • Constructor Summary

      Constructors 
      Constructor Description
      Parser()
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void parseWithCharsetDetection​(java.lang.String startingCharset, java.io.InputStream inputStream, CharacterReceiver characterReceiver)
      Parse an input stream with character set detection.
      void parseWithoutCharsetDetection​(java.lang.String startingCharset, java.io.InputStream inputStream, CharacterReceiver characterReceiver)
      Parse an input stream without character set detection.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Parser

        public Parser()
        Constructor. Someday there will be a constructor which accepts character detection configuration information, but for now there is none.
    • Method Detail

      • parseWithCharsetDetection

        public void parseWithCharsetDetection​(java.lang.String startingCharset,
                                              java.io.InputStream inputStream,
                                              CharacterReceiver characterReceiver)
                                       throws java.io.IOException,
                                              ManifoldCFException
        Parse an input stream with character set detection. This method uses BOM (byte order mark) and the xml encoding tag to determine the character encoding to use. The caller may pass in a starting character encoding, which functions as the default if no better determination is made.
        Parameters:
        startingCharset - is the starting character set. Pass null if this is unknown.
        inputStream - is the input stream. It is the caller's responsibility to close the stream when the parse is done.
        characterReceiver - is the character receiver that will actually do the parsing.
        Throws:
        java.io.IOException
        ManifoldCFException
      • parseWithoutCharsetDetection

        public void parseWithoutCharsetDetection​(java.lang.String startingCharset,
                                                 java.io.InputStream inputStream,
                                                 CharacterReceiver characterReceiver)
                                          throws java.io.IOException,
                                                 ManifoldCFException
        Parse an input stream without character set detection.
        Parameters:
        startingCharset - is the starting character set. If null is passed, the code will presume utf-8.
        inputStream - is the input stream. It is the caller's responsibility to close the stream when the parse is done.
        characterReceiver - is the character receiver that will actually do the parsing.
        Throws:
        java.io.IOException
        ManifoldCFException