Class Parser
- java.lang.Object
-
- org.apache.manifoldcf.connectorcommon.fuzzyml.Parser
-
public class Parser extends java.lang.Object
This is the main parser class. This class has an entry point for both parsing XML and HTML. The way the parser works is to accept both an input stream (which the caller is responsible for closing) as well as a CharacterReceiver that will do the actual parsing. This class is responsible mainly for setup and character set detection,
-
-
Constructor Summary
Constructors Constructor Description Parser()
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
parseWithCharsetDetection(java.lang.String startingCharset, java.io.InputStream inputStream, CharacterReceiver characterReceiver)
Parse an input stream with character set detection.void
parseWithoutCharsetDetection(java.lang.String startingCharset, java.io.InputStream inputStream, CharacterReceiver characterReceiver)
Parse an input stream without character set detection.
-
-
-
Method Detail
-
parseWithCharsetDetection
public void parseWithCharsetDetection(java.lang.String startingCharset, java.io.InputStream inputStream, CharacterReceiver characterReceiver) throws java.io.IOException, ManifoldCFException
Parse an input stream with character set detection. This method uses BOM (byte order mark) and the xml encoding tag to determine the character encoding to use. The caller may pass in a starting character encoding, which functions as the default if no better determination is made.- Parameters:
startingCharset
- is the starting character set. Pass null if this is unknown.inputStream
- is the input stream. It is the caller's responsibility to close the stream when the parse is done.characterReceiver
- is the character receiver that will actually do the parsing.- Throws:
java.io.IOException
ManifoldCFException
-
parseWithoutCharsetDetection
public void parseWithoutCharsetDetection(java.lang.String startingCharset, java.io.InputStream inputStream, CharacterReceiver characterReceiver) throws java.io.IOException, ManifoldCFException
Parse an input stream without character set detection.- Parameters:
startingCharset
- is the starting character set. If null is passed, the code will presume utf-8.inputStream
- is the input stream. It is the caller's responsibility to close the stream when the parse is done.characterReceiver
- is the character receiver that will actually do the parsing.- Throws:
java.io.IOException
ManifoldCFException
-
-