Class FindContentHandler
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.FindContentHandler
-
- All Implemented Interfaces:
IDiscoveredLinkHandler
,IHTMLHandler
,IMetaTagHandler
public class FindContentHandler extends FindHandler implements IHTMLHandler
This class is the handler for HTML content grepping during state transitions
-
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.StringBuilder
contentBuffer
protected java.util.List<java.util.regex.Pattern>
contentPatterns
protected static int
MAX_LENGTH
protected static int
OVERLAP_AMOUNT
-
Fields inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
parentURI, targetURI
-
-
Constructor Summary
Constructors Constructor Description FindContentHandler(java.lang.String parentURI, java.util.List<java.util.regex.Pattern> contentPatterns)
FindContentHandler(java.lang.String parentURI, java.util.regex.Pattern contentPattern)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
applyOverrides(LoginParameters lp)
Apply overridesvoid
finishUp()
Finish up all processing.void
noteAHREF(java.lang.String rawURL)
Note discovered hrefvoid
noteBASEHREF(java.lang.String rawURL)
Note discovered base hrefvoid
noteFormEnd()
Note the end of a formvoid
noteFormInput(java.util.Map inputAttributes)
Note an input tagvoid
noteFormStart(java.util.Map formAttributes)
Note the start of a formvoid
noteFRAMESRC(java.lang.String rawURL)
Note discovered FRAME SRCvoid
noteIMGSRC(java.lang.String rawURL)
Note discovered IMG SRCvoid
noteLINKHREF(java.lang.String rawURL)
Note discovered hrefvoid
noteMetaTag(java.util.Map metaAttributes)
Note a meta tagvoid
noteTextCharacter(char textCharacter)
Note a character of text.protected void
processBuffer()
-
Methods inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
getTargetURI, noteDiscoveredBase, noteDiscoveredLink
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IDiscoveredLinkHandler
noteDiscoveredBase, noteDiscoveredLink
-
-
-
-
Field Detail
-
contentPatterns
protected final java.util.List<java.util.regex.Pattern> contentPatterns
-
contentBuffer
protected final java.lang.StringBuilder contentBuffer
-
MAX_LENGTH
protected static final int MAX_LENGTH
- See Also:
- Constant Field Values
-
OVERLAP_AMOUNT
protected static final int OVERLAP_AMOUNT
- See Also:
- Constant Field Values
-
-
Method Detail
-
applyOverrides
public void applyOverrides(LoginParameters lp) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Apply overrides- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteTextCharacter
public void noteTextCharacter(char textCharacter) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note a character of text. Structured this way to keep overhead low for handlers that don't use text.- Specified by:
noteTextCharacter
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteMetaTag
public void noteMetaTag(java.util.Map metaAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note a meta tag- Specified by:
noteMetaTag
in interfaceIMetaTagHandler
- Parameters:
metaAttributes
- are the attributes that belong to the tag.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormStart
public void noteFormStart(java.util.Map formAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the start of a form- Specified by:
noteFormStart
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormInput
public void noteFormInput(java.util.Map inputAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note an input tag- Specified by:
noteFormInput
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormEnd
public void noteFormEnd() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the end of a form- Specified by:
noteFormEnd
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteAHREF
public void noteAHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href- Specified by:
noteAHREF
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteBASEHREF
public void noteBASEHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered base href- Specified by:
noteBASEHREF
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteLINKHREF
public void noteLINKHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href- Specified by:
noteLINKHREF
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteIMGSRC
public void noteIMGSRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered IMG SRC- Specified by:
noteIMGSRC
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFRAMESRC
public void noteFRAMESRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered FRAME SRC- Specified by:
noteFRAMESRC
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
finishUp
public void finishUp() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Finish up all processing. Called ONLY if we haven't already aborted.- Specified by:
finishUp
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
processBuffer
protected void processBuffer()
-
-