Class FindContentHandler
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.FindContentHandler
-
- All Implemented Interfaces:
IDiscoveredLinkHandler,IHTMLHandler,IMetaTagHandler
public class FindContentHandler extends FindHandler implements IHTMLHandler
This class is the handler for HTML content grepping during state transitions
-
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.StringBuildercontentBufferprotected java.util.List<java.util.regex.Pattern>contentPatternsprotected static intMAX_LENGTHprotected static intOVERLAP_AMOUNT-
Fields inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
parentURI, targetURI
-
-
Constructor Summary
Constructors Constructor Description FindContentHandler(java.lang.String parentURI, java.util.List<java.util.regex.Pattern> contentPatterns)FindContentHandler(java.lang.String parentURI, java.util.regex.Pattern contentPattern)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidapplyOverrides(LoginParameters lp)Apply overridesvoidfinishUp()Finish up all processing.voidnoteAHREF(java.lang.String rawURL)Note discovered hrefvoidnoteBASEHREF(java.lang.String rawURL)Note discovered base hrefvoidnoteFormEnd()Note the end of a formvoidnoteFormInput(java.util.Map inputAttributes)Note an input tagvoidnoteFormStart(java.util.Map formAttributes)Note the start of a formvoidnoteFRAMESRC(java.lang.String rawURL)Note discovered FRAME SRCvoidnoteIMGSRC(java.lang.String rawURL)Note discovered IMG SRCvoidnoteLINKHREF(java.lang.String rawURL)Note discovered hrefvoidnoteMetaTag(java.util.Map metaAttributes)Note a meta tagvoidnoteTextCharacter(char textCharacter)Note a character of text.protected voidprocessBuffer()-
Methods inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
getTargetURI, noteDiscoveredBase, noteDiscoveredLink
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IDiscoveredLinkHandler
noteDiscoveredBase, noteDiscoveredLink
-
-
-
-
Field Detail
-
contentPatterns
protected final java.util.List<java.util.regex.Pattern> contentPatterns
-
contentBuffer
protected final java.lang.StringBuilder contentBuffer
-
MAX_LENGTH
protected static final int MAX_LENGTH
- See Also:
- Constant Field Values
-
OVERLAP_AMOUNT
protected static final int OVERLAP_AMOUNT
- See Also:
- Constant Field Values
-
-
Method Detail
-
applyOverrides
public void applyOverrides(LoginParameters lp) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Apply overrides- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteTextCharacter
public void noteTextCharacter(char textCharacter) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote a character of text. Structured this way to keep overhead low for handlers that don't use text.- Specified by:
noteTextCharacterin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteMetaTag
public void noteMetaTag(java.util.Map metaAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote a meta tag- Specified by:
noteMetaTagin interfaceIMetaTagHandler- Parameters:
metaAttributes- are the attributes that belong to the tag.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormStart
public void noteFormStart(java.util.Map formAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote the start of a form- Specified by:
noteFormStartin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormInput
public void noteFormInput(java.util.Map inputAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote an input tag- Specified by:
noteFormInputin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormEnd
public void noteFormEnd() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote the end of a form- Specified by:
noteFormEndin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteAHREF
public void noteAHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered href- Specified by:
noteAHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteBASEHREF
public void noteBASEHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered base href- Specified by:
noteBASEHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteLINKHREF
public void noteLINKHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered href- Specified by:
noteLINKHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteIMGSRC
public void noteIMGSRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered IMG SRC- Specified by:
noteIMGSRCin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFRAMESRC
public void noteFRAMESRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered FRAME SRC- Specified by:
noteFRAMESRCin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
finishUp
public void finishUp() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionFinish up all processing. Called ONLY if we haven't already aborted.- Specified by:
finishUpin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
processBuffer
protected void processBuffer()
-
-