Class WebcrawlerConnector.ProcessActivityHTMLHandler
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityHTMLHandler
-
- All Implemented Interfaces:
IDiscoveredLinkHandler,IHTMLHandler,IMetaTagHandler
- Enclosing class:
- WebcrawlerConnector
protected class WebcrawlerConnector.ProcessActivityHTMLHandler extends WebcrawlerConnector.ProcessActivityLinkHandler implements IHTMLHandler
Class that describes HTML handling
-
-
Field Summary
-
Fields inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
activities, baseDocumentIdentifier, contextDescription, documentIdentifier, filter, linkType
-
-
Constructor Summary
Constructors Constructor Description ProcessActivityHTMLHandler(java.lang.String documentIdentifier, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, WebcrawlerConnector.DocumentURLFilter filter, int metaRobotTagsUsage)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidfinishUp()Done with the document.voidnoteAHREF(java.lang.String rawURL)Note discovered hrefvoidnoteBASEHREF(java.lang.String rawURL)Note discovered basevoidnoteFormEnd()Note the end of a formvoidnoteFormInput(java.util.Map inputAttributes)Note an input tagvoidnoteFormStart(java.util.Map formAttributes)Note the start of a formvoidnoteFRAMESRC(java.lang.String rawURL)Note discovered FRAME SRCvoidnoteIMGSRC(java.lang.String rawURL)Note discovered IMG SRCvoidnoteLINKHREF(java.lang.String rawURL)Note discovered hrefvoidnoteMetaTag(java.util.Map metaAttributes)Note a meta tagvoidnoteTextCharacter(char textCharacter)Note a character of text.booleanshouldIndex()Decide whether we should index.-
Methods inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
noteDiscoveredBase, noteDiscoveredLink
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IDiscoveredLinkHandler
noteDiscoveredBase, noteDiscoveredLink
-
-
-
-
Constructor Detail
-
ProcessActivityHTMLHandler
public ProcessActivityHTMLHandler(java.lang.String documentIdentifier, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, WebcrawlerConnector.DocumentURLFilter filter, int metaRobotTagsUsage)Constructor.
-
-
Method Detail
-
shouldIndex
public boolean shouldIndex()
Decide whether we should index.
-
noteTextCharacter
public void noteTextCharacter(char textCharacter) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote a character of text. Structured this way to keep overhead low for handlers that don't use text.- Specified by:
noteTextCharacterin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteMetaTag
public void noteMetaTag(java.util.Map metaAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote a meta tag- Specified by:
noteMetaTagin interfaceIMetaTagHandler- Parameters:
metaAttributes- are the attributes that belong to the tag.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormStart
public void noteFormStart(java.util.Map formAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote the start of a form- Specified by:
noteFormStartin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormInput
public void noteFormInput(java.util.Map inputAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote an input tag- Specified by:
noteFormInputin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormEnd
public void noteFormEnd() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote the end of a form- Specified by:
noteFormEndin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteBASEHREF
public void noteBASEHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered base- Specified by:
noteBASEHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteAHREF
public void noteAHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered href- Specified by:
noteAHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteLINKHREF
public void noteLINKHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered href- Specified by:
noteLINKHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteIMGSRC
public void noteIMGSRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered IMG SRC- Specified by:
noteIMGSRCin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFRAMESRC
public void noteFRAMESRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered FRAME SRC- Specified by:
noteFRAMESRCin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
finishUp
public void finishUp() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionDescription copied from interface:IHTMLHandlerDone with the document.- Specified by:
finishUpin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
-