Class WebcrawlerConnector.ProcessActivityHTMLHandler
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityHTMLHandler
-
- All Implemented Interfaces:
IDiscoveredLinkHandler
,IHTMLHandler
,IMetaTagHandler
- Enclosing class:
- WebcrawlerConnector
protected class WebcrawlerConnector.ProcessActivityHTMLHandler extends WebcrawlerConnector.ProcessActivityLinkHandler implements IHTMLHandler
Class that describes HTML handling
-
-
Field Summary
-
Fields inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
activities, baseDocumentIdentifier, contextDescription, documentIdentifier, filter, linkType
-
-
Constructor Summary
Constructors Constructor Description ProcessActivityHTMLHandler(java.lang.String documentIdentifier, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, WebcrawlerConnector.DocumentURLFilter filter, int metaRobotTagsUsage)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
finishUp()
Done with the document.void
noteAHREF(java.lang.String rawURL)
Note discovered hrefvoid
noteBASEHREF(java.lang.String rawURL)
Note discovered basevoid
noteFormEnd()
Note the end of a formvoid
noteFormInput(java.util.Map inputAttributes)
Note an input tagvoid
noteFormStart(java.util.Map formAttributes)
Note the start of a formvoid
noteFRAMESRC(java.lang.String rawURL)
Note discovered FRAME SRCvoid
noteIMGSRC(java.lang.String rawURL)
Note discovered IMG SRCvoid
noteLINKHREF(java.lang.String rawURL)
Note discovered hrefvoid
noteMetaTag(java.util.Map metaAttributes)
Note a meta tagvoid
noteTextCharacter(char textCharacter)
Note a character of text.boolean
shouldIndex()
Decide whether we should index.-
Methods inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
noteDiscoveredBase, noteDiscoveredLink
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IDiscoveredLinkHandler
noteDiscoveredBase, noteDiscoveredLink
-
-
-
-
Constructor Detail
-
ProcessActivityHTMLHandler
public ProcessActivityHTMLHandler(java.lang.String documentIdentifier, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, WebcrawlerConnector.DocumentURLFilter filter, int metaRobotTagsUsage)
Constructor.
-
-
Method Detail
-
shouldIndex
public boolean shouldIndex()
Decide whether we should index.
-
noteTextCharacter
public void noteTextCharacter(char textCharacter) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note a character of text. Structured this way to keep overhead low for handlers that don't use text.- Specified by:
noteTextCharacter
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteMetaTag
public void noteMetaTag(java.util.Map metaAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note a meta tag- Specified by:
noteMetaTag
in interfaceIMetaTagHandler
- Parameters:
metaAttributes
- are the attributes that belong to the tag.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormStart
public void noteFormStart(java.util.Map formAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the start of a form- Specified by:
noteFormStart
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormInput
public void noteFormInput(java.util.Map inputAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note an input tag- Specified by:
noteFormInput
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormEnd
public void noteFormEnd() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the end of a form- Specified by:
noteFormEnd
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteBASEHREF
public void noteBASEHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered base- Specified by:
noteBASEHREF
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteAHREF
public void noteAHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href- Specified by:
noteAHREF
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteLINKHREF
public void noteLINKHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href- Specified by:
noteLINKHREF
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteIMGSRC
public void noteIMGSRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered IMG SRC- Specified by:
noteIMGSRC
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFRAMESRC
public void noteFRAMESRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered FRAME SRC- Specified by:
noteFRAMESRC
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
finishUp
public void finishUp() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Description copied from interface:IHTMLHandler
Done with the document.- Specified by:
finishUp
in interfaceIHTMLHandler
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
-