Class WebcrawlerConnector.ProcessActivityHTMLHandler

    • Constructor Detail

      • ProcessActivityHTMLHandler

        public ProcessActivityHTMLHandler​(java.lang.String documentIdentifier,
                                          org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
                                          WebcrawlerConnector.DocumentURLFilter filter,
                                          int metaRobotTagsUsage)
        Constructor.
    • Method Detail

      • shouldIndex

        public boolean shouldIndex()
        Decide whether we should index.
      • noteTextCharacter

        public void noteTextCharacter​(char textCharacter)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note a character of text. Structured this way to keep overhead low for handlers that don't use text.
        Specified by:
        noteTextCharacter in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteMetaTag

        public void noteMetaTag​(java.util.Map metaAttributes)
                         throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note a meta tag
        Specified by:
        noteMetaTag in interface IMetaTagHandler
        Parameters:
        metaAttributes - are the attributes that belong to the tag.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteFormStart

        public void noteFormStart​(java.util.Map formAttributes)
                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note the start of a form
        Specified by:
        noteFormStart in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteFormInput

        public void noteFormInput​(java.util.Map inputAttributes)
                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note an input tag
        Specified by:
        noteFormInput in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteFormEnd

        public void noteFormEnd()
                         throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note the end of a form
        Specified by:
        noteFormEnd in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteBASEHREF

        public void noteBASEHREF​(java.lang.String rawURL)
                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note discovered base
        Specified by:
        noteBASEHREF in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteAHREF

        public void noteAHREF​(java.lang.String rawURL)
                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note discovered href
        Specified by:
        noteAHREF in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteLINKHREF

        public void noteLINKHREF​(java.lang.String rawURL)
                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note discovered href
        Specified by:
        noteLINKHREF in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteIMGSRC

        public void noteIMGSRC​(java.lang.String rawURL)
                        throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note discovered IMG SRC
        Specified by:
        noteIMGSRC in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteFRAMESRC

        public void noteFRAMESRC​(java.lang.String rawURL)
                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note discovered FRAME SRC
        Specified by:
        noteFRAMESRC in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • finishUp

        public void finishUp()
                      throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Description copied from interface: IHTMLHandler
        Done with the document.
        Specified by:
        finishUp in interface IHTMLHandler
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException