Class DataCache


  • public class DataCache
    extends java.lang.Object
    This class is a cache of a specific URL's data. It's fetched early and kept, so that (1) an accurate data length can be found, and (2) we can compute a version checksum.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      protected static class  DataCache.DocumentData
      This class represents everything we need to know about a document that's getting passed from the getDocumentVersions() phase to the processDocuments() phase.
    • Constructor Summary

      Constructors 
      Constructor Description
      DataCache()
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String addData​(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, java.lang.String documentIdentifier, IThrottledConnection connection)
      Add a data entry into the cache.
      void deleteData​(java.lang.String documentIdentifier)
      Delete specified item of data.
      java.lang.String getContentType​(java.lang.String documentIdentifier)
      Get the content type.
      java.io.InputStream getData​(java.lang.String documentIdentifier)
      Fetch binary data entry from the cache.
      long getDataLength​(java.lang.String documentIdentifier)
      Fetch binary data length.
      java.lang.String getReferralURI​(java.lang.String documentIdentifier)
      Get the referral URI.
      int getResponseCode​(java.lang.String documentIdentifier)
      Get the response code.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • DataCache

        public DataCache()
        Constructor.
    • Method Detail

      • addData

        public java.lang.String addData​(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
                                        java.lang.String documentIdentifier,
                                        IThrottledConnection connection)
                                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Add a data entry into the cache. This method is called whenever the data from a fetch is considered interesting or useful, and will be thus passed on from getDocumentVersions() to the processDocuments() phase. At the moment that's usually a 200 or a 302 response.
        Parameters:
        documentIdentifier - is the document identifier (url).
        connection - is the connection, upon which a fetch has been done that needs to be cached.
        Returns:
        a "checksum" value, to use as a version string.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseCode

        public int getResponseCode​(java.lang.String documentIdentifier)
        Get the response code.
        Parameters:
        documentIdentifier - is the document identifier.
        Returns:
        the code.
      • getContentType

        public java.lang.String getContentType​(java.lang.String documentIdentifier)
        Get the content type.
        Parameters:
        documentIdentifier - is the document identifier.
        Returns:
        the content type, or null if there is none.
      • getReferralURI

        public java.lang.String getReferralURI​(java.lang.String documentIdentifier)
        Get the referral URI.
        Parameters:
        documentIdentifier - is the document identifier.
        Returns:
        the referral URI, or null if none.
      • getDataLength

        public long getDataLength​(java.lang.String documentIdentifier)
        Fetch binary data length.
        Parameters:
        documentIdentifier - is the document identifier.
        Returns:
        the length.
      • getData

        public java.io.InputStream getData​(java.lang.String documentIdentifier)
                                    throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Fetch binary data entry from the cache.
        Parameters:
        documentIdentifier - is the document identifier (url).
        Returns:
        a binary data stream.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • deleteData

        public void deleteData​(java.lang.String documentIdentifier)
        Delete specified item of data.
        Parameters:
        documentIdentifier - is the document identifier (url).