Class DataCache
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache
-
public class DataCache extends java.lang.ObjectThis class is a cache of a specific URL's data. It's fetched early and kept, so that (1) an accurate data length can be found, and (2) we can compute a version checksum.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classDataCache.DocumentDataThis class represents everything we need to know about a document that's getting passed from the getDocumentVersions() phase to the processDocuments() phase.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidprotected java.util.Map<java.lang.String,DataCache.DocumentData>cacheData
-
Constructor Summary
Constructors Constructor Description DataCache()Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringaddData(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, java.lang.String documentIdentifier, IThrottledConnection connection)Add a data entry into the cache.voiddeleteData(java.lang.String documentIdentifier)Delete specified item of data.java.lang.StringgetContentType(java.lang.String documentIdentifier)Get the content type.java.io.InputStreamgetData(java.lang.String documentIdentifier)Fetch binary data entry from the cache.longgetDataLength(java.lang.String documentIdentifier)Fetch binary data length.java.lang.StringgetReferralURI(java.lang.String documentIdentifier)Get the referral URI.intgetResponseCode(java.lang.String documentIdentifier)Get the response code.
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
cacheData
protected java.util.Map<java.lang.String,DataCache.DocumentData> cacheData
-
-
Method Detail
-
addData
public java.lang.String addData(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, java.lang.String documentIdentifier, IThrottledConnection connection) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionAdd a data entry into the cache. This method is called whenever the data from a fetch is considered interesting or useful, and will be thus passed on from getDocumentVersions() to the processDocuments() phase. At the moment that's usually a 200 or a 302 response.- Parameters:
documentIdentifier- is the document identifier (url).connection- is the connection, upon which a fetch has been done that needs to be cached.- Returns:
- a "checksum" value, to use as a version string.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getResponseCode
public int getResponseCode(java.lang.String documentIdentifier)
Get the response code.- Parameters:
documentIdentifier- is the document identifier.- Returns:
- the code.
-
getContentType
public java.lang.String getContentType(java.lang.String documentIdentifier)
Get the content type.- Parameters:
documentIdentifier- is the document identifier.- Returns:
- the content type, or null if there is none.
-
getReferralURI
public java.lang.String getReferralURI(java.lang.String documentIdentifier)
Get the referral URI.- Parameters:
documentIdentifier- is the document identifier.- Returns:
- the referral URI, or null if none.
-
getDataLength
public long getDataLength(java.lang.String documentIdentifier)
Fetch binary data length.- Parameters:
documentIdentifier- is the document identifier.- Returns:
- the length.
-
getData
public java.io.InputStream getData(java.lang.String documentIdentifier) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionFetch binary data entry from the cache.- Parameters:
documentIdentifier- is the document identifier (url).- Returns:
- a binary data stream.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
deleteData
public void deleteData(java.lang.String documentIdentifier)
Delete specified item of data.- Parameters:
documentIdentifier- is the document identifier (url).
-
-