Class DataCache
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache
-
public class DataCache extends java.lang.Object
This class is a cache of a specific URL's data. It's fetched early and kept, so that (1) an accurate data length can be found, and (2) we can compute a version checksum.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
DataCache.DocumentData
This class represents everything we need to know about a document that's getting passed from the getDocumentVersions() phase to the processDocuments() phase.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
_rcsid
protected java.util.Map<java.lang.String,DataCache.DocumentData>
cacheData
-
Constructor Summary
Constructors Constructor Description DataCache()
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
addData(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, java.lang.String documentIdentifier, IThrottledConnection connection)
Add a data entry into the cache.void
deleteData(java.lang.String documentIdentifier)
Delete specified item of data.java.lang.String
getContentType(java.lang.String documentIdentifier)
Get the content type.java.io.InputStream
getData(java.lang.String documentIdentifier)
Fetch binary data entry from the cache.long
getDataLength(java.lang.String documentIdentifier)
Fetch binary data length.java.lang.String
getReferralURI(java.lang.String documentIdentifier)
Get the referral URI.int
getResponseCode(java.lang.String documentIdentifier)
Get the response code.
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
cacheData
protected java.util.Map<java.lang.String,DataCache.DocumentData> cacheData
-
-
Method Detail
-
addData
public java.lang.String addData(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, java.lang.String documentIdentifier, IThrottledConnection connection) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Add a data entry into the cache. This method is called whenever the data from a fetch is considered interesting or useful, and will be thus passed on from getDocumentVersions() to the processDocuments() phase. At the moment that's usually a 200 or a 302 response.- Parameters:
documentIdentifier
- is the document identifier (url).connection
- is the connection, upon which a fetch has been done that needs to be cached.- Returns:
- a "checksum" value, to use as a version string.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getResponseCode
public int getResponseCode(java.lang.String documentIdentifier)
Get the response code.- Parameters:
documentIdentifier
- is the document identifier.- Returns:
- the code.
-
getContentType
public java.lang.String getContentType(java.lang.String documentIdentifier)
Get the content type.- Parameters:
documentIdentifier
- is the document identifier.- Returns:
- the content type, or null if there is none.
-
getReferralURI
public java.lang.String getReferralURI(java.lang.String documentIdentifier)
Get the referral URI.- Parameters:
documentIdentifier
- is the document identifier.- Returns:
- the referral URI, or null if none.
-
getDataLength
public long getDataLength(java.lang.String documentIdentifier)
Fetch binary data length.- Parameters:
documentIdentifier
- is the document identifier.- Returns:
- the length.
-
getData
public java.io.InputStream getData(java.lang.String documentIdentifier) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Fetch binary data entry from the cache.- Parameters:
documentIdentifier
- is the document identifier (url).- Returns:
- a binary data stream.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
deleteData
public void deleteData(java.lang.String documentIdentifier)
Delete specified item of data.- Parameters:
documentIdentifier
- is the document identifier (url).
-
-