Class IncrementalIngester.PipelineAddFanout
- java.lang.Object
-
- org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.PipelineAddFanout
-
- All Implemented Interfaces:
IOutputAddActivity
,IOutputCheckActivity
,IOutputHistoryActivity
,IOutputQualifyActivity
- Enclosing class:
- IncrementalIngester
public static class IncrementalIngester.PipelineAddFanout extends java.lang.Object implements IOutputAddActivity
This class describes the entry stage of multiple siblings in an add pipeline.
-
-
Field Summary
Fields Modifier and Type Field Description protected IncrementalIngester.PipelineAddEntryPoint[]
entryPoints
protected IOutputHistoryActivity
finalHistoryActivity
protected IOutputQualifyActivity
finalQualifyActivity
-
Fields inherited from interface org.apache.manifoldcf.agents.interfaces.IOutputAddActivity
_rcsid
-
Fields inherited from interface org.apache.manifoldcf.agents.interfaces.IOutputHistoryActivity
CREATED_DIRECTORY, EXCEPTION, EXCLUDED_CONTENT, EXCLUDED_DATE, EXCLUDED_LENGTH, EXCLUDED_MIMETYPE, EXCLUDED_URL, HTTP_ERROR, IOEXCEPTION, UNKNOWN_SECURITY
-
-
Constructor Summary
Constructors Constructor Description PipelineAddFanout(IncrementalIngester.PipelineAddEntryPoint[] entryPoints, IOutputHistoryActivity finalHistoryActivity, IOutputQualifyActivity finalQualifyActivity)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
checkDateIndexable(java.util.Date date)
Detect if a document date is acceptable downstream or not.boolean
checkDocumentIndexable(java.io.File localFile)
Pre-determine whether a document (passed here as a File object) is acceptable downstream.boolean
checkLengthIndexable(long length)
Pre-determine whether a document's length is acceptable downstream.boolean
checkMimeTypeIndexable(java.lang.String mimeType)
Detect if a mime type is acceptable downstream or not.boolean
checkNeedToReindex()
boolean
checkURLIndexable(java.lang.String uri)
Pre-determine whether a document's URL is acceptable downstream.void
noDocument()
Send NO document via the pipeline to the next output connection.java.lang.String
qualifyAccessToken(java.lang.String authorityNameString, java.lang.String accessToken)
Qualify an access token appropriately, to match access tokens as returned by mod_aa.void
recordActivity(java.lang.Long startTime, java.lang.String activityType, java.lang.Long dataSize, java.lang.String entityURI, java.lang.String resultCode, java.lang.String resultDescription)
Record time-stamped information about the activity of the output connector.int
sendDocument(java.lang.String documentURI, RepositoryDocument document)
Send a document via the pipeline to the next output connection.
-
-
-
Field Detail
-
entryPoints
protected final IncrementalIngester.PipelineAddEntryPoint[] entryPoints
-
finalHistoryActivity
protected final IOutputHistoryActivity finalHistoryActivity
-
finalQualifyActivity
protected final IOutputQualifyActivity finalQualifyActivity
-
-
Constructor Detail
-
PipelineAddFanout
public PipelineAddFanout(IncrementalIngester.PipelineAddEntryPoint[] entryPoints, IOutputHistoryActivity finalHistoryActivity, IOutputQualifyActivity finalQualifyActivity)
-
-
Method Detail
-
checkNeedToReindex
public boolean checkNeedToReindex()
-
checkDateIndexable
public boolean checkDateIndexable(java.util.Date date) throws ManifoldCFException, ServiceInterruption
Description copied from interface:IOutputCheckActivity
Detect if a document date is acceptable downstream or not. This method is used to determine whether it makes sense to fetch a document in the first place.- Specified by:
checkDateIndexable
in interfaceIOutputCheckActivity
- Parameters:
date
- is the date of the document.- Returns:
- true if the document with that date can be accepted by the downstream connection.
- Throws:
ManifoldCFException
ServiceInterruption
-
checkMimeTypeIndexable
public boolean checkMimeTypeIndexable(java.lang.String mimeType) throws ManifoldCFException, ServiceInterruption
Description copied from interface:IOutputCheckActivity
Detect if a mime type is acceptable downstream or not. This method is used to determine whether it makes sense to fetch a document in the first place.- Specified by:
checkMimeTypeIndexable
in interfaceIOutputCheckActivity
- Parameters:
mimeType
- is the mime type of the document.- Returns:
- true if the mime type can be accepted by the downstream connection.
- Throws:
ManifoldCFException
ServiceInterruption
-
checkDocumentIndexable
public boolean checkDocumentIndexable(java.io.File localFile) throws ManifoldCFException, ServiceInterruption
Description copied from interface:IOutputCheckActivity
Pre-determine whether a document (passed here as a File object) is acceptable downstream. This method is used to determine whether a document needs to be actually transferred. This hook is provided mainly to support search engines that only handle a small set of accepted file types.- Specified by:
checkDocumentIndexable
in interfaceIOutputCheckActivity
- Parameters:
localFile
- is the local file to check.- Returns:
- true if the file is acceptable by the downstream connection.
- Throws:
ManifoldCFException
ServiceInterruption
-
checkLengthIndexable
public boolean checkLengthIndexable(long length) throws ManifoldCFException, ServiceInterruption
Description copied from interface:IOutputCheckActivity
Pre-determine whether a document's length is acceptable downstream. This method is used to determine whether to fetch a document in the first place.- Specified by:
checkLengthIndexable
in interfaceIOutputCheckActivity
- Parameters:
length
- is the length of the document.- Returns:
- true if the file is acceptable by the downstream connection.
- Throws:
ManifoldCFException
ServiceInterruption
-
checkURLIndexable
public boolean checkURLIndexable(java.lang.String uri) throws ManifoldCFException, ServiceInterruption
Description copied from interface:IOutputCheckActivity
Pre-determine whether a document's URL is acceptable downstream. This method is used to help filter out documents that cannot be indexed in advance.- Specified by:
checkURLIndexable
in interfaceIOutputCheckActivity
- Parameters:
uri
- is the URL of the document.- Returns:
- true if the file is acceptable by the downstream connection.
- Throws:
ManifoldCFException
ServiceInterruption
-
sendDocument
public int sendDocument(java.lang.String documentURI, RepositoryDocument document) throws ManifoldCFException, ServiceInterruption, java.io.IOException
Send a document via the pipeline to the next output connection.- Specified by:
sendDocument
in interfaceIOutputAddActivity
- Parameters:
documentURI
- is the document's URI.document
- is the document data to be processed (handed to the output data store).- Returns:
- the document status (accepted or permanently rejected); return codes are listed in IPipelineConnector.
- Throws:
java.io.IOException
- only if there's an IO error reading the data from the document.ManifoldCFException
ServiceInterruption
-
noDocument
public void noDocument() throws ManifoldCFException, ServiceInterruption
Send NO document via the pipeline to the next output connection. This is equivalent to sending an empty document placeholder.- Specified by:
noDocument
in interfaceIOutputAddActivity
- Throws:
ManifoldCFException
ServiceInterruption
-
qualifyAccessToken
public java.lang.String qualifyAccessToken(java.lang.String authorityNameString, java.lang.String accessToken) throws ManifoldCFException
Qualify an access token appropriately, to match access tokens as returned by mod_aa. This method includes the authority name with the access token, if any, so that each authority may establish its own token space.- Specified by:
qualifyAccessToken
in interfaceIOutputQualifyActivity
- Parameters:
authorityNameString
- is the name of the authority to use to qualify the access token.accessToken
- is the raw, repository access token.- Returns:
- the properly qualified access token.
- Throws:
ManifoldCFException
-
recordActivity
public void recordActivity(java.lang.Long startTime, java.lang.String activityType, java.lang.Long dataSize, java.lang.String entityURI, java.lang.String resultCode, java.lang.String resultDescription) throws ManifoldCFException
Record time-stamped information about the activity of the output connector.- Specified by:
recordActivity
in interfaceIOutputHistoryActivity
- Parameters:
startTime
- is either null or the time since the start of epoch in milliseconds (Jan 1, 1970). Every activity has an associated time; the startTime field records when the activity began. A null value indicates that the start time and the finishing time are the same.activityType
- is a string which is fully interpretable only in the context of the connector involved, which is used to categorize what kind of activity is being recorded. For example, a web connector might record a "fetch document" activity. Cannot be null.dataSize
- is the number of bytes of data involved in the activity, or null if not applicable.entityURI
- is a (possibly long) string which identifies the object involved in the history record. The interpretation of this field will differ from connector to connector. May be null.resultCode
- contains a terse description of the result of the activity. The description is limited in size to 255 characters, and can be interpreted only in the context of the current connector. May be null.resultDescription
- is a (possibly long) human-readable string which adds detail, if required, to the result described in the resultCode field. This field is not meant to be queried on. May be null.- Throws:
ManifoldCFException
-
-