Class WorkerThread
- java.lang.Object
-
- java.lang.Thread
-
- org.apache.manifoldcf.crawler.system.WorkerThread
-
- All Implemented Interfaces:
java.lang.Runnable
public class WorkerThread extends java.lang.ThreadThis class represents a worker thread. Hundreds of these threads are instantiated in order to perform crawling and extraction.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classWorkerThread.CheckActivityThe check activity classprotected static classWorkerThread.DocumentBinDocumentBin classprotected static classWorkerThread.DocumentReferenceClass describing document reference.protected static classWorkerThread.DocumentToProcessClass that represents a decision to process a document.protected static classWorkerThread.ExistingVersionsThe implementation of the IExistingVersions interface.protected static classWorkerThread.OutputActivityThe ingest logger classprotected static classWorkerThread.ProcessActivityProcess activity class wraps access to the ingester and job queue.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidprotected DocumentQueuedocumentQueueThis is a reference to the static main document queueprotected java.lang.StringidThread idprotected static intMAX_ADDS_IN_TRANSACTIONThe maximum number of adds that happen in a single transactionprotected java.lang.StringprocessIDProcess IDprotected QueueTrackerqueueTrackerQueue trackerprotected WorkerResetManagerresetManagerWorker thread pool reset manager
-
Constructor Summary
Constructors Constructor Description WorkerThread(java.lang.String id, DocumentQueue documentQueue, WorkerResetManager resetManager, QueueTracker queueTracker, java.lang.String processID)Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static booleancompareArrays(java.lang.String[] array1, java.lang.String[] array2)Compare two sorted collection names lists.protected static java.lang.StringcomputeComponentIDHash(java.lang.String componentIdentifier)protected static java.lang.StringmakeListString(java.util.List<QueuedDocument> sourceList)protected static java.lang.StringmakeListString(DocumentDescription[] sourceList)protected static voidmoveList(java.util.List<QueuedDocument> sourceList, java.util.List<QueuedDocument> targetList)protected static voidprocessDeleteLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime)Clear specified documents out of the job queue and from the appliance.protected static voidprocessHopcountRemovalLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> hopcountremoveList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime)Mark specified documents as 'hopcount removed', and remove them from the index.protected static voidprocessJobQueueDeletions(java.util.List<QueuedDocument> jobmanagerDeleteList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime)Process job queue deletions.protected static voidprocessJobQueueHopcountRemovals(java.util.List<QueuedDocument> jobmanagerRemovalList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime)Process job queue hopcount removals.protected static java.util.List<QueuedDocument>removeFromIndex(IPipelineConnections pipelineConnections, java.lang.String connectionName, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, WorkerThread.OutputActivity ingestLogger)Remove a specified set of documents from the index.protected static voidrequeueDocuments(IJobManager jobManager, java.util.List<QueuedDocument> requeueList, long retryTime, long failTime, int failCount)Requeue documents after a service interruption was detected.voidrun()-
Methods inherited from class java.lang.Thread
activeCount, checkAccess, clone, countStackFrames, currentThread, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, onSpinWait, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, suspend, toString, yield
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
id
protected final java.lang.String id
Thread id
-
documentQueue
protected final DocumentQueue documentQueue
This is a reference to the static main document queue
-
resetManager
protected final WorkerResetManager resetManager
Worker thread pool reset manager
-
queueTracker
protected final QueueTracker queueTracker
Queue tracker
-
processID
protected final java.lang.String processID
Process ID
-
MAX_ADDS_IN_TRANSACTION
protected static final int MAX_ADDS_IN_TRANSACTION
The maximum number of adds that happen in a single transaction- See Also:
- Constant Field Values
-
-
Constructor Detail
-
WorkerThread
public WorkerThread(java.lang.String id, DocumentQueue documentQueue, WorkerResetManager resetManager, QueueTracker queueTracker, java.lang.String processID) throws ManifoldCFExceptionConstructor.- Parameters:
id- is the worker thread id.- Throws:
ManifoldCFException
-
-
Method Detail
-
run
public void run()
- Specified by:
runin interfacejava.lang.Runnable- Overrides:
runin classjava.lang.Thread
-
compareArrays
protected static boolean compareArrays(java.lang.String[] array1, java.lang.String[] array2)Compare two sorted collection names lists.
-
makeListString
protected static java.lang.String makeListString(java.util.List<QueuedDocument> sourceList)
-
makeListString
protected static java.lang.String makeListString(DocumentDescription[] sourceList)
-
moveList
protected static void moveList(java.util.List<QueuedDocument> sourceList, java.util.List<QueuedDocument> targetList)
-
processHopcountRemovalLists
protected static void processHopcountRemovalLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> hopcountremoveList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Mark specified documents as 'hopcount removed', and remove them from the index. Documents in this state are presumed to have: (a) nothing in the index (b) no intrinsic links for which they are the origin In order to guarantee this situation, this method must be capable of doing much of what the deletion method must do. Specifically, it should be capable of deleting documents from the index should they be already present.- Throws:
ManifoldCFException
-
processDeleteLists
protected static void processDeleteLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Clear specified documents out of the job queue and from the appliance.- Parameters:
pipelineConnections- is the basic pipeline specification for this job.jobManager- is the job manager.deleteList- is a list of QueuedDocument objects to clean out.ingester- is the handle to the incremental ingestion API control object.- Throws:
ManifoldCFException
-
removeFromIndex
protected static java.util.List<QueuedDocument> removeFromIndex(IPipelineConnections pipelineConnections, java.lang.String connectionName, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, WorkerThread.OutputActivity ingestLogger) throws ManifoldCFException
Remove a specified set of documents from the index.- Returns:
- the list of documents whose state needs to be updated in jobqueue.
- Throws:
ManifoldCFException
-
processJobQueueDeletions
protected static void processJobQueueDeletions(java.util.List<QueuedDocument> jobmanagerDeleteList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Process job queue deletions. Either the indexer has already been updated, or it is not necessary to update it.- Throws:
ManifoldCFException
-
processJobQueueHopcountRemovals
protected static void processJobQueueHopcountRemovals(java.util.List<QueuedDocument> jobmanagerRemovalList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Process job queue hopcount removals. All indexer updates have already taken place.- Throws:
ManifoldCFException
-
requeueDocuments
protected static void requeueDocuments(IJobManager jobManager, java.util.List<QueuedDocument> requeueList, long retryTime, long failTime, int failCount) throws ManifoldCFException
Requeue documents after a service interruption was detected.- Parameters:
jobManager- is the job manager object.requeueList- is a list of QueuedDocument objects describing what needs to be requeued.retryTime- is the time that the first retry ought to be scheduled for.failTime- is the time beyond which retries lead to hard failure.failCount- is the number of retries allowed until hard failure.- Throws:
ManifoldCFException
-
computeComponentIDHash
protected static java.lang.String computeComponentIDHash(java.lang.String componentIdentifier) throws ManifoldCFException- Throws:
ManifoldCFException
-
-