Class WorkerThread
- java.lang.Object
-
- java.lang.Thread
-
- org.apache.manifoldcf.crawler.system.WorkerThread
-
- All Implemented Interfaces:
java.lang.Runnable
public class WorkerThread extends java.lang.Thread
This class represents a worker thread. Hundreds of these threads are instantiated in order to perform crawling and extraction.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
WorkerThread.CheckActivity
The check activity classprotected static class
WorkerThread.DocumentBin
DocumentBin classprotected static class
WorkerThread.DocumentReference
Class describing document reference.protected static class
WorkerThread.DocumentToProcess
Class that represents a decision to process a document.protected static class
WorkerThread.ExistingVersions
The implementation of the IExistingVersions interface.protected static class
WorkerThread.OutputActivity
The ingest logger classprotected static class
WorkerThread.ProcessActivity
Process activity class wraps access to the ingester and job queue.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
_rcsid
protected DocumentQueue
documentQueue
This is a reference to the static main document queueprotected java.lang.String
id
Thread idprotected static int
MAX_ADDS_IN_TRANSACTION
The maximum number of adds that happen in a single transactionprotected java.lang.String
processID
Process IDprotected QueueTracker
queueTracker
Queue trackerprotected WorkerResetManager
resetManager
Worker thread pool reset manager
-
Constructor Summary
Constructors Constructor Description WorkerThread(java.lang.String id, DocumentQueue documentQueue, WorkerResetManager resetManager, QueueTracker queueTracker, java.lang.String processID)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static boolean
compareArrays(java.lang.String[] array1, java.lang.String[] array2)
Compare two sorted collection names lists.protected static java.lang.String
computeComponentIDHash(java.lang.String componentIdentifier)
protected static java.lang.String
makeListString(java.util.List<QueuedDocument> sourceList)
protected static java.lang.String
makeListString(DocumentDescription[] sourceList)
protected static void
moveList(java.util.List<QueuedDocument> sourceList, java.util.List<QueuedDocument> targetList)
protected static void
processDeleteLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime)
Clear specified documents out of the job queue and from the appliance.protected static void
processHopcountRemovalLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> hopcountremoveList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime)
Mark specified documents as 'hopcount removed', and remove them from the index.protected static void
processJobQueueDeletions(java.util.List<QueuedDocument> jobmanagerDeleteList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime)
Process job queue deletions.protected static void
processJobQueueHopcountRemovals(java.util.List<QueuedDocument> jobmanagerRemovalList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime)
Process job queue hopcount removals.protected static java.util.List<QueuedDocument>
removeFromIndex(IPipelineConnections pipelineConnections, java.lang.String connectionName, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, WorkerThread.OutputActivity ingestLogger)
Remove a specified set of documents from the index.protected static void
requeueDocuments(IJobManager jobManager, java.util.List<QueuedDocument> requeueList, long retryTime, long failTime, int failCount)
Requeue documents after a service interruption was detected.void
run()
-
Methods inherited from class java.lang.Thread
activeCount, checkAccess, clone, countStackFrames, currentThread, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, onSpinWait, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, suspend, toString, yield
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
id
protected final java.lang.String id
Thread id
-
documentQueue
protected final DocumentQueue documentQueue
This is a reference to the static main document queue
-
resetManager
protected final WorkerResetManager resetManager
Worker thread pool reset manager
-
queueTracker
protected final QueueTracker queueTracker
Queue tracker
-
processID
protected final java.lang.String processID
Process ID
-
MAX_ADDS_IN_TRANSACTION
protected static final int MAX_ADDS_IN_TRANSACTION
The maximum number of adds that happen in a single transaction- See Also:
- Constant Field Values
-
-
Constructor Detail
-
WorkerThread
public WorkerThread(java.lang.String id, DocumentQueue documentQueue, WorkerResetManager resetManager, QueueTracker queueTracker, java.lang.String processID) throws ManifoldCFException
Constructor.- Parameters:
id
- is the worker thread id.- Throws:
ManifoldCFException
-
-
Method Detail
-
run
public void run()
- Specified by:
run
in interfacejava.lang.Runnable
- Overrides:
run
in classjava.lang.Thread
-
compareArrays
protected static boolean compareArrays(java.lang.String[] array1, java.lang.String[] array2)
Compare two sorted collection names lists.
-
makeListString
protected static java.lang.String makeListString(java.util.List<QueuedDocument> sourceList)
-
makeListString
protected static java.lang.String makeListString(DocumentDescription[] sourceList)
-
moveList
protected static void moveList(java.util.List<QueuedDocument> sourceList, java.util.List<QueuedDocument> targetList)
-
processHopcountRemovalLists
protected static void processHopcountRemovalLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> hopcountremoveList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Mark specified documents as 'hopcount removed', and remove them from the index. Documents in this state are presumed to have: (a) nothing in the index (b) no intrinsic links for which they are the origin In order to guarantee this situation, this method must be capable of doing much of what the deletion method must do. Specifically, it should be capable of deleting documents from the index should they be already present.- Throws:
ManifoldCFException
-
processDeleteLists
protected static void processDeleteLists(IPipelineConnections pipelineConnections, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Clear specified documents out of the job queue and from the appliance.- Parameters:
pipelineConnections
- is the basic pipeline specification for this job.jobManager
- is the job manager.deleteList
- is a list of QueuedDocument objects to clean out.ingester
- is the handle to the incremental ingestion API control object.- Throws:
ManifoldCFException
-
removeFromIndex
protected static java.util.List<QueuedDocument> removeFromIndex(IPipelineConnections pipelineConnections, java.lang.String connectionName, IJobManager jobManager, java.util.List<QueuedDocument> deleteList, IIncrementalIngester ingester, WorkerThread.OutputActivity ingestLogger) throws ManifoldCFException
Remove a specified set of documents from the index.- Returns:
- the list of documents whose state needs to be updated in jobqueue.
- Throws:
ManifoldCFException
-
processJobQueueDeletions
protected static void processJobQueueDeletions(java.util.List<QueuedDocument> jobmanagerDeleteList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Process job queue deletions. Either the indexer has already been updated, or it is not necessary to update it.- Throws:
ManifoldCFException
-
processJobQueueHopcountRemovals
protected static void processJobQueueHopcountRemovals(java.util.List<QueuedDocument> jobmanagerRemovalList, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod, IReprioritizationTracker rt, long currentTime) throws ManifoldCFException
Process job queue hopcount removals. All indexer updates have already taken place.- Throws:
ManifoldCFException
-
requeueDocuments
protected static void requeueDocuments(IJobManager jobManager, java.util.List<QueuedDocument> requeueList, long retryTime, long failTime, int failCount) throws ManifoldCFException
Requeue documents after a service interruption was detected.- Parameters:
jobManager
- is the job manager object.requeueList
- is a list of QueuedDocument objects describing what needs to be requeued.retryTime
- is the time that the first retry ought to be scheduled for.failTime
- is the time beyond which retries lead to hard failure.failCount
- is the number of retries allowed until hard failure.- Throws:
ManifoldCFException
-
computeComponentIDHash
protected static java.lang.String computeComponentIDHash(java.lang.String componentIdentifier) throws ManifoldCFException
- Throws:
ManifoldCFException
-
-