Class HopCount
- java.lang.Object
-
- org.apache.manifoldcf.core.database.BaseTable
-
- org.apache.manifoldcf.crawler.jobs.HopCount
-
public class HopCount extends BaseTable
This class manages the table that keeps track of hop count, and algorithmically determines this value for a document identifier upon request.
hopcountField Type Description id BIGINT Primary Key jobid BIGINT Reference:jobs.id linktype VARCHAR(255) parentidhash VARCHAR(40) distance BIGINT deathmark CHAR(1)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
HopCount.Answer
This class represents an answer - which consists both of an answer value, and also the dependencies of that answer (i.e.protected class
HopCount.DocumentHash
The Document Hash structure contains the document nodes we are interested in, including those we need answers for to proceed.protected static class
HopCount.DocumentNode
This class keeps track of the data associated with a node in the hash map.protected static class
HopCount.DocumentReference
This class describes a document reference.protected static class
HopCount.NodeQueue
A queue object allows document nodes to be ordered appropriately for the most efficient execution.protected static class
HopCount.NodeReference
This class describes a node link reference.protected static class
HopCount.Question
A class describing a document identifier and a link type, to be used in looking up the appropriate node in the hash.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
_rcsid
static int
ANSWER_INFINITY
static int
ANSWER_UNKNOWN
protected HopDeleteDeps
deleteDepsManager
Hop "delete" dependencies managerstatic java.lang.String
distanceField
static java.lang.String
idField
protected IntrinsicLink
intrinsicLinkManager
Intrinsic link table manager.static java.lang.String
jobIDField
static java.lang.String
linkTypeField
protected ILockManager
lockManager
Lock managerstatic int
MARK_DELETING
static int
MARK_NORMAL
static int
MARK_QUEUED
static java.lang.String
markForDeathField
protected static java.util.Map
markMap
static java.lang.String
parentIDHashField
protected static java.lang.Boolean
storeHopCount
If the global cluster property "storehopcount" is set to false(defaults to true), disable support for hopcount handling completely, the hopcount will never be recorded in the "intrinsiclink" or "hopcount" tables for any job at all.protected IThreadContext
threadContext
Thread context-
Fields inherited from class org.apache.manifoldcf.core.database.BaseTable
dbInterface, tableName
-
-
Constructor Summary
Constructors Constructor Description HopCount(IThreadContext tc, IDBInterface database)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean[]
addToProcessingQueue(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.lang.String[] documentIDHashes, HopCount.Answer[] startingAnswers, java.lang.String sourceDocumentIDHash, java.lang.String linkType, int hopcountMethod)
Add documents to the processing queue.void
deinstall()
Uninstall.void
deleteDocumentIdentifiers(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] documentHashes, int hopcountMethod)
Remove a set of document identifier hashes.void
deleteMatchingDocuments(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams, int hopcountMethod)
Remove a set of document identifiers specified as a criteria.void
deleteOwner(java.lang.Long jobID)
Delete an owner (and clean up the corresponding hopcount rows).protected void
doDeleteDocuments(java.lang.Long jobID, java.lang.String[] documentHashes)
Invalidate links that start with a specific set of documents.protected void
doDeleteDocuments(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams)
Invalidate links that start with a specific set of documents, described by a table join.protected void
doDeleteInvalidation(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes)
Invalidate targets of links which have a given set of source documents.protected void
doFinish(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod)
Method that does the work of "finishing" a set of child references.protected boolean[]
doRecord(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID)
Do the work of recording source-target references.int[]
findHopCounts(java.lang.Long jobID, java.lang.String[] parentIdentifierHashes, java.lang.String linkType)
Calculate a bunch of hop-counts.void
finishParents(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod)
Complete a recalculation pass for a set of source documents.void
finishSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)
Finish seed references.protected IResultSet
getDocumentChildren(java.lang.Long jobID, java.lang.String documentIDHash)
Get document's children.void
install(java.lang.String jobsTable, java.lang.String jobsColumn)
Install or upgrade.protected void
markForDelete(java.lang.Long jobID, java.util.ArrayList list, java.lang.String commonNewExpression, java.util.ArrayList commonNewList)
protected void
markForDocumentDelete(java.lang.Long jobID, java.util.ArrayList list)
static java.lang.String
markToString(int mark)
Go from mark to string.protected int
maxClauseMarkForDelete(java.lang.Long jobID)
protected int
maxClauseMarkForDocumentDelete(java.lang.Long jobID)
protected int
maxClausePerformFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes)
Calculate max clausesprotected int
maxClausePerformGetCachedDistanceDeps()
protected int
maxClausePerformGetCachedDistances(java.lang.Long jobID)
Calculate the max clauses.protected int
maxClauseProcessFind(java.lang.Long jobID, java.lang.String linkType)
Find max clause count.protected void
performFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.util.ArrayList list, java.util.Map<HopCount.Question,java.lang.Long> matchMap)
Limited find for missing records.protected void
performGetCachedDistanceDeps(java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.util.ArrayList list)
Do a limited fetch of cached distance dependenciesprotected void
performGetCachedDistances(HopCount.DocumentNode[] rval, java.util.Map<HopCount.Question,java.lang.Integer> indexMap, java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.lang.Long jobID, java.util.ArrayList ltList, java.util.ArrayList list)
Do a limited fetch of cached distancesprotected void
performMarkAddDeps(java.lang.String query, java.util.ArrayList list)
Do the work of marking add-dep-dependent links in the hopcount table.protected void
processFind(int[] rval, java.util.Map rvalMap, java.lang.Long jobID, java.lang.String linkType, java.util.ArrayList list)
Process a portion of a find request for hopcount information.boolean
processQueue(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)
Process a stage of the propagation queue for a job.protected HopCount.DocumentNode[]
readCachedNodes(java.lang.Long jobID, HopCount.Question[] unansweredQuestions)
Find the cached distance from a set of identifiers to the root.boolean
recordReference(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String targetDocumentIDHash, java.lang.String linkType, int hopcountMethod, java.lang.String processID)
Record a reference from source to target.boolean[]
recordReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID)
Record a set of references from source to target.void
recordSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] targetDocumentIDHashes, int hopcountMethod, java.lang.String processID)
Record references from a set of documents to the root.void
restart()
Clean up after all process IDs.void
restart(java.lang.String processID)
Reset, at startup time.void
restartCluster()
Restart entire cluster.void
revertParents(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes)
Revert newly-added links, because of a possibly incomplete document processing phase.static int
stringToMark(java.lang.String value)
Go from string to mark.protected void
writeCachedDistance(java.lang.Long jobID, java.lang.String[] legalLinkTypes, HopCount.DocumentNode dn, int hopcountMethod)
Write a distance into the cache.-
Methods inherited from class org.apache.manifoldcf.core.database.BaseTable
addTableIndex, analyzeTable, beginTransaction, buildConjunctionClause, constructCountClause, constructDistinctOnClause, constructDoubleCastClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, findConjunctionClauseMax, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getSleepAmt, getTableIndexes, getTableName, getTableSchema, getTransactionID, getWindowedReportMaxRows, makeTableKey, noteModifications, performAddIndex, performAlter, performCommit, performCreate, performDelete, performDrop, performInsert, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback, sleepFor
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
ANSWER_UNKNOWN
public static final int ANSWER_UNKNOWN
- See Also:
- Constant Field Values
-
ANSWER_INFINITY
public static final int ANSWER_INFINITY
- See Also:
- Constant Field Values
-
idField
public static final java.lang.String idField
- See Also:
- Constant Field Values
-
jobIDField
public static final java.lang.String jobIDField
- See Also:
- Constant Field Values
-
linkTypeField
public static final java.lang.String linkTypeField
- See Also:
- Constant Field Values
-
parentIDHashField
public static final java.lang.String parentIDHashField
- See Also:
- Constant Field Values
-
distanceField
public static final java.lang.String distanceField
- See Also:
- Constant Field Values
-
markForDeathField
public static final java.lang.String markForDeathField
- See Also:
- Constant Field Values
-
MARK_NORMAL
public static final int MARK_NORMAL
- See Also:
- Constant Field Values
-
MARK_QUEUED
public static final int MARK_QUEUED
- See Also:
- Constant Field Values
-
MARK_DELETING
public static final int MARK_DELETING
- See Also:
- Constant Field Values
-
markMap
protected static java.util.Map markMap
-
intrinsicLinkManager
protected IntrinsicLink intrinsicLinkManager
Intrinsic link table manager.
-
deleteDepsManager
protected HopDeleteDeps deleteDepsManager
Hop "delete" dependencies manager
-
threadContext
protected IThreadContext threadContext
Thread context
-
lockManager
protected final ILockManager lockManager
Lock manager
-
storeHopCount
protected static java.lang.Boolean storeHopCount
If the global cluster property "storehopcount" is set to false(defaults to true), disable support for hopcount handling completely, the hopcount will never be recorded in the "intrinsiclink" or "hopcount" tables for any job at all.
-
-
Constructor Detail
-
HopCount
public HopCount(IThreadContext tc, IDBInterface database) throws ManifoldCFException
Constructor.- Parameters:
database
- is the database handle.- Throws:
ManifoldCFException
-
-
Method Detail
-
install
public void install(java.lang.String jobsTable, java.lang.String jobsColumn) throws ManifoldCFException
Install or upgrade.- Throws:
ManifoldCFException
-
deinstall
public void deinstall() throws ManifoldCFException
Uninstall.- Throws:
ManifoldCFException
-
stringToMark
public static int stringToMark(java.lang.String value) throws ManifoldCFException
Go from string to mark.- Parameters:
value
- is the string.- Returns:
- the status value.
- Throws:
ManifoldCFException
-
markToString
public static java.lang.String markToString(int mark) throws ManifoldCFException
Go from mark to string.- Parameters:
mark
- is the mark.- Returns:
- the string.
- Throws:
ManifoldCFException
-
deleteOwner
public void deleteOwner(java.lang.Long jobID) throws ManifoldCFException
Delete an owner (and clean up the corresponding hopcount rows).- Throws:
ManifoldCFException
-
restart
public void restart(java.lang.String processID) throws ManifoldCFException
Reset, at startup time.- Parameters:
processID
- is the process ID.- Throws:
ManifoldCFException
-
restart
public void restart() throws ManifoldCFException
Clean up after all process IDs.- Throws:
ManifoldCFException
-
restartCluster
public void restartCluster() throws ManifoldCFException
Restart entire cluster.- Throws:
ManifoldCFException
-
recordSeedReferences
public void recordSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] targetDocumentIDHashes, int hopcountMethod, java.lang.String processID) throws ManifoldCFException
Record references from a set of documents to the root. These will be marked as "new" or "existing", and will have a null linktype.- Throws:
ManifoldCFException
-
finishSeedReferences
public void finishSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod) throws ManifoldCFException
Finish seed references. Seed references are special in that the only source is the root.- Throws:
ManifoldCFException
-
recordReference
public boolean recordReference(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String targetDocumentIDHash, java.lang.String linkType, int hopcountMethod, java.lang.String processID) throws ManifoldCFException
Record a reference from source to target. This reference will be marked as "new" or "existing".- Throws:
ManifoldCFException
-
recordReferences
public boolean[] recordReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID) throws ManifoldCFException
Record a set of references from source to target. This reference will be marked as "new" or "existing".- Throws:
ManifoldCFException
-
finishParents
public void finishParents(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod) throws ManifoldCFException
Complete a recalculation pass for a set of source documents. All child links that are not marked as "new" or "existing" will be removed. At the completion of this pass, the links will have their "new" flag cleared.- Throws:
ManifoldCFException
-
revertParents
public void revertParents(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes) throws ManifoldCFException
Revert newly-added links, because of a possibly incomplete document processing phase. All child links marked as "new" will be removed, and all links marked as "existing" will be reset to be "base".- Throws:
ManifoldCFException
-
doRecord
protected boolean[] doRecord(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID) throws ManifoldCFException
Do the work of recording source-target references.- Throws:
ManifoldCFException
-
deleteMatchingDocuments
public void deleteMatchingDocuments(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams, int hopcountMethod) throws ManifoldCFException
Remove a set of document identifiers specified as a criteria. This will remove hopcount rows and also intrinsic links that have the specified document identifiers as sources.- Throws:
ManifoldCFException
-
deleteDocumentIdentifiers
public void deleteDocumentIdentifiers(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] documentHashes, int hopcountMethod) throws ManifoldCFException
Remove a set of document identifier hashes. This will also remove the intrinsic links that have these document identifier hashes as sources, as well as invalidating cached hop counts that depend on them.- Throws:
ManifoldCFException
-
findHopCounts
public int[] findHopCounts(java.lang.Long jobID, java.lang.String[] parentIdentifierHashes, java.lang.String linkType) throws ManifoldCFException
Calculate a bunch of hop-counts. The values returned are only guaranteed to be an upper bound, unless the queue has recently been processed (via processQueue below). -1 will be returned to indicate "infinity".- Throws:
ManifoldCFException
-
maxClauseProcessFind
protected int maxClauseProcessFind(java.lang.Long jobID, java.lang.String linkType)
Find max clause count.
-
processFind
protected void processFind(int[] rval, java.util.Map rvalMap, java.lang.Long jobID, java.lang.String linkType, java.util.ArrayList list) throws ManifoldCFException
Process a portion of a find request for hopcount information.- Throws:
ManifoldCFException
-
processQueue
public boolean processQueue(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod) throws ManifoldCFException
Process a stage of the propagation queue for a job.- Parameters:
jobID
- is the job we need to have the hopcount propagated for.- Returns:
- true if the queue is empty.
- Throws:
ManifoldCFException
-
maxClausePerformFindMissingRecords
protected int maxClausePerformFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes)
Calculate max clauses
-
performFindMissingRecords
protected void performFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.util.ArrayList list, java.util.Map<HopCount.Question,java.lang.Long> matchMap) throws ManifoldCFException
Limited find for missing records.- Throws:
ManifoldCFException
-
addToProcessingQueue
protected boolean[] addToProcessingQueue(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.lang.String[] documentIDHashes, HopCount.Answer[] startingAnswers, java.lang.String sourceDocumentIDHash, java.lang.String linkType, int hopcountMethod) throws ManifoldCFException
Add documents to the processing queue. For the supplied bunch of link types and document ids, the corresponding hopcount records will be marked as being queued. If, for example, the affected link types are 'link' and 'redirect', and the specified document id's are 'A' and 'B' and 'C', then six hopcount rows will be created and/or queued. The values that this code uses for initial distance or delete dependencies for each of the hopcount rows combinatorially described above are calculated by this method by starting with the passed-in hopcount values and dependencies for each of the affectedLinkTypes for the specified "source" document. The result estimates are then generated by passing these values and dependencies over the links to the target document identifiers, presuming that the link is of the supplied link type.- Parameters:
jobID
- is the job the documents belong to.affectedLinkTypes
- are the set of affected link types.documentIDHashes
- are the documents to add.startingAnswers
- are the hopcounts and delete dependencies for the source document as they are currently known. The size of this array is the same as the size of the affectedLinkTypes array.sourceDocumentIDHash
- is the source document identifier for the links from source to target documents.linkType
- is the link type for this queue addition.hopcountMethod
- is the desired method of managing hopcounts.- Returns:
- a boolean array which is the subset of documentIDHashes whose distances may have changed.
- Throws:
ManifoldCFException
-
performMarkAddDeps
protected void performMarkAddDeps(java.lang.String query, java.util.ArrayList list) throws ManifoldCFException
Do the work of marking add-dep-dependent links in the hopcount table.- Throws:
ManifoldCFException
-
doFinish
protected void doFinish(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod) throws ManifoldCFException
Method that does the work of "finishing" a set of child references. The API for hopcount involves doing the following for every document that is recrawled or reassessed, INCLUDING the seeds (in which case the document hash is the empty string): (1) Record all target references of the source documents, which either adds intrinsic links, or moves them to the "existing" state (2) When done adding, call this method, which should (depending on hopcount mode) mark hopcount records in need of reassessment, and delete the intrinsic links that have the right source document and were not marked as "new" or "existing", but rather just "base".- Throws:
ManifoldCFException
-
doDeleteDocuments
protected void doDeleteDocuments(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams) throws ManifoldCFException
Invalidate links that start with a specific set of documents, described by a table join.- Throws:
ManifoldCFException
-
doDeleteDocuments
protected void doDeleteDocuments(java.lang.Long jobID, java.lang.String[] documentHashes) throws ManifoldCFException
Invalidate links that start with a specific set of documents.- Throws:
ManifoldCFException
-
maxClauseMarkForDocumentDelete
protected int maxClauseMarkForDocumentDelete(java.lang.Long jobID)
-
markForDocumentDelete
protected void markForDocumentDelete(java.lang.Long jobID, java.util.ArrayList list) throws ManifoldCFException
- Throws:
ManifoldCFException
-
doDeleteInvalidation
protected void doDeleteInvalidation(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes) throws ManifoldCFException
Invalidate targets of links which have a given set of source documents. This also removes intrinsic links that were not re-added that point to children of the source documents. The purpose of that queue is to re-establish non-infinite values for all nodes that are described in IntrinsicLinks, that are still connected to the root.- Throws:
ManifoldCFException
-
maxClauseMarkForDelete
protected int maxClauseMarkForDelete(java.lang.Long jobID)
-
markForDelete
protected void markForDelete(java.lang.Long jobID, java.util.ArrayList list, java.lang.String commonNewExpression, java.util.ArrayList commonNewList) throws ManifoldCFException
- Throws:
ManifoldCFException
-
getDocumentChildren
protected IResultSet getDocumentChildren(java.lang.Long jobID, java.lang.String documentIDHash) throws ManifoldCFException
Get document's children.- Returns:
- rows that contain the children. Column names are 'linktype','childidentifier'.
- Throws:
ManifoldCFException
-
readCachedNodes
protected HopCount.DocumentNode[] readCachedNodes(java.lang.Long jobID, HopCount.Question[] unansweredQuestions) throws ManifoldCFException
Find the cached distance from a set of identifiers to the root. This is tricky, because if there is a queue assessment going on, some values are not valid. In general, one would treat a missing record as meaning "infinity". But if the missing record is simply invalidated at the moment, we want it to be treated as "missing". So... we pick up the record despite it potentially being marked, and we then examine the mark to figure out what to do.- Returns:
- the corresponding list of nodes, taking into account unknown distances.
- Throws:
ManifoldCFException
-
maxClausePerformGetCachedDistanceDeps
protected int maxClausePerformGetCachedDistanceDeps()
-
performGetCachedDistanceDeps
protected void performGetCachedDistanceDeps(java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.util.ArrayList list) throws ManifoldCFException
Do a limited fetch of cached distance dependencies- Throws:
ManifoldCFException
-
maxClausePerformGetCachedDistances
protected int maxClausePerformGetCachedDistances(java.lang.Long jobID)
Calculate the max clauses.
-
performGetCachedDistances
protected void performGetCachedDistances(HopCount.DocumentNode[] rval, java.util.Map<HopCount.Question,java.lang.Integer> indexMap, java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.lang.Long jobID, java.util.ArrayList ltList, java.util.ArrayList list) throws ManifoldCFException
Do a limited fetch of cached distances- Throws:
ManifoldCFException
-
writeCachedDistance
protected void writeCachedDistance(java.lang.Long jobID, java.lang.String[] legalLinkTypes, HopCount.DocumentNode dn, int hopcountMethod) throws ManifoldCFException
Write a distance into the cache.- Throws:
ManifoldCFException
-
-