Class HopCount
- java.lang.Object
-
- org.apache.manifoldcf.core.database.BaseTable
-
- org.apache.manifoldcf.crawler.jobs.HopCount
-
public class HopCount extends BaseTable
This class manages the table that keeps track of hop count, and algorithmically determines this value for a document identifier upon request.
hopcountField Type Description id BIGINT Primary Key jobid BIGINT Reference:jobs.id linktype VARCHAR(255) parentidhash VARCHAR(40) distance BIGINT deathmark CHAR(1)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classHopCount.AnswerThis class represents an answer - which consists both of an answer value, and also the dependencies of that answer (i.e.protected classHopCount.DocumentHashThe Document Hash structure contains the document nodes we are interested in, including those we need answers for to proceed.protected static classHopCount.DocumentNodeThis class keeps track of the data associated with a node in the hash map.protected static classHopCount.DocumentReferenceThis class describes a document reference.protected static classHopCount.NodeQueueA queue object allows document nodes to be ordered appropriately for the most efficient execution.protected static classHopCount.NodeReferenceThis class describes a node link reference.protected static classHopCount.QuestionA class describing a document identifier and a link type, to be used in looking up the appropriate node in the hash.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidstatic intANSWER_INFINITYstatic intANSWER_UNKNOWNprotected HopDeleteDepsdeleteDepsManagerHop "delete" dependencies managerstatic java.lang.StringdistanceFieldstatic java.lang.StringidFieldprotected IntrinsicLinkintrinsicLinkManagerIntrinsic link table manager.static java.lang.StringjobIDFieldstatic java.lang.StringlinkTypeFieldprotected ILockManagerlockManagerLock managerstatic intMARK_DELETINGstatic intMARK_NORMALstatic intMARK_QUEUEDstatic java.lang.StringmarkForDeathFieldprotected static java.util.MapmarkMapstatic java.lang.StringparentIDHashFieldprotected static java.lang.BooleanstoreHopCountIf the global cluster property "storehopcount" is set to false(defaults to true), disable support for hopcount handling completely, the hopcount will never be recorded in the "intrinsiclink" or "hopcount" tables for any job at all.protected IThreadContextthreadContextThread context-
Fields inherited from class org.apache.manifoldcf.core.database.BaseTable
dbInterface, tableName
-
-
Constructor Summary
Constructors Constructor Description HopCount(IThreadContext tc, IDBInterface database)Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean[]addToProcessingQueue(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.lang.String[] documentIDHashes, HopCount.Answer[] startingAnswers, java.lang.String sourceDocumentIDHash, java.lang.String linkType, int hopcountMethod)Add documents to the processing queue.voiddeinstall()Uninstall.voiddeleteDocumentIdentifiers(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] documentHashes, int hopcountMethod)Remove a set of document identifier hashes.voiddeleteMatchingDocuments(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams, int hopcountMethod)Remove a set of document identifiers specified as a criteria.voiddeleteOwner(java.lang.Long jobID)Delete an owner (and clean up the corresponding hopcount rows).protected voiddoDeleteDocuments(java.lang.Long jobID, java.lang.String[] documentHashes)Invalidate links that start with a specific set of documents.protected voiddoDeleteDocuments(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams)Invalidate links that start with a specific set of documents, described by a table join.protected voiddoDeleteInvalidation(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes)Invalidate targets of links which have a given set of source documents.protected voiddoFinish(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod)Method that does the work of "finishing" a set of child references.protected boolean[]doRecord(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID)Do the work of recording source-target references.int[]findHopCounts(java.lang.Long jobID, java.lang.String[] parentIdentifierHashes, java.lang.String linkType)Calculate a bunch of hop-counts.voidfinishParents(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod)Complete a recalculation pass for a set of source documents.voidfinishSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)Finish seed references.protected IResultSetgetDocumentChildren(java.lang.Long jobID, java.lang.String documentIDHash)Get document's children.voidinstall(java.lang.String jobsTable, java.lang.String jobsColumn)Install or upgrade.protected voidmarkForDelete(java.lang.Long jobID, java.util.ArrayList list, java.lang.String commonNewExpression, java.util.ArrayList commonNewList)protected voidmarkForDocumentDelete(java.lang.Long jobID, java.util.ArrayList list)static java.lang.StringmarkToString(int mark)Go from mark to string.protected intmaxClauseMarkForDelete(java.lang.Long jobID)protected intmaxClauseMarkForDocumentDelete(java.lang.Long jobID)protected intmaxClausePerformFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes)Calculate max clausesprotected intmaxClausePerformGetCachedDistanceDeps()protected intmaxClausePerformGetCachedDistances(java.lang.Long jobID)Calculate the max clauses.protected intmaxClauseProcessFind(java.lang.Long jobID, java.lang.String linkType)Find max clause count.protected voidperformFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.util.ArrayList list, java.util.Map<HopCount.Question,java.lang.Long> matchMap)Limited find for missing records.protected voidperformGetCachedDistanceDeps(java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.util.ArrayList list)Do a limited fetch of cached distance dependenciesprotected voidperformGetCachedDistances(HopCount.DocumentNode[] rval, java.util.Map<HopCount.Question,java.lang.Integer> indexMap, java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.lang.Long jobID, java.util.ArrayList ltList, java.util.ArrayList list)Do a limited fetch of cached distancesprotected voidperformMarkAddDeps(java.lang.String query, java.util.ArrayList list)Do the work of marking add-dep-dependent links in the hopcount table.protected voidprocessFind(int[] rval, java.util.Map rvalMap, java.lang.Long jobID, java.lang.String linkType, java.util.ArrayList list)Process a portion of a find request for hopcount information.booleanprocessQueue(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)Process a stage of the propagation queue for a job.protected HopCount.DocumentNode[]readCachedNodes(java.lang.Long jobID, HopCount.Question[] unansweredQuestions)Find the cached distance from a set of identifiers to the root.booleanrecordReference(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String targetDocumentIDHash, java.lang.String linkType, int hopcountMethod, java.lang.String processID)Record a reference from source to target.boolean[]recordReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID)Record a set of references from source to target.voidrecordSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] targetDocumentIDHashes, int hopcountMethod, java.lang.String processID)Record references from a set of documents to the root.voidrestart()Clean up after all process IDs.voidrestart(java.lang.String processID)Reset, at startup time.voidrestartCluster()Restart entire cluster.voidrevertParents(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes)Revert newly-added links, because of a possibly incomplete document processing phase.static intstringToMark(java.lang.String value)Go from string to mark.protected voidwriteCachedDistance(java.lang.Long jobID, java.lang.String[] legalLinkTypes, HopCount.DocumentNode dn, int hopcountMethod)Write a distance into the cache.-
Methods inherited from class org.apache.manifoldcf.core.database.BaseTable
addTableIndex, analyzeTable, beginTransaction, buildConjunctionClause, constructCountClause, constructDistinctOnClause, constructDoubleCastClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, findConjunctionClauseMax, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getSleepAmt, getTableIndexes, getTableName, getTableSchema, getTransactionID, getWindowedReportMaxRows, makeTableKey, noteModifications, performAddIndex, performAlter, performCommit, performCreate, performDelete, performDrop, performInsert, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback, sleepFor
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
ANSWER_UNKNOWN
public static final int ANSWER_UNKNOWN
- See Also:
- Constant Field Values
-
ANSWER_INFINITY
public static final int ANSWER_INFINITY
- See Also:
- Constant Field Values
-
idField
public static final java.lang.String idField
- See Also:
- Constant Field Values
-
jobIDField
public static final java.lang.String jobIDField
- See Also:
- Constant Field Values
-
linkTypeField
public static final java.lang.String linkTypeField
- See Also:
- Constant Field Values
-
parentIDHashField
public static final java.lang.String parentIDHashField
- See Also:
- Constant Field Values
-
distanceField
public static final java.lang.String distanceField
- See Also:
- Constant Field Values
-
markForDeathField
public static final java.lang.String markForDeathField
- See Also:
- Constant Field Values
-
MARK_NORMAL
public static final int MARK_NORMAL
- See Also:
- Constant Field Values
-
MARK_QUEUED
public static final int MARK_QUEUED
- See Also:
- Constant Field Values
-
MARK_DELETING
public static final int MARK_DELETING
- See Also:
- Constant Field Values
-
markMap
protected static java.util.Map markMap
-
intrinsicLinkManager
protected IntrinsicLink intrinsicLinkManager
Intrinsic link table manager.
-
deleteDepsManager
protected HopDeleteDeps deleteDepsManager
Hop "delete" dependencies manager
-
threadContext
protected IThreadContext threadContext
Thread context
-
lockManager
protected final ILockManager lockManager
Lock manager
-
storeHopCount
protected static java.lang.Boolean storeHopCount
If the global cluster property "storehopcount" is set to false(defaults to true), disable support for hopcount handling completely, the hopcount will never be recorded in the "intrinsiclink" or "hopcount" tables for any job at all.
-
-
Constructor Detail
-
HopCount
public HopCount(IThreadContext tc, IDBInterface database) throws ManifoldCFException
Constructor.- Parameters:
database- is the database handle.- Throws:
ManifoldCFException
-
-
Method Detail
-
install
public void install(java.lang.String jobsTable, java.lang.String jobsColumn) throws ManifoldCFExceptionInstall or upgrade.- Throws:
ManifoldCFException
-
deinstall
public void deinstall() throws ManifoldCFExceptionUninstall.- Throws:
ManifoldCFException
-
stringToMark
public static int stringToMark(java.lang.String value) throws ManifoldCFExceptionGo from string to mark.- Parameters:
value- is the string.- Returns:
- the status value.
- Throws:
ManifoldCFException
-
markToString
public static java.lang.String markToString(int mark) throws ManifoldCFExceptionGo from mark to string.- Parameters:
mark- is the mark.- Returns:
- the string.
- Throws:
ManifoldCFException
-
deleteOwner
public void deleteOwner(java.lang.Long jobID) throws ManifoldCFExceptionDelete an owner (and clean up the corresponding hopcount rows).- Throws:
ManifoldCFException
-
restart
public void restart(java.lang.String processID) throws ManifoldCFExceptionReset, at startup time.- Parameters:
processID- is the process ID.- Throws:
ManifoldCFException
-
restart
public void restart() throws ManifoldCFExceptionClean up after all process IDs.- Throws:
ManifoldCFException
-
restartCluster
public void restartCluster() throws ManifoldCFExceptionRestart entire cluster.- Throws:
ManifoldCFException
-
recordSeedReferences
public void recordSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] targetDocumentIDHashes, int hopcountMethod, java.lang.String processID) throws ManifoldCFExceptionRecord references from a set of documents to the root. These will be marked as "new" or "existing", and will have a null linktype.- Throws:
ManifoldCFException
-
finishSeedReferences
public void finishSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod) throws ManifoldCFExceptionFinish seed references. Seed references are special in that the only source is the root.- Throws:
ManifoldCFException
-
recordReference
public boolean recordReference(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String targetDocumentIDHash, java.lang.String linkType, int hopcountMethod, java.lang.String processID) throws ManifoldCFExceptionRecord a reference from source to target. This reference will be marked as "new" or "existing".- Throws:
ManifoldCFException
-
recordReferences
public boolean[] recordReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID) throws ManifoldCFExceptionRecord a set of references from source to target. This reference will be marked as "new" or "existing".- Throws:
ManifoldCFException
-
finishParents
public void finishParents(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod) throws ManifoldCFExceptionComplete a recalculation pass for a set of source documents. All child links that are not marked as "new" or "existing" will be removed. At the completion of this pass, the links will have their "new" flag cleared.- Throws:
ManifoldCFException
-
revertParents
public void revertParents(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes) throws ManifoldCFExceptionRevert newly-added links, because of a possibly incomplete document processing phase. All child links marked as "new" will be removed, and all links marked as "existing" will be reset to be "base".- Throws:
ManifoldCFException
-
doRecord
protected boolean[] doRecord(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod, java.lang.String processID) throws ManifoldCFExceptionDo the work of recording source-target references.- Throws:
ManifoldCFException
-
deleteMatchingDocuments
public void deleteMatchingDocuments(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams, int hopcountMethod) throws ManifoldCFExceptionRemove a set of document identifiers specified as a criteria. This will remove hopcount rows and also intrinsic links that have the specified document identifiers as sources.- Throws:
ManifoldCFException
-
deleteDocumentIdentifiers
public void deleteDocumentIdentifiers(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] documentHashes, int hopcountMethod) throws ManifoldCFExceptionRemove a set of document identifier hashes. This will also remove the intrinsic links that have these document identifier hashes as sources, as well as invalidating cached hop counts that depend on them.- Throws:
ManifoldCFException
-
findHopCounts
public int[] findHopCounts(java.lang.Long jobID, java.lang.String[] parentIdentifierHashes, java.lang.String linkType) throws ManifoldCFExceptionCalculate a bunch of hop-counts. The values returned are only guaranteed to be an upper bound, unless the queue has recently been processed (via processQueue below). -1 will be returned to indicate "infinity".- Throws:
ManifoldCFException
-
maxClauseProcessFind
protected int maxClauseProcessFind(java.lang.Long jobID, java.lang.String linkType)Find max clause count.
-
processFind
protected void processFind(int[] rval, java.util.Map rvalMap, java.lang.Long jobID, java.lang.String linkType, java.util.ArrayList list) throws ManifoldCFExceptionProcess a portion of a find request for hopcount information.- Throws:
ManifoldCFException
-
processQueue
public boolean processQueue(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod) throws ManifoldCFExceptionProcess a stage of the propagation queue for a job.- Parameters:
jobID- is the job we need to have the hopcount propagated for.- Returns:
- true if the queue is empty.
- Throws:
ManifoldCFException
-
maxClausePerformFindMissingRecords
protected int maxClausePerformFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes)Calculate max clauses
-
performFindMissingRecords
protected void performFindMissingRecords(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.util.ArrayList list, java.util.Map<HopCount.Question,java.lang.Long> matchMap) throws ManifoldCFExceptionLimited find for missing records.- Throws:
ManifoldCFException
-
addToProcessingQueue
protected boolean[] addToProcessingQueue(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.lang.String[] documentIDHashes, HopCount.Answer[] startingAnswers, java.lang.String sourceDocumentIDHash, java.lang.String linkType, int hopcountMethod) throws ManifoldCFExceptionAdd documents to the processing queue. For the supplied bunch of link types and document ids, the corresponding hopcount records will be marked as being queued. If, for example, the affected link types are 'link' and 'redirect', and the specified document id's are 'A' and 'B' and 'C', then six hopcount rows will be created and/or queued. The values that this code uses for initial distance or delete dependencies for each of the hopcount rows combinatorially described above are calculated by this method by starting with the passed-in hopcount values and dependencies for each of the affectedLinkTypes for the specified "source" document. The result estimates are then generated by passing these values and dependencies over the links to the target document identifiers, presuming that the link is of the supplied link type.- Parameters:
jobID- is the job the documents belong to.affectedLinkTypes- are the set of affected link types.documentIDHashes- are the documents to add.startingAnswers- are the hopcounts and delete dependencies for the source document as they are currently known. The size of this array is the same as the size of the affectedLinkTypes array.sourceDocumentIDHash- is the source document identifier for the links from source to target documents.linkType- is the link type for this queue addition.hopcountMethod- is the desired method of managing hopcounts.- Returns:
- a boolean array which is the subset of documentIDHashes whose distances may have changed.
- Throws:
ManifoldCFException
-
performMarkAddDeps
protected void performMarkAddDeps(java.lang.String query, java.util.ArrayList list) throws ManifoldCFExceptionDo the work of marking add-dep-dependent links in the hopcount table.- Throws:
ManifoldCFException
-
doFinish
protected void doFinish(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod) throws ManifoldCFExceptionMethod that does the work of "finishing" a set of child references. The API for hopcount involves doing the following for every document that is recrawled or reassessed, INCLUDING the seeds (in which case the document hash is the empty string): (1) Record all target references of the source documents, which either adds intrinsic links, or moves them to the "existing" state (2) When done adding, call this method, which should (depending on hopcount mode) mark hopcount records in need of reassessment, and delete the intrinsic links that have the right source document and were not marked as "new" or "existing", but rather just "base".- Throws:
ManifoldCFException
-
doDeleteDocuments
protected void doDeleteDocuments(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams) throws ManifoldCFExceptionInvalidate links that start with a specific set of documents, described by a table join.- Throws:
ManifoldCFException
-
doDeleteDocuments
protected void doDeleteDocuments(java.lang.Long jobID, java.lang.String[] documentHashes) throws ManifoldCFExceptionInvalidate links that start with a specific set of documents.- Throws:
ManifoldCFException
-
maxClauseMarkForDocumentDelete
protected int maxClauseMarkForDocumentDelete(java.lang.Long jobID)
-
markForDocumentDelete
protected void markForDocumentDelete(java.lang.Long jobID, java.util.ArrayList list) throws ManifoldCFException- Throws:
ManifoldCFException
-
doDeleteInvalidation
protected void doDeleteInvalidation(java.lang.Long jobID, java.lang.String[] sourceDocumentHashes) throws ManifoldCFExceptionInvalidate targets of links which have a given set of source documents. This also removes intrinsic links that were not re-added that point to children of the source documents. The purpose of that queue is to re-establish non-infinite values for all nodes that are described in IntrinsicLinks, that are still connected to the root.- Throws:
ManifoldCFException
-
maxClauseMarkForDelete
protected int maxClauseMarkForDelete(java.lang.Long jobID)
-
markForDelete
protected void markForDelete(java.lang.Long jobID, java.util.ArrayList list, java.lang.String commonNewExpression, java.util.ArrayList commonNewList) throws ManifoldCFException- Throws:
ManifoldCFException
-
getDocumentChildren
protected IResultSet getDocumentChildren(java.lang.Long jobID, java.lang.String documentIDHash) throws ManifoldCFException
Get document's children.- Returns:
- rows that contain the children. Column names are 'linktype','childidentifier'.
- Throws:
ManifoldCFException
-
readCachedNodes
protected HopCount.DocumentNode[] readCachedNodes(java.lang.Long jobID, HopCount.Question[] unansweredQuestions) throws ManifoldCFException
Find the cached distance from a set of identifiers to the root. This is tricky, because if there is a queue assessment going on, some values are not valid. In general, one would treat a missing record as meaning "infinity". But if the missing record is simply invalidated at the moment, we want it to be treated as "missing". So... we pick up the record despite it potentially being marked, and we then examine the mark to figure out what to do.- Returns:
- the corresponding list of nodes, taking into account unknown distances.
- Throws:
ManifoldCFException
-
maxClausePerformGetCachedDistanceDeps
protected int maxClausePerformGetCachedDistanceDeps()
-
performGetCachedDistanceDeps
protected void performGetCachedDistanceDeps(java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.util.ArrayList list) throws ManifoldCFException
Do a limited fetch of cached distance dependencies- Throws:
ManifoldCFException
-
maxClausePerformGetCachedDistances
protected int maxClausePerformGetCachedDistances(java.lang.Long jobID)
Calculate the max clauses.
-
performGetCachedDistances
protected void performGetCachedDistances(HopCount.DocumentNode[] rval, java.util.Map<HopCount.Question,java.lang.Integer> indexMap, java.util.Map<java.lang.Long,HopCount.DocumentNode> depsMap, java.lang.Long jobID, java.util.ArrayList ltList, java.util.ArrayList list) throws ManifoldCFException
Do a limited fetch of cached distances- Throws:
ManifoldCFException
-
writeCachedDistance
protected void writeCachedDistance(java.lang.Long jobID, java.lang.String[] legalLinkTypes, HopCount.DocumentNode dn, int hopcountMethod) throws ManifoldCFExceptionWrite a distance into the cache.- Throws:
ManifoldCFException
-
-