Class IntrinsicLink
- java.lang.Object
-
- org.apache.manifoldcf.core.database.BaseTable
-
- org.apache.manifoldcf.crawler.jobs.IntrinsicLink
-
public class IntrinsicLink extends BaseTable
This class manages the table that keeps track of intrinsic relationships between documents.
intrinsiclinkField Type Description jobid BIGINT Reference:jobs.id linktype VARCHAR(255) parentidhash VARCHAR(40) childidhash VARCHAR(40) isnew CHAR(1) processid VARCHAR(16)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classIntrinsicLink.DuplicateFinder
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidstatic java.lang.StringchildIDHashFieldstatic java.lang.StringjobIDFieldprotected static intLINKSTATUS_BASEThe standard value for this field.protected static intLINKSTATUS_EXISTINGThis value means that the link existed before, and has been found during this scan.protected static intLINKSTATUS_NEWThis value means that the link is brand-new; it did not exist before this pass.protected static java.util.MaplinkstatusMapstatic java.lang.StringlinkTypeFieldstatic java.lang.StringnewFieldstatic java.lang.StringparentIDHashFieldstatic java.lang.StringprocessIDField-
Fields inherited from class org.apache.manifoldcf.core.database.BaseTable
dbInterface, tableName
-
-
Constructor Summary
Constructors Constructor Description IntrinsicLink(IDBInterface database)Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidanalyzeTables()Analyze job tables that need analysis.voiddeinstall()Uninstall.voiddeleteOwner(java.lang.Long jobID)Delete an owner (and clean up the corresponding hopcount rows).IResultSetgetDocumentChildren(java.lang.Long jobID, java.lang.String parentIDHash)Get document's children.java.lang.String[]getDocumentUniqueParents(java.lang.Long jobID, java.lang.String childIDHash)Get document's parents.voidinstall(java.lang.String jobsTable, java.lang.String jobsColumn)Install or upgrade.protected intmaxClausePerformExistsCheck(java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash)Calculate the max clauses for the exists checkprotected intmaxClausePerformRemoveDocumentLinks(java.lang.Long jobID)protected intmaxClausePerformRemoveLinks(java.lang.Long jobID)protected intmaxClausesPerformRestoreLinks(java.lang.Long jobID)protected intmaxClausesPerformRevertLinks(java.lang.Long jobID)protected voidperformExistsCheck(java.util.Set<java.lang.String> presentMap, java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash, java.util.List<java.lang.String> list)Do the exists check, in batch.protected voidperformRemoveDocumentLinks(java.util.List<java.lang.String> list, java.lang.Long jobID)protected voidperformRemoveLinks(java.util.List<java.lang.String> list, java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams)protected voidperformRestoreLinks(java.lang.Long jobID, java.util.List<java.lang.String> list)protected voidperformRevertLinks(java.lang.Long jobID, java.util.List<java.lang.String> list)java.lang.String[]recordReferences(java.lang.Long jobID, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, java.lang.String processID)Record a references from source to targets.voidremoveDocumentLinks(java.lang.Long jobID, java.lang.String[] documentIDHashes)Remove all links that mention a specific set of documents.voidremoveDocumentLinks(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams)Remove all links that mention a specific set of documents, as described by a join.voidremoveLinks(java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams, java.lang.String[] sourceDocumentIDHashes)Remove all target links of the specified source documents that are not marked as "new" or "existing", and return the others to their base state.voidrestart()Clean up after all process IDsvoidrestart(java.lang.String processID)Reset, at startup time.voidrestartCluster()voidrestoreLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes)Return all target links of the specified source documents to their base state.voidrevertLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes)Throw away links added during (aborted) processing.static java.lang.StringstatusToString(int status)Convert link status to stringstatic intstringToStatus(java.lang.String status)Convert string to link status.-
Methods inherited from class org.apache.manifoldcf.core.database.BaseTable
addTableIndex, analyzeTable, beginTransaction, buildConjunctionClause, constructCountClause, constructDistinctOnClause, constructDoubleCastClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, findConjunctionClauseMax, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getSleepAmt, getTableIndexes, getTableName, getTableSchema, getTransactionID, getWindowedReportMaxRows, makeTableKey, noteModifications, performAddIndex, performAlter, performCommit, performCreate, performDelete, performDrop, performInsert, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback, sleepFor
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
LINKSTATUS_BASE
protected static final int LINKSTATUS_BASE
The standard value for this field. Means that the link existed prior to this scan, and no new link was found yet.- See Also:
- Constant Field Values
-
LINKSTATUS_NEW
protected static final int LINKSTATUS_NEW
This value means that the link is brand-new; it did not exist before this pass.- See Also:
- Constant Field Values
-
LINKSTATUS_EXISTING
protected static final int LINKSTATUS_EXISTING
This value means that the link existed before, and has been found during this scan.- See Also:
- Constant Field Values
-
jobIDField
public static final java.lang.String jobIDField
- See Also:
- Constant Field Values
-
linkTypeField
public static final java.lang.String linkTypeField
- See Also:
- Constant Field Values
-
parentIDHashField
public static final java.lang.String parentIDHashField
- See Also:
- Constant Field Values
-
childIDHashField
public static final java.lang.String childIDHashField
- See Also:
- Constant Field Values
-
newField
public static final java.lang.String newField
- See Also:
- Constant Field Values
-
processIDField
public static final java.lang.String processIDField
- See Also:
- Constant Field Values
-
linkstatusMap
protected static java.util.Map linkstatusMap
-
-
Constructor Detail
-
IntrinsicLink
public IntrinsicLink(IDBInterface database) throws ManifoldCFException
Constructor.- Parameters:
database- is the database handle.- Throws:
ManifoldCFException
-
-
Method Detail
-
install
public void install(java.lang.String jobsTable, java.lang.String jobsColumn) throws ManifoldCFExceptionInstall or upgrade.- Throws:
ManifoldCFException
-
deinstall
public void deinstall() throws ManifoldCFExceptionUninstall.- Throws:
ManifoldCFException
-
analyzeTables
public void analyzeTables() throws ManifoldCFExceptionAnalyze job tables that need analysis.- Throws:
ManifoldCFException
-
deleteOwner
public void deleteOwner(java.lang.Long jobID) throws ManifoldCFExceptionDelete an owner (and clean up the corresponding hopcount rows).- Throws:
ManifoldCFException
-
restart
public void restart(java.lang.String processID) throws ManifoldCFExceptionReset, at startup time. Since links can only be added in a transactionally safe way by processing of documents, and cached records of hopcount are updated only when requested, it is safest to simply move any "new" or "new existing" links back to base state on startup. Then, the next time that page is processed, the links will be updated properly.- Parameters:
processID- is the process to restart.- Throws:
ManifoldCFException
-
restart
public void restart() throws ManifoldCFExceptionClean up after all process IDs- Throws:
ManifoldCFException
-
restartCluster
public void restartCluster() throws ManifoldCFException- Throws:
ManifoldCFException
-
recordReferences
public java.lang.String[] recordReferences(java.lang.Long jobID, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, java.lang.String processID) throws ManifoldCFExceptionRecord a references from source to targets. These references will be marked as either "new" or "existing".- Returns:
- the target document ID's that are considered "new".
- Throws:
ManifoldCFException
-
maxClausePerformExistsCheck
protected int maxClausePerformExistsCheck(java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash)Calculate the max clauses for the exists check
-
performExistsCheck
protected void performExistsCheck(java.util.Set<java.lang.String> presentMap, java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash, java.util.List<java.lang.String> list) throws ManifoldCFExceptionDo the exists check, in batch.- Throws:
ManifoldCFException
-
removeDocumentLinks
public void removeDocumentLinks(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams) throws ManifoldCFExceptionRemove all links that mention a specific set of documents, as described by a join.- Throws:
ManifoldCFException
-
removeDocumentLinks
public void removeDocumentLinks(java.lang.Long jobID, java.lang.String[] documentIDHashes) throws ManifoldCFExceptionRemove all links that mention a specific set of documents.- Throws:
ManifoldCFException
-
maxClausePerformRemoveDocumentLinks
protected int maxClausePerformRemoveDocumentLinks(java.lang.Long jobID)
-
performRemoveDocumentLinks
protected void performRemoveDocumentLinks(java.util.List<java.lang.String> list, java.lang.Long jobID) throws ManifoldCFException- Throws:
ManifoldCFException
-
removeLinks
public void removeLinks(java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams, java.lang.String[] sourceDocumentIDHashes) throws ManifoldCFExceptionRemove all target links of the specified source documents that are not marked as "new" or "existing", and return the others to their base state.- Throws:
ManifoldCFException
-
maxClausePerformRemoveLinks
protected int maxClausePerformRemoveLinks(java.lang.Long jobID)
-
performRemoveLinks
protected void performRemoveLinks(java.util.List<java.lang.String> list, java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams) throws ManifoldCFException- Throws:
ManifoldCFException
-
restoreLinks
public void restoreLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes) throws ManifoldCFExceptionReturn all target links of the specified source documents to their base state.- Throws:
ManifoldCFException
-
maxClausesPerformRestoreLinks
protected int maxClausesPerformRestoreLinks(java.lang.Long jobID)
-
performRestoreLinks
protected void performRestoreLinks(java.lang.Long jobID, java.util.List<java.lang.String> list) throws ManifoldCFException- Throws:
ManifoldCFException
-
revertLinks
public void revertLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes) throws ManifoldCFExceptionThrow away links added during (aborted) processing.- Throws:
ManifoldCFException
-
maxClausesPerformRevertLinks
protected int maxClausesPerformRevertLinks(java.lang.Long jobID)
-
performRevertLinks
protected void performRevertLinks(java.lang.Long jobID, java.util.List<java.lang.String> list) throws ManifoldCFException- Throws:
ManifoldCFException
-
getDocumentChildren
public IResultSet getDocumentChildren(java.lang.Long jobID, java.lang.String parentIDHash) throws ManifoldCFException
Get document's children.- Returns:
- rows that contain the children. Column names are 'linktype','childidentifier'.
- Throws:
ManifoldCFException
-
getDocumentUniqueParents
public java.lang.String[] getDocumentUniqueParents(java.lang.Long jobID, java.lang.String childIDHash) throws ManifoldCFExceptionGet document's parents.- Returns:
- a set of document identifier hashes that constitute parents of the specified identifier.
- Throws:
ManifoldCFException
-
stringToStatus
public static int stringToStatus(java.lang.String status)
Convert string to link status.
-
statusToString
public static java.lang.String statusToString(int status)
Convert link status to string
-
-