Class IntrinsicLink
- java.lang.Object
-
- org.apache.manifoldcf.core.database.BaseTable
-
- org.apache.manifoldcf.crawler.jobs.IntrinsicLink
-
public class IntrinsicLink extends BaseTable
This class manages the table that keeps track of intrinsic relationships between documents.
intrinsiclinkField Type Description jobid BIGINT Reference:jobs.id linktype VARCHAR(255) parentidhash VARCHAR(40) childidhash VARCHAR(40) isnew CHAR(1) processid VARCHAR(16)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
IntrinsicLink.DuplicateFinder
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
_rcsid
static java.lang.String
childIDHashField
static java.lang.String
jobIDField
protected static int
LINKSTATUS_BASE
The standard value for this field.protected static int
LINKSTATUS_EXISTING
This value means that the link existed before, and has been found during this scan.protected static int
LINKSTATUS_NEW
This value means that the link is brand-new; it did not exist before this pass.protected static java.util.Map
linkstatusMap
static java.lang.String
linkTypeField
static java.lang.String
newField
static java.lang.String
parentIDHashField
static java.lang.String
processIDField
-
Fields inherited from class org.apache.manifoldcf.core.database.BaseTable
dbInterface, tableName
-
-
Constructor Summary
Constructors Constructor Description IntrinsicLink(IDBInterface database)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
analyzeTables()
Analyze job tables that need analysis.void
deinstall()
Uninstall.void
deleteOwner(java.lang.Long jobID)
Delete an owner (and clean up the corresponding hopcount rows).IResultSet
getDocumentChildren(java.lang.Long jobID, java.lang.String parentIDHash)
Get document's children.java.lang.String[]
getDocumentUniqueParents(java.lang.Long jobID, java.lang.String childIDHash)
Get document's parents.void
install(java.lang.String jobsTable, java.lang.String jobsColumn)
Install or upgrade.protected int
maxClausePerformExistsCheck(java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash)
Calculate the max clauses for the exists checkprotected int
maxClausePerformRemoveDocumentLinks(java.lang.Long jobID)
protected int
maxClausePerformRemoveLinks(java.lang.Long jobID)
protected int
maxClausesPerformRestoreLinks(java.lang.Long jobID)
protected int
maxClausesPerformRevertLinks(java.lang.Long jobID)
protected void
performExistsCheck(java.util.Set<java.lang.String> presentMap, java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash, java.util.List<java.lang.String> list)
Do the exists check, in batch.protected void
performRemoveDocumentLinks(java.util.List<java.lang.String> list, java.lang.Long jobID)
protected void
performRemoveLinks(java.util.List<java.lang.String> list, java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams)
protected void
performRestoreLinks(java.lang.Long jobID, java.util.List<java.lang.String> list)
protected void
performRevertLinks(java.lang.Long jobID, java.util.List<java.lang.String> list)
java.lang.String[]
recordReferences(java.lang.Long jobID, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, java.lang.String processID)
Record a references from source to targets.void
removeDocumentLinks(java.lang.Long jobID, java.lang.String[] documentIDHashes)
Remove all links that mention a specific set of documents.void
removeDocumentLinks(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams)
Remove all links that mention a specific set of documents, as described by a join.void
removeLinks(java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams, java.lang.String[] sourceDocumentIDHashes)
Remove all target links of the specified source documents that are not marked as "new" or "existing", and return the others to their base state.void
restart()
Clean up after all process IDsvoid
restart(java.lang.String processID)
Reset, at startup time.void
restartCluster()
void
restoreLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes)
Return all target links of the specified source documents to their base state.void
revertLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes)
Throw away links added during (aborted) processing.static java.lang.String
statusToString(int status)
Convert link status to stringstatic int
stringToStatus(java.lang.String status)
Convert string to link status.-
Methods inherited from class org.apache.manifoldcf.core.database.BaseTable
addTableIndex, analyzeTable, beginTransaction, buildConjunctionClause, constructCountClause, constructDistinctOnClause, constructDoubleCastClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, findConjunctionClauseMax, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getSleepAmt, getTableIndexes, getTableName, getTableSchema, getTransactionID, getWindowedReportMaxRows, makeTableKey, noteModifications, performAddIndex, performAlter, performCommit, performCreate, performDelete, performDrop, performInsert, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback, sleepFor
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
LINKSTATUS_BASE
protected static final int LINKSTATUS_BASE
The standard value for this field. Means that the link existed prior to this scan, and no new link was found yet.- See Also:
- Constant Field Values
-
LINKSTATUS_NEW
protected static final int LINKSTATUS_NEW
This value means that the link is brand-new; it did not exist before this pass.- See Also:
- Constant Field Values
-
LINKSTATUS_EXISTING
protected static final int LINKSTATUS_EXISTING
This value means that the link existed before, and has been found during this scan.- See Also:
- Constant Field Values
-
jobIDField
public static final java.lang.String jobIDField
- See Also:
- Constant Field Values
-
linkTypeField
public static final java.lang.String linkTypeField
- See Also:
- Constant Field Values
-
parentIDHashField
public static final java.lang.String parentIDHashField
- See Also:
- Constant Field Values
-
childIDHashField
public static final java.lang.String childIDHashField
- See Also:
- Constant Field Values
-
newField
public static final java.lang.String newField
- See Also:
- Constant Field Values
-
processIDField
public static final java.lang.String processIDField
- See Also:
- Constant Field Values
-
linkstatusMap
protected static java.util.Map linkstatusMap
-
-
Constructor Detail
-
IntrinsicLink
public IntrinsicLink(IDBInterface database) throws ManifoldCFException
Constructor.- Parameters:
database
- is the database handle.- Throws:
ManifoldCFException
-
-
Method Detail
-
install
public void install(java.lang.String jobsTable, java.lang.String jobsColumn) throws ManifoldCFException
Install or upgrade.- Throws:
ManifoldCFException
-
deinstall
public void deinstall() throws ManifoldCFException
Uninstall.- Throws:
ManifoldCFException
-
analyzeTables
public void analyzeTables() throws ManifoldCFException
Analyze job tables that need analysis.- Throws:
ManifoldCFException
-
deleteOwner
public void deleteOwner(java.lang.Long jobID) throws ManifoldCFException
Delete an owner (and clean up the corresponding hopcount rows).- Throws:
ManifoldCFException
-
restart
public void restart(java.lang.String processID) throws ManifoldCFException
Reset, at startup time. Since links can only be added in a transactionally safe way by processing of documents, and cached records of hopcount are updated only when requested, it is safest to simply move any "new" or "new existing" links back to base state on startup. Then, the next time that page is processed, the links will be updated properly.- Parameters:
processID
- is the process to restart.- Throws:
ManifoldCFException
-
restart
public void restart() throws ManifoldCFException
Clean up after all process IDs- Throws:
ManifoldCFException
-
restartCluster
public void restartCluster() throws ManifoldCFException
- Throws:
ManifoldCFException
-
recordReferences
public java.lang.String[] recordReferences(java.lang.Long jobID, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, java.lang.String processID) throws ManifoldCFException
Record a references from source to targets. These references will be marked as either "new" or "existing".- Returns:
- the target document ID's that are considered "new".
- Throws:
ManifoldCFException
-
maxClausePerformExistsCheck
protected int maxClausePerformExistsCheck(java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash)
Calculate the max clauses for the exists check
-
performExistsCheck
protected void performExistsCheck(java.util.Set<java.lang.String> presentMap, java.lang.Long jobID, java.lang.String linkType, java.lang.String childIDHash, java.util.List<java.lang.String> list) throws ManifoldCFException
Do the exists check, in batch.- Throws:
ManifoldCFException
-
removeDocumentLinks
public void removeDocumentLinks(java.lang.Long jobID, java.lang.String joinTableName, java.lang.String joinTableIDColumn, java.lang.String joinTableJobColumn, java.lang.String joinTableCriteria, java.util.ArrayList joinTableParams) throws ManifoldCFException
Remove all links that mention a specific set of documents, as described by a join.- Throws:
ManifoldCFException
-
removeDocumentLinks
public void removeDocumentLinks(java.lang.Long jobID, java.lang.String[] documentIDHashes) throws ManifoldCFException
Remove all links that mention a specific set of documents.- Throws:
ManifoldCFException
-
maxClausePerformRemoveDocumentLinks
protected int maxClausePerformRemoveDocumentLinks(java.lang.Long jobID)
-
performRemoveDocumentLinks
protected void performRemoveDocumentLinks(java.util.List<java.lang.String> list, java.lang.Long jobID) throws ManifoldCFException
- Throws:
ManifoldCFException
-
removeLinks
public void removeLinks(java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams, java.lang.String[] sourceDocumentIDHashes) throws ManifoldCFException
Remove all target links of the specified source documents that are not marked as "new" or "existing", and return the others to their base state.- Throws:
ManifoldCFException
-
maxClausePerformRemoveLinks
protected int maxClausePerformRemoveLinks(java.lang.Long jobID)
-
performRemoveLinks
protected void performRemoveLinks(java.util.List<java.lang.String> list, java.lang.Long jobID, java.lang.String commonNewExpression, java.util.ArrayList commonNewParams) throws ManifoldCFException
- Throws:
ManifoldCFException
-
restoreLinks
public void restoreLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes) throws ManifoldCFException
Return all target links of the specified source documents to their base state.- Throws:
ManifoldCFException
-
maxClausesPerformRestoreLinks
protected int maxClausesPerformRestoreLinks(java.lang.Long jobID)
-
performRestoreLinks
protected void performRestoreLinks(java.lang.Long jobID, java.util.List<java.lang.String> list) throws ManifoldCFException
- Throws:
ManifoldCFException
-
revertLinks
public void revertLinks(java.lang.Long jobID, java.lang.String[] sourceDocumentIDHashes) throws ManifoldCFException
Throw away links added during (aborted) processing.- Throws:
ManifoldCFException
-
maxClausesPerformRevertLinks
protected int maxClausesPerformRevertLinks(java.lang.Long jobID)
-
performRevertLinks
protected void performRevertLinks(java.lang.Long jobID, java.util.List<java.lang.String> list) throws ManifoldCFException
- Throws:
ManifoldCFException
-
getDocumentChildren
public IResultSet getDocumentChildren(java.lang.Long jobID, java.lang.String parentIDHash) throws ManifoldCFException
Get document's children.- Returns:
- rows that contain the children. Column names are 'linktype','childidentifier'.
- Throws:
ManifoldCFException
-
getDocumentUniqueParents
public java.lang.String[] getDocumentUniqueParents(java.lang.Long jobID, java.lang.String childIDHash) throws ManifoldCFException
Get document's parents.- Returns:
- a set of document identifier hashes that constitute parents of the specified identifier.
- Throws:
ManifoldCFException
-
stringToStatus
public static int stringToStatus(java.lang.String status)
Convert string to link status.
-
statusToString
public static java.lang.String statusToString(int status)
Convert link status to string
-
-