Class HopCount


  • public class HopCount
    extends BaseTable
    This class manages the table that keeps track of hop count, and algorithmically determines this value for a document identifier upon request.

    hopcount
    FieldTypeDescription        
    idBIGINTPrimary Key
    jobidBIGINTReference:jobs.id
    linktypeVARCHAR(255)
    parentidhashVARCHAR(40)
    distanceBIGINT
    deathmarkCHAR(1)


    • Field Detail

      • parentIDHashField

        public static final java.lang.String parentIDHashField
        See Also:
        Constant Field Values
      • markForDeathField

        public static final java.lang.String markForDeathField
        See Also:
        Constant Field Values
      • markMap

        protected static java.util.Map markMap
      • intrinsicLinkManager

        protected IntrinsicLink intrinsicLinkManager
        Intrinsic link table manager.
      • deleteDepsManager

        protected HopDeleteDeps deleteDepsManager
        Hop "delete" dependencies manager
      • threadContext

        protected IThreadContext threadContext
        Thread context
      • lockManager

        protected final ILockManager lockManager
        Lock manager
      • storeHopCount

        protected static java.lang.Boolean storeHopCount
        If the global cluster property "storehopcount" is set to false(defaults to true), disable support for hopcount handling completely, the hopcount will never be recorded in the "intrinsiclink" or "hopcount" tables for any job at all.
    • Method Detail

      • stringToMark

        public static int stringToMark​(java.lang.String value)
                                throws ManifoldCFException
        Go from string to mark.
        Parameters:
        value - is the string.
        Returns:
        the status value.
        Throws:
        ManifoldCFException
      • markToString

        public static java.lang.String markToString​(int mark)
                                             throws ManifoldCFException
        Go from mark to string.
        Parameters:
        mark - is the mark.
        Returns:
        the string.
        Throws:
        ManifoldCFException
      • recordSeedReferences

        public void recordSeedReferences​(java.lang.Long jobID,
                                         java.lang.String[] legalLinkTypes,
                                         java.lang.String[] targetDocumentIDHashes,
                                         int hopcountMethod,
                                         java.lang.String processID)
                                  throws ManifoldCFException
        Record references from a set of documents to the root. These will be marked as "new" or "existing", and will have a null linktype.
        Throws:
        ManifoldCFException
      • finishSeedReferences

        public void finishSeedReferences​(java.lang.Long jobID,
                                         java.lang.String[] legalLinkTypes,
                                         int hopcountMethod)
                                  throws ManifoldCFException
        Finish seed references. Seed references are special in that the only source is the root.
        Throws:
        ManifoldCFException
      • recordReference

        public boolean recordReference​(java.lang.Long jobID,
                                       java.lang.String[] legalLinkTypes,
                                       java.lang.String sourceDocumentIDHash,
                                       java.lang.String targetDocumentIDHash,
                                       java.lang.String linkType,
                                       int hopcountMethod,
                                       java.lang.String processID)
                                throws ManifoldCFException
        Record a reference from source to target. This reference will be marked as "new" or "existing".
        Throws:
        ManifoldCFException
      • recordReferences

        public boolean[] recordReferences​(java.lang.Long jobID,
                                          java.lang.String[] legalLinkTypes,
                                          java.lang.String sourceDocumentIDHash,
                                          java.lang.String[] targetDocumentIDHashes,
                                          java.lang.String linkType,
                                          int hopcountMethod,
                                          java.lang.String processID)
                                   throws ManifoldCFException
        Record a set of references from source to target. This reference will be marked as "new" or "existing".
        Throws:
        ManifoldCFException
      • finishParents

        public void finishParents​(java.lang.Long jobID,
                                  java.lang.String[] legalLinkTypes,
                                  java.lang.String[] sourceDocumentHashes,
                                  int hopcountMethod)
                           throws ManifoldCFException
        Complete a recalculation pass for a set of source documents. All child links that are not marked as "new" or "existing" will be removed. At the completion of this pass, the links will have their "new" flag cleared.
        Throws:
        ManifoldCFException
      • revertParents

        public void revertParents​(java.lang.Long jobID,
                                  java.lang.String[] sourceDocumentHashes)
                           throws ManifoldCFException
        Revert newly-added links, because of a possibly incomplete document processing phase. All child links marked as "new" will be removed, and all links marked as "existing" will be reset to be "base".
        Throws:
        ManifoldCFException
      • doRecord

        protected boolean[] doRecord​(java.lang.Long jobID,
                                     java.lang.String[] legalLinkTypes,
                                     java.lang.String sourceDocumentIDHash,
                                     java.lang.String[] targetDocumentIDHashes,
                                     java.lang.String linkType,
                                     int hopcountMethod,
                                     java.lang.String processID)
                              throws ManifoldCFException
        Do the work of recording source-target references.
        Throws:
        ManifoldCFException
      • deleteMatchingDocuments

        public void deleteMatchingDocuments​(java.lang.Long jobID,
                                            java.lang.String[] legalLinkTypes,
                                            java.lang.String joinTableName,
                                            java.lang.String joinTableIDColumn,
                                            java.lang.String joinTableJobColumn,
                                            java.lang.String joinTableCriteria,
                                            java.util.ArrayList joinTableParams,
                                            int hopcountMethod)
                                     throws ManifoldCFException
        Remove a set of document identifiers specified as a criteria. This will remove hopcount rows and also intrinsic links that have the specified document identifiers as sources.
        Throws:
        ManifoldCFException
      • deleteDocumentIdentifiers

        public void deleteDocumentIdentifiers​(java.lang.Long jobID,
                                              java.lang.String[] legalLinkTypes,
                                              java.lang.String[] documentHashes,
                                              int hopcountMethod)
                                       throws ManifoldCFException
        Remove a set of document identifier hashes. This will also remove the intrinsic links that have these document identifier hashes as sources, as well as invalidating cached hop counts that depend on them.
        Throws:
        ManifoldCFException
      • findHopCounts

        public int[] findHopCounts​(java.lang.Long jobID,
                                   java.lang.String[] parentIdentifierHashes,
                                   java.lang.String linkType)
                            throws ManifoldCFException
        Calculate a bunch of hop-counts. The values returned are only guaranteed to be an upper bound, unless the queue has recently been processed (via processQueue below). -1 will be returned to indicate "infinity".
        Throws:
        ManifoldCFException
      • maxClauseProcessFind

        protected int maxClauseProcessFind​(java.lang.Long jobID,
                                           java.lang.String linkType)
        Find max clause count.
      • processFind

        protected void processFind​(int[] rval,
                                   java.util.Map rvalMap,
                                   java.lang.Long jobID,
                                   java.lang.String linkType,
                                   java.util.ArrayList list)
                            throws ManifoldCFException
        Process a portion of a find request for hopcount information.
        Throws:
        ManifoldCFException
      • processQueue

        public boolean processQueue​(java.lang.Long jobID,
                                    java.lang.String[] legalLinkTypes,
                                    int hopcountMethod)
                             throws ManifoldCFException
        Process a stage of the propagation queue for a job.
        Parameters:
        jobID - is the job we need to have the hopcount propagated for.
        Returns:
        true if the queue is empty.
        Throws:
        ManifoldCFException
      • maxClausePerformFindMissingRecords

        protected int maxClausePerformFindMissingRecords​(java.lang.Long jobID,
                                                         java.lang.String[] affectedLinkTypes)
        Calculate max clauses
      • performFindMissingRecords

        protected void performFindMissingRecords​(java.lang.Long jobID,
                                                 java.lang.String[] affectedLinkTypes,
                                                 java.util.ArrayList list,
                                                 java.util.Map<HopCount.Question,​java.lang.Long> matchMap)
                                          throws ManifoldCFException
        Limited find for missing records.
        Throws:
        ManifoldCFException
      • addToProcessingQueue

        protected boolean[] addToProcessingQueue​(java.lang.Long jobID,
                                                 java.lang.String[] affectedLinkTypes,
                                                 java.lang.String[] documentIDHashes,
                                                 HopCount.Answer[] startingAnswers,
                                                 java.lang.String sourceDocumentIDHash,
                                                 java.lang.String linkType,
                                                 int hopcountMethod)
                                          throws ManifoldCFException
        Add documents to the processing queue. For the supplied bunch of link types and document ids, the corresponding hopcount records will be marked as being queued. If, for example, the affected link types are 'link' and 'redirect', and the specified document id's are 'A' and 'B' and 'C', then six hopcount rows will be created and/or queued. The values that this code uses for initial distance or delete dependencies for each of the hopcount rows combinatorially described above are calculated by this method by starting with the passed-in hopcount values and dependencies for each of the affectedLinkTypes for the specified "source" document. The result estimates are then generated by passing these values and dependencies over the links to the target document identifiers, presuming that the link is of the supplied link type.
        Parameters:
        jobID - is the job the documents belong to.
        affectedLinkTypes - are the set of affected link types.
        documentIDHashes - are the documents to add.
        startingAnswers - are the hopcounts and delete dependencies for the source document as they are currently known. The size of this array is the same as the size of the affectedLinkTypes array.
        sourceDocumentIDHash - is the source document identifier for the links from source to target documents.
        linkType - is the link type for this queue addition.
        hopcountMethod - is the desired method of managing hopcounts.
        Returns:
        a boolean array which is the subset of documentIDHashes whose distances may have changed.
        Throws:
        ManifoldCFException
      • performMarkAddDeps

        protected void performMarkAddDeps​(java.lang.String query,
                                          java.util.ArrayList list)
                                   throws ManifoldCFException
        Do the work of marking add-dep-dependent links in the hopcount table.
        Throws:
        ManifoldCFException
      • doFinish

        protected void doFinish​(java.lang.Long jobID,
                                java.lang.String[] legalLinkTypes,
                                java.lang.String[] sourceDocumentHashes,
                                int hopcountMethod)
                         throws ManifoldCFException
        Method that does the work of "finishing" a set of child references. The API for hopcount involves doing the following for every document that is recrawled or reassessed, INCLUDING the seeds (in which case the document hash is the empty string): (1) Record all target references of the source documents, which either adds intrinsic links, or moves them to the "existing" state (2) When done adding, call this method, which should (depending on hopcount mode) mark hopcount records in need of reassessment, and delete the intrinsic links that have the right source document and were not marked as "new" or "existing", but rather just "base".
        Throws:
        ManifoldCFException
      • doDeleteDocuments

        protected void doDeleteDocuments​(java.lang.Long jobID,
                                         java.lang.String joinTableName,
                                         java.lang.String joinTableIDColumn,
                                         java.lang.String joinTableJobColumn,
                                         java.lang.String joinTableCriteria,
                                         java.util.ArrayList joinTableParams)
                                  throws ManifoldCFException
        Invalidate links that start with a specific set of documents, described by a table join.
        Throws:
        ManifoldCFException
      • doDeleteDocuments

        protected void doDeleteDocuments​(java.lang.Long jobID,
                                         java.lang.String[] documentHashes)
                                  throws ManifoldCFException
        Invalidate links that start with a specific set of documents.
        Throws:
        ManifoldCFException
      • maxClauseMarkForDocumentDelete

        protected int maxClauseMarkForDocumentDelete​(java.lang.Long jobID)
      • doDeleteInvalidation

        protected void doDeleteInvalidation​(java.lang.Long jobID,
                                            java.lang.String[] sourceDocumentHashes)
                                     throws ManifoldCFException
        Invalidate targets of links which have a given set of source documents. This also removes intrinsic links that were not re-added that point to children of the source documents. The purpose of that queue is to re-establish non-infinite values for all nodes that are described in IntrinsicLinks, that are still connected to the root.
        Throws:
        ManifoldCFException
      • maxClauseMarkForDelete

        protected int maxClauseMarkForDelete​(java.lang.Long jobID)
      • markForDelete

        protected void markForDelete​(java.lang.Long jobID,
                                     java.util.ArrayList list,
                                     java.lang.String commonNewExpression,
                                     java.util.ArrayList commonNewList)
                              throws ManifoldCFException
        Throws:
        ManifoldCFException
      • getDocumentChildren

        protected IResultSet getDocumentChildren​(java.lang.Long jobID,
                                                 java.lang.String documentIDHash)
                                          throws ManifoldCFException
        Get document's children.
        Returns:
        rows that contain the children. Column names are 'linktype','childidentifier'.
        Throws:
        ManifoldCFException
      • readCachedNodes

        protected HopCount.DocumentNode[] readCachedNodes​(java.lang.Long jobID,
                                                          HopCount.Question[] unansweredQuestions)
                                                   throws ManifoldCFException
        Find the cached distance from a set of identifiers to the root. This is tricky, because if there is a queue assessment going on, some values are not valid. In general, one would treat a missing record as meaning "infinity". But if the missing record is simply invalidated at the moment, we want it to be treated as "missing". So... we pick up the record despite it potentially being marked, and we then examine the mark to figure out what to do.
        Returns:
        the corresponding list of nodes, taking into account unknown distances.
        Throws:
        ManifoldCFException
      • maxClausePerformGetCachedDistanceDeps

        protected int maxClausePerformGetCachedDistanceDeps()
      • maxClausePerformGetCachedDistances

        protected int maxClausePerformGetCachedDistances​(java.lang.Long jobID)
        Calculate the max clauses.