Class SeedingActivity

    • Field Detail

      • processID

        protected final java.lang.String processID
      • connectionName

        protected final java.lang.String connectionName
      • jobID

        protected final java.lang.Long jobID
      • legalLinkTypes

        protected final java.lang.String[] legalLinkTypes
      • overrideSchedule

        protected final boolean overrideSchedule
      • hopcountMethod

        protected final int hopcountMethod
      • documentHashList

        protected final java.lang.String[] documentHashList
      • documentList

        protected final java.lang.String[] documentList
      • documentPrereqList

        protected final java.lang.String[][] documentPrereqList
      • documentCount

        protected int documentCount
      • remainingDocumentHashList

        protected final java.lang.String[] remainingDocumentHashList
      • remainingDocumentCount

        protected int remainingDocumentCount
    • Method Detail

      • addSeedDocument

        public void addSeedDocument​(java.lang.String documentIdentifier,
                                    java.lang.String[] prereqEventNames)
                             throws ManifoldCFException
        Record a "seed" document identifier. Seeds passed to this method will be loaded into the job's queue at the beginning of the job's execution, and for continuous crawling jobs, periodically throughout the crawl. All documents passed to this method are placed on the "pending documents" list, and are marked as being seed documents. All pending documents will be processed to determine if they have changed or have been deleted. It is not a big problem if the connector chooses to put more documents onto the pending list than are strictly necessary; it is merely a question of overall work required. Note that it is always ok to send MORE documents rather than less to this method.
        Specified by:
        addSeedDocument in interface ISeedingActivity
        Parameters:
        documentIdentifier - is the identifier of the document to add to the "pending" queue.
        prereqEventNames - is the list of prerequisite events required for this document, or null if none.
        Throws:
        ManifoldCFException
      • addSeedDocument

        public void addSeedDocument​(java.lang.String documentIdentifier)
                             throws ManifoldCFException
        Record a "seed" document identifier. Seeds passed to this method will be loaded into the job's queue at the beginning of the job's execution, and for continuous crawling jobs, periodically throughout the crawl. All documents passed to this method are placed on the "pending documents" list, and are marked as being seed documents. All pending documents will be processed to determine if they have changed or have been deleted. It is not a big problem if the connector chooses to put more documents onto the pending list than are strictly necessary; it is merely a question of overall work required. Note that it is always ok to send MORE documents rather than less to this method.
        Specified by:
        addSeedDocument in interface ISeedingActivity
        Parameters:
        documentIdentifier - is the identifier of the document to add to the "pending" queue.
        Throws:
        ManifoldCFException
      • addUnqueuedSeedDocument

        public void addUnqueuedSeedDocument​(java.lang.String documentIdentifier)
                                     throws ManifoldCFException
        This method receives document identifiers that should be considered part of the seeds, but do not need to be queued for processing at this time. (This method is used to keep the hopcount tables up to date.) It is allowed to receive more identifiers than it strictly needs to, specifically identifiers that may have also been sent to the addSeedDocuments() method above. However, the connector must constrain the identifiers it sends by the document specification. This method is only required to be called at all if the connector supports hopcount determination (which it should signal by having more than zero legal relationship types returned by the getRelationshipTypes() method).
        Specified by:
        addUnqueuedSeedDocument in interface ISeedingActivity
        Parameters:
        documentIdentifier - is the identifier of the document to consider as a seed, but not to put in the "pending" queue.
        Throws:
        ManifoldCFException
      • recordActivity

        public void recordActivity​(java.lang.Long startTime,
                                   java.lang.String activityType,
                                   java.lang.Long dataSize,
                                   java.lang.String entityIdentifier,
                                   java.lang.String resultCode,
                                   java.lang.String resultDescription,
                                   java.lang.String[] childIdentifiers)
                            throws ManifoldCFException
        Record time-stamped information about the activity of the connector.
        Specified by:
        recordActivity in interface IHistoryActivity
        Parameters:
        startTime - is either null or the time since the start of epoch in milliseconds (Jan 1, 1970). Every activity has an associated time; the startTime field records when the activity began. A null value indicates that the start time and the finishing time are the same.
        activityType - is a string which is fully interpretable only in the context of the connector involved, which is used to categorize what kind of activity is being recorded. For example, a web connector might record a "fetch document" activity. Cannot be null.
        dataSize - is the number of bytes of data involved in the activity, or null if not applicable.
        entityIdentifier - is a (possibly long) string which identifies the object involved in the history record. The interpretation of this field will differ from connector to connector. May be null.
        resultCode - contains a terse description of the result of the activity. The description is limited in size to 255 characters, and can be interpreted only in the context of the current connector. May be null.
        resultDescription - is a (possibly long) human-readable string which adds detail, if required, to the result described in the resultCode field. This field is not meant to be queried on. May be null.
        childIdentifiers - is a set of child entity identifiers associated with this activity. May be null.
        Throws:
        ManifoldCFException
      • writeSeedDocuments

        protected void writeSeedDocuments​(java.lang.String[] docIDHashes,
                                          java.lang.String[] docIDs,
                                          java.lang.String[][] prereqEventNames)
                                   throws ManifoldCFException
        Write specified documents after calculating their priorities
        Throws:
        ManifoldCFException
      • checkJobStillActive

        public void checkJobStillActive()
                                 throws ManifoldCFException,
                                        ServiceInterruption
        Check whether current job is still active. This method is provided to allow an individual connector that needs to wait on some long-term condition to give up waiting due to the job itself being aborted. If the connector should abort, this method will raise a properly-formed ServiceInterruption, which if thrown to the caller, will signal that the current seeding activity remains incomplete and must be retried when the job is resumed.
        Specified by:
        checkJobStillActive in interface IAbortActivity
        Throws:
        ManifoldCFException
        ServiceInterruption
      • createGlobalString

        public java.lang.String createGlobalString​(java.lang.String simpleString)
        Create a global string from a simple string.
        Specified by:
        createGlobalString in interface INamingActivity
        Parameters:
        simpleString - is the simple string.
        Returns:
        a global string.
      • createConnectionSpecificString

        public java.lang.String createConnectionSpecificString​(java.lang.String simpleString)
        Create a connection-specific string from a simple string.
        Specified by:
        createConnectionSpecificString in interface INamingActivity
        Parameters:
        simpleString - is the simple string.
        Returns:
        a connection-specific string.
      • createJobSpecificString

        public java.lang.String createJobSpecificString​(java.lang.String simpleString)
        Create a job-based string from a simple string.
        Specified by:
        createJobSpecificString in interface INamingActivity
        Parameters:
        simpleString - is the simple string.
        Returns:
        a job-specific string.