Class FileConnector
- java.lang.Object
-
- org.apache.manifoldcf.core.connector.BaseConnector
-
- org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
-
- org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector
-
- All Implemented Interfaces:
org.apache.manifoldcf.core.interfaces.IConnector,org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
public class FileConnector extends org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorThis is the "repository connector" for a file system. It's a relative of the share crawler, and should have comparable basic functionality, with the exception of the ability to use ActiveDirectory and look at other shares.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidprotected static java.lang.String[]activitiesListprotected static java.lang.StringACTIVITY_READprotected static java.lang.StringRELATIONSHIP_CHILD-
Fields inherited from class org.apache.manifoldcf.core.connector.BaseConnector
currentContext, params
-
Fields inherited from interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
GLOBAL_DENY_TOKEN, JOBMODE_CONTINUOUS, JOBMODE_ONCEONLY, MODEL_ADD, MODEL_ADD_CHANGE, MODEL_ADD_CHANGE_DELETE, MODEL_ALL, MODEL_CHAINED_ADD, MODEL_CHAINED_ADD_CHANGE, MODEL_CHAINED_ADD_CHANGE_DELETE, MODEL_PARTIAL
-
-
Constructor Summary
Constructors Constructor Description FileConnector()Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringaddSeedDocuments(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities, org.apache.manifoldcf.core.interfaces.Specification spec, java.lang.String lastSeedVersion, long seedTime, int jobMode)Queue "seed" documents.protected static booleancheckInclude(java.io.File file, java.lang.String fileName, org.apache.manifoldcf.core.interfaces.Specification documentSpecification)Check if a file or directory should be included, given a document specification.protected static booleancheckIngest(java.io.File file, org.apache.manifoldcf.core.interfaces.Specification documentSpecification)Check if a file should be ingested, given a document specification.protected static booleancheckMatch(java.lang.String sourceMatch, int sourceIndex, java.lang.String match)Check a match between two strings with wildcards.protected static java.lang.StringconvertToURI(java.lang.String documentIdentifier)Convert a document identifier to a URI.protected static java.lang.StringconvertToWGETURI(java.lang.String path)Convert a document identifier to a URI.protected static java.lang.StringfindConvertPath(org.apache.manifoldcf.core.interfaces.Specification spec, java.io.File theFile)This method finds the part of the path that should be converted to a URI.java.lang.String[]getActivitiesList()List the activities we might report on.java.lang.String[]getBinNames(java.lang.String documentIdentifier)For any given document, list the bins that it is a member of.intgetConnectorModel()Tell the world what model this connector uses for getDocumentIdentifiers().java.lang.String[]getRelationshipTypes()Return the list of relationship types that this connector recognizes.protected static java.lang.StringmapExtensionToMimeType(java.lang.String fileName)Map an extension to a mime typeprotected static intmatchSubPath(java.lang.String subPath, java.lang.String fullPath)Match a sub-path.voidoutputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName)Output the specification body section.voidoutputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray)Output the specification header section.protected static booleanprocessCheck(boolean caseSensitive, java.lang.String sourceMatch, int sourceIndex, java.lang.String match, int matchIndex)Recursive worker method for checkMatch.voidprocessDocuments(java.lang.String[] documentIdentifiers, org.apache.manifoldcf.crawler.interfaces.IExistingVersions statuses, org.apache.manifoldcf.core.interfaces.Specification spec, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, int jobMode, boolean usesDefaultAuthority)Process a set of documents.java.lang.StringprocessSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber)Process a specification post.voidviewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber)View specification.-
Methods inherited from class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
getFormCheckJavascriptMethodName, getFormPresaveCheckJavascriptMethodName, getMaxDocumentRequest, requestInfo
-
Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector
check, clearThreadContext, connect, deinstall, disconnect, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationBody, outputConfigurationHeader, outputConfigurationHeader, outputConfigurationHeader, pack, packFixedList, packList, packList, poll, processConfigurationPost, processConfigurationPost, setThreadContext, unpack, unpackFixedList, unpackList, viewConfiguration, viewConfiguration
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.manifoldcf.core.interfaces.IConnector
check, clearThreadContext, connect, deinstall, disconnect, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationHeader, poll, processConfigurationPost, setThreadContext, viewConfiguration
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
ACTIVITY_READ
protected static final java.lang.String ACTIVITY_READ
- See Also:
- Constant Field Values
-
RELATIONSHIP_CHILD
protected static final java.lang.String RELATIONSHIP_CHILD
- See Also:
- Constant Field Values
-
activitiesList
protected static final java.lang.String[] activitiesList
-
-
Method Detail
-
getConnectorModel
public int getConnectorModel()
Tell the world what model this connector uses for getDocumentIdentifiers(). This must return a model value as specified above.- Specified by:
getConnectorModelin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
getConnectorModelin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Returns:
- the model type value.
-
getRelationshipTypes
public java.lang.String[] getRelationshipTypes()
Return the list of relationship types that this connector recognizes.- Specified by:
getRelationshipTypesin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
getRelationshipTypesin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Returns:
- the list.
-
getActivitiesList
public java.lang.String[] getActivitiesList()
List the activities we might report on.- Specified by:
getActivitiesListin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
getActivitiesListin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
-
getBinNames
public java.lang.String[] getBinNames(java.lang.String documentIdentifier)
For any given document, list the bins that it is a member of.- Specified by:
getBinNamesin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
getBinNamesin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
-
convertToWGETURI
protected static java.lang.String convertToWGETURI(java.lang.String path) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionConvert a document identifier to a URI. The URI is the URI that will be the unique key from the search index, and will be presented to the user as part of the search results.- Parameters:
path- is the document filePath.- Returns:
- the document uri.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
convertToURI
protected static java.lang.String convertToURI(java.lang.String documentIdentifier) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionConvert a document identifier to a URI. The URI is the URI that will be the unique key from the search index, and will be presented to the user as part of the search results.- Parameters:
documentIdentifier- is the document identifier.- Returns:
- the document uri.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
addSeedDocuments
public java.lang.String addSeedDocuments(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities, org.apache.manifoldcf.core.interfaces.Specification spec, java.lang.String lastSeedVersion, long seedTime, int jobMode) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionQueue "seed" documents. Seed documents are the starting places for crawling activity. Documents are seeded when this method calls appropriate methods in the passed in ISeedingActivity object. This method can choose to find repository changes that happen only during the specified time interval. The seeds recorded by this method will be viewed by the framework based on what the getConnectorModel() method returns. It is not a big problem if the connector chooses to create more seeds than are strictly necessary; it is merely a question of overall work required. The end time and seeding version string passed to this method may be interpreted for greatest efficiency. For continuous crawling jobs, this method will be called once, when the job starts, and at various periodic intervals as the job executes. When a job's specification is changed, the framework automatically resets the seeding version string to null. The seeding version string may also be set to null on each job run, depending on the connector model returned by getConnectorModel(). Note that it is always ok to send MORE documents rather than less to this method. The connector will be connected before this method can be called.- Specified by:
addSeedDocumentsin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
addSeedDocumentsin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Parameters:
activities- is the interface this method should use to perform whatever framework actions are desired.spec- is a document specification (that comes from the job).seedTime- is the end of the time range of documents to consider, exclusive.lastSeedVersion- is the last seeding version string for this job, or null if the job has no previous seeding version string.jobMode- is an integer describing how the job is being run, whether continuous or once-only.- Returns:
- an updated seeding version string, to be stored with the job.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
processDocuments
public void processDocuments(java.lang.String[] documentIdentifiers, org.apache.manifoldcf.crawler.interfaces.IExistingVersions statuses, org.apache.manifoldcf.core.interfaces.Specification spec, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, int jobMode, boolean usesDefaultAuthority) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionProcess a set of documents. This is the method that should cause each document to be fetched, processed, and the results either added to the queue of documents for the current job, and/or entered into the incremental ingestion manager. The document specification allows this class to filter what is done based on the job. The connector will be connected before this method can be called.- Specified by:
processDocumentsin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
processDocumentsin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Parameters:
documentIdentifiers- is the set of document identifiers to process.statuses- are the currently-stored document versions for each document in the set of document identifiers passed in above.activities- is the interface this method should use to queue up new document references and ingest documents.jobMode- is an integer describing how the job is being run, whether continuous or once-only.usesDefaultAuthority- will be true only if the authority in use for these documents is the default one.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
findConvertPath
protected static java.lang.String findConvertPath(org.apache.manifoldcf.core.interfaces.Specification spec, java.io.File theFile)This method finds the part of the path that should be converted to a URI. Returns null if the path should not be converted.- Parameters:
spec- is the document specification.- Returns:
- the part of the path to be converted, or null.
-
mapExtensionToMimeType
protected static java.lang.String mapExtensionToMimeType(java.lang.String fileName)
Map an extension to a mime type
-
outputSpecificationHeader
public void outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionOutput the specification header section. This method is called in the head section of a job page which has selected a repository connection of the current type. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the job editing HTML. The connector will be connected before this method can be called.- Specified by:
outputSpecificationHeaderin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
outputSpecificationHeaderin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.tabsArray- is an array of tab names. Add to this array any tab names that are specific to the connector.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
outputSpecificationBody
public void outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionOutput the specification body section. This method is called in the body section of a job page which has selected a repository connection of the current type. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is always "editjob". The connector will be connected before this method can be called.- Specified by:
outputSpecificationBodyin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
outputSpecificationBodyin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.actualSequenceNumber- is the connection within the job that has currently been selected.tabName- is the current tab name. (actualSequenceNumber, tabName) form a unique tuple within the job.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
processSpecificationPost
public java.lang.String processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionProcess a specification post. This method is called at the start of job's edit or view page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the document specification accordingly. The name of the posted form is always "editjob". The connector will be connected before this method can be called.- Specified by:
processSpecificationPostin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
processSpecificationPostin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Parameters:
variableContext- contains the post data, including binary file-upload information.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.- Returns:
- null if all is well, or a string error message if there is an error that should prevent saving of the job (and cause a redirection to an error page).
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
viewSpecification
public void viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionView specification. This method is called in the body section of a job's view page. Its purpose is to present the document specification information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags. The connector will be connected before this method can be called.- Specified by:
viewSpecificationin interfaceorg.apache.manifoldcf.crawler.interfaces.IRepositoryConnector- Overrides:
viewSpecificationin classorg.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
checkInclude
protected static boolean checkInclude(java.io.File file, java.lang.String fileName, org.apache.manifoldcf.core.interfaces.Specification documentSpecification) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionCheck if a file or directory should be included, given a document specification.- Parameters:
fileName- is the canonical file name.documentSpecification- is the specification.- Returns:
- true if it should be included.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
checkIngest
protected static boolean checkIngest(java.io.File file, org.apache.manifoldcf.core.interfaces.Specification documentSpecification) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionCheck if a file should be ingested, given a document specification. It is presumed that documents that do not pass checkInclude() will be checked with this method.- Parameters:
file- is the file.documentSpecification- is the specification.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
matchSubPath
protected static int matchSubPath(java.lang.String subPath, java.lang.String fullPath)Match a sub-path. The sub-path must match the complete starting part of the full path, in a path sense. The returned value should point into the file name beyond the end of the matched path, or be -1 if there is no match.- Parameters:
subPath- is the sub path.fullPath- is the full path.- Returns:
- the index of the start of the remaining part of the full path, or -1.
-
checkMatch
protected static boolean checkMatch(java.lang.String sourceMatch, int sourceIndex, java.lang.String match)Check a match between two strings with wildcards.- Parameters:
sourceMatch- is the expanded string (no wildcards)sourceIndex- is the starting point in the expanded string.match- is the wildcard-based string.- Returns:
- true if there is a match.
-
processCheck
protected static boolean processCheck(boolean caseSensitive, java.lang.String sourceMatch, int sourceIndex, java.lang.String match, int matchIndex)Recursive worker method for checkMatch. Returns 'true' if there is a path that consumes both strings in their entirety in a matched way.- Parameters:
caseSensitive- is true if file names are case sensitive.sourceMatch- is the source string (w/o wildcards)sourceIndex- is the current point in the source string.match- is the match string (w/wildcards)matchIndex- is the current point in the match string.- Returns:
- true if there is a match.
-
-