Class WebcrawlerConfig
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConfig
-
public class WebcrawlerConfig extends java.lang.ObjectConstants for the Webcrawler connector configuration.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidstatic java.lang.StringATTR_ASPSESSIONREMOVALaspsessionremoval attributestatic java.lang.StringATTR_BINREGEXPThe bin regular expressionstatic java.lang.StringATTR_BVSESSIONREMOVALbvsessionremoval attributestatic java.lang.StringATTR_DESCRIPTIONdescription attributestatic java.lang.StringATTR_DOMAINDomain/realm part of credentials (if any)static java.lang.StringATTR_INSENSITIVEWhether the match is case insensitivestatic java.lang.StringATTR_JAVASESSIONREMOVALjavasessionremoval attributestatic java.lang.StringATTR_LOWERCASEmap to lower casestatic java.lang.StringATTR_MAPMap attributestatic java.lang.StringATTR_MATCHMatch attributestatic java.lang.StringATTR_MATCHREGEXPForm name or link target regexp for authentication pagestatic java.lang.StringATTR_NAMEname attributestatic java.lang.StringATTR_NAMEREGEXPAuthentication parameter name regexpstatic java.lang.StringATTR_OVERRIDETARGETURLURL to fetch next in a sequence (an override)static java.lang.StringATTR_PASSWORDPassword part of credentialsstatic java.lang.StringATTR_PHPSESSIONREMOVALphpsessionremoval attributestatic java.lang.StringATTR_REGEXPregexp attributestatic java.lang.StringATTR_REORDERreorder attributestatic java.lang.StringATTR_TOKENtoken attributestatic java.lang.StringATTR_TRUSTEVERYTHING"Trust everything" attribute - replacing truststore if set to 'true'static java.lang.StringATTR_TRUSTSTORETrust store section of authentication recordstatic java.lang.StringATTR_TYPEType of securitystatic java.lang.StringATTR_URLREGEXPRegexp for access control nodestatic java.lang.StringATTR_USERNAMEUsername part of credentialsstatic java.lang.StringATTR_VALUEThe value attribute (used for maxconnections and maxkbpersecond)static java.lang.StringATTRVALUE_BASICType value for basic authenticationstatic java.lang.StringATTRVALUE_CONTENTAuthentication page type: Accessstatic java.lang.StringATTRVALUE_FALSEValue falsestatic java.lang.StringATTRVALUE_FORMAuthentication page type: Formstatic java.lang.StringATTRVALUE_LINKAuthentication page type: Linkstatic java.lang.StringATTRVALUE_NOValue nostatic java.lang.StringATTRVALUE_NTLMType value for NTLM authenticationstatic java.lang.StringATTRVALUE_REDIRECTIONAuthentication page type: Redirectionstatic java.lang.StringATTRVALUE_SESSIONType value for session-based authenticationstatic java.lang.StringATTRVALUE_TRUEValue truestatic java.lang.StringATTRVALUE_YESValue yesstatic java.lang.StringNODE_ACCESSForced acl access token node.static java.lang.StringNODE_ACCESSCREDENTIALAccess control description nodestatic java.lang.StringNODE_AUTHPAGEAuthentication page description nodestatic java.lang.StringNODE_AUTHPARAMETERAuthentication parameter nodestatic java.lang.StringNODE_BINDESCThe bin description nodestatic java.lang.StringNODE_EXCLUDEHEADERExclude header node.static java.lang.StringNODE_EXCLUDESExclude regexps node.static java.lang.StringNODE_EXCLUDESCONTENTINDEXExclude any page containing specified regex in their body from indexstatic java.lang.StringNODE_EXCLUDESINDEXExclude regexps node.static java.lang.StringNODE_FORCEINCLUSIONForce the inclusion of redirections.static java.lang.StringNODE_INCLUDESInclude regexps node.static java.lang.StringNODE_INCLUDESINDEXInclude regexps node.static java.lang.StringNODE_LIMITTOSEEDSLimit to seeds.static java.lang.StringNODE_MAPMap entry specification node.static java.lang.StringNODE_MAXCONNECTIONSThe max connections nodestatic java.lang.StringNODE_MAXFETCHESPERMINUTEThe max fetch rate nodestatic java.lang.StringNODE_MAXKBPERSECONDThe bandwidth nodestatic java.lang.StringNODE_SEEDSThe seeds node.static java.lang.StringNODE_TRUSTTrust store description nodestatic java.lang.StringNODE_URLSPECCanonicalization rule.static java.lang.StringPARAMETER_EMAILEmail (a parameter)static java.lang.StringPARAMETER_META_ROBOTS_TAGS_USAGEMeta robots tags usage (a parameter)static java.lang.StringPARAMETER_PROXYAUTHDOMAINProxy auth domain (parameter)static java.lang.StringPARAMETER_PROXYAUTHPASSWORDProxy auth password (parameter)static java.lang.StringPARAMETER_PROXYAUTHUSERNAMEProxy auth username (parameter)static java.lang.StringPARAMETER_PROXYHOSTProxy host name (parameter)static java.lang.StringPARAMETER_PROXYPORTProxy port (parameter)static java.lang.StringPARAMETER_ROBOTSUSAGERobots usage (a parameter)
-
Constructor Summary
Constructors Constructor Description WebcrawlerConfig()
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
PARAMETER_ROBOTSUSAGE
public static final java.lang.String PARAMETER_ROBOTSUSAGE
Robots usage (a parameter)- See Also:
- Constant Field Values
-
PARAMETER_META_ROBOTS_TAGS_USAGE
public static final java.lang.String PARAMETER_META_ROBOTS_TAGS_USAGE
Meta robots tags usage (a parameter)- See Also:
- Constant Field Values
-
PARAMETER_EMAIL
public static final java.lang.String PARAMETER_EMAIL
Email (a parameter)- See Also:
- Constant Field Values
-
PARAMETER_PROXYHOST
public static final java.lang.String PARAMETER_PROXYHOST
Proxy host name (parameter)- See Also:
- Constant Field Values
-
PARAMETER_PROXYPORT
public static final java.lang.String PARAMETER_PROXYPORT
Proxy port (parameter)- See Also:
- Constant Field Values
-
PARAMETER_PROXYAUTHDOMAIN
public static final java.lang.String PARAMETER_PROXYAUTHDOMAIN
Proxy auth domain (parameter)- See Also:
- Constant Field Values
-
PARAMETER_PROXYAUTHUSERNAME
public static final java.lang.String PARAMETER_PROXYAUTHUSERNAME
Proxy auth username (parameter)- See Also:
- Constant Field Values
-
PARAMETER_PROXYAUTHPASSWORD
public static final java.lang.String PARAMETER_PROXYAUTHPASSWORD
Proxy auth password (parameter)- See Also:
- Constant Field Values
-
NODE_BINDESC
public static final java.lang.String NODE_BINDESC
The bin description node- See Also:
- Constant Field Values
-
ATTR_BINREGEXP
public static final java.lang.String ATTR_BINREGEXP
The bin regular expression- See Also:
- Constant Field Values
-
ATTR_INSENSITIVE
public static final java.lang.String ATTR_INSENSITIVE
Whether the match is case insensitive- See Also:
- Constant Field Values
-
NODE_MAXCONNECTIONS
public static final java.lang.String NODE_MAXCONNECTIONS
The max connections node- See Also:
- Constant Field Values
-
NODE_MAXKBPERSECOND
public static final java.lang.String NODE_MAXKBPERSECOND
The bandwidth node- See Also:
- Constant Field Values
-
NODE_MAXFETCHESPERMINUTE
public static final java.lang.String NODE_MAXFETCHESPERMINUTE
The max fetch rate node- See Also:
- Constant Field Values
-
ATTR_VALUE
public static final java.lang.String ATTR_VALUE
The value attribute (used for maxconnections and maxkbpersecond)- See Also:
- Constant Field Values
-
NODE_ACCESSCREDENTIAL
public static final java.lang.String NODE_ACCESSCREDENTIAL
Access control description node- See Also:
- Constant Field Values
-
ATTR_URLREGEXP
public static final java.lang.String ATTR_URLREGEXP
Regexp for access control node- See Also:
- Constant Field Values
-
ATTR_TYPE
public static final java.lang.String ATTR_TYPE
Type of security- See Also:
- Constant Field Values
-
ATTRVALUE_BASIC
public static final java.lang.String ATTRVALUE_BASIC
Type value for basic authentication- See Also:
- Constant Field Values
-
ATTRVALUE_NTLM
public static final java.lang.String ATTRVALUE_NTLM
Type value for NTLM authentication- See Also:
- Constant Field Values
-
ATTRVALUE_SESSION
public static final java.lang.String ATTRVALUE_SESSION
Type value for session-based authentication- See Also:
- Constant Field Values
-
ATTR_DOMAIN
public static final java.lang.String ATTR_DOMAIN
Domain/realm part of credentials (if any)- See Also:
- Constant Field Values
-
ATTR_USERNAME
public static final java.lang.String ATTR_USERNAME
Username part of credentials- See Also:
- Constant Field Values
-
ATTR_PASSWORD
public static final java.lang.String ATTR_PASSWORD
Password part of credentials- See Also:
- Constant Field Values
-
NODE_AUTHPAGE
public static final java.lang.String NODE_AUTHPAGE
Authentication page description node- See Also:
- Constant Field Values
-
ATTRVALUE_FORM
public static final java.lang.String ATTRVALUE_FORM
Authentication page type: Form- See Also:
- Constant Field Values
-
ATTRVALUE_LINK
public static final java.lang.String ATTRVALUE_LINK
Authentication page type: Link- See Also:
- Constant Field Values
-
ATTRVALUE_REDIRECTION
public static final java.lang.String ATTRVALUE_REDIRECTION
Authentication page type: Redirection- See Also:
- Constant Field Values
-
ATTRVALUE_CONTENT
public static final java.lang.String ATTRVALUE_CONTENT
Authentication page type: Access- See Also:
- Constant Field Values
-
ATTR_MATCHREGEXP
public static final java.lang.String ATTR_MATCHREGEXP
Form name or link target regexp for authentication page- See Also:
- Constant Field Values
-
ATTR_OVERRIDETARGETURL
public static final java.lang.String ATTR_OVERRIDETARGETURL
URL to fetch next in a sequence (an override)- See Also:
- Constant Field Values
-
NODE_AUTHPARAMETER
public static final java.lang.String NODE_AUTHPARAMETER
Authentication parameter node- See Also:
- Constant Field Values
-
ATTR_NAMEREGEXP
public static final java.lang.String ATTR_NAMEREGEXP
Authentication parameter name regexp- See Also:
- Constant Field Values
-
NODE_TRUST
public static final java.lang.String NODE_TRUST
Trust store description node- See Also:
- Constant Field Values
-
ATTR_TRUSTSTORE
public static final java.lang.String ATTR_TRUSTSTORE
Trust store section of authentication record- See Also:
- Constant Field Values
-
ATTR_TRUSTEVERYTHING
public static final java.lang.String ATTR_TRUSTEVERYTHING
"Trust everything" attribute - replacing truststore if set to 'true'- See Also:
- Constant Field Values
-
NODE_MAP
public static final java.lang.String NODE_MAP
Map entry specification node. Has two attributes: 'match' and 'map'.- See Also:
- Constant Field Values
-
NODE_SEEDS
public static final java.lang.String NODE_SEEDS
The seeds node. The value of this node contains the seeds, as a large text area.- See Also:
- Constant Field Values
-
NODE_INCLUDES
public static final java.lang.String NODE_INCLUDES
Include regexps node. The value of this node contains the regexps that must match the canonical URL in order for that URL to be included in the crawl. These regexps are newline separated, and # starts a comment.- See Also:
- Constant Field Values
-
NODE_EXCLUDES
public static final java.lang.String NODE_EXCLUDES
Exclude regexps node. The value of this node contains the regexps that if any one matches, causes the URL to be excluded from the crawl. These regexps are newline separated, and # starts a comment.- See Also:
- Constant Field Values
-
NODE_INCLUDESINDEX
public static final java.lang.String NODE_INCLUDESINDEX
Include regexps node. The value of this node contains the regexps that must match the canonical URL in order for that URL to be included for indexing. These regexps are newline separated, and # starts a comment.- See Also:
- Constant Field Values
-
NODE_EXCLUDESINDEX
public static final java.lang.String NODE_EXCLUDESINDEX
Exclude regexps node. The value of this node contains the regexps that if any one matches, causes the URL to be excluded from indexing. These regexps are newline separated, and # starts a comment.- See Also:
- Constant Field Values
-
NODE_EXCLUDESCONTENTINDEX
public static final java.lang.String NODE_EXCLUDESCONTENTINDEX
Exclude any page containing specified regex in their body from index- See Also:
- Constant Field Values
-
NODE_LIMITTOSEEDS
public static final java.lang.String NODE_LIMITTOSEEDS
Limit to seeds. When value attribute is true, only seed domains will be permitted.- See Also:
- Constant Field Values
-
NODE_FORCEINCLUSION
public static final java.lang.String NODE_FORCEINCLUSION
Force the inclusion of redirections. When value attribute is true, redirected URL will be included.- See Also:
- Constant Field Values
-
NODE_URLSPEC
public static final java.lang.String NODE_URLSPEC
Canonicalization rule. Attributes are regexp, description, reorder, javasessionremoval, aspsessionremoval, phpsessionremoval, bvsessionremoval- See Also:
- Constant Field Values
-
NODE_ACCESS
public static final java.lang.String NODE_ACCESS
Forced acl access token node. Attribute is "token".- See Also:
- Constant Field Values
-
NODE_EXCLUDEHEADER
public static final java.lang.String NODE_EXCLUDEHEADER
Exclude header node. The value of this node lists a single header (in lower case) that should be excluded from the document metadata- See Also:
- Constant Field Values
-
ATTR_REGEXP
public static final java.lang.String ATTR_REGEXP
regexp attribute- See Also:
- Constant Field Values
-
ATTR_DESCRIPTION
public static final java.lang.String ATTR_DESCRIPTION
description attribute- See Also:
- Constant Field Values
-
ATTR_REORDER
public static final java.lang.String ATTR_REORDER
reorder attribute- See Also:
- Constant Field Values
-
ATTR_JAVASESSIONREMOVAL
public static final java.lang.String ATTR_JAVASESSIONREMOVAL
javasessionremoval attribute- See Also:
- Constant Field Values
-
ATTR_ASPSESSIONREMOVAL
public static final java.lang.String ATTR_ASPSESSIONREMOVAL
aspsessionremoval attribute- See Also:
- Constant Field Values
-
ATTR_PHPSESSIONREMOVAL
public static final java.lang.String ATTR_PHPSESSIONREMOVAL
phpsessionremoval attribute- See Also:
- Constant Field Values
-
ATTR_BVSESSIONREMOVAL
public static final java.lang.String ATTR_BVSESSIONREMOVAL
bvsessionremoval attribute- See Also:
- Constant Field Values
-
ATTR_LOWERCASE
public static final java.lang.String ATTR_LOWERCASE
map to lower case- See Also:
- Constant Field Values
-
ATTR_NAME
public static final java.lang.String ATTR_NAME
name attribute- See Also:
- Constant Field Values
-
ATTR_TOKEN
public static final java.lang.String ATTR_TOKEN
token attribute- See Also:
- Constant Field Values
-
ATTRVALUE_YES
public static final java.lang.String ATTRVALUE_YES
Value yes- See Also:
- Constant Field Values
-
ATTRVALUE_NO
public static final java.lang.String ATTRVALUE_NO
Value no- See Also:
- Constant Field Values
-
ATTRVALUE_FALSE
public static final java.lang.String ATTRVALUE_FALSE
Value false- See Also:
- Constant Field Values
-
ATTRVALUE_TRUE
public static final java.lang.String ATTRVALUE_TRUE
Value true- See Also:
- Constant Field Values
-
ATTR_MATCH
public static final java.lang.String ATTR_MATCH
Match attribute- See Also:
- Constant Field Values
-
ATTR_MAP
public static final java.lang.String ATTR_MAP
Map attribute- See Also:
- Constant Field Values
-
-