Configuration File Triggers
The triggers in the configuration file are defined below.
The following triggers are used for both harvesting (OAI-PMH compliant) and crawling (non-OAI-PMH compliant):
The following triggers are used only when harvesting:
Triggers for harvesting only | Definition | Example |
---|---|---|
OAIFROMDATE | Start the OAI repository harvest at records of this date. | 20040101120000 |
OAIFORMAT |
The format that the XML Harvester uses to request data. Supported formats include MARC21 and OAI_DC. Other formats can be supported, although additional parsing may be required. Contact Innovative to request in formats other than MARC21 or OAI_DC. If MARC21 is specified, the XMLTYPE must be set to MARC. If OAI_DC is specified, the XMLTYPE must be set to DC. This entry is case sensitive: match the case used at the repository. | oai_dc |
OAIFROMEMAIL | Sends an email to the OAI repository from the email address specified in the OAIFROMEMAIL trigger when the XML Harvester requests records from the OAI repository. | library@iii.com |
OAISET | The set name (setSpec) to request from the OAI repository. The set name must match the value at the repository. If necessary, examine the URL of the set to determine the actual set name used at the repository. | lcphotos |
OAIUNTILDATE | Stop the OAI repository harvest at records of this date. | 20040201120000 |
USEOAI | Identifies that the repository is an OAI-compliant repository. Values are true and false. This trigger must be set to "true" if harvesting from an OAI-compliant repository. | true |
The following triggers are used only when crawling:
Triggers for crawling only | Definition | Example |
---|---|---|
ALLOWED_TYPE | (repeatable) Defines a MIME type that can be crawled. | text/html |
CRAWLDOWNONLY | If set to true, stops the XML Harvester from crawling out of the starting directory. | true |
CREATE856FROMURI | Create an 856 field from the Uniform Resource Identifier (URI) of the source XML page and put the URI in subfield |u of the 856 field. | true |
CREATEOVERLAYFROMURI | Create an overlay of a record from URI. | false |
DEPTH | Maximum search depth. The XML Harvester will not crawl beyond this level of the repository's tree. Zero equals no limit. | 5 |
EXTENSION | (repeatable) Directories and file extensions that the XML Harvester should read in search of XML pages. | .htm |
FILTER | (repeatable) Simple inclusion (indicated by preceding the filter string with a "+") and preclusion (indicated by preceding the filter string with a "-"). Regular expressions are not supported. | -this_string |
MAXCOUNT | Maximum number of hits before stopping. | 1000 |
DATE | Indicate that only XML pages updated on or after the specified date should be crawled. | |
ONEHOST | Determines whether the XML Harvester will follow links outside the specified repository. | true |
ROBOTFLAG | Determines whether the XML Harvester honors the robot file. This file is a list of directories or files to which the repository administrator disallows access. | true |