Configuration File Triggers

The triggers in the configuration file are defined below.

The following triggers are used for both harvesting (OAI-PMH compliant) and crawling (non-OAI-PMH compliant):

Triggers for harvesting and crawling Definition Example
856TEXT For the automatically generated 856 fields (from the trigger CREATE856FROMURI), this is the text that will be inserted into the subfield z of the 856 (public note). Gotham City Library
9xxMARCTAG MARC tag to use for storing database name and timestamp. 998
DBNAME Descriptive name for the OAI repository or XML collection that is being searched. Used in the naming of the MARC file that is created. LCPhotos
DCMAPPING Overrides the system's hard-coded DC-to-MARC-tags mapping. Maps the incoming DC tags to the MARC tags in the trigger. The trigger's fields are:
FieldPurpose
XML TagThe tag of the XML field to map.
MARC FieldThe MARC 21 field to which the XML data is mapped.
Indicator 1Indicator 1 for the MARC field.
Indicator 2Indicator 2 for the MARC field.
First Subfield TagThe first subfield tag of the MARC field.
contributor|110| 1|a|false
   and
specialnote|505| 0|a|false
RECID_MARCTAG Not required, but strongly recommended. The MARC tag where the unique record identifier is placed (e.g., 001). 001


URL
Required. The URL to connect to the OAI repository or XML collection. http://memory.loc.gov/cgi-bin/oai2_0


XML_TYPE
Required. This identifies which Innovative parser is used for the XML data. There are two currently supported values: DC and MARCXML. MARCXML

The following triggers are used only when harvesting:

Triggers for harvesting only Definition Example
OAIFROMDATE Start the OAI repository harvest at records of this date. 20040101120000


OAIFORMAT
The format that the XML Harvester uses to request data. Supported formats include MARC21 and OAI_DC. Other formats can be supported, although additional parsing may be required. Contact Innovative to request in formats other than MARC21 or OAI_DC. If MARC21 is specified, the XMLTYPE must be set to MARC. If OAI_DC is specified, the XMLTYPE must be set to DC. This entry is case sensitive: match the case used at the repository. oai_dc
OAIFROMEMAIL Sends an email to the OAI repository from the email address specified in the OAIFROMEMAIL trigger when the XML Harvester requests records from the OAI repository. library@iii.com
OAISET The set name (setSpec) to request from the OAI repository. The set name must match the value at the repository. If necessary, examine the URL of the set to determine the actual set name used at the repository. lcphotos
OAIUNTILDATE Stop the OAI repository harvest at records of this date. 20040201120000
USEOAI Identifies that the repository is an OAI-compliant repository. Values are true and false. This trigger must be set to "true" if harvesting from an OAI-compliant repository. true

The following triggers are used only when crawling:

Triggers for crawling only Definition Example
ALLOWED_TYPE (repeatable) Defines a MIME type that can be crawled. text/html
CRAWLDOWNONLY If set to true, stops the XML Harvester from crawling out of the starting directory. true
CREATE856FROMURI Create an 856 field from the Uniform Resource Identifier (URI) of the source XML page and put the URI in subfield |u of the 856 field. true
CREATEOVERLAYFROMURI Create an overlay of a record from URI. false
DEPTH Maximum search depth. The XML Harvester will not crawl beyond this level of the repository's tree. Zero equals no limit. 5
EXTENSION (repeatable) Directories and file extensions that the XML Harvester should read in search of XML pages. .htm
FILTER (repeatable) Simple inclusion (indicated by preceding the filter string with a "+") and preclusion (indicated by preceding the filter string with a "-"). Regular expressions are not supported. -this_string
MAXCOUNT Maximum number of hits before stopping. 1000
DATE Indicate that only XML pages updated on or after the specified date should be crawled.
ONEHOST Determines whether the XML Harvester will follow links outside the specified repository. true
ROBOTFLAG Determines whether the XML Harvester honors the robot file. This file is a list of directories or files to which the repository administrator disallows access. true