Harvesting XML Metadata

If your organization has acquired XML Harvester and a load table, you can:

Additional load tables can be purchased. Innovative recommends purchasing one load table for every repository you harvest. See Editing the XML Harvester Configuration File.

Metadata are structured elements that describe an information resource or, more generally, any definable entity. The bibliographic record data can be considered metadata. Harvesting refers to the process of collecting metadata from a server. Servers designed to provide metadata for harvest are called repositories.

The standard used by XML Harvester is the Open Archives Initiative Protocol for MetaData Harvesting(OAI-PMH). OAI-PMH requests are expressed as Hypertext Transfer Protocol (HTTP) requests. XML Harvester can crawl (be used with a nonstandard repository) or harvest (be used with an OAI-PMH-compliant repository).

Contact Innovative to configure XML Harvester to automatically perform scheduled metadata harvesting.

The XML Harvester process works as follows:

  1. You edit the configuration file to specify the:
    • format of the request
    • mapping of the metadata to MARC
    • URL of the repository
    • additional harvesting parameters
  2. Harvest the repository.
  3. XML Harvester converts records into a file that can be loaded into the system by using a standard load table.
  4. Load the records into the Innovative system.

Metadata and Metadata Request Formats

The XML Harvester-supported formats for metadata and metadata requests are:

Specify the metadata request format by editing the OAIFORMAT trigger in the configuration file.

Specify the metadata format by editing the XML_TYPE trigger in the configuration file.

Specifying a Repository of XML Records

The XML Harvester is OAI-PMH compliant. Innovative recommends using the repositories that conform to the OAI-PMH standard.

Specify the repository by editing the URL trigger in the configuration file.

See also:
Using Data Exchange
Administering the Sierra Scheduler