Using Matchkeys

To enable the use of matchkeys in record matching, ask your Central System Administrator to contact Innovative. This feature requires that all Local Servers in your INN-Reach System use matchkeys as a match field.

Innovative can configure INN-Reach to use system-generated strings called matchkeys as either a Primary Match Field or a Secondary Match Field in record matching. Local Servers generate matchkeys, which the INN-Reach Central Server uses to match the incoming records to master records in the INN-Reach Catalog.

If your INN-Reach System uses matchkeys, INN-Reach:

  1. Generates a matchkey for each bibliographic record that the Local Server contributes.
    1. Constructs the initial matchkey string from data stored in local bibliographic record fields. For more information on the elements of this string, see Data Elements of the Matchkey below.

      Note that if there are duplicate instances of the MARC-tagged fields used to create a matchkey (for example, two MARC 245 fields), the system uses only the first field to generate the matchkey.
    2. Normalizes the matchkey string, using the following rules:
      • Makes all Unicode characters A-Z (that is, Unicode values 65-90) lower case.
      • Removes any leading spaces.
      • Collapses multiple spaces to single spaces.
      • Replaces remaining spaces with the underscore '_' character.
    3. Performs the following optional substitutions:
      • Inserts spaces in bytes 69-72 (pagination data).
      • Inserts spaces in bytes 73-75 (edition data).
      • Substitutes specified characters for diacritics. If contributing sites want INN-Reach to consider words with certain diacritics equivalent to words without those diacritics (for example, cafe and café), the system can be configured to process the diacritics in the bibliographic data and substitute desired characters in the matchkey string).
  2. Transmits the matchkey to the INN-Reach Central Server with the bibliographic record.
  3. Uses the matchkey to identify matching bibliographic records on the INN-Reach Central Server. If a central bibliographic record has no MARC 245 field, or if the MARC 245 field does not contain either subfield $a or $b, the INN-Reach Central Server creates a matchkey that consists of the bibliographic record number from the owning site, the '@' symbol, and the owning site's site code. The remaining bytes in the matchkey are right-padded with spaces.

Data Elements of the Matchkey

A matchkey is an 110-character alphanumeric Unicode string composed of data from the local bibliographic record. (Note that the term "numeric characters" used here refers only to the Arabic digits 0-9 [Unicode values 30-39]. Non-arabic digits are not considered numeric.)

The elements of the string depend on the presence or absence of the MARC 245 field in the local bibliographic record:

Position (Bytes) Element MARC Source Fields Notes
0-59 Title 245 $a $b 60 characters maximum. If the combined length of $a and $b exceeds 60 characters:
  • first 45 characters of $a and $b are used
  • first character of each word beginning after the 45th character is used, up to the last, followed by as much of the last word as possible
If the last word of a title starts before or at the 44th byte, the word is kept in its entirety or until the maximum number of characters (60) is reached.

If the combined length of $a and $b is fewer than 60 characters, or if the result of title key construction is fewer than 60 characters, the remaining bytes are right-padded with spaces.

The apostrophe "'" character (Unicode value 39) and curly braces '{' '}' (Unicode values 123 and 125) are stripped. The ampersand '&' character (Unicode value 38) is converted to the word "and." All other punctuation characters (Unicode values 33-37, 40-47, 58-64, 91-96, 124, and 126) are replaced with spaces.

The leading articles "a," "an," and "the" are stripped along with any spaces that immediately follow them.

If the first MARC 245 field in the record indicates that there is a corresponding MARC 880 field, the system uses the content of the 880 field for bytes 0-59 (title data) instead of the 245 field.
60-64 General Media Designation (GMD) 245 $h The first five contiguous alphanumeric characters are used. If there are fewer than five alphanumeric characters, these bytes are right-padded with spaces. If there is no source field, these bytes are assigned spaces.

Diacritics and non-alphanumeric characters are automatically removed from this element before further processing.
65-68 Pub. Year 260 $c This field is parsed from right to left. The system considers four contiguous numeric characters to represent a year. All of the years listed in the field are considered, rather than just the first year that is found. If there are multiple years, precedence is given to years that are not preceded by a 'c'. If no year is found, or if there is no source field, these bytes are assigned spaces.
69-72 Pagination 300 $a First four contiguous numeric characters are used. If a non-numeric character is encountered, then no additional scanning of the field occurs. If there are fewer than four contiguous numeric characters, or if there is no source field, these bytes are assigned spaces.

Innovative can configure the system to automatically assign spaces to these bytes.
73-75 Edition Statement 250 $a First three contiguous numeric characters are used. If there are not three contiguous numeric characters, then the longest sequence of contiguous numeric characters available is used (for example, two contiguous numeric characters or first numeric character). If there are no numeric characters, then the first three contiguous alphabetic characters are used. If there are not three contiguous alphabetic characters, then the longest sequence of contiguous alphabetic characters available is used (for example, two contiguous alphabetic characters or first alphabetic character). If there are no alphabetic characters, or if there is no source field, these bytes are assigned spaces.

Diacritics are automatically removed from this element before further processing.

Innovative can configure the system to automatically assign spaces to these bytes.
76-77 Publisher Name 260 $b First two alphanumeric characters or spaces are used. If there are fewer than two alphanumeric characters or spaces, or if there is no source field, these bytes are assigned spaces.

Diacritics and non-alphanumeric characters (other than spaces) are automatically removed from this element before further processing.
78 Type of '_' Leader If there is a '_' tagged leader in the record, this value is taken from the 10th absolute byte of the leader. If there is no '_' tagged leader but there is an 008 field that has leader information, this value is taken from the 50th absolute byte of the 008 field. If there is no '_' tagged leader or an 008 field that has leader information, a space is assigned to this byte.
79-98 Title Part 245 $p First 20 alphanumeric characters or spaces are used. If there are fewer than 20 alphanumeric characters or spaces, these bytes are right-padded with spaces. If there is no source field, these bytes are assigned spaces.

Non-alphanumeric characters (other than spaces) are automatically removed from this element before further processing.
99-109 Title Number 245 $n First ten alphanumeric characters or spaces are used. If there are fewer than ten alphanumeric characters or spaces, these bytes are right-padded with spaces. If there is no source field, these bytes are assigned spaces.

Non-alphanumeric characters (other than spaces) are automatically removed from this element before further processing.