Comparing Titles in Incoming and Existing Bibliographic Records

Innovative can configure your INN-Reach Catalog to compare the titles of incoming and master bibliographic records when matching on Primary Match Fields and/or Secondary Match Fields. To change whether or not the system performs title comparisons in matching, Central System Administrators must contact Innovative.

The title comparison uses data from the following subfields of the MARC 245 (TITLE) field:

245$a Title
245$b Remainder of title
245$n Number of part/section of a work
245$p Name of part/section of a work

To compare titles, the system normalizes the data in the MARC 245 fields from the incoming and existing records, and compares the normalized data as follows:

  1. Performs a strict comparison of title data. To do so, it concatenates and compares data from $a and $b:
    Match?System Action
    YesJumps to the subfield $n comparison.
    NoJumps to the lenient title comparison.
  2. Performs a lenient comparison of title data. To do so, it compares data from subfield $a only:
    Match?System Action
    YesJumps to the subfield $n comparison.
    NoStops evaluating the records and identifies them as not a match.
  3. Compares data for $n:
    Match?System Action
    YesJumps to the subfield $p comparison.
    NoStops evaluating the records and identifies them as not a match.
  4. Compares data for $p:
    Match?System Action
    YesContinues evaluating the records as potential matches per additional criteria. If there are no additional evaluations to be performed on the records, the system identifies the records as a match .
    NoStops evaluating the records and identifies them as not a match.

Normalizing MARC 245 Subfield Data

To normalize data from the MARC 245 subfields, the system:

  1. Makes all characters in the data lower case.
  2. Strips punctuation.
  3. Strips initial English articles (for example, "a", "an", "the") based on the second indicator in the MARC 245 field.
  4. Elides spaces.
  5. Replaces space-slash-space sequences (that is, " / ") with a single space. For example, "Bright-Sided / How Positive Thinking Is Undermining America" would normalize to "bright-sided how positive thinking is undermining america".
  6. Replaces UTF-8 codes with the Western ASCII equivalents.
  7. Strips data within square brackets ([ ]).
  8. Extracts "words" from the remaining data. For the purposes of this comparison, a word is defined as the first four letters of a sequence of continuous nonspace characters.

    Depending on the subfield, the system extracts either a subset or all of the words identified, as follows:

    Subfield:The system extracts:
    $a or $bThe first three words in the first instance of the subfield.
    $nAll words in the first instance of the subfield.
    $pAll words in the subfield. If multiple instances of the subfield are present in a single field, the system extracts and compares the words from the first instance. If that comparison results in a match, the system extracts and compares the words from the second instance. The system does not evaluate instances of any subfields subsequent to the second.

For example:

Original Subfield Data Normalized Subfield Data
t245 |aDragon Slippers drag slip
t245 |aThe Westing Game west game
t245 |aMy One Hundred Adventures my one hund
t245 |aFlora Segunda: Being the Magickal Mishaps of a Girl of Spirit, Her Glass-Gazing Sidekick, Two Ominous Butlers (One Blue), a House with Eleven Thousand Rooms, and a Red Dog flor segu bein
t245 |aFlora Segunda:|bBeing the Magickal Mishaps of a Girl of Spirit, Her Glass-Gazing Sidekick, Two Ominous Butlers (One Blue), a House with Eleven Thousand Rooms, and a Red Dog flor segu bein
t245 |aLe gar{uU+00E7}on qui pouvait voler le garc qui
t245 |aLe garcon qui pouvait voler le garc qui
t245 |aDissertation abstracts.|nA|pThe humanities and arts|pThe sciences subfield $a: diss abst
subfield $n: a
subfield $p (1st instance): huma and arts
subfield $p (2nd instance): scie