Comparing Titles in Incoming and Existing Bibliographic Records
Innovative can configure your INN-Reach Catalog to compare the titles of incoming and master bibliographic records when matching on Primary Match Fields and/or Secondary Match Fields. To change whether or not the system performs title comparisons in matching, Central System Administrators must contact Innovative.
The title comparison uses data from the following subfields of the MARC 245 (TITLE) field:
245$a | Title |
245$b | Remainder of title |
245$n | Number of part/section of a work |
245$p | Name of part/section of a work |
To compare titles, the system normalizes the data in the MARC 245 fields from the incoming and existing records, and compares the normalized data as follows:
- Performs a strict comparison of title data. To do so, it concatenates and compares data from $a and $b:
Match? System Action Yes Jumps to the subfield $n comparison. No Jumps to the lenient title comparison. - Performs a lenient comparison of title data. To do so, it compares data from subfield $a only:
Match? System Action Yes Jumps to the subfield $n comparison. No Stops evaluating the records and identifies them as not a match. - Compares data for $n:
Match? System Action Yes Jumps to the subfield $p comparison. No Stops evaluating the records and identifies them as not a match. - Compares data for $p:
Match? System Action Yes Continues evaluating the records as potential matches per additional criteria. If there are no additional evaluations to be performed on the records, the system identifies the records as a match . No Stops evaluating the records and identifies them as not a match.
Normalizing MARC 245 Subfield Data
To normalize data from the MARC 245 subfields, the system:
- Makes all characters in the data lower case.
- Strips punctuation.
- Strips initial English articles (for example, "a", "an", "the") based on the second indicator in the MARC 245 field.
- Elides spaces.
- Replaces space-slash-space sequences (that is, " / ") with a single space. For example, "Bright-Sided / How Positive Thinking Is Undermining America" would normalize to "bright-sided how positive thinking is undermining america".
- Replaces UTF-8 codes with the Western ASCII equivalents.
- Strips data within square brackets ([ ]).
- Extracts "words" from the remaining data. For the purposes of this comparison, a word is defined as the first four letters of a sequence of continuous nonspace characters.
Depending on the subfield, the system extracts either a subset or all of the words identified, as follows:
Subfield: The system extracts: $a or $b The first three words in the first instance of the subfield. $n All words in the first instance of the subfield. $p All words in the subfield. If multiple instances of the subfield are present in a single field, the system extracts and compares the words from the first instance. If that comparison results in a match, the system extracts and compares the words from the second instance. The system does not evaluate instances of any subfields subsequent to the second.
For example:
Original Subfield Data | Normalized Subfield Data |
---|---|
t245 |aDragon Slippers | drag slip |
t245 |aThe Westing Game | west game |
t245 |aMy One Hundred Adventures | my one hund |
t245 |aFlora Segunda: Being the Magickal Mishaps of a Girl of Spirit, Her Glass-Gazing Sidekick, Two Ominous Butlers (One Blue), a House with Eleven Thousand Rooms, and a Red Dog | flor segu bein |
t245 |aFlora Segunda:|bBeing the Magickal Mishaps of a Girl of Spirit, Her Glass-Gazing Sidekick, Two Ominous Butlers (One Blue), a House with Eleven Thousand Rooms, and a Red Dog | flor segu bein |
t245 |aLe gar{uU+00E7}on qui pouvait voler | le garc qui |
t245 |aLe garcon qui pouvait voler | le garc qui |
t245 |aDissertation abstracts.|nA|pThe humanities and arts|pThe sciences |
subfield $a: diss abst subfield $n: a subfield $p (1st instance): huma and arts subfield $p (2nd instance): scie |