Comparing Titles in Incoming and Existing Bibliographic Records

Innovative can configure your INN-Reach Catalog to compare the titles of incoming and master bibliographic records when matching on Primary Match Fields and/or Secondary Match Fields. To change whether or not the system performs title comparisons in matching, Central System Administrators must contact Innovative.

The title comparison uses data from the following subfields of the MARC 245 (TITLE) field:

245$a	Title
245$b	Remainder of title
245$n	Number of part/section of a work
245$p	Name of part/section of a work

To compare titles, the system normalizes the data in the MARC 245 fields from the incoming and existing records, and compares the normalized data as follows:

Performs a strict comparison of title data. To do so, it concatenates and compares data from $a and $b:
Match? System Action
Yes Jumps to the subfield $n comparison.
No Jumps to the lenient title comparison.
Performs a lenient comparison of title data. To do so, it compares data from subfield $a only:
Match? System Action
Yes Jumps to the subfield $n comparison.
No Stops evaluating the records and identifies them as not a match.
Compares data for $n:
Match? System Action
Yes Jumps to the subfield $p comparison.
No Stops evaluating the records and identifies them as not a match.

Match?	System Action
Yes	Jumps to the subfield $n comparison.
No	Jumps to the lenient title comparison.

Match?	System Action
Yes	Jumps to the subfield $n comparison.
No	Stops evaluating the records and identifies them as not a match.

Match?	System Action
Yes	Jumps to the subfield $p comparison.
No	Stops evaluating the records and identifies them as not a match.

Compares data for $p:

Match?	System Action
Yes	Continues evaluating the records as potential matches per additional criteria. If there are no additional evaluations to be performed on the records, the system identifies the records as a match .
No	Stops evaluating the records and identifies them as not a match.

Normalizing MARC 245 Subfield Data

To normalize data from the MARC 245 subfields, the system:

Makes all characters in the data lower case.
Strips punctuation.
Strips initial English articles (for example, "a", "an", "the") based on the second indicator in the MARC 245 field.
Elides spaces.
Replaces space-slash-space sequences (that is, " / ") with a single space. For example, "Bright-Sided / How Positive Thinking Is Undermining America" would normalize to "bright-sided how positive thinking is undermining america".
Replaces UTF-8 codes with the Western ASCII equivalents.
Strips data within square brackets ([ ]).

Extracts "words" from the remaining data. For the purposes of this comparison, a word is defined as the first four letters of a sequence of continuous nonspace characters.

Depending on the subfield, the system extracts either a subset or all of the words identified, as follows:

Subfield:	The system extracts:
$a or $b	The first three words in the first instance of the subfield.
$n	All words in the first instance of the subfield.
$p	All words in the subfield. If multiple instances of the subfield are present in a single field, the system extracts and compares the words from the first instance. If that comparison results in a match, the system extracts and compares the words from the second instance. The system does not evaluate instances of any subfields subsequent to the second.

For example:

Original Subfield Data	Normalized Subfield Data
t245 \|aDragon Slippers	drag slip
t245 \|aThe Westing Game	west game
t245 \|aMy One Hundred Adventures	my one hund
t245 \|aFlora Segunda: Being the Magickal Mishaps of a Girl of Spirit, Her Glass-Gazing Sidekick, Two Ominous Butlers (One Blue), a House with Eleven Thousand Rooms, and a Red Dog	flor segu bein
t245 \|aFlora Segunda:\|bBeing the Magickal Mishaps of a Girl of Spirit, Her Glass-Gazing Sidekick, Two Ominous Butlers (One Blue), a House with Eleven Thousand Rooms, and a Red Dog	flor segu bein
t245 \|aLe gar{uU+00E7}on qui pouvait voler	le garc qui
t245 \|aLe garcon qui pouvait voler	le garc qui
t245 \|aDissertation abstracts.\|nA\|pThe humanities and arts\|pThe sciences	subfield $a: diss abst subfield $n: a subfield $p (1st instance): huma and arts subfield $p (2nd instance): scie