More About Phrase Indexes
A phrase index is created from a specified string of text in a variable-length field. A phrase index will include the entirety of the MARC field's specified subfields in a single index entry, in the order in which they appear.
Phrase indexes are given a one-letter abbreviation known as an "index tag." Index tags and field group tags are not the same thing.
Here are the standard system phrase indexes:
Index Tag | Index Name | Type of Record |
a | Author | Bibliographic and authority |
b | Barcode | Item and patron |
c | Call no. | Bibliographic, item and holdings |
d | Subject | Bibliographic and authority |
g | Gov't Doc no. | Bibliographic and item |
i | Standard no. | Bibliographic |
k | Titlekey | Bibliographic, built on the first seven words of the title; used to find duplicates when keying new records |
m | Resource subject | Resource; used in the ERM module |
n | Name | Patron |
o | Control no. | Bibliographic |
p | Prof/TA | Course |
r | Course | Course |
t | Title | Bibliographic and authority |
u | Unique ID | Patron |
w or x | Keyword | Bibiliographic, authority, holdings, order, vendor, invoice, resource, contact |
y | Resource title | Resource; used in the ERM module |
z | ARN | Authority |
For more about these indexes, see:
- The Heading Indexes: Author, Subject, and Title
- The Keyword Index: Title Sort Keys
- The Titlekey Index: Title Duplication Keys
- The Subject Index: Rotating Subject Headings
- Number Indexes
The Heading Indexes: Author, Subject, and Title
The author, subject, and title indexes are called "heading" indexes - each index entry consists of an entire heading, not individual words. When searching one of these indexes, the words used in the search statement must be entered in the order in which they appear in the entry in the index. For example, a title search for "lady portrait" would not retrieve the record for Henry James' "Portrait of a Lady".
Initial Articles in Titles
Initial articles in titles (as defined in MARC 21 records by the value of the second indicator in the 245 field) are not indexed in the title index. Although punctuation marks are usually not included in the title and subject indexes, the library may choose to include certain ones, such as the '#' character which might appear in titles to indicate the musical "sharp" (e.g., Quartet in C# Minor) or the '+' character which might appear in the title or subject entry for a book on the topic of C++ programming.
The recommended fields and subfields to be included in the standard author index are:
Standard Author Index | ||
---|---|---|
MARC Tag | Indicator | Subfields |
100 | all | abcdq |
110 | all | abcd |
111 | all | acdegq |
400 | 2nd ind. = 0 | abcd |
410 | 2nd ind. = 0 | abcde |
411 | 2nd ind. = 0 | acdegq |
700 | all | abcdq |
710 | all | abcde |
711 | all | acdegq |
800 | all | abcdeq |
810 | all | abcde |
811 | all | acdegq |
The recommended fields and subfields to be included in the standard subject index are:
Standard Subject Index | ||
---|---|---|
MARC Tag | 2nd Indicator | Subfields |
600 | 0 and blank | all |
610 | 0 and blank | all |
630 | 0 and blank | all |
650 | 0 and blank | all |
651 | 0 and blank | all |
690 | 0 and blank | all |
691 | 0 and blank | all |
The recommended fields and subfields to be included in the standard title index are:
Standard Title Index | ||
---|---|---|
MARC Tag | Indicator | Subfields |
100 | all | fglnoprstv |
110 | all | fkloprstv |
111 | all | fklpstv |
130 | all | all but h |
210 | all | all |
211 | all | all |
212 | all | all |
214 | all | all |
240 | all | all but h |
245 | all | all but h, c |
246 | all | all but h |
247 | all | all but h |
400 | 2nd ind. = 0 | tpv |
410 | 2nd ind. = 0 | tpv |
411 | 2nd ind. = 0 | tpv |
440 | all | all but x |
700 | all | fglmnoprstv |
710 | all | fklmoprstv |
711 | all | fklpstv |
730 | all | all but h, x |
740 | all | all but h, x |
800 | all | fglmnoprstv |
810 | all | fklmoprstv |
811 | all | fklpstv |
830 | all but h | all |
The Keyword Index: Title Sort Keys
Title Keys
The title sort key is sometimes referred to as simply the title key. The system uses a similar, though not identical, key for duplicate checking. This latter key is the title duplication key and its use is described in Duplicate Checking.
Each entry in the keyword index is assigned a seven-character title sort key, which is used to determine the order in which records appear in a record browse screen when searching the keyword index and the display is sorted by title. In such a display, the order in which the records appear is determined by the alphabetic order of the records' title sort keys. See Advanced Searching: Ranking Options for more information on the sort order of record browse displays.
The system derives the title sort key from the first subfield of the first Title field in the record by:
- Normalizing the Title field by removing non-filing words and replacing punctuation characters with spaces.
- Mapping any non-CJK or Thai braced diacritics to the form in which they are indexed.
- Taking the first five letters of the first word. If there are fewer than five letters in the first word, the system adds spaces at the end to fill it to five characters.
- Appending the first letter of the second and third words. If the title has only one or two words, spaces are used instead.
Some examples are:
Title Sort Key Examples | ||
---|---|---|
Title Field | Title Sort Key | |
|aHandbook of clinical laboratory data.|cCo-editors: Willard R. Faulkner, John W. King [and] Henry C. Damm. | handboc | |
|aDas opernbuch :|bein f{u00FC}hrer durch den | opernef | |
|aNon-dispersive infra-red gas analysis|bin science, medicine, and industry|c[by] D. W. Hill [and] T. Powell. | non di | |
|aPhotosynthesis|c[by] G.E. Fogg. | photo | |
|aA-Z of astronomy | a zo | |
|a1001 questions answered about astronomy. | 1001 qa | |
|aA la m{u00E9}moire de Paul d'Estournelles | a lm | |
|aPr{u00E9}cis de m{u00E9}canique rationnelle | precidm | |
|aLes {u00C9}tats-Unis et la France | etatsue |
Titles that consist entirely of braced diacritics, including those encoding CJK or Thai characters, are usually displayed in RightResult order. Sorting on title is not recommended.
The Titlekey Index: Title Duplication Keys
The title duplication key is formed from the first letter of each of the first seven words of the normalized form of the field.
If the field does not contain seven words, the key is padded with letters from the last word.
If there are not enough letters in the last word, asterisks are used to complete the key.
Initial articles (a, an, the) are excluded on the basis of the SKIP field in each bibliographic record.
When forming the title duplication key, the system eliminates subtitles and series numbering from the key by stopping at colons, semicolons, and SPACE-slashes (e.g., " /").
Any non-CJK or Thai braced diacritics are mapped to the form in which they are indexed .
Only the first 49 characters of the field are used in the formation of the title duplication key, and only alphabetic characters are used. Ampersands are treated as the word "and".
Some examples are:
Title Duplication Key Examples | ||
---|---|---|
Title/Series Field | Title Duplicate Key | |
The rise and fall of the Roman Empire | rafotre | |
Gone with the wind | gwtwind | |
Newsweek | newswee | |
River run | rrun*** | |
War within : from victorian to modernist thought | wwithin | |
McGraw Hill series in Speech; 36 | mhsispe | |
Sportparent / American Sport Education Program | sportpa | |
Träumen (entered as Tr{u00E4}umen) | traumen |
If a title or series entirely consists of braced diacritics encoding CJK or Thai characters, no title duplication key is constructed, but a search is conducted on the entire title. For titles encoded for other character sets, no meaningful title duplication key can be constructed, so duplication checking is not supported.
The Subject Index: Rotating Subject Headings
Subject headings can be "rotated". This means that a heading can be retrieved by matching on any of the following subdivisions: title of work ('t'), general subdivision ('x'), period subdivision ('y'), or place subdivision ('z'). This is true as long as the first character entered into one of these subdivisions is not a number. For example, the heading
6510 SOCIETY OF FRIENDS|yCIVIL WAR, 1861-1865|zPENNSYLVANIA
will be rotated to both:
PENNSYLVANIA--SOCIETY OF FRIENDS--CIVIL WAR, 1861-1865
and
CIVIL WAR 1861-1865--SOCIETY OF FRIENDS--PENNSYLVANIA
Thus, this heading can be retrieved by entering "Society" or "Pennsylvania" or "Civil War". If the period subdivision (subfield y) in the above heading had been "|y1861-1865", then that subdivision would not have been rotated, since subdivisions are not rotated if the first character is a number. In that case, the heading would have been retrievable by entering "Society" or "Pennsylvania", but not "1861-1865."
The advantage to rotated indexing of subjects is that it allows library staff to take full advantage of the global update function (which can do batch updating of rotated headings).
Certain subject subheadings occur so frequently that they are not helpful as access points and these subheadings are not rotated. An alphabetical listing of these headings is provided.
To ensure rotated subject headings work on your system, do not use a BROWSE_d Web option.
Number Indexes
Call Number Index
During record loading, the system chooses a single call number from each record (from among all the call numbers in the record) and stores it in the call number field. Your library can specify which fields the system should check, and the order in which it should check them, to extract the call number to use.
You can extract call numbers from different fields for different library collections based on the library holding symbol. You can edit the Holding Symbol file with Advanced System Access & Administration. See Holding Symbol for more information.
The resulting call number field may be indexed using one of the following:
- a character-by-character indexing scheme
- LC classification logic,
- Dewey classification logic
- SUDOCS logic
- NLM logic
Before indexing, the data in the call number field is normalized.
The LC classification logic allows the system to keep a call number index for LC call numbers in true shelflist order with Cutter numbers always in a specified position. If your library always uses LC call numbers, you should select the LC classification logic. Dewey, SUDOCS, and NLM logic order the call number index based on the intricacies of those classification schemes.
A library which uses call numbers from different classification schemes can choose to have multiple call number indexes. In such a case, when a call number is entered in a record, the system determines which indexing scheme to use based on the MARC tag of the call number field.
Government Docs Number Index
The recommended fields and subfields to be included in the standard SUDOCS index are:
Standard SUDOCS Index | ||
---|---|---|
MARC Tag | Indicator | Subfields |
086 | all | a |
Standard Number Index
Your library can choose to index ISBN and ISSN without punctuation, e.g., the ISSN 0148-8759 would be indexed as 01488759 and could be retrieved by searching for "01488759" or "0148-8579" or "0148 8759".
The recommended fields and subfields to be included in the standard ISBN/ISSN index are:
Standard ISBN/ISSN Index | ||
---|---|---|
MARC Tag | Indicator | Subfields |
020 | all | a |
022 | all | a |
028 | all | all |
Bib Utility or Control Number Index
The recommended fields and subfields to be included in the standard OCLC # index are:
Standard OCLC # Index | ||
---|---|---|
MARC Tag | Indicator | Subfields |
001 | n/a | all |