Non-Roman Script Materials in North American Libraries: Automation and International Exchange

John Eilts
Research Libraries Group, Mountain View, California,USA

The cataloging of nonroman script materials in North America prior to automation was accomplished in a multitude of very divergent ways. The physical limitations of the early printed book catalog format were probably the most instrumental factors in perpetuating differences in practices among the research libraries. Eventually the major research libraries settled into two primary methods, an d one hybrid method, whether using the book catalog format, or later the revolutionary and ubiquitous card format. The latter, card format gained significantly as a preferred method after the 1901 i ntroduction of printed catalog cards at affordable prices from the Library of Congress.

The methods used in representing nonroman scripts were often dictated by local conditions, primarily the staff expertise and equipment available. The first method used was to record the bibliograph ic information in the original script. In this model (Figure A) the native script was used for the entire cataloging record. In some cases research libraries even tried to maintain native language subject access.

Figure A.

In the second model (Figure B) the entire record was romanized (transcribed in Latin letters representing either the closest pronunciation of the language for an English speaker, or an attempt to re present the written characters. This method was used to maintain a unified catalog as the previous method required maintaining a separate catalog sequence for each script or language. It also did n ot require any special typographic equipment, or specialized staff to maintain separate files in each language.

The romanization was problematic in certain languages where certain characters (e.g. vowels in the semitic languages) were often not written, or in ideographic scripts such as Chinese. There was a virtual "Tower of Babel" of romanization systems in use in the early part of this century. It was not until the latter half of the current centruy that the North American library community published standards for romanization.

Figure B.

Avicenna, 980-1037.
    al-Qİnýn fþ al- ibb / ta lþf Abý  Alþ al- usayn ibn  Alþ ibn Sþnİ ;
 ta qþq wa-ta lþq Sa þd al-La  İm. -- Bayrýt, Lubnİn : Dİr al-Fikr, 1994.
    4 v. ; 25 cm.
    Includes bibliographical references.

    1. Medicine, Arab--Early works to 1800. I. La  İm, Sa þd. II. Title.                  

The third model (Figure C) is a mixture of scripts. This was by far the most prevalent method used in the card catalog. It had the advantage of allowing for filing of all languages into one dicti onary sequence. This is the format used by the Library of Congress in its own printed catalog cards which were distributed throughout the world.

Figure C.

Averroës, 1126-1198.
  [Peraim mi-tokh "ha-Be ur ha-emtsa i"]
  42 p. ; 25 cm.

  1. Aristotle. Physics. 2. Aristotle. Metaphysics. I. Kalonymus ben Kalonymus,
 b. 1286. II. Sermoneta, Giuseppe. III. Ravitzky, Aviezer. IV. Title.                   

A fourth model was used by some of the libraries with much smaller collections in nonroman scripts. They generally represented the materials with translated bibliographic information. This proved e asier for processing and did not necessarily require special expertise on the staff. Often a library patron who knew the language involved was asked to provide a translation, which may or may not h ave been accurate and was limited by the patron's knowledge of his own language and of English.

Figure D.

       [Wisdon of Confucius.  Taiwan, 1977]
       63 p. ; 23 cm.
        In chinese.
        1. Philosophy, Chinese. 2. Ethics, Chinese.

It is fortunate that through all of the stages in the development of the Anglo-American Cataloging Rules , that libraries processing nonroman script materials tended to follow the mainstream and did not invent a separate set of rules. The use of recognized standards for processing nonroman script materials has guaranteed that these materials will be in the mainstream as automated systems devel op the capability to handle the various scripts.

When the computer came into the library world in the 1960's it was primarily used as a tool to automate the time consuming, and error prone process of manual production of the unit card, and for the over-typing of added entries. This was a tremendous labor saving measure which also had the by-product of retaining the record in electronic format. This fact allowed for the relatively easy distr ibution of cataloging information.

The Library of Congress began to use the USMARC format and make copies of its records available in machine readable format in the 1960's. This fact fostered the development of bibliographic networ ks. Although the major source was initially from the Library of Congress, it rapidly became apparent that this networking allowed for the sharing of cataloging information among users of the same ne twork.

The earliest implementations of USMARC supported only the basic ASCII (American Standard Code for Information Interchange--the Latin letters A-Z, numbers, and some punctuation) and the ANSEL (Ameri can National Standard for Extended Latin--additional letters and modifiers necessary to represent other European languages and to romanize Nonroman script languages).

For this reason, libraries needing to automate their processing of nonroman script language materials, it was necessary to create records in which the bibliographic data was fully romanized, or to c ontinue with increasingly expensive manual processes.

It was not until the 1980's that MARBI (the American Library Association joint committee responsible for the development and maintenance of the USMARC format) added extensions to USMARC to accommod ate nonroman scripts in bibliographic records. The nonroman scripts were added to the basic MARC record as “alternate graphic representations” as multiple occurances of the field 880 (Figure E).

Figure E

The enabling format changes were followed by the creation of the RLIN East Asian Character Code (REACC--a unified code set for the automation of the Chinese, Japanese, and Korean scripts, which was later adopted as an American National Standard and renamed EA CC). In 1983 the Research Libraries Group (RLG) introduced the processing of CJK in its RLIN network. OCLC implemented its CJK system in 1986. This dual network implementation has allowed for the distribution of the Library of Congress cataloging in electronic format, and even more importantly, the exchange of CJK script data between the two major North American bibliographic networks.

Since then RLG has implemented support for the Cyrillic script (1986), the Hebraic script (1988), and the Arabic script (1992). Records created by the Library of Congress in CJK, Hebraic, and Arabic scripts are currently available for distribution world -wide through the Cataloging Distribution Service.

The model adopted in the automated environment using nonroman scripts was the mixed model as in Figure C, above, with the added requirement of romanizing the complete bibliographic description. The libraries participating in the bibliographic networks are abl e to process their own materials in the original script as well as use the data created by others.

After decades of building a complex of standards for descriptive cataloging, subject analysis, classification, machine readable encoding, romanizations, character sets, etc., libraries in the 1990's are now looking at simplification of the cataloging process. Cataloging of materials to the full extent of these standards has become an extrememly expensive operation. Libraries have been experiencing funding problems for the past two decades and h ave reached a point of maximizing the availability of using cataloging copy from other sources. They have now turned their attention to minimizing the amount of necessary data to maximize the produc tivity of their shrinking cataloging staff.

To address the problems being faced by the libraries of North American, the Library of Congress formed the Cooperative Cataloging Council (CCC) in January of 1993. The group was formed to develop a strategic plan for the national cooperative cataloging progra m. The five main goals of the program are:

  1. Increase timely availability of bibliographic and authority records in a more cost-effective manner;
  2. Develop and maintain mutually acceptable standards for records;
  3. Promote timely access and cost-effectiveness in cataloging and expand pool of catalogers using the mutually accepted standards;
  4. Increase the sharing and use of foreign bibliographic and authority records;
  5. Provide for ongoing planning and operations among participants to further the mission.

In the Spring of 1994 following the successful report of the Standards Task Group which developed standards for a "core level" bibliographic record, the CCC appointed a task group to apply these principles to the JACKPHY (Japanese, Arabic, Chinese, Korean, Pe rsian, Hebrew, Yiddish) languages. In addition to defining the necessary script elements for a core record (Latin script elements were defined by the work of the Standards Task Group), this group de fined a fundamental problem as the lack of vernacular script data in authority files. The issue was not establishing headings in the vernacular scripts, it was to include vernacualr script forms in the authority record. It was generally agreed that the authoritative form in a North American library is that which is established in the Anglo-American Name Authority File. The USMARC format alrea dy has the provision for including nonroman script data in much the same manner as it is included in the bibliographic formats as alternate graphic representations. It has yet to be implemented. Th is is partly due to the integrated nature of authority record creation in North America. Authority records are created at the Library of Congress in its automated system (MUMS), and also by research libraries in the network of their choice--either OCLC or RLIN. These systems regularly send new and updated headings for inclusion in the other databases, and are never more than 24 hours out of sy nchronization. BUT, not all of the three systems have the same nonroman script capabilities and, in fact, MUMS does not currently allow the creation or display of any nonroman scripts.

The challenge now facing the libraries of North American in the area of nonroman script cataloging is the international exchange of data. The libraries of North American have now developed standards for cataloging, character sets, data formats, subject anal ysis, classification, etc. The task ahead is to begin to develop the necessary programs to translate these formats to and from those in use in the countries where the scripts are native. And in are as where the direct translation of formats would cause a lose of data, begin to discuss and negotiate with the concerned parties so that all could benefit form the international exchange of bibliogra phic data.