65th IFLA Council and General
August 20 - August 28, 1999
Code Number: 013-117_E
Division Number: IV
Professional Group: Classification and Indexing
Joint Meeting with: -
Meeting Number: 117
Simultaneous Interpretation: No
The Preparation of an Index for the Chinese DDC 21: Issues and Approaches
Editorial Board, Chinese Library Classification
1. Selection of Index Type
Generally speaking, the indexes of the world's library classification schemes can be divided into the following three types:
- Relative index: The indexes of both DDC 21 and CLC belong to this type.
- Chain index: The indexes of Colon Classification of India and Bliss Bibliographic Classification of U.K. belong to this type.
- Thesaurus-type index: The indexes of some special subject schedules of Universal Decimal Classification (UDC) belong to this type. The indexes include not only captions and notations of classes, but also UF (used for), NT (narrower term), BT (broader term) and RT (related term) which are commonly found in thesauri. Actually, this type of index has already become a kind of thesaurus.
All types of methods of indexing, especially those created by computers such as KWIC (keyword in context index), KWOC (keyword out of context index), and PRECIS (Preserved Context Indexing System) can be used in the preparation of classification indexes. Which type of index will be prepared for the Chinese DDC? A simple option is to prepare a relative index like DDC and CLC. However, a relative index has some shortcomings. First, it is difficult to generate such an index automatically. The process requires a great deal of human involvement; therefore, it is both time-consuming and resource consuming. Second, the specificity of index entries of a relative index is limited. A great number of compound terms that express complicated concepts might not be fully rotated in the index, so that they might not be found under particular search points.
In preparing an index for the Chinese DDC 21, we have set up the following goals:
- Index entries should have high specificity, matching the classes in the scheme as much as possible.
- The index should provide multiple search/access points to each class.
- The index should be easy to compile, easy to use, and should be able to employ computer technology in the compiling process. The process should be efficient while the quality should be guaranteed, and
- The index should fully consider the characteristics of the Chinese language.To attain these goals, it was decided to prepare a Chinese rotation index in KWIC format, combining some of the merits of other types of classification indexes. The major advantages of a rotated index include swift and easy generation by computer, low cost and high specificity. Through an index entry, a user can easily translate the subject term(s) he/she used into the notation of a class. In addition, a user can directly find the captions of a class and its alternative classes through index entries. This will give the user enough information to make a choice.
This kind of rotated index allows classifiers to search for one word (single- or multiple-character word in Chinese text) or a word element (one character in a Chinese word) in the captions of classes. Multiple search points guide a user to find appropriate entries in the index easily. One of the characteristics of Chinese language is that many concepts in a 'family" share same word(s) or word elements. In an index, the same word or word element that is used in different subject areas can be gathered together by nature. It becomes convenient for the user to check or search for some subject concepts that have various hierarchical relationships or associative relationships in various occasions.
2. Method of index generation
There are two methods of preparing an index for the Chinese DDC. One method is to translate the index of English DDC 21 into Chinese after completing the translation of the whole classification. Another is to generate a rotated index automatically for the Chinese DDC 21 based on the translated classification scheme. The first method is, in fact, direct translation. It directly translates the index of English DDC 21 into Chinese. While a great deal of time and manpower would be used in the translation, the translated terms could seldom maintain consistency with the captions of the classes in the Chinese DDC scheme. Additionally, much work has to be done in sorting Chinese index entries. The second method is called rearrangement. A new index is directly generated based on the classification schedule of the Chinese DDC 21. This approach would save much time and manpower because it can effectively use the translated classification scheme. The problem of such an index is its inconsistency with the index of English DDC 21. Weighing the advantages and disadvantages of both methods, it was decided to choose the rearrangement method. The selected type of indexing method is KWIC. An entry includes a heading (keyword) and its context within a caption, cross-references and alternative classes, and the locators (i.e. related notations). The index can provide many search points (the depth of indexing is greater than 2). The specificity of the index entries is high; and the index is easy to use.
3. Human-computer cooperation during the process of preparing the index
Dr. Ranganathan developed chain procedure by using the corresponding relationships between classes and subject headings cleverly. It has had widespread influences in the library and information field and has been used to prepare alphabetical indexes of classification schemes and alphabetical catalogues. The chain procedure is a semi-automatic method of index preparation. Today we can effectively translate a chain of a class into index entries through a computer process. There is need for minimum manual enhancement when a class chain is verified and edited and after, an index entry is automatically produced.
We had the experience of generating an index (approximately 1000 pages) for CLC by using the chain procedure and a keyword indexing system in 1999, using software we developed. This experience is very helpful in preparing the index for the Chinese DDC 21. When translating DDC 21 into Chinese, we produced machine-readable date in the process of computer typesetting. These data become the base of producing the index. The system we have successfully developed for generating indexes is composed of six modules:
- The dictionary production module: This module will collect terms used for the Index of Chinese DDC 21, and then produce a "term dictionary" through rearranging collected words;
- The machine-processing module: The major function of this is that the computer will process the classification scheme of Chinese DDC after indicators are added manually. The final result is a set of machine-produced entries that are independent and complete, and have clear meanings semantically.
- Automatic word segmentation module: The major function of this is to segment phrases automatically by using the "term dictionary", in order to ensure and improve speed and consistency of term segmentation. In addition, necessary manual modification to the automatically segmented words may be performed.
- The computer-aided verification and testing module: This module will verify and test data that are processed automatically by the computer. It can decrease the chances of man-made errors occurred in the routing operation and delete some unsuitable term segmentation by using stop-words.
- The index producing and sorting module: This completes the rotation of entries, links and arranges entries, outputs index entries. It will also delete all indicators and markups used in segmentation.
- Statistics and management module: The main functions include generating statistical data, monitoring the production of index entries, controlling the final index products which include checking the number of index entries and the size of the printed index.
The process of preparing the rotated index for the Chinese DDC 21 reflects the combination of computer technology with manual work. We have paid much attention to benefiting from computer advantages in handing data to finish some tasks such as segmenting words, producing index entries, rotating, sorting, verifying and testing and outputting. Human efforts have focused on intellectual work. The major work in preparation of Chinese DDC21 can be summarized thus:
- The computer program automatically deletes all typesetting markups including history notes and method notes from the machine- readable data of the classification scheme of Chinese DDC, then moves the processed data to a relation database.
- The manual processing of the data includes:
- Adding some special indicators according to the relationships among classes;
- Using special symbols to indicate the defined or modified locations of specific words or word elements;
- Deleting those classes that will not be useful in searching.
- Correcting or rewriting a small portion of class captions when adding special symbol becomes too complicated.
- After special symbols are added manually, the computer program generates phrases that have independent, complete and clear semantic meanings, including one or more class caption entries and scope and usage notes.
- The computer program automatically segments caption phrases using a "term dictionary".
- Index editors add some words that cannot be segmented automatically due to the limit of the "term dictionary". These words will be merged into all searching points in the index.
- After automatic verification and testing, further human quality control of index entries is necessary, based on a comparison between the entries and the classification scheme.
- The computer program produces index entries, arranges and rotates entries, and prints the draft of the index. The draft is further proofread by human work.
- The computer sends the final product to a laser printer. The final product is checked by the index editors again.
Hou Hanqing, et al. Computer-aided compilation of permuted index of Chinese Thesaurus. Journal of Information Science (in Chinese), v.17, no.4 1998.
Wilson, T.D. An introduction to subject indexing. London: Clive, Bingley, 1971.