As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites
This old website and all of its content will stay on as archive – http://archive.ifla.org
Liu Xiangsheng
Wang Dongbo
In China, the development of research, compilation and application of classification systems and thesauri (CS&T) as tools for organizing knowledge can reflect certain aspects of present development of Chinese librarianship.
The article describes the general progress of CS&T in China over a period of 46 years ( especially in recent 15 years ), relating to brief history and current status of classification systems, brief history and current status of thesauri, classified thesauri-- the tools for organizing knowledge integrating classification systems with thesauri, general trends of the development of CS&T in Ch ina, Chinese natural language searching method, research and teaching in the field of CS&T in China, etc.
In China, the development of research, compilation and application of classification systems and thesauri (hereafter referred to as CS&T) as tools for organizing knowledge, can reflect certain aspects of present development of Chinese librarianship.
The modern classifications in China, with the marked Chinese characteristics, were influenced by DDC ( Dewey Decimal Classification) in the principles of compilation, the techniques of presentation and the systems of division. DDC itself, however, has been unable to be popularized in China.
The classifications used in contemporary China (after October, 1949) were mainly compiled after the founding of the People's Republic of China. Chinese socialist government enables the scientific, cultural and educational undertakings to become the absolutely necessary parts of the socialist construction. The task that libraries face is to change their work styles to serve the new society. Th e classifications with characteristics of nation and times had to have corresponding changes. In the early days, there were some classifications compiled on the basis of the old systems. The government realized that it was necessary to compile a more completely new classification for general libraries because the method of "new wine in old bottles" couldn't solve the problem. As Zheng Zhenduo, fo rmer vice minister of the Ministry of Culture, pointed out "new classification must be compiled by a group, not by individuals.
In more than 40 years, the compiled comprehensive classifications are listed in historical sequence as follows:
These classifications, which applied advanced compilation techniques, have the common characteristics of their new systems, Chinese nation and times.
Chinese Library Classification (hereafter referred to as CLC) is one of the most important and complete classifications. At present, it has been used in over 90% of libraries and information institutions of every types in China, especially all of public libraries. CLC began to be compiled by 36 main libraries and information institutions in 1971. Its trial edition was published in 1973, the 1st edition in 1975, the 2nd edition in 1980, the 3rd edition in 1990. Now it has formed a series, including basic edition (ca. 30,000 classes), enlarged edition (also entitled Zhongguo Tu Shu Zi Liao Fen Lei Fa (Classification for Monographs and Materials), ca. 50, 000 classes), abridged edition (ca. 4, 000 classes) , CLC for Children Libraries, CLC for Newspapers and Magazines, User's Manual, C ompared List of CLC and Chinese Thesaurus(also entitled Zhongguo Fen Lei Zhu Ti Ci Biao (Chinese Classified Thesaurus), with the function of relative index of CLC), etc. Now, several classifications for specialized subjects compatible with CLC is being compiled, some of which have been published, e.g. Zhongguo Tu Shu Guan Tu Shu Fen Lei Fa Jiao Yu Fen Lei Biao ( CLC for Education). CLC is divided into 5 categories, 22 divisions, and among them, the division of industrial technology has very detailed classes with 16 subdivisions. CLC is convenient to the users because its system is consistent with the development of contemporary science and technology, its classes are arranged reasonably, using mixed notation of letters and numbers. In 1985, CLC won the first-class prize of the National Award for the Advancement of Science and Technology. The abridged edition of CLC is published in Uighur language and Mongolian language, its basic edition (the 2nd edition) has already translated into Japanese and published by the Konno Institute for Chinese Language (in Japan).
Library Classification of People's University of China and Library Classification of the Chinese Academy of Sciences are used in such libraries and information institutions as libraries subordinated to Chinese Academy of Sciences and Chinese Academy of Social Sciences, some college and university libraries, some special libraries.
Chinese Archive Classification (CAC) consists of three kinds of classification schedules, one of them is main body, entitled Chinese Archive Classification, which is only used for classifying the archives of People's Republic of China, the others are supplements, entitled respectively Qing Dai Dang An Fen Lei Biao (Classification Schedule for Archives of Qing Dynasty) and Min Guo Dang An Fe n Lei Biao (Classification Schedule for Archives in the Republican Period).
CAC is an actually a series of classifications used for archives at all levels and all kinds all over the country. At present, several specialized subject classifications enlarged and adapted according to the main body of CAC are being compiled and published successively.
There are more than 10 such specialized classifications as Jun Shi Tu Shu Zi Liao Fen Lei Fa ( Classification for Military Science) in addition to above-mentioned general classifications. Quite a few of them, however, are not published publicly. Moreover, there are many other classification systems used in abstracting and indexing periodicals.
Some foreign classifications have been translated into Chinese. Universal Decimal Classification (UDC) used to be used in the information institutions of science and technology in 1960's and is now used for indexing Chinese standard documents. Dewey Decimal Classification (DDC) is used in several libraries for parts of their collections, DDC Chinese Edition is expected to be translated and p ublished in China in the near future. International Patent Classification ( IPC) is now used for indexing Chinese patent documents.
In 1974, "Project 748", a system project on Chinese Information Processing, began to be carried out. One part of the Project is the compilation of Han Yu Zhu Ti Ci Biao (Chinese Thesaurus, hereafter referred to as CT) planned for computerized information retrieval systems. CT, one of the largest thesauri in the world, has 91,158 descriptors and 17, 410 non-descriptors. Headed by Institute of Scientific and Technical Information of China and National Library of China, 1,378 compilers in the specific subject field from 505 institutions were in charge of its compilation. CT was published in 1980, and won the second-class prize of the National Award for the Advancement of Science and Technology in 1985. Because of compilation of CT, The Chinese tools for organizing knowledge got into a new era of simultaneous development of CS&T.
CT is mainly used in general libraries and information institutions. Its low specificity of terms in thesaurus is revealed when it is used in specialized libraries and information institutions, especially in specialized databases of journal papers. Later, specialized subject thesauri providing more specialized subject terms, compiled on the basis of CT by some institutions that participated in the compilation of CT have come out one after another for the needs of specialized databases. Since then, more and more thesauri comes into being. There are more than 100 thesauri already compiled so far. These specialized subject thesauri, being published publicly one after another and covering almost all subject fields, have substantial quantity of terms. It is affirmed that thesauri are the main indexing tools used in the Chinese databases.
There are two other general thesauri in China, Zhongguo Dang An Zhu Ti Ci Biao (Chinese Thesaurus for Archives) and Jun Yong Zhu Ti Ci Biao ( Military Thesaurus). Military Thesaurus is a series of large-scale thesauri, with a dictionary of descriptors as one volume.
In the end of 1986, 40 institutions, headed by National Library of China, began to compile Chinese Classified Thesaurus (hereafter referred to as CCT). The compilation of CCT is a huge and complicated project, and was finished in 1994. The printed CCT, divided into 2 parts and 6 volumes, 6,215 pages, was also published publicly in the same year.
CCT is a multifuctional indexing tool, it is not only a complete CLC and Classification for Monographs and Materials, but also a revised and enlarged CT. Through two-way corresponding of class numbers and descriptors, the part of classification schedule can be taken for category index and hierarchical indexes of CT, the part of thesaurus can be taken for class relative indexes of CLC and Cla ssification for Monographs and Materials. It is greatly convenient for classifying and subject indexing and retrieving. Subject searching in the bibliographic database with class numbers and without descriptors, linked with machine-readable version of CCT is demonstrated to have satisfied retrieval effectiveness.
There are two other similar classified thesauri published in the field of the Medical Science and Education. There are also two thesaurofacets, Educational Thesaurofacet, and Retrieval Vocabularies of Social Sciences and Humanities, a large- scale multi-disciplinary thesaurofacet.
The China National Technical Committee for Standardization of Information and Documentation (CNTCSID), founded in 1979, corresponding to ISO TC46, sets up the 5th Subcommittee engaged in standardization of CS&T and indexing. Since 1980, the Subcommittee has been doing its work vigorously and doing research on the standardization in this field, for example, recommendation of CLC and CT as nat ional standards; quality inspection on CAC; formulation of national standards as follows:
The problem of compatibility of thesauri is also paid close attention. More than 100 specialized subject thesauri have been compiled since 1980's. These thesauri are in consistence not only with ISO2788 Documentation --Guidelines for Establishment and Development of Monolingual Thesauri and GB13190 Documentation--Guidelines for Establishment and Development of Chinese Thesauri in the constru ction techniques, but also with CT in the choice of terms. In order to achieve the compatibility more conveniently, a project called National Term Bank is being carried out. So far, many thesauri have been inputted into the Bank as a compatible center. In addition, the term bank of Military Thesaurus is being set up.
Another way to achieve the compatibility is to develop a series of classifications and thesauri. CLC, CAC, and Military Thesaurus etc. , which consist of a series of classifications or thesauri respectively, are compatible large scale knowledge tools. Such practice may be the characteristic of China.
In China, computer application in the library and information work is wide, and the computerization of CS&T is the inevitable trend. The computer is used popularly in the production of printed CS&T. In addition, the use of computer reaches a higher level in auto-generation of some parts of CS&T and in their management. For example, Educational Thesaurus, CCT, Retrieval Vocabularies for Socia l Sciences and Humanities, Military Thesaurus, etc. There are some progresses on automatic assigned indexing of class numbers. These actual experience will speed up the development of computerization of CS&T.
The combination reform of enumerative classification, namely increase of combined component, become a trend, because this scheme can eliminate the disadvantages of having limited capacity of concepts and the contradiction between centralization and decentralization, and develop the retrieval functions more effectively in the computerized information retrieval systems. Library circle held two nationwide seminars before and reached a common view on the problem. At present, the better reform scheme in the techniques is being explored.
Because of possibility of practical verification, the integration of CS&T will be an important trend in the field of tools for organizing knowledge in China in the future.
Besides the technique of automatic Chinese term extraction, other ways using natural language for searching are also being explored, for example, inputting one or several Chinese characters or one or several words formed by several Chinese characters and then searching in the machine-readable text, which is the inherent function of computer word processing and is slow. There is a way to quic ken the searching, called the single Chinese character searching, i.e. index according to Chinese character, or non-indexed system. In this way, every Chinese character from title, abstract or text is put into index (useless word also can be excluded). Then several Chinese characters can be combined when searching. In addition, there is a way of keyword extraction by man-machine interaction and a way of free indexing without word list, etc. Compared with automatic Chinese exaction, these ways is all easier to realize and have already been used in some systems.
The disadvantages of all ways using natural language are that there are a great deal of synonyms, near synonyms, polysemies, ambiguous meaning and lack of semantic connection between words. These factors affect not only the recall ratio, but also the precision ratio. Therefore, the control of these factors, i.e. post-control of them, is still needed. The research of post-controlled vocabular ies is given more and more attention by the researchers of CS&T, and there has been some good results.
The activities for study and discussion of CS&T in China can be divided into two main fields: (i) researches on the compilation, revision and evaluation of CS&T, publishing essays by nationwide scholars and library staff members and organizing seminars; (ii) researches on the theories in the field, mainly conducted by teachers and postgraduates in universities and colleges. In China, the active researchers in CS&T are Pi Gaopin, Zhang Qiyu, Li Xinghui, Bai Guoying, Liu Xiangsheng, Zhao Zongren, Qian Qilin, Hou Hanqing, Qiu Feng, Wang Yongcheng, Chen Shunian, Zeng Lei, Dai Weimin, etc.
More than 40 departments of library and information science or information management in universities or colleges offer courses on CS&T, Postgraduates are educated in Wuhan University , Peking University, Air Force Political College (in Shanghai), East China Normal University ( in Shanghai) , Beijing Normal University, Zhongshan University (in Guangzhou), the Documentation and Information Ce nter of the Chinese Academy of Sciences, etc.
In China, CS&T are called by a common name "Information Retrieval Language". The discipline of comprehensive study of methodology for organizing knowledge such as CS&T etc. is generally called "Information Linguistics". The discipline of separate study of classification is designed "Classification Science". There are many published monographs such as "Xian Dai Xi Fang Zhu Yao Tu Shu Fen Lei Fa Shu Ping (A Review on Main Modern Western Library Classifications)", by Liu Guojun, in 1980, "Tu Shu Fen Lei Xue ( Classification Science)" , by Bai Guoying, in 1981, "Qing Bao JIan Suo Yu Yan ( Information Retrieval Language)", by Zhang Qiyu, in 1983, "Zhu Ti Fa Yu Zhu Ti Biao Yin (Subject Indexing Language and Subject Indexing)", by Liu Xiangsheng, in 1985, "Han Yu Zhu Ti Ci Biao Biao Yin Sh ou Ce (Guide to Chinese Thesaurus)" , by Qian Qilin, in 1985, "Qing Bao Yu Yan Xue Ji Chu ( The Fundamentals of Information Linguistics)", by Zhang Qiyu, in 1987, "Qing Bao Jian Suo Yu Zhu Ti Ci Biao (Information Retrieval and Thesaurus)", by Qiu Feng, in 1988, "Tu Shu Fen Lei Xue (Classification Science)", by Zhou Jiliang et al., in 1989, "Zhu Ti Fa Dao Lun ( Introduction to Subject Indexing Lan guage)", by Hou Hanqing et. al., in 1991, "Dang Dai Fen Lei Fa Zhu Ti Fa Suo Yin Fa Yan Jiu (Research on Contemporary Library Classification, Thesaurus and Indexing)" by Hou Hanqing et al., in 1993, etc. There are also more than 5,000 articles on the field of CS&T.
The changes that had happened in the recent ten years in the research on CS&T, can be summed up as the following four aspects:
The development in this period have raised the theoretical level of China CS&T more rapidly and greatly, thus narrowed down the gap between China and other advanced countries.
LIU Xiangsheng, research librarian of the National Library of China; Standing Council Member and Secretary-General of the China Society for Library Science; Editor-in-Chief of Chinese Library Classification; member of China Technical Committee of Standardization for Information and Documentation
WANG Dongbo, associate research librarian and deputy director of Department of Cataloging for Chinese Monographs, National Library of China.