As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites
This old website and all of its content will stay on as archive – http://archive.ifla.org
This paper describes the characteristics and current state of Research and Development on digital libraries in Japan. Government related agencies such as National Diet Library and Information Technology Promotion Agency, National Center for Science Information Systems, and B-ISDN Business Chance & Culture Creation are active in developing digital libraries. Having identified three kinds of digit al libraries the paper describes the activities carried out at those institutions in terms of their aims and products. Some universities such as Nara Institute of Science Technology and University of Library and Information Science as well as for-profit private companies embark on the projects from several points of view. Then, the paper describes their undertakings focussing on fundamental ideas , technologies, and techniques such as browsers, full-text search methods, visualization, SGML applications, document structures, user interfaces, and others. Non-technical aspects indispensable for the development are also briefly mentioned.
A move to digital libraries from traditional ones is an outstanding example and several libraries are required to deal properly with it. Facing this kind of shift, traditional libraries struggle to show their identity and raison d'etre to the changing world. On the other hand, some scholars and researchers in universities and private companies are actively participating in the research and dev elopment (R&D) related to digital libraries. Hopefully, the results from the R&D will help libraries, when fulfilling intellectual duties assigned to them.
Since digitization of printed materials is quite important to build digital libraries, a variety of digitization projects have been carried on in the world. An attempt by the National Diet Library is an example. As the result of digitizing materials, images or full-texts are generated depending on the technology used for the purpose.
A big merit to develop digital libraries is due to the marked grade up in retrieval ability, as compared with the one in so-called paper libraries. Full-text search facilities, therefore, are indispensable in the digital library environment and it is very valuable to produce full-text forms from the point of searching information.
Some considers information technology as a panacea for the recovery from depression in trade, since it has a possibility to give a big business chance to the industry. In particular, what is necessary to develop digital libraries is considered as promising in terms of economic effects. Digital library projects require a variety of new technology and the accomplishments by the R&D may produce a new big market.
Taking account of these economic effects, R&D activities on digital libraries have been carried out under strong initiatives by the government agencies in Japan. The Ministry of International Trade Industry has supported financially that kind of project via the Information Technology Promotion Agency. Some other Ministries also embark on the similar projects. Considering such characteristics, this paper describes the state of the R&D related to the digital libraries in Japan.
Digital libraries could be categorized as follows though there will be other ones.
The network is, however, a metaphorical library since it does not have the idea of collection. Set of information offered from the network is only the disordered aggregation of those provided by the participating members of the network in many cases. A typical example is the Internet. It cannot have a unique collection policy of its own.
There are computer-based information systems that simulate information seeking behavior in libraries and mechanisms that adopt a book metaphor for visualizing information for users. In those systems we can get access to the stored information as if we are in a virtual library. Those are also metaphorical libraries and could be considered as user interfaces.
The national union catalog network project is aiming at the groundwork for the building of a future digital library network. Twenty-seven public libraries and the National Diet Library are participating in the project as of March 1996. Bibliographic information from participating libraries is integrated into the union catalog database located in the Center for Information Infrastructure. It is to be searched by participating libraries through the network (using ISDN lines). Next step is to develop effective methods for identifying and integrating bibliographic information and for producing index files in digital library environment. A more user-friendly interface and new interlibrary loan system are also examined.
On the other hand, the digital library demonstration project aims to unveil technical issues and problems that may be encountered when developing and operating digital libraries in the future. The ultimate goal of the project will be the "Type 1" digital library. A variety of materials are digitized such as rare books, books published in the Meiji Era, journal articles on politics and economic s, and others. Most of them were chosen from the collection in the National Diet Library. Search systems are developed to retrieve above mentioned digitized data. They enable both bibliographic and full-text searching. A menu-driven method is also prepared.
At the same time, the Information Technology Promotion Agency has established a special committee, composed of scholars, researchers, and librarians. The purpose is to discuss approaches, problems, and issues related to the development and operation of digital libraries and put the results to practical use.
NACSIS-ELS (NACSIS Electronic Library System) is a prototype of next-generation information service system that integrates and replaces conventional online information retrieval systems(5). The major objective of this project is to investigate feasibility of the basic idea underlying NACSIS services and make the system functions more practical and sustainable. It is already in trial service sin ce February 1995 in a distributed processing environment.
NACSIS-ELS provides image and bibliographic databases at present. Image databases consist of raster images of scientific and scholarly journals as well as conference proceedings. Those hold all pages of articles including a cover, table of contents, and back cover. Publications of twenty-two Japanese academic societies are used for the image capturing. Books are not considered as candidates fo r electronic materials stored in the databases. Now, NACSIS-ELS databases contain page images not full-text forms. However, the capability to manipulate such forms is under development and is to be embedded into NACSIS-ELS(6).
Bibliographic databases are prepared for users to locate secondary information about the images. They contain records comprising titles, author names, affiliations and abstracts of scientific papers derived from NACSIS Information Retrieval Service (NACSIS-IR). NACSIS-ELS also stores cataloging information used for NACSIS-CAT, the cataloging service for university libraries began in 1984.
Users can look for the journal articles by conventional Boolean search of bibliographic databases as well as browsing images, and the page images requested by users are transmitted to users' workstation from the image databases through the Internet. In addition, the system enables users to receive high-quality printouts from local printers. NACSIS-ELS offers usual online search functions foun d in NACSIS-IR, which has been available since 1987. It also enables users to browse pages and print those if needed.
The evaluation of the performance of NACSIS-ELS was started in December 1994, by monitoring comments from users of the system. The operational (commercial) service is scheduled to start in April 1997. The Advisory Board for the Electronic Library System was established to make the shift from the trial stage to the operational one smooth.
NACSIS-ELS functions mainly as a document delivery system. It might be classified into the "Type 2" library.
NACSIS also participates in a project to develop digital archive for music (7). The project is going on as a part of the IDEAMA (International Digital ElectroAcoustic Music Archive) project, of which purpose is to preserve early tape music in this century. Surveys concerning bibliographic information and location of the tapes have been carried out.
A variety of multimedia application projects have been carried out by making use of extremely high-level information technology offered by BBCC. The digital library project is an example. Those will contribute substantially to the development of Kansai Branch of National Diet Library. The library is scheduled to be built in the City by the year of 2003 and is expected to be the largest digital library in Japan.
Leading companies in the computer and telecommunications industry such as Fujitsu, Hitachi, NTT, and Toshiba take part in the projects, respectively. In particular, Ariadne system has made considerable influence on the other R&D related to digital libraries. It was developed by an ad-hoc group named Electronic Library Research Group organized by scholars from several universities and Fujitsu. Ariadne has such outstanding characteristics as follows(8),(9).
Emphasis of the system is on the user-friendliness. Ariadne will be categorized in the "Type 1" library.
ULIS itself has carried out several research projects. One of them is the development of a bookshelf-like browser, named SOPAC1(2). SOPAC is an OPAC using a bookshelf metaphor. When getting access to the system the physical image of each book is displayed with its size and thickness. The image is used as button to request bibliographic information about the book. SOPAC has been developed on th e operational OPAC system in the university.
A variety of collaboration works are going on in libraries. Users consult with librarians to get information they need. We exchange ideas and information with others to perform intellectual activities in libraries. Taking account of this phenomenon, a collaboration support system was developed and validity of the system was examined from a survey result (12).
Library collection is essentially multilingual in the sense that it includes materials written in foreign languages. Thus, the facility is required to deal with multilingual documents in digital libraries. For example, a digital library is desirable to have functions to input, search, and display texts written in different character sets. ULIS has developed a browsing tool for multilingual doc uments stored in the WWW environment(13).
Japanese character set includes three kinds of subsets, Kanji, Katakana and Hiragana. It is said to have more than eight thousand characters together. In addition, Japanese writing is ambiguous in two reasons. One comes from the transliteration of foreign words. Many English words are transliterated into Katakana, but their spelling is often not unique. It is due to the difficulty to represent English sounds by Katakana because of a big gap in the pronunciation system between two languages. Many Kanjis have variants in their shapes. Moreover some characters in different subsets have similar shapes and not easy to distinguish the difference by vision.
These are major causes to impede correct recognition of texts in digitized materials. Thus, it is needed to develop a full-text search method that can tolerate such errors, especially in Japan. In this sense it is also meaningful to explore approximate or fuzzy matching techniques in the OCR environment.
Two search techniques were developed at Hitachi Central Research Laboratory that can tolerate the recognition errors(14). If it is possible to generate spelling variants when converting printed characters or expanding queries, searching digitized texts will not cause so much retrieval errors. This is the underlying idea to have brought about those techniques. Those are based on the sequential search method developed by Hitachi and implemented in a full-text search system, named Bibliotheca/TS.
Here OCR keeps multiple candidates for ambiguous characters and outputs them as they were. The first technique takes an approach to expand search terms and generates equivalent query strings that should match erroneously recognized texts. Each query term given by a searcher is expanded by using a mechanism to generate spelling variants. As the result, the expanded character strings that will i nclude wrongly recognized characters are matched with digitized data produced by OCR. Suppose a character X has similar characters in shape, such as X1 and X2, then a search term XY is expanded into three terms, XY, X1Y and X2Y.
The second one, on the other hand, takes an approach to deal with multiple candidates as the digitized texts. In this case the result of recognition includes characters erroneously converted. The number of the candidates is controlled by setting a threshold to show the degree of the similarity.
To get rid of those, a technique to integrate visualization and navigation functions, named information outlining is developed at IBM Tokyo Research Laboratory(15). It uses three mechanisms, viewers, mappers, and linkers. Viewers capture and visualize stored data from various aspects to help users to locate information they need. Other two will function to support viewers. Mappers extract a sp ecific subset of data in a collection while Linkers define relationships among data.
Viewers provide four kinds of the view, such as chronological, geographical, folder, and categorical. The prototype system uses newspaper articles, patent disclosures, and images as a collection. In case that newspaper articles are used as the collection, the chronological view shows the number of articles published in the particular periods. The geographical view indicates the number of artic les referring to the particular geographical areas.
Those data are displayed with color in the two dimensional space by using bar charts and push-buttons. Novice users are not familiar with term-based retrieval, and they will submit questions just like "what is new" or "what has been happening recently." The system tries to handle such questions. Thus, the technique could be recognized to work as an interface for those users.
A prototype digital library of "Type 3" was developed by the Telematique International Research Laboratories and was shown at the National Diet Library in 1988. The library named SungWoKung support information seeking behavior of users by offering both book and library metaphors(16). The Laboratories have developed technologies such as image processing and character recognition as well.
If particular roles of texts can be identified automatically such as the purpose, results, and conclusion of a document, it is possible to improve the performance of full-text search by considering the roles. Thus, Sumita and others developed an idea to automatically extract a document structure and generate abstracts(17).
Generally speaking, the application of SGML (Standard Generalized Markup Language) to the commercial arena remains still in the cradle stage. There is, however, a foresighted attempt. The Chemical Society of Japan has changed the publication style of the Bulletin of the Chemical Society of Japan to the digital form based on SGML. The electronic version started in January 1993. Several years of extensive research have been done before the undertaking and new research started to experiment on electronic contribution of papers(18).
Furthermore, the inter-organizational cooperation must be established to do digitization at the operational level. For example, a union catalog that tells "what is already digitized" and "what is to be digitized" should be compiled to avoid meaningless duplication of digitization activities.
Research on human behavior for information seeking and processing is also vital. Digital libraries should reflect human information needs and understand their behavior more than the traditional ones so that they can get high reputation.
We must remember that change has equal opportunity to improve a situation and to make it worse and attempts to transform academic culture evoke tensions and fears(21).
2) Pilot Electronic Library Project.. Information Technology Promotion Agency, Japan. 10p.
3) Librarianship in Japan.. revised ed. Japan Library Association, 1994. p.81-85
4) Catalog of the National Center for Science Information Systems.. 1995. 25p. (in Japanese)
5) Adachi, Jun and Hashizume, Hiromichi. NACSIS Electronic Library System : Its design and implementation. Proceedings of International Symposium on Digital Libraries 1995. University of Library and Information Science, Tsukuba Japan, 1995. p.36-41.
6) Oyama, Keizo. Digital libraries and SGML databases : An ideal and the reality. Digital Libraries., no.5, p.33-43 (1995) (in Japanese)
7) Miyazawa, Akira and Hashizume, Hiromichi. Digital archive for music. Proceedings of International Symposium on Digital Libraries 1995.. University of Library and Information Science, Tsukuba Japan, 1995. p.273-4.
8) Matsumoto, Hiroshi. High speed network for digital libraries. University of Library and Information Science, Tsukuba Japan, 1995. p. 65-72.
9) Nagao, Makoto et a. l. Development of the electronic library "Ariadne"(1) - Issues related to the systems design -, Journal of Information Processing and Management., 38(3), p.191-206 (1995) (in Japanese)
10) Imai, Masakazu et al.. Design of a digital university library : Mandala Library. Proceedingsof International Symposium on Digital Libraries 1995.. University of Library and Information Science, Tsukuba Japan, 1995. p.119-24.
11) Chihara, Kunihiro. Digital library and multimedia. Information Science Symposium 1996.. p. 25-32. (in Japanese)
12) Sugimoto, Shigeo et al. Enhancing usability of network-based library information system - experimental studies of a user interface for OPAC and of a collaboration tool for library services. Proceedings of Digital Libraries '95.. p.115-122.
13) Sakaguchi, Tetsuo et al. A browsing tool for multi-lingual documents for users without multi-lingual fonts. Proceedings of the First ACM International Conference on Digital Libraries.. 1996. p.63-71.
14) Fujisawa, Hiromichi and Marukawa, Katsumi. Full-text search and document recognition of Japanese text. Proceedings of 4th Symposium on Document Analysis and Information Retrieval.. 1995. p.55-80.
15) Morohashi, Masayuki et al.. Information outlining : Filing the gap between visualization and navigation in digital libraries. Proceedings of International Symposium on Digital Libraries 1995.. University of Library and Information Science, Tsukuba Japan, 1995. p.151-8.
16) Sato, Mamoru. Electronic libraries "SungWoKung" which support retrievals of books using presentation of CG pictures of libraries, Information Processing Society of Japan SIG Reports., 91-F1-24, p.1-8 (1991) (in Japanese)
17) Sumita, Kazuoet al. Effective document retrieval for digital library - document structure analysis and automatic abstract generation -, Digital Libraries., no.5, p. 35-41 (1995) (in Japanese)
18) Ishizuka, Hidehiro et al.. Generation of full-text database based on SGML through an electronic contribution by an author - An experiment in the Chemical Society of Japan -, Information Processing Society of Japan SIG Reports., 94-F1-35, p.1-8 (1994) (in Japanese)
19) Naemura, Kenji. Considerations on the significance of and the means for protecting intellectual property rights related to the electronic dissemination of research information, Information Processing Society of Japan SIG Reports., 95-F1-40, p.17-24 (1995) (in Japanese)
20) Nawa, Kotaro. Digital library and copyright law, Digital Libraries., no.4, p. 8-12 (1995) (in Japanese)
21) Peek, Robin P. Scholarly publishing, facing and new frontiers : In Scholarly Publishing - The Electronic Frontier.. MIT Press, 1966. p.14.