IFLA

As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites

This old website and all of its content will stay on as archive – http://archive.ifla.org

IFLANET home - International Federation of Library Associations and InstitutionsAnnual ConferenceSearchContacts

62nd IFLA General Conference - Conference Proceedings - August 25-31, 1996

Current State of Research and Development on Digital Libraries in Japan

Kimio Hosono,
Professor, School of Library and Information Science, Keio University, Tokyo, Japan


ABSTRACT

This paper describes the characteristics and current state of Research and Development on digital libraries in Japan. Government related agencies such as National Diet Library and Information Technology Promotion Agency, National Center for Science Information Systems, and B-ISDN Business Chance & Culture Creation are active in developing digital libraries. Having identified three kinds of digit al libraries the paper describes the activities carried out at those institutions in terms of their aims and products. Some universities such as Nara Institute of Science Technology and University of Library and Information Science as well as for-profit private companies embark on the projects from several points of view. Then, the paper describes their undertakings focussing on fundamental ideas , technologies, and techniques such as browsers, full-text search methods, visualization, SGML applications, document structures, user interfaces, and others. Non-technical aspects indispensable for the development are also briefly mentioned.


PAPER

1. Introduction

Looking back to the history of our intellectual activities, we saw that a variety of technologies have made tremendous influence on our information handling and library services. The influence in the past, however, was not so dominant comparing with the one by recent computer and telecommunications technologies. Information technology becomes indispensable for recent society as it diffuses rap idly among us. It dominates not only our daily life but also intellectual life. It makes substantial influence on the industrial structure. As the result, there seems to happen a so-called paradigm shift.

A move to digital libraries from traditional ones is an outstanding example and several libraries are required to deal properly with it. Facing this kind of shift, traditional libraries struggle to show their identity and raison d'etre to the changing world. On the other hand, some scholars and researchers in universities and private companies are actively participating in the research and dev elopment (R&D) related to digital libraries. Hopefully, the results from the R&D will help libraries, when fulfilling intellectual duties assigned to them.

Since digitization of printed materials is quite important to build digital libraries, a variety of digitization projects have been carried on in the world. An attempt by the National Diet Library is an example. As the result of digitizing materials, images or full-texts are generated depending on the technology used for the purpose.

A big merit to develop digital libraries is due to the marked grade up in retrieval ability, as compared with the one in so-called paper libraries. Full-text search facilities, therefore, are indispensable in the digital library environment and it is very valuable to produce full-text forms from the point of searching information.

Some considers information technology as a panacea for the recovery from depression in trade, since it has a possibility to give a big business chance to the industry. In particular, what is necessary to develop digital libraries is considered as promising in terms of economic effects. Digital library projects require a variety of new technology and the accomplishments by the R&D may produce a new big market.

Taking account of these economic effects, R&D activities on digital libraries have been carried out under strong initiatives by the government agencies in Japan. The Ministry of International Trade Industry has supported financially that kind of project via the Information Technology Promotion Agency. Some other Ministries also embark on the similar projects. Considering such characteristics, this paper describes the state of the R&D related to the digital libraries in Japan.

2. Type of Digital Libraries

The concept of a digital library is difficult to specify and its definition is not fixed yet. Besides the term "digital library", several other terms are used to express the concept such as "electronic library"," virtual library"," hyper library", and "library without walls". This shows broadness of the connotation of the concept. It signifies not only a building but functions carried out in it.

Digital libraries could be categorized as follows though there will be other ones.

1) Type 1: Digitized libraries

They are libraries in which not only the procedures are computer-based but also library resources are digitized. They could be placed on the same line with traditional libraries except their collection is in digital form. Information services based on such materials should be sophisticatedly integrated. Introduction of new information technology will completely change existing works and r eplace almost all jobs so far done by staff at the digital libraries, in an extreme case. Users can get easy access to the materials from remote places at anytime they want. It is not necessary for users to visit the libraries. This kind of scenario may, however, not happen for the time being, especially at large scale libraries.

2) Type 2: Information resource networks

An information network like the Internet gives us a considerable amount of chance to obtain desired information. It strikingly widens the geographical possibility to get access to organizations that provide information. It has respectable effects on our information seeking behavior. We navigate in the network to search for information as if we are in a library. In this sense, this kind of network will function as a library though it only provides digitized materials. It could be called "library without walls".

The network is, however, a metaphorical library since it does not have the idea of collection. Set of information offered from the network is only the disordered aggregation of those provided by the participating members of the network in many cases. A typical example is the Internet. It cannot have a unique collection policy of its own.

3) Type 3: Virtual libraries

We are accustomed to such information handling procedures done in libraries as browsing books and journals, scanning card catalogs, berrypicking1) at bookshelves. Those are fundamental behavior for performing intellectual activities. Books are familiar information media for us. Thickness and size of books give us information about the volume of the materials. It is, therefore, quite reason able to apply those procedures and features to the mechanisms for searching desired information.

There are computer-based information systems that simulate information seeking behavior in libraries and mechanisms that adopt a book metaphor for visualizing information for users. In those systems we can get access to the stored information as if we are in a virtual library. Those are also metaphorical libraries and could be considered as user interfaces.

3. Research and Development in Government-Related Organizations

3.1 Information Technology Promotion Agency, Japan

The Information Technology Promotion Agency, which is a special authorized public corporation affiliated to the Ministry of International Trade and Industry, has been planning the "Pilot Electronic Library System Project". It is a joint project with the National Diet Library. The purpose is to conduct various technological experiments and researches necessary for setting up the digital library (2). The Project consists of two components, a national union catalog network project and a digital library demonstration project. Actual construction of the pilot digital library system started in April 1994. The Center for Information Infrastructure in which those experiments are carried out was opened in the Fujisawa campus of Keio University in September 1995.

The national union catalog network project is aiming at the groundwork for the building of a future digital library network. Twenty-seven public libraries and the National Diet Library are participating in the project as of March 1996. Bibliographic information from participating libraries is integrated into the union catalog database located in the Center for Information Infrastructure. It is to be searched by participating libraries through the network (using ISDN lines). Next step is to develop effective methods for identifying and integrating bibliographic information and for producing index files in digital library environment. A more user-friendly interface and new interlibrary loan system are also examined.

On the other hand, the digital library demonstration project aims to unveil technical issues and problems that may be encountered when developing and operating digital libraries in the future. The ultimate goal of the project will be the "Type 1" digital library. A variety of materials are digitized such as rare books, books published in the Meiji Era, journal articles on politics and economic s, and others. Most of them were chosen from the collection in the National Diet Library. Search systems are developed to retrieve above mentioned digitized data. They enable both bibliographic and full-text searching. A menu-driven method is also prepared.

At the same time, the Information Technology Promotion Agency has established a special committee, composed of scholars, researchers, and librarians. The purpose is to discuss approaches, problems, and issues related to the development and operation of digital libraries and put the results to practical use.

3.2 NACSIS-ELS

The National Center for Science Information Systems (NACSIS) of the Ministry of Education, Science and Culture, established in April 1986, has been playing an important role to develop, foster, and promote an effective and efficient flow of information and library materials among libraries as well as end-users3). It distributes catalog data of monographs and serials for member academic librar ies to facilitate efficient cataloging practices in their own libraries and quick interlibrary loans among them. NACSIS also acts as a database producer and vendor. An Information retrieval Service is offered based on about 50 databases, most of which are domestic ones, as of 1995(4). To promote information exchange among and Information retrieval Service to end-users NACSIS operates the Science Information Network that covers academic institutions nationally. Its role is not limited to the mentioned. A variety of R&D projects are also focal for the activities carried out at NACSIS.

NACSIS-ELS (NACSIS Electronic Library System) is a prototype of next-generation information service system that integrates and replaces conventional online information retrieval systems(5). The major objective of this project is to investigate feasibility of the basic idea underlying NACSIS services and make the system functions more practical and sustainable. It is already in trial service sin ce February 1995 in a distributed processing environment.

NACSIS-ELS provides image and bibliographic databases at present. Image databases consist of raster images of scientific and scholarly journals as well as conference proceedings. Those hold all pages of articles including a cover, table of contents, and back cover. Publications of twenty-two Japanese academic societies are used for the image capturing. Books are not considered as candidates fo r electronic materials stored in the databases. Now, NACSIS-ELS databases contain page images not full-text forms. However, the capability to manipulate such forms is under development and is to be embedded into NACSIS-ELS(6).

Bibliographic databases are prepared for users to locate secondary information about the images. They contain records comprising titles, author names, affiliations and abstracts of scientific papers derived from NACSIS Information Retrieval Service (NACSIS-IR). NACSIS-ELS also stores cataloging information used for NACSIS-CAT, the cataloging service for university libraries began in 1984.

Users can look for the journal articles by conventional Boolean search of bibliographic databases as well as browsing images, and the page images requested by users are transmitted to users' workstation from the image databases through the Internet. In addition, the system enables users to receive high-quality printouts from local printers. NACSIS-ELS offers usual online search functions foun d in NACSIS-IR, which has been available since 1987. It also enables users to browse pages and print those if needed.

The evaluation of the performance of NACSIS-ELS was started in December 1994, by monitoring comments from users of the system. The operational (commercial) service is scheduled to start in April 1997. The Advisory Board for the Electronic Library System was established to make the shift from the trial stage to the operational one smooth.

NACSIS-ELS functions mainly as a document delivery system. It might be classified into the "Type 2" library.

NACSIS also participates in a project to develop digital archive for music (7). The project is going on as a part of the IDEAMA (International Digital ElectroAcoustic Music Archive) project, of which purpose is to preserve early tape music in this century. Surveys concerning bibliographic information and location of the tapes have been carried out.

3.3 BBCC

The B-ISDN Business Chance & Culture Creation (BBCC) is the nonprofit organization composed of more than 190 member organizations including the Ministry of Posts and Telecommunications, local governments, private companies, and others(8). It was established in Kansai Science City in December 1992 to promote the utilization of and to develop new business opportunities of Broadband Integrated Se rvices Digital Network (B-ISDN).

A variety of multimedia application projects have been carried out by making use of extremely high-level information technology offered by BBCC. The digital library project is an example. Those will contribute substantially to the development of Kansai Branch of National Diet Library. The library is scheduled to be built in the City by the year of 2003 and is expected to be the largest digital library in Japan.

Leading companies in the computer and telecommunications industry such as Fujitsu, Hitachi, NTT, and Toshiba take part in the projects, respectively. In particular, Ariadne system has made considerable influence on the other R&D related to digital libraries. It was developed by an ad-hoc group named Electronic Library Research Group organized by scholars from several universities and Fujitsu. Ariadne has such outstanding characteristics as follows(8),(9).

  1. Texts, sounds, still images, and moving images are stored in the database.
  2. High speed data transmission of multimedia data is available by using B-ISDN network so that it is easy to get access to the library from remote places.
  3. A variety of search techniques are provided, such as conventional keyword searching for bibliographic and full-text data, searching by using term hierarchy, and searching for natural language queries.
  4. Several functions that enable users to make efficient use of the library are provided, such as concurrent reading more than one document, referring to dictionaries, marking on a page, and automatic keyword translation from Japanese to English and vice versa.

Emphasis of the system is on the user-friendliness. Ariadne will be categorized in the "Type 1" library.

4. Research and Development in Universities

4.1 Nara Institute of Science and Technology (NAIST)

NAIST, established in 1991, is a very young and small university. Thus, its collection of books and other library materials is small in volume. The Institute consists of two graduate schools, Information Science and Biological Sciences. Both the feature of the field that the schools cover and the size of collection have made the idea to build a first digital library in Japan, named Mandala Lib rary. It is planned that the library has following functions and characteristics(10), besides the basic ones that university libraries usually have in common.
  1. The target of digitization is books, journals, microfilms, technical reports, reprints, videos and others.
  2. Those are digitized as images in most case. Some are transformed to full-text by using Optical Character Recognition (OCR).
  3. A full-text search function is available for the texts obtained by OCR.
  4. Users can get access to the library from all workstations in NAIST.
  5. Mosaic and Netscape of WWW client software are used for the user interface.
  6. The library provides users in NAIST with SDI services of newly acquired materials.
  7. Information produced in NAIST is distributed even to the outside users.
The Library is now in the prototype stage and is expected to be realized in April 1996. It may be a "Type 1" library. It will be also classified in the "Type 2" library if its networking aspect is emphasized. The library has been developed by scholars in NAIST. Thus, the whole process is quite research oriented. In fact, the Institute is carrying on several R&D projects related to digital librari es(11). An example is a research aiming to understand the structure of document images.

4.2 The University of Library and Information Science (ULIS)

ULIS has played very important roles to promote R&D on digital libraries by organizing an international symposium and several workshops. The first workshop was held in August 1994. Since then, similar ones were organized five times. An international symposium was held in August 1995. The outputs from those meetings are enormous. Those not only contributed to the exchange of information and ide as but also brought forth positive atmosphere in the library world to build digital libraries.

ULIS itself has carried out several research projects. One of them is the development of a bookshelf-like browser, named SOPAC1(2). SOPAC is an OPAC using a bookshelf metaphor. When getting access to the system the physical image of each book is displayed with its size and thickness. The image is used as button to request bibliographic information about the book. SOPAC has been developed on th e operational OPAC system in the university.

A variety of collaboration works are going on in libraries. Users consult with librarians to get information they need. We exchange ideas and information with others to perform intellectual activities in libraries. Taking account of this phenomenon, a collaboration support system was developed and validity of the system was examined from a survey result (12).

Library collection is essentially multilingual in the sense that it includes materials written in foreign languages. Thus, the facility is required to deal with multilingual documents in digital libraries. For example, a digital library is desirable to have functions to input, search, and display texts written in different character sets. ULIS has developed a browsing tool for multilingual doc uments stored in the WWW environment(13).

5. Research and Development in For-profit Private Companies

Several private companies have enthusiastically embarked on the development of digital libraries under the belief that those will bring a good business chance to them. They are concerned with such developments in several different ways. Building a digital library, as mentioned in the section of BBCC, is a typical example. They also investigate new technologies and techniques that process and/o r visualize digitized data more effectively and efficiently.

5.1 Full-text search methods

OCR is a technology to produce digitized full-texts from paper based materials. This kind of transformation is useful since keyword search techniques are available to search for any part of them. Digitization by OCR, however, has a serious drawback since it may entail character recognition errors and produces misspelled words. The recognition errors cause wrong search outputs and decrease recal l and precision performance of digital libraries. In case of materials written in Japanese the problem is much worse than what may happen in, for example, English.

Japanese character set includes three kinds of subsets, Kanji, Katakana and Hiragana. It is said to have more than eight thousand characters together. In addition, Japanese writing is ambiguous in two reasons. One comes from the transliteration of foreign words. Many English words are transliterated into Katakana, but their spelling is often not unique. It is due to the difficulty to represent English sounds by Katakana because of a big gap in the pronunciation system between two languages. Many Kanjis have variants in their shapes. Moreover some characters in different subsets have similar shapes and not easy to distinguish the difference by vision.

These are major causes to impede correct recognition of texts in digitized materials. Thus, it is needed to develop a full-text search method that can tolerate such errors, especially in Japan. In this sense it is also meaningful to explore approximate or fuzzy matching techniques in the OCR environment.

Two search techniques were developed at Hitachi Central Research Laboratory that can tolerate the recognition errors(14). If it is possible to generate spelling variants when converting printed characters or expanding queries, searching digitized texts will not cause so much retrieval errors. This is the underlying idea to have brought about those techniques. Those are based on the sequential search method developed by Hitachi and implemented in a full-text search system, named Bibliotheca/TS.

Here OCR keeps multiple candidates for ambiguous characters and outputs them as they were. The first technique takes an approach to expand search terms and generates equivalent query strings that should match erroneously recognized texts. Each query term given by a searcher is expanded by using a mechanism to generate spelling variants. As the result, the expanded character strings that will i nclude wrongly recognized characters are matched with digitized data produced by OCR. Suppose a character X has similar characters in shape, such as X1 and X2, then a search term XY is expanded into three terms, XY, X1Y and X2Y.

The second one, on the other hand, takes an approach to deal with multiple candidates as the digitized texts. In this case the result of recognition includes characters erroneously converted. The number of the candidates is controlled by setting a threshold to show the degree of the similarity.

5.2 Visualization and navigation function

Both term-based retrieval and straightforward sequential searching are not necessarily adequate for the large scale collection, in particular in the digital library environment by the following reasons.

  1. Those are not designed for collections of heterogeneous type of materials such as texts, images, movies, and sounds.
  2. Users often do not know what they really want. Thus, an initial query is not specific enough in many cases.
  3. There are no effective methods for searchers to locate desired information from massive collection.
  4. There are many domain-dependent representations of data that no single browser can fully represent all the details of the data(15).

To get rid of those, a technique to integrate visualization and navigation functions, named information outlining is developed at IBM Tokyo Research Laboratory(15). It uses three mechanisms, viewers, mappers, and linkers. Viewers capture and visualize stored data from various aspects to help users to locate information they need. Other two will function to support viewers. Mappers extract a sp ecific subset of data in a collection while Linkers define relationships among data.

Viewers provide four kinds of the view, such as chronological, geographical, folder, and categorical. The prototype system uses newspaper articles, patent disclosures, and images as a collection. In case that newspaper articles are used as the collection, the chronological view shows the number of articles published in the particular periods. The geographical view indicates the number of artic les referring to the particular geographical areas.

Those data are displayed with color in the two dimensional space by using bar charts and push-buttons. Novice users are not familiar with term-based retrieval, and they will submit questions just like "what is new" or "what has been happening recently." The system tries to handle such questions. Thus, the technique could be recognized to work as an interface for those users.

5.3 Others

R&D projects related to digital libraries are not limited to the above mentioned ones. There are some others.

A prototype digital library of "Type 3" was developed by the Telematique International Research Laboratories and was shown at the National Diet Library in 1988. The library named SungWoKung support information seeking behavior of users by offering both book and library metaphors(16). The Laboratories have developed technologies such as image processing and character recognition as well.

If particular roles of texts can be identified automatically such as the purpose, results, and conclusion of a document, it is possible to improve the performance of full-text search by considering the roles. Thus, Sumita and others developed an idea to automatically extract a document structure and generate abstracts(17).

Generally speaking, the application of SGML (Standard Generalized Markup Language) to the commercial arena remains still in the cradle stage. There is, however, a foresighted attempt. The Chemical Society of Japan has changed the publication style of the Bulletin of the Chemical Society of Japan to the digital form based on SGML. The electronic version started in January 1993. Several years of extensive research have been done before the undertaking and new research started to experiment on electronic contribution of papers(18).

6. Concluding remarks

R&D projects related to digital libraries are so far slanted to the technology oriented aspects. Those phenomena are not peculiar to Japan. Sound development of digital libraries of the "Types 1", however, depends much on non-technological aspects. Copyright issues are a most irritating one among them. Those are discussed at several occasions(19),(20).

Furthermore, the inter-organizational cooperation must be established to do digitization at the operational level. For example, a union catalog that tells "what is already digitized" and "what is to be digitized" should be compiled to avoid meaningless duplication of digitization activities.

Research on human behavior for information seeking and processing is also vital. Digital libraries should reflect human information needs and understand their behavior more than the traditional ones so that they can get high reputation.

We must remember that change has equal opportunity to improve a situation and to make it worse and attempts to transform academic culture evoke tensions and fears(21).

References

1) Bates, Marcia J. The design of browsing and berrypicking techniques for the online search interface, Online Review., 13(5), p.407-24 (1989)

2) Pilot Electronic Library Project.. Information Technology Promotion Agency, Japan. 10p.

3) Librarianship in Japan.. revised ed. Japan Library Association, 1994. p.81-85

4) Catalog of the National Center for Science Information Systems.. 1995. 25p. (in Japanese)

5) Adachi, Jun and Hashizume, Hiromichi. NACSIS Electronic Library System : Its design and implementation. Proceedings of International Symposium on Digital Libraries 1995. University of Library and Information Science, Tsukuba Japan, 1995. p.36-41.

6) Oyama, Keizo. Digital libraries and SGML databases : An ideal and the reality. Digital Libraries., no.5, p.33-43 (1995) (in Japanese)

7) Miyazawa, Akira and Hashizume, Hiromichi. Digital archive for music. Proceedings of International Symposium on Digital Libraries 1995.. University of Library and Information Science, Tsukuba Japan, 1995. p.273-4.

8) Matsumoto, Hiroshi. High speed network for digital libraries. University of Library and Information Science, Tsukuba Japan, 1995. p. 65-72.

9) Nagao, Makoto et a. l. Development of the electronic library "Ariadne"(1) - Issues related to the systems design -, Journal of Information Processing and Management., 38(3), p.191-206 (1995) (in Japanese)

10) Imai, Masakazu et al.. Design of a digital university library : Mandala Library. Proceedingsof International Symposium on Digital Libraries 1995.. University of Library and Information Science, Tsukuba Japan, 1995. p.119-24.

11) Chihara, Kunihiro. Digital library and multimedia. Information Science Symposium 1996.. p. 25-32. (in Japanese)

12) Sugimoto, Shigeo et al. Enhancing usability of network-based library information system - experimental studies of a user interface for OPAC and of a collaboration tool for library services. Proceedings of Digital Libraries '95.. p.115-122.

13) Sakaguchi, Tetsuo et al. A browsing tool for multi-lingual documents for users without multi-lingual fonts. Proceedings of the First ACM International Conference on Digital Libraries.. 1996. p.63-71.

14) Fujisawa, Hiromichi and Marukawa, Katsumi. Full-text search and document recognition of Japanese text. Proceedings of 4th Symposium on Document Analysis and Information Retrieval.. 1995. p.55-80.

15) Morohashi, Masayuki et al.. Information outlining : Filing the gap between visualization and navigation in digital libraries. Proceedings of International Symposium on Digital Libraries 1995.. University of Library and Information Science, Tsukuba Japan, 1995. p.151-8.

16) Sato, Mamoru. Electronic libraries "SungWoKung" which support retrievals of books using presentation of CG pictures of libraries, Information Processing Society of Japan SIG Reports., 91-F1-24, p.1-8 (1991) (in Japanese)

17) Sumita, Kazuoet al. Effective document retrieval for digital library - document structure analysis and automatic abstract generation -, Digital Libraries., no.5, p. 35-41 (1995) (in Japanese)

18) Ishizuka, Hidehiro et al.. Generation of full-text database based on SGML through an electronic contribution by an author - An experiment in the Chemical Society of Japan -, Information Processing Society of Japan SIG Reports., 94-F1-35, p.1-8 (1994) (in Japanese)

19) Naemura, Kenji. Considerations on the significance of and the means for protecting intellectual property rights related to the electronic dissemination of research information, Information Processing Society of Japan SIG Reports., 95-F1-40, p.17-24 (1995) (in Japanese)

20) Nawa, Kotaro. Digital library and copyright law, Digital Libraries., no.4, p. 8-12 (1995) (in Japanese)

21) Peek, Robin P. Scholarly publishing, facing and new frontiers : In Scholarly Publishing - The Electronic Frontier.. MIT Press, 1966. p.14.