As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites

This old website and all of its content will stay on as archive – http://archive.ifla.org

64th IFLA Conference Logo

64th IFLA General Conference
August 16 - August 21, 1998

Code Number: 132-160(WS)-E
Division Number: V.
Professional Group: Serial Publications
Joint Meeting with: Document Delivery & Interlending and the UAP Core Programme
Meeting Number: 160.
Simultaneous Interpretation: No

The UNIverse Project - A European Demonstration which Adds Value to the Virtual Union Catalogue

Suzanne Ward

UNIverse Project Officer
The British Library Document Supply Centre
Boston Spa, UK

Abstract

This paper covers one of the leading projects currently being funded under the European Commission's Telematics for Libraries 4th Framework Programme. The paper will describe the technical background to the project, outlining the way in which we have built on the concept of the virtual union catalogue to overcome the problems traditionally associated with distributed catalogue searching. It will also explore the way in which the virtual union catalogue plays a fundamental role in integrating key library services to provide a sophisticated software solution to users in today's networked information environment.

Paper

PROJECT BACKGROUND

We describe the project as a "Large Scale Demonstrator for Global, Open Distributed Library Systems". This gives you an idea of the scale of the impact we are hoping for. The project has been running since October 1996 and will continue until the end of April 1999. In total, the project is costing somewhere in the region of 4,000,000 ecus.

The project has a Consortium of some 17 partners based in a number of European countries (Denmark, Greece, Ireland, Luxembourg, The Netherlands, Norway, UK). The partners are a mix of software developers and different types of libraries (national, academic, public, special), each contributing their expertise to develop a new type of system which meets technological requirements and user needs. The project is coordinated by Fretwell-Downing Informatics, UK.

Summary of partners:

Main partners: Fretwell Downing; Danish Technical Knowledge Centre; The British Library; Irish Library Council; National Library of Greece; Freshwater Biological Association
Associate partners: Q-Ray, Ex-Libris, Index Data; Technical University of Delft; Technical University Library of Norway; University of Sheffield; University College Dublin; Forbairt, Kyros; Southampton Oceanography Centre

A number of these partners were involved in projects which preceded UNIverse to lay the foundations for the developments we are hoping to build on. IRIS was the first of these projects. It was the first operational Z39.50-based service (and now a commercial system in Ireland). The software enabled you to search library systems simultaneously. A single hit-list was returned to the user and requests could be made via email. In DALI, the achievements of IRIS were built on further so that requesting could be done using the ISO ILL protocol and multimedia documents delivered. EUROPAGATE developed a Z39.50/ISO Search/Retrieve gateway to improve the interoperability of different catalogue systems. It also examined MARC data conversion to accommodate the different characteristics of the servers being searched. UNIverse is taking these achievements further to deliver a sophisticated software solution. The project's primary objectives are to:

produce software which makes distributed library catalogues available as a virtual union catalogue, integrating key library services around that union catalogue;
produce a demonstrator service among a group of some thirty test-site libraries.

VIRTUAL UNION CATALOGUE

The virtual union catalogue lies at the core of the system. What we mean by the "virtual union catalogue" is this - whereas a union catalogue is one which exists in a unified physical format, a virtual union catalogue is one which is only brought together across telecommunication networks. The key features we have incorporated into our virtual union catalogue address many of the problems traditionally associated with distributed catalogue searching:

you can search in parallel multiple physical databases which have a variety of access methods, record syntax, character set and language, presenting the results as if a single database was being searched;
the multiplicity of data sources is hidden from the user and a high quality of service is achieved both in terms of performance and data quality through record deduplication and merging;
through the use of open distributed processing techniques the architecture has potentially unlimited scalability while maintaining high performance.

We will look now in detail at how we have added value to the virtual union catalogue. It is in some of these areas in particular that the project is making a significant contribution to the Research & Development outcomes of the Libraries Programme.

Searching parallel, multiple physical databases which have a variety of access methods. This is achieved using the Z39.50 (or ISO 23950) protocol. The protocol standardises the way in which networked computers can communicate with each other so that they can search and retrieve data held on different systems. One major advantage of the system is that the user only needs to learn the interface and query language of their local system in order to be able to search any number of remote Z-enabled databases.
In UNIverse we provide Z39.50 searching through a WWW interface. This combines the accessibility of the Web with the standard and structured approach provided by the protocol. The WWW gateway gives the user with any Web browser access to any Z39.50 target and the results will be displayed in a consistent manner whichever database is selected.
The JAVA programming language is being considered because its application makes Web pages more interactive. JAVA is written in and supports Unicode (a universal character set). Since UNIverse needs to address catalogues in a wide range of languages, this use of Unicode is essential as it opens up the possibility of building a system that can support multiple languages.
Record syntax conversion
As the project hoped to develop cross-database searching, we had to consider how to handle retrieving records from different databases which would be in different record formats. It was decided that some sort of gateway was needed which would enable various record syntaxes to be converted to a common format. Once in this common format, the records could more easily undergo further processes. Initially UNIMARC was considered as the conversion record format. However because of the various interpretations of MARC standards, and UNIMARC itself, in different countries, an alternative has had to be found.
We decided on GRS-1 (General Record Syntax) which has a more abstract sequence of tagged elements than MARC (which uses different attributes and tags in different kinds of MARC records). GRS is also more general than MARC and so has application both within and outside of the libraries domain. The UNIverse software converts the various incoming records twice. As they come in to the Collator they are converted to GRS. This means that the deduplication process can be carried out on the records while they are all in the same format (and independent of the original and desired record syntax). After this the records are converted into the record syntax preferred by the searcher's software.
Deduplication
In order to maintain the quality of service when you are searching across multiple databases at one time, it is important to reduce the effect of duplicate database content. This can be achieved through a deduplication process. In UNIverse our key requirements from deduplication have been:
- accuracy - e.g. that two records should not be represented as duplicates when they are not;
- thoroughness/completeness - record matching may be carried out against a number of criteria and if a deduplication service is to be useful then it must check a significant number of these possibilities;
- speed - in an interactive service the deduplication process must not noticeably degrade response time.
In UNIverse we have decided to cluster records that are potential duplicates according to a matching algorithm which is either predefined in the system or can be generated spontaneously. Records identified as duplicates are not deleted. This approach overcomes the problem of consistent naming of a record-object, where current naming schemes like ISBN, DOI, URN are domain-specific and sometimes not even unique. The advantage of this clustering approach is that a more aggressive matching algorithm may be used since no data are lost if duplicates are incorrectly identified.
. Multi-lingual thesaurus
Thesauri can be used as information retrieval tools, making it possible for the user to browse through thesauri in order to find a term to be used in a query for bibliographic records. A multi-lingual thesaurus allows users to search a combination of databases using their own language. These thesauri are available electronically and are accessible over Z39.50. The UNIverse software translates the query term into the equivalent terms in another language.
The Explain service
This is a further important function which will help maintain the quality of service in UNIverse. While parallel multiple database searching is a possibility, users should be encouraged to search databases they know something about (rather than submit searches to inappropriate databases, which both wastes the user's time and uses network space unnecessarily). The Explain service provides this information. It is made up of two parts:
- The Profile Database
- Query Adaptation
The Profile Database is part of the UNIverse server and holds data relevant to remote targets. The aim is to specify the data schema that will hold the data for the remote targets. Its content includes:
- the appropriate target - the target that handles the information that is sought;
- the appropriate service - a description of the services provided by a target, their current status etc;
- the description of the contents of the target database;
- the queryable attributes of the target - list of attributes, terms and restrictions the target can handle. This is relevant to the Query Adaptation facility described below;
- quality of service information;
- administration information - contact name, down time etc.
The Query Adaptation facility will be called prior to issuing a query to a remote Z39.50 target. The facility makes use of the Profile database to map the user's query onto an 'optimal' query that can be evaluated at each target. The utility has a simple call interface accepting the following parameters:
- the requested query
- the target name
- a measure of how semantically near a converted query must be to the original.
The algorithm that converts the requested query into the nearest query type that the target can evaluate needs to be designed and implemented. The philosophy that will govern this design is the establishment of criteria that will allow the closeness of the requested query to the resulting query to be measured. These criteria should not only give a final indication of the closeness, but they should also guide the intermediate decisions about the substitution process.

AFTER THE VIRTUAL UNION CATALOGUE

The way in which UNIverse overcomes many of the problems associated with multiple database searching has been demonstrated above. As is increasingly the case these days, however, once a user has retrieved a hit-list and located an item, they want to move quickly to seeing the item for themselves. With this in mind, UNIverse has integrated a requesting service into the software, so that users can either place a request for an Interlibrary Loan or Document Delivery themselves, or have their request submitted to their library and then authorised by library staff before it is sent to a supplier.

A number of standardised request formats are built in to the software e.g. email, ISO ILL protocol, ART (for BLDSC requesting). A variety of delivery formats are catered for as requests can be fulfilled in a number of ways, ranging from postal and fax to electronic. The format of the item (whether it is an ILL or non-returnable item) depends on what each supplier can support. In implementing the UNIverse software librarians need to decide whether they want to mediate requests made by their users or let users place requests direct with the supplier. The software permits different authorisation levels to be set for different groups of users.

Also from the hit-list you can move into the Record Supply/Collaborative cataloguing service. Some UNIverse service providers are willing to sell bibliographic records, others would like to share records. The software caters for both of these scenarios.

DEMONSTRATING THE SOFTWARE

We are about to enter a demonstration phase with approximately 30 libraries across Europe. The libraries have been organised into five groups. There are three national groups (Irish, Greek and UK) and two subject-based groups (technological and environmental) which bring together libraries from different countries with a common subject interest . In each user group there are several libraries acting as service providers to the other members within their group. These services include making their catalogues/databases available for searching and retrieving, offering document delivery/ILL facilities and record supply/collaborative cataloguing.

In effect, we will have five examples of virtual union catalogues - from the five user groups. By the nature of the technology we are using, all these services are available to any library within the project regardless of their user group (thus creating a sixth example of a virtual union catalogue), but for the sake of speed and performance we are encouraging users to make use of the services available "locally" (i.e. within their group) before going outside of their group.

An evaluation programme is being developed which will put the achievements of the project into perspective. Not only will the operation and performance of the software be assessed. The responses of librarians' and end-users' to the development of this new integrated library service will also be analysed. The focus will be on their reactions to using the services and whether this type of approach has had any impact on the way in which their library operates.

CONCLUSION

By the time the UNIverse project ends, it will have addressed many of the issues currently facing those looking to modernise and improve library services. It is never a static picture though and there are already follow-up projects looking to build on UNIverse to address more of the issues concerned with remote access and user authentication. With UNIverse however, a robust structure should be in place for building virtual union catalogues and reaping the benefits of this type of access.

For more information, look at the project WWW site: http://www.fdgroup.co.uk/research/universe

64th IFLA General Conference August 16 - August 21, 1998