61st IFLA General Conference - Conference Proceedings - August 20-25, 1995
Access to newspaper collections and content in a time of change
GEOFF SMITH, British Library Newspaper Library, London, UK
Libraries which hold newspaper collections and provide services based on them, and users wishing access to newspapers and their contents have been experiencing times of change for some years and will
continue to do so in future. New technologies have been and continue to be developed which offer opportunities for improvements in access, while traditional methods of collecting and preservation re
main in widespread use and are an essential underpinning for improvements to access in the future. In the course of this paper I shall be reviewing the strengths and weaknesses of both traditional a
nd computerised approaches to newspaper access. I shall assess the current position in relation to newspaper access, and will give my views on how this might change in the foreseeable future.
Hard copy and microfilm
Historically libraries have made newspapers available for longer term research use by collecting them as they are published, storing them, often after boxing or binding into volumes, cataloguing them
, usually by place of publication and title, and making them available for consultation in their reading rooms. Since the 1950s the microfilming of newspaper titles has offered an economic and spac
e efficient alternative to this for preserving the full content of newspaper titles in image form. This could be either be by inhouse filming of newspapers held by the library, or through the purchas
e of film produced by commercial organisations or by other libraries. Microfilming to appropriate preservation standards remains the method most widely accepted by libraries for the long term preser
vation of the full content of newspapers and for making duplicate sets of newspapers more widely available. Its importance and practicality can be seen through the continuing efforts and achievements
of co-operative programmes of preservation microfilming such as the United States Newspaper Program and NEWSPLAN in Great Britain and Ireland. Microfilm is still the medium most widely used by libr
aries for providing access to runs of newspapers, although it is worth pointing out that many users have a strong preference for use of the original hard copy where this is available to them
From the point of view of access, however, the limitations and weaknesses of these traditional approaches are well known to the librarians involved and to the research users of newspapers. For anyone
collecting and preserving newspapers in original hard copy form the first essential requirement is for a great deal of storage space. The British Library Newspaper Library's 600,000 bound volumes a
nd parcels of newspapers occupy some 18 linear miles (29 linear kilometres) of shelving. The Library is in one of its periodic crises of storage space, and is having to look at remote storage of low
use material as a way of coping until further development of the Colindale site is possible. The material is bulky, heavy and prone to rapid deterioration through both wear and tear and through the i
n-built tendency to self-destruction of the acidic wood-pulp based paper on which most newspapers have been printed since the mid-nineteenth century.
Microfilm offers an answer to the space problem, as well as being light and easy to store and retrieve. However from the researcher's point of view both hard copy and microfilm have significant drawb
acks in terms of access and use. Where major archive collections have been built up, whether by legal deposit or large scale collecting over an extended period, they are likely to be centralised and
often for reference only, so that the individual researcher must travel to the archive. The cost and scale of systematic collection mean that local collections or the researcher's own institution are
likely to have only limited, perhaps specialist or localised, subsets of the total newspaper output. In some countries newspapers on microfilm may be available through inter-library loan, though thi
s is generally not the case in Great Britain. Identification of the location of newspaper holdings may be problematic where no union catalogue has been set up and made available. The United States N
ewspaper Program has made the systematic recording of bibliographic details and locations of newspapers on the shared OCLC database the key activity of its individual state projects along with the pr
eservation on microfilm of the newspapers recorded. In Britain there is no union catalogue of newspapers, though such a facility is highly desirable, and should be seen as an important element is th
e provision of systematic access on a national level. I hope that the conversion of the British Library Newspaper Library's catalogue into machine readable form so that it can be mounted on the Briti
sh Library's online public access catalogue in 1996 will be the first step towards to the creation of a union catalogue of newspapers in Great Britain.
The biggest problem for users of hard copy and microfilm newspaper collections is the difficulty of subject access: unless the newspaper is one of the few to which a subject index is published or unl
ess an index to a titles or group of titles has been created locally, then finding articles on a particular topic is very time consuming unless one has a precise knowledge of the dates on which relev
ant articles were likely to have been published. Despite these difficulties of access, in Britain and, I am sure, elsewhere the demand for use of newspapers as research material continues to grow. Th
e range of uses of the material varies widely, from academic research at all levels in area such as history, social and media studies, through to research to obtain source material for TV, publishin
g or print journalism, to evidence for legal cases, material for business use, and personal research on family history, local history and personal interest research in varied areas of popular culture
such as sport, fashion and entertainment.
Where subject access is the primary requirement of a collection the traditional alternative to collecting runs of complete newspapers has been the creation of press cuttings libraries. This has bee
n the approach adopted commonly by newspapers themselves within their libraries. It involves the physical cutting out of articles from the newspapers in which they appeared and their filing by subjec
t heading or subject classification. The articles are then seen in the context of other articles on the same subject, but out of the context of the newspaper issue in which they originally appeared.
The process of creating and maintaining a cuttings library is labour intensive because of the physical cutting involved, the manual collation by subject and the filing involved. In many commercial o
rganisations which rely on press cuttings there is pressure to reduce costs, either by the reduction of staff costs or of the number of titles covered, and it is such organisations which have been in
the forefront of applying computerised methods to the problems of accessing newspaper information.
On-line information retrieval access
Though the traditional approaches described above are still widespread today, they are now accompanied by options for improved access offered by new technologies and automated systems. The process of
change began in the 1980's with the advent of on-line database systems storing and making available the full text of newspapers articles. Such systems are powerful, providing fast access to multiple
titles and many years of data. They are generally updated daily, so that data is complete up to the previous day's issues. They are wide ranging in coverage. FT Profile the main British on-line new
spaper host carries the text of some 20 British newspapers; the American systems, NEXIS and DIALOG a large number of American and other titles. Powerful though they are these systems too have drawba
cks. They are expensive to use so that few public sector organisations are able to offer free end user access to them. The systems are primarily used by business and research organisations. Even thou
gh they cover multiple titles they are not comprehensive; it is primarily national broadsheet titles or specialist titles which are represented. The popular press, in which there is much research int
erest, is typically not represented on these on-line systems. Local newspapers in Britain are also usually not available on-line. Nor is the date coverage comprehensive. The on-line files typically b
egin in the mid 1980s, following the introduction of the use of computerised systems for the production of newspapers, since the data in machine readable form which is used for electronic on-line acc
ess is a by-product of these computerised production systems. Finally, what is available on line is not the complete newspaper but rather the text of selected articles from it. Excluded may be items
for which the newspaper concerned does not hold copyright, eg press agency material, signed articles by individual columnists, as well as the photographs, advertisements, cartoons, crosswords and mu
ch other material which give the text articles their context and which give individual character to the titles concerned. The user does not see the page on which the article appeared, so that the si
ze of the headline, the prominence given to the particular article, the accompanying photographs are all lost. However for the user whose primary requirement is for fast up to date remote access by s
ubject to major titles from recent years, and for whom cost of use is not a limiting factor then many the problems of access have already been solved.
Since the beginning of the 1990s libraries which have had a problem with the high variable costs of the use of on-line systems have had a fixed price alternative which can allow them to provide end u
ser information retrieval searching at no additional cost once the subscription and equipment costs are paid. This is through the use of CD-ROMs which store and provide access to the text content of
individual newspaper titles, or which make available newspaper indexes or abstracts which would previously only have been available in print or on-line form. Many newspaper titles have been published
in the UK and United States in this form, and some European titles are also available: "Le Monde", "Frankfurter Allgemeine Zeitung", "Corriere della Sere" and, I am sure, others round the world with
which I am not familiar.
CD-ROM systems have some of the power of on-line systems - in particular the ability to search with text retrieval software any of data elements present, from article text through headline, by-line,
data of publication and in some cases subject heading. The limitations of CD-ROM systems at present are also those of the on-line systems: only recent years are available; only a limited number of ti
tles is available, and the content is incomplete compared to what is in the original newspaper. The updating of disks is less frequent than with on-line services, with disks typically being produced
on a quarterly basis. There are additional problems of each title or group of titles being published separately and, usually each year of data appearing on a separate disk. This quickly leads to pro
blems of disk management within the library, with the need to consider the use (and cost) of networking and jukeboxes to share multiple disks between a number of users.
Where a library has the hard copy or microfilm of the newspapers concerned, then CD-ROM is a good medium for holding and providing access to index and abstract information, particularly for covering
multiple years and titles where the hard copy equivalent would be split across many different volumes. Examples of such useful products include the American "Newspaper Abstracts", "British Newspap
er Index", and "Palmer's Index to 'The Times', 1785-1905".
Both on-line and CD-ROM systems have been available for a number of years, and are mature and widely used technologies. We are now beginning to see other technical developments which will affect and,
in time, improve access to newspapers in the future.
Digitisation from original newspapers or from microfilm
There has recently been much interest in the potential of digitising library materials, that is creating electronic images of the pages of a document through scanning either the original item or a m
icrofilm copy of it. If optical character recognition is then applied to the image it can in theory generate ASCII text which can then be indexed automatically by text retrieval software. From the p
oint of view of the traditional archive and of the user, the advantages of electronic storage and display of newspaper data in page image form, combined with text retrieval access, would be many: the
y would include the great reduction in physical space required for electronic storage compared to microfilm or hard copy; the ability of multiple users to access the same material at the same time
and at high speed; remote access would overcome the limitations of centralised reference collections; keyword subject access could become available to previously unindexed material and unlock the ric
hes of the collection content. At present, as far as large scale newspaper collections are concerned, the above remains a dream rather than an achievable reality. Some projects have already been ca
rried out to digitise newspaper pages and the results of some these made available via the Internet. An interesting example is the Amercian "Valley of the Shadow" project, which has included digitisa
tion of Virginia and Pennsylvania newspapers from the period of the American Civil War.
The British Library's own experience of scanning from microfilm is that the digitisation of newspaper pages from microfilm is possible and achievable, but that at this stage it is slow, labour intens
ive and expensive. The throughput rate is crucially dependent on the quality of the microfilm and of the organisation of the material on the film. The hardware and software systems for the digitisati
on of library material and suitable for use with newspapers are still in an early stage of development. They can only improve, and improvement is certainly required in terms of speed, ease of set-up,
optimisation software, and in the automatic cleaning and enhancement of images. Character recognition software has not yet shown itself to be capable of recognising text from older newspapers. Desp
ite their advances in recent years further progress is still needed in terms of power, image resolution, storage capabilities, character recognition, telecommunications and ease of use, and in their
relative costs to overcome existing limitations and to make the large scale conversion of newspaper collections to digital form an affordable and achievable reality.
Newspaper electronic archiving
Many of the limits of digitisation can be overcome when the material concerned originates in electronic form. Much interest is being shown within the British newspaper industry in electronic archivin
g systems able to store and make use of all of the information used in the electronic production of the newspapers. Instead of storing only the text of articles for use in text retrieval systems, suc
h modern archive systems can also store photographs, graphic information, colour details and the page makeup control information which is used to create the individual newspaper page. Such systems a
re therefore able to use the stored text to allow the power of text searching but combine this with the ability to display photographs and graphics and even the image of the page on which the articl
e appeared. Because the page is only recreated at the display / output stage there is much less of a storage overhead compared to storing a full bit-mapped page image. Since the resolution of the di
splayed or printed page is determined by the display or output device used, whether in hard copy, an on-line display, or from CD-ROM, it is possible to use such systems to 'republish' or reproduce t
he original material to the same quaility as the original. How and what publicly accessible services will be based on such systems is not yet clear. CD-ROM versions of individual titles seem likely,
possibly starting with popular tabloid titles for which text only versions have never been a meaningful option. Whether there will centralised multi-title archive databases equivalent to FT Profile
or NEXIS or whether access will be on a distributed basis to individual newspaper or publishing group titles and systems remains to be seen, as does whether the systems will allow free or only commer
cial charged access.
A growing number of newspaper related sources is becoming available via the Internet. Newspapers such as the "Daily Telegraph" in England and the "Irish Times" in Ireland have mounted material allowi
ng access to articles from the current day's paper as well as in some cases articles from the back files of the last one or two years. Similar access via the Internet is available to material from a
number of Amercian newspapers. The publishers often take the opportunity to make available further information about their titles and groups and to present the data in ways which are more interesting
visually that in traditional on-line systems. It seems certain that more newspaper publishers will explore the potential of the Internet to allow users to gain direct access to their material in the
future, although how long such access will remain free remains to be seen.
The Internet also offers librarians very powerful capabilities to assist their own users and those remote from them in locating and accessing relevant material and provides a mechanism for making wid
ely available information about their own collections and services. The power of World Wide Web to provide hypertext linkages for rapid access to and movement between related sources offers a challen
ge to librarians to provide the navigational tools to help users in this. In the field of newspaper information and access an excellent example of what can be achieved is the World Wide Web home page
of the Library of Congress's Newspaper and Current Periodicals Room.
What then is the current position in relation to newspaper access and how is this likely to change in the foreseeable future. Those whose research needs can be met by access to current and recent yea
rs of major newspapers from developed countries, and the libraries providing services to such researchers, already have a choice of options available, from access to the newspapers themselves to micr
ofilm surrogates, on-line full text or CD ROM versions. Their choice will widen with the availability of more titles made available electronically, for example popular tabloid newspapers in electroni
c 'facsimile' form, and in the amount of newspaper material available via the Internet. Their choice will be limited primarily by budgets and by the technology available to them.
For researchers requiring access to other material, to pre-1980 newspapers, to most local newspapers, and to newspapers from countries whose newspaper production systems are less technically advanc
ed, the options are fewer and the pace of change will be less swift. It is my view that the large scale, systematic conversion of hard copy or microfilm newspaper collections into digital form with a
utomatically generated full text indexing is still some way from being achievable technically and economically, even though many of the technical building blocks do now exist and further improvements
are certain. For this material, access to hard-copy originals, or, more frequently, to microfilm sets of newspapers will remain the commonly used approach. Individually and co-operatively, librarie
s must continue to collect newspapers on a systematic basis and must ensure the preservation of the material in their collections for the future through programmes of archival microfilming. Even if t
he benefits of electronic access to the primary content of newspapers are not yet achievable on a large scale for this material there is still valuable work to be done in improving secondary access t
ools such as catalogues, indexes and user guides and in making use of new technologies such as CD-ROM and Internet access for the dissemination of these.
From the point of view of the library with newspaper collections and of the research user of newspaper material, change is in progress and will continue. Our challenge is to understand the process an
d nature of change, to seek to influence and manage it, and to embrace change where it offers real and affordable benefits to us and our users. At the same time we must keep sight of the strengths an
d value of our traditional approaches and should continue with them where they are still the most appropriate and effective options.