University College London (UCL-CS) has been involved with the CORE project since 1988, relying heavily on the work of Bellcore, and using the ACS data. Their CODA Project work, supported by the British Library Research and Development Department, provides facilities similar to the CORE project, but has concentrated on additional areas covering the use of ODA as a distribution medium and the usage of relatively low-bandwidth networks such as the ISDN. This paper discusses the way the database is set up - which involves conversion from a SGML representation into an ODA one, the methods of indexing, the access methods provided, and our user experience. We discuss also the motivation of many of our implementation choices. Because of ACS constraints, the CODA data cannot be made available outside the University of London.
When the CODA project started in 1991, UCL-CS was involved with ESPRIT PODA projects in the use of ODA, while the CORE project was using no standard language for the representation of the text; hence use of ODA was a natural choice for CODA project. Later the ACS textual material became available in SGML form; even so, there are significant advantages in the use of ODA, which are discussed in the paper. For example, ODA is a blind open interchange format for which a number of converters are available - unlike SGML, in which the interchange is dependent on the DTD. Our decision to use the ODA formulation required a SGML -> ODA converter.
Having an on-line database of scientific journals offers many advantages over the conventional paper-based journals. Electronic searching texts for information is much easier than manual; far more productive searching can be undertaken using a computer system. In our environment all the journals are indexed so that, despite the size of the database, searches are very fast. Electronic access provides additional advantages:
At the start of the CODA project, the most sensible device for storing such a large database was an Optical Juke Box (JB); hence a 90 GB HP magneto-optical JB was acquired - to which a high speed storage server, with some 18 GB of disk space is attached as front end. A reverse index of all the document text is held in the disc storage. For the whole of ten years of data this contains about 4 GB. All document text searching is from the disc storage; the retrieval of the documents themselves is from the JB which holds the documents in all forms. To assuage the worries of publisher we have added various forms of integrity control, authentication and audit trails.