Last update at http://inet.nttam.com : Thu May 11 14:47:11 1995

Experiences with On-line access to Chemical Journals@Presenation to Inet '95 Conference

On-line access to Chem. J. Proc. INET 95 P.T.Kirstein

 

Experiences with On-line access to Chemical Journals

 
Peter Kirstein <kirstein@cs.ucl.ac.uk>

Goli Montasser-Kohsari <gmontass@cs.ucl.ac.uk>

Abstract

Contents:

1. Overview of the Project

1. Overview of the Project

The American Chemical Society (ACS), Bellcore, Chemical Abstracts Service (CAS), Cornell University and OCLC collaborated in the CORE project [1] to deliver electronic information from primary publications to end-user chemists. As part of this experiment Bellcore scanned most of the pages of ACS journals published between 1989 and 1994 - some several times. They processed the typesetting tapes from the same journal issues into a Standardised General Mark-up Language (SGML) [2] format so that it could be indexed and/or typeset. By the end of the project, they were providing electronic access to a large electronic database containing approximately 60,000 articles, representing 400,000 pages of journal articles of the American Chemical Society (ACS) for the period 1989-94. The data is held at the Cornell U Mann Library for access by Cornell chemists.

In the Computer Science department of University College London (UCL-CS), we have been involved with members of the CORE project since 1988. This activity relies heavily on the work of Bellcore, and uses the data provided by the ACS. It is supported by the British Library Research and Development Department (BLRDD). While we have provided facilities similar to the CORE project, we have also been interested in applications for the data which the CORE project has not been focused towards or in a position to support. The UCL activity is referred to as the C-ODA project, and also covers areas such as applications of ISO standards, and usage of relatively low-bandwidth networks such as the ISDN. The paper discusses the way the database is set up - which involves conversion from a SGML representation into the Open Document Architecture (ODA) [3], the methods of indexing, the access methods provided, and our user experience. We discuss also the motivation of many of our implementation choices, and the lessons to be learned from our experiences.

The ACS has consented to allow the data to be used for these projects, with certain restrictions on distribution - mainly that the data will not be available outside Cornell U for the CORE project, and outside the University of London for C-ODA. There are differences between the CORE and C-ODA projects [4] but these are not discussed in this paper.

This project started in 1991, when UCL-CS was heavily involved with ESPRIT PODA projects (e.g. [5]) in the use of ODA. At that time, the CORE project was using no standard language for the representation of the text, so that ODA was a natural choice for the C-ODA project. Later the ACS textual material became available in SGML form. The relative advantages of the two forms are discussed in Section 5.

2. The Publishing and Access Chain

2. The Publishing and Access Chain



2.1 The Chain Itself

2.1 The Chain Itself

While work with the ACS databases as processed by Bellcore the main activity in the project, we obtained a good insight onto how the publishing chain should proceed for this type of activity. The fact that it did not always do so only made our task harder. The conventional publishing and access chain for journal articles in science and engineering is as follows:

a)  Journals are submitted in a number of forms by the authors.
The chosen format by authors seems to be predominantly TeX or LaTeX and Postscript, but this is not always the case. Many authors like to use their favourite word processing system, and would prefer WORD or WORDPERFECT

b)  The article is registered by the publisher.

The article is now in the publishers system. A lengthy pre-printing chain is then initiated. This should involve minimal reprocessing by the publishers, since it is not yet clear that the article will be published

c)  The article is submitted to reviewers.

d)  The reviews are returned to the publishers.  Parts may be
    passed on to the authors also for subsequent action in revising
    the article.

e)  The author provides a revised manuscript, in the same form as in <
    (a).

Once the cycle of (e), (c) and (d) has been completed, the article is ready for printing.

f)  The articles can be translated into a proprietary mark-up language
    (with a specific  DTD) for typesetting, and printing.

Up to the end of 1994, the ACS has produced its journals by sticking the diagrams onto the Masters before printing. This meant that the typesetting tapes did not include the diagrams; they did include equations and tables. For a full electronic form, the figures must also be provided electronically.

In most cases, the article is then returned to the author for proof-reading. This may well involve hand-written revisions, which must be included by the publisher. This cycle is avoided if the submission is in camera-ready form. The article is now ready for combining with other articles.

g)  The article is combined with other articles, and an issue is
    prepared.  This includes provision of a Table of Contents 
 

h)  For normal publishing, the article is then printed.  For an
    electronic publishing chain, instead of being printed, the data is
    converted into a form which is suitable as a distribution format,
    and then sent to the 'electronic library' organisations 

 
i)  The issue, and its component articles, are registered in secondary
    publications.  Some of these publications include only author,
    article name and reference;  others also include abstract and
    keywords.

The form suitable for electronic distribution may be quite different from that desired by the publishers for the article preparation phase of (a) - (f). This is discussed further in Section 5. Once the article has been published, it is necessary to enter the reader access phase.

j)  The researcher searches for a specific article, or for an
    article about the subject.  The search may be by looking at the
    Table of Contents of the journal itself, or by querying  one of the
    secondary publications of (i).  It may even be possible to search
    electronically the journal article itself.

k)  The search suggests that a particular article, or part of an
    article, be consulted.  This article may then be requested
    (electronically or by some other means).

l)  The article is delivered to the researcher - again either in
    hard-copy form or electronically.

After perusing the article, the researcher is either satisfied, or continues the cycle of (j) - (l). 2.2 The Different Parties Interests

2.2 The Different Parties Interests

An organisation like the ACS is particularly concerned with the automation of the process in the preparation of the article; steps (b) - (g). The end-user is most concerned with the steps (j) - (l). The actual publishing itself is step (h); clearly all aspects of this concern the publisher. For publishing via paper or CD-ROM, it is then necessarily to continue the whole distribution chain. For some publishers, e.g. the ACS, the secondary publications of (i) are also produced; in the case of the ACS this is Chemical Abstracts. For electronic publishing, it may be that the electronic document store is actually provided by another Value Added Service, though many publishers would also like to provide this service.

In the CORE and C-ODA projects, there was no question of steps (a) - (e); the articles had already been published. This meant that steps (g) - (i) had been completed in some form. However, the form of the original publication had been on paper. It was thus necessary to completely redo step (f), to provide a new form of the complete issues in electronic form. The Table of Contents of (g) was part of the new step (f). For some versions of the activity, OCLC or Chemical Abstracts provided the secondary publications (i); for others, we did searching on the original articles, so that a formal secondary publication was not required.

In the CODA and CORE projects, the main role of the ACS was to provide the typesetting tapes and the journals themselves - and to permit it to be used for the Pilots. Normally the provision of the pre-printed form of the data (step f) would be done by the publishers; in these projects this was done mainly by Bellcore from the combination of scanning the original journals, extracting the material not on the typesetting tapes, and combining it with the typesetting information. The document stores themselves were held at the user premises - Cornell U and UCL. This in itself may be different in future such projects, if the publishers provided the Value Added Services.. Both OCLC and Chemical Abstracts (a division of the ACS) provided some of the secondary services; the Chemical Abstracts data was provided, and the OCLC Newton search engine was used in the CORE project by Cornell U - though not by UCL. 3. The Different Data Sets

3. The Different Data Sets



3.1 The Source Data

3.1 The Source Data.

The ACS has been preserving the typesetting tapes of all their journals for the past 10 years, and for up to 17 years for some particular journals. The typesetting tapes contain all the textual information of the journals, including highlighting, equations, and tables, and also a large amount of contextual information. This contextual information includes what we may describe as Document Management Attributes (DMAs), and also some of the structural information of the articles. The historical typesetting tapes do not contain, however, any of the graphical images, nor any layout or presentation information. Bellcore derived the graphic images by scanning the microfilm copies of the published journals and using custom OCR techniques to identify page components such as figures, tables and schemas (captionless figures) since no other record of the images is available [6]. Moreover the formats of even the text material has changed in the last 15 years; for this reason, while some of the text data does indeed go back to 1980, that before 1989 is incomplete - only 3500 of the articles are prior to that date.

The initial format of the typesetting tapes was not SGML, but a proprietary scheme encoded in an IBM database format. This was converted into SGML by Bellcore as part of the CORE project. They passed the SGML versions of the documents on to us (along with the scanned image components), with the permission of the ACS. This is in a special DTD used only for this data, but based on the American Association of Physics (AAP) DTD. We gratefully acknowledge this assistance from Bellcore and the ACS. The size of the full SGML database for all the journals of the ACS from 1988 - 1994 is about 6 GB.

In practice, the tables and equations were not translated from the typesetting tapes. Instead the graphics, tables and equations were derived from the scanned page images in bit-mapped form. When this process had been completed, there were two data sets - the one representing the text in SGML, and the one representing figures, tables and equations. The extracted graphics activity was quite error-prone; a 95% success rate at finding figures was considered reasonably good. There has been a lot of problems in extracting the graphics, and the processing had to be done several times. We have received mainly the 1989-94 data; the size of the database for the extracted graphics for this period is 2 GB.

The CORE project did not work only with the text/image form of the data; they were also interested in providing the full image data to their users. The size of that database is critically dependent on the resolution of the scanning; at the 300 dpi eventually adopted, the size of this database for the period 1991-94 period is 55 GB.

To the CORE project, The ACS Chemical Abstracts data (CAS) was provided, together with the Newton Search Engine from OCLC. OCLC also indexed the textual data for use with its Newton Search Engine, and provided the indexed database to Cornell U.

3.2 Databases for Electronic Access

3.2 Databases for Electronic Access

There are many forms of database which could be provided for this type of data. Many organisations provide Abstract Services - often searchable electronically; when these are used the actual journal articles can be requested (often electronically) and delivered in paper form or even by facsimile. Some organisations provide facilities for full text search. The ACS has been is providing this for some of its journals; the journal articles are then still delivered in paper or facsimile form. Both the CORE and CODA projects wished to provide electronically not only the facilities for search, but also the documents themselves - in text and image form. For this reason, it was necessary to provide three forms of database:

 
I.   A database of the text data - in a form suitable for electronic
     search;
II. The text portion of the journal articles - in a form suitable for user access;
III. Any part of the database representing the original journal articles not contained in (II) - but linked to it, and in a form for user access.

The format of the typesetting tapes need not be the same as (I) or even II.

3.3 The Derived Databases

3.3 The Derived Databases

UCL has produced several databases from the source data. We decided to provide full text search using the text searching package WAIS [7]. WAIS is the Wide Area Information Server tool developed and placed in the public domain by Thinking Machines Corp, and now being developed further as a commercial product by WAIS Inc. WAIS provides tools for full-text indexing of different types of data, and allowing that index to be queried by a remote machine. It is a classic client-server system with a back-end (the WAIS server) which searches an index based upon queries provided by a front end (WAISQ - WAIS Question). The WAIS server can provide both lists of documents with their 'scores' according to some query, and whole documents when a user selects a document from a list. Xwaisq is an X-based question program which is provided with the Public WAIS distribution. From the SGML form of the data, we have derived a field-indexed version of the text data which can be searched very fast. The 3 GB of SGML data becomes 5 GB in the fully indexed WAIS database..

The text portion of the journal articles themselves were provided in several forms - many discussed in [4]. Our main user activity was with the whole database in the form of the Open Document Architecture (ODA) [3] form. The comparisons between ODA and SGML are given in [4] and Section 5; of some importance is that ODA has a unitary data content which includes text and image, so that the image portion with extracted graphics could be encompassed in the same database.

WAIS has the capability to access documents via the WWW using WAISGATE [8]. To exercise this feature, we derived a SGML database of the text with a simple SGML-HTML converter. The SGML database was provided only for the 1993 data.

The extracted image data tapes mentioned in Section 3.1, provided by Bellcore was used for the tables, equations and figures. In fact the tables and equations were available on the original ACS typesetting tapes, but none of the parties had the effort to derive the equations and tables into the SGML - though this is what should really have been done.

Finally we also have the image data for 1988-94. However, we have made available to our users only .the articles from 1991-94 in the full image form. Partly this is because we are not particularly interested in pursuing this approach - because we want to provide also remote access (via lower speed networks), and do not like the amount of storage this form of presentation requires. Accessing the image data requires a different form of reader [9],[10].

4. Activities in the C-ODA Project

4. Activities in the C-ODA Project

The C-ODA project had two main strands - replicating the work undertaken by Bellcore and its partners in the USA, and also extending the work into using more efficient storage, lower data transmission, and more generalised document searching tools The starting point for all is the work of Michael Lesk at Bellcore who built a number of tools to convert the original ACS data from their typesetting form into the SGML format [2]. This data is augmented with scanned images of the journals and diagrams to form a rich enough base of information to build the database upon - again an activity undertaken by Lesk [6]..

It was our intention to provide a document database which could be queried in a convenient manner, and allowed the users to browse their results on-screen using a number of different tools. We have provided facilities for a number of end-user chemists to access the database at various locations within the University of London - both at UCL with Local Area Network (LAN) access, and via the University of London Wide Area facilities (WAN) which include the Internet and the ISDN. At the beginning a portion of the data was provided originally in the same form as in the CORE project; but now, the database is supplemented by transforming the whole data which we have into the ODA/ODIF format (9 GB), and making it available to the UCL chemists in that form. At present we providing a number of interfaces to access that data, including WAIS, SuperBook and Xpixlook. We are also evaluating how SuperBook, can be extended to give intelligent Hypertext guidance to users [11].

We have set up the document database of Section 3 so that it can be queried in a convenient manner, and allows the user to browse the results on-screen using a number of different tools. We have provided facilities for end-user chemists to access the database at various locations within UCL. A portion of the data was provided originally in the same form as in the CORE project; now, the database is supplemented by transforming the whole data which we have into the ODA/ODIF format, and making it available to the University of London (UL) chemists in that form. At present we are using a large set of the 1988-1994 collection of ACS journals, we provide a number of interfaces to access that data, including several using WAIS [7] and a tool developed at Bellcore called Xpixlook [9]. We are also evaluating the use of a Hypertext Browser called SuperBooks [10] also by Bellcore.

At the time we started the C-ODA project, due to the size of the dataset, the most sensible device for storing the documents was an Optical Juke Box (JB) - hence we acquired a 90GB HP magneto-optical JB for this purpose. With the more rapid reduction of the cost of magnetic storage of the magneto-optical, this may no longer be the case. We have developed a JB interface library which virtualises the JB as a single large storage device, so that the application programs do not need to track the locations of files among the discs in the JB - to which a high speed storage server, with some 18 GB of disk space is attached as front end. A reverse index of all the document text is held in the disc storage. For the whole of ten years of data this will contain about 6.1 GB. All searching of document contents is done from the disc storage; the retrieval of the documents themselves is from the JB which holds the documents in all forms.

The work we have undertaken in this project is as follows:

5. Document Distribution Formats

5. Document Distribution Formats



5.1 Document Distribution Needs

5.1 Document Distribution Needs

A distribution format should have the following properties as a minimum:

Electronic Journal (EJ) Delivery involves a publisher generating documents and distributing the electronic form to organisations which will pass these on to the users. For the sake of argument we will call these organisations 'electronic libraries', even though they may not be what are currently recognised as libraries. The reader of these documents will require them in one of two ways. Either they will be receiving a new issue of the EJ, in which case they will wish to inspect the table of contents, browse the articles, and/or read a number of articles in-depth. Alternatively, they will wish to search against a collection of journals, using some kind of query mechanism, and then browse or read the articles that were found. However, it is also possible that a reader may wish to browse old journals, or search in a new issue, and the user should be able to do both.

When viewing the EJ, the reader will expect that the articles be clear and contain formatting suitable for supporting the document structure. Moreover, all readers and screens are not equal, and so some method of changing the size of the documents would be advantageous.

Having an on-line database of scientific journals offers many advantages over the conventional paper-based journals; many of these advantages fall into the areas of search and access. Electronic searching texts for information is much easier than manual; far more productive searching can be undertaken using a computer system. In our environment all the journals are indexed so that, despite the size of the database, searches are very fast. Electronic access provides additional advantages:

5.2 The Use of SGML and ODA

5.2 The Use of SGML and ODA

SGML is a system of specifying intellectual mark-up for documents. The point of intellectual mark-up is that one denotes what an element represents, rather than what it looks like. The mark-up should describe a documents structure and other attributes rather than specify processing that is to be performed on it, as descriptive Mark-up need be done only once and will suffice for all future processing. For example, one would mark the title of an article with the tag <title>, rather than say 'Centred, Bold, 16pt Times Roman'. The description of <title> is then contained in a ``Document Type Definition`` or ``DTD``.

ODA supports this functionality using a mechanism called a 'Document Class' but also allows presentation information to be bound to the document elements. ODA has been designed primarily as an interchange format for documents. ODA is supported by commercial word-processor manufacturers, and converters are available between ODA and commercial word-processor formats.

SGML uses an ASCII-based representation which has certain in-built limitations and advantages. It is not possible to embed arbitrary binary data within an SGML document, since elements are terminated by a special character sequence - and clearly that sequence is possible in arbitrary binary data. It is possible to circumvent this using escape sequences, but there is no defined way to do this within the ISO SGML standard. The accepted method is to refer to external entities for such items. ODA uses a binary representation expressed as ASN.1 streams; as such it is not subject to such restrictions.

One of the reasons SGML is well-used is because it is easy to generate and transmit the ASCII representation. On receiving an SGML file, it is possible to scrutinise it effectively using just a standard text editor. Another is that it is so flexible that it is possible to express the styles of the publisher by providing a specific DTD or set of DTDs.

ODA and SGML are suitable format for document distribution for the following reasons:

Our general impression is that SGML is an excellent authoring format, due to its more sophisticated data-modelling potential, and ODA a distribution format. The concept of authoring in SGML and distribution in ODA brings together the best of both worlds. It is comparatively straightforward to move between the two forms; the really important step is to have the distribution mechanism using one such format, rather than something like Postscript. While Postscript is suitable for printing, it is impossible to transform back into a revisable form.

6. Storing Data

6. Storing Data





Goli Montasser-Kohsari
Tue May 9 15:37:08 BST 1995
6.1 The Use of an Optical JukeBox

6.1 The Use of an Optical JukeBox

We have installed a large document store, consisting of a Hewlett Packard optical JB with 4 Sony drives, a Sun SparcStation (Sparc-5 with 96 MB of primary store) as a dedicated server, and 18 GB of Magnetic storage. The main storage consists of 144 magnetic optical platters each with 600 MB of data; this allows 90 GB of rewritable storage. Access to arbitrary data is slow - 15 seconds. However it is possible to stage the data into the disc storage.

At the time of writing the paper UCL had received, but not yet put into operation, integrated software for storing the data in the JB; some recent JB software allows an application running on a workstation to access transparently any disk in an optical JB via standard Unix functions. It treats the whole JB as an integrated disc store - while still giving us some control on what to cache in the magnetic store. We are still investigating the advantages of that type of software.

We store all the text data on magnetic storage. This allows content searching to be done relatively fast. The available textual data requires approximately 6 GB of storage. In practice we are just moving this function to a 4-processor Sparcserver 1000 - which is not the JB controller. This will allow simultaneous support of many researchers simultaneously.

It is an important aspect of the C-ODA project that the JB uses magneto-optical rewriteable storage. The CORE project used Write Once Read Many (WORM) storage; as a result, CORE was very concerned about getting the data right before it is put onto the JB. Since we have found that it requires many passes through the whole data in practise, this has had the impact of making all their data manipulation a very long-winded process; CORE has usually worked for a longer time with smaller databases on disc store, and been very hesitant to commit to using the JB.

6.2 Database Sizes and Access Times

6.2 Database Sizes and Access Times

We now have considerable experience on the size of the data, and on the access times [4]. We have the text components of the database for most if 1988-1994, and the bitmap form for much of 1989-1994. Typical numbers of recent articles in the database are 15-16K pa for recent years; over the same period we have typically 8K articles in bit-map form also.

The full data for 1989-94, including the SGML and the extracted images, requires some 50 GB of image; this which we have loaded onto the JB. From the above it is clear that the actual data management of these large collections, when they pass through so many stages of processing, is difficult.

We treat each year as a separate database, and the search for any particular word combination is done on each database. Thus, for example, searching for any single word (e.g. Robb), would take less than a second on each database; in one such search, 847 documents were found. It is also possible to do a field search on the same data; if the same database was searched in a field sense (e.g. author = Robb), then the search time was little changed, but the number of documents retrieved was more manageable and precise - only 23 documents. Finally, the current version of C-WAIS has limited facilities for parallel work in a multiprocessor system; it can operate on several databases in parallel - one to each processor, but not with more than one processor per database. Thus our multiprocessor WAIS server, which has four processors, can operate significantly faster than a single processor one.

7. User Interfaces

7. User Interfaces

Having an on-line database of scientific journals offers many advantages over the conventional paper-based journals; and many of these advantages fall into the areas of search and access. Much of the UCL-CS interest in the project is in providing different means of search and access, and gauging the comparative value of the different methods. Electronic searching of texts for information is much easier and more productive than manual. We support full-text retrieval - every single word in the document is indexed so that the searches go beyond any keywords that the author/classifier has deemed appropriate. Again, search responses are virtually instantaneous - with the limited number of users we currently support.

Electronic access provides additional advantages. Access is non-exclusive - any number of people can access the same journal simultaneously. Access is distributed - it is not necessary to be in close proximity to the database in order to access its information. Access can be integrated with the users' facilities, allowing extraction of information for other purposes.

Most search requests are based upon some type of word-based search, the system looking for occurrences of the words in its document base. Searches may be restricted to certain kinds of data in the documents such as titles, author names, or abstracts - or may be applied to the whole of the text in the document. One of the interfaces (WAIS) will support relevance feedback - this mechanism allows the user to mark one or more documents in the database as being relevant to the query and the search algorithms will favour similar/related documents in subsequent searches. Algebraic text searching allows greater control over the text queries if more than one word is to be searched for in the document database; it allows the user to specify rules about how those documents are to be searched. Say a search is looking for the words ``petroleum`` and ``refinement``. The number of documents containing both words could be quite high, although there is no guarantee that a document containing both words may be about the refinement of petroleum - the occurrences could have been on separate pages. However, if the search were to look for ``petroleum`` and ``refinement`` in the same paragraph, then one would expect a higher ``hit-rate`` of appropriate documents. Some of the interfaces will allow some degree of algebraic searching.

Browsing is another type of searching - just looking through documents for contents of interest - much as one would skim a book. For browsing to be effective, it is essential that page update be quick.

At UCL-CS we are particularly interested in widening the scope of the project to include remote access to the document database; this involves relatively low-bandwidth commun-ications - for example, Basic-Rate ISDN lines operating at 64 Kbps. At this speed a typical page in bitmap form, occupying 100 KB, takes at least 12 seconds to deliver. However delivery of the document form is nearer 1 second per page, or perhaps three or four seconds if images are also transmitted. Our technology provides access to the database outside the high-bandwidth LAN at UCL - though the ACS constraints do not allow us to offer such a service outside the University of London. We enforce our constraints by the use of security techniques (cf Section 9). We expect to introduce at a later stage other document stores, which have less constraints on their usage than the current ACS ones.

A fuller discussion of the various User interfaces available in the C-ODA and CORE projects is given in [4], [9].

8. User Experiences

8. User Experiences

The following highlight the immediate concerns of the users.

We are now planning to make access available via the WWW; this will certainly ease the problems of chemists getting started; there are far better facilities available for WWW support than other means.

9. Database Security Features

9. Database Security Features

Restricting document access is important in two ways. First, publishers are going to make document access available only if it can be constrained and charged. C-WAIS can restrict access only to workstations with specific IP numbers; this allows already restricted access only to workstations in the University of London if this is desired. Second, we have added a public key system to the data access mechanism; this allows non-repudiable access to be achieved - which is important for later billing. Another use of such techniques, is that it is straightforward to add a digital signature to each article, which indicates the source of the document (e.g. the UCL-CS datastore), and the copyright holder (e.g. ACS); this could even be augmented by the person who accessed the store. Such additions could be used in several ways:

The implementation of this technology has been described in [13]; it depends on the OSISEC [14] security package developed at UCL which implements the services described within the X.509 Authentication Framework, viz.: Data Confidentiality, Data Integrity, Origin Authenticity, and Non-Repudiation of Data Origin.



Goli Montasser-Kohsari
Tue May 9 15:37:08 BST 1995
10. Conclusions

10. Conclusions

Database Construction

User Access

Document Formats

User Interest and Facilities

Acknowledgements

We acknowledge the help given to the project by a number of People. David Gold did much of the work described here while he was leading the project; Mike Lesk (Bellcore) has been a major driving force both to the CORE and C-ODA projects; Lorrin Garson (ACS) has kindly allowed us to use the ACS data and discussed their new production process; Chemistry users have been important in the trials; Peter Williams (Sterling Software and UCL) and Sammy Sameshima (now Hitachi Software) have been instrumental in the security.

References ...

References< ...

[1] M. Lesk, ``The CORE Electronic Chemistry Library``, Proc. ACM SIG ] Information Retrieval Conference, Chicago, 1991.

[2] ISO, Information processing -- Text and office systems --Standard Generalised Mark-up Language (SGML),ISO, IS 8879, 1986.

[3] ISO, Office Document Architecture (ODA) and Interchange Format, ISO, IS 8613, 1988.

[4] P. Kirstein, and A. Montaser-Kohsari, The C-ODA project - experience and tools, to be published in Comp. J.

[5] S. Golkar, P. Kirstein and A. Montasser-Kohsari, ODA activities at University College London, Comp. Netw. and ISDN Syst., 21, 187-196, 1991.

[6] M. Lesk, (1990), Images in document retrieval: extraction of figures from pages. Proc. Anglo-French-US Conf. Image Storage in Libraries and Museums. York, June 25-26, 1990.

[7] B. Kahle, Wide Area Information Server Concepts, Tech. Rep., TM Limited, 1989..

[8] --,WAIS Server and WAIS Workstation for UNIX, Administrators Manual Release 2.0, WAIS Inc., Menlo Park, CA, USA..

[9] M. Lesk,, Electronic Chemical Journals, 66, 14, 747A-755A., 1994.

[10] J. Remde, L. Gomez, and T. Landauer, SuperBook: an automatic tool for information exploration -Hypertext?, Proc. Hypertext '87, Chapel Hill, N.C., pp 175-188, 1987.

[11] M. Hu. ``An Intelligent Hypertext System``, Ph.D thesis, University College London, UK, 1994.

[12] G. Montasser-Kohsari and P. Kirstein, On-Line Access to Multimedia Documents, BLRDD R&D Report 6139, London, 1994.

[13] J. Sameshima and P. Kirstein, Secure Document Interchange - a Secure User Agent, to be published in Proc JENC 95, Terena, 1995.

[14] P. Kirstein and P. Williams, Preparing to Pilot OSI Authentication and Security Services on a Medium-scale, Proc.4th JENC , pp 50-54, 1993.

The Authors ...

The Authors ...

Peter Kirstein is a Professor in the Department of Computer Science at University College London. He has been leading research projects in computer communications, computer networks, telematic services and related activities for over 20 years. Amongst recent projects which he has led are the ESPRIT PODA project on ODA, VALUE PASSWORD project on security, and a BLRDD one on accessing electronic documents.

Goli Montasser-Kohsari is a Senior Research Fellow in the Department of Computer Science at University College London. She has a PhD from Newcastle U in Computer Science. She was responsible for the UCL-CS activity on PODA-SAX, and had the technical leadership of the C-ODA project. Goli has been responsible for all the recent UCL-CS activity on ODA implementation, ODA-SGML conversion, and C-ODA piloting.

About this document ...

About this document ...

This document was generated using the LaTeX2HTML translator Version 95.1 (Fri Jan 20 1995) Copyright © 1993, 1994, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html inetw1.tex.

The translation was initiated by Goli Montasser-Kohsari on Tue May 9 15:37:08 BST 1995


Goli Montasser-Kohsari
Tue May 9 15:37:08 BST 1995