Peter B. BOYCE <email@example.com>
American Astronomical Society
The Web makes it possible for everyone to put their material where everyone can see it, usurping, as it were, the traditional job of the publisher, that of distributing information. But, other tasks of the traditional publisher are not being taken care of by most people putting information on the Web. Self-publishing on the Web can be seductively rapid and promotes the exchange of current information without regard to physical location. A well-organized and seamlessly interlinked, distributed online information system provides benefits to the users that go far beyond the mere availability of single articles. As the system developed for astronomy demonstrates, such a system can revolutionize research and study capabilities. But, without adherence to open standards, provision for easy maintenance, a good measure of quality control, and production in a robust format, it is impossible to weld the individual articles into a working system. The products of individual Web authors, and even many Web-publishing organizations, fall far short of what is required to provide a well-linked, functioning, and long-lived information system.
The capabilities of the Web are enormously seductive. We can prepare a paper or other work in HTML (Hypertext Markup Language) and, literally, reach the eyeballs of a million people around the world. Seeing our work on the Web gives most of us a rush that exceeds the sight of our book featured in the window of the local bookstore. It is quite enjoyable to plug into the Internet in some far-away place and show off our latest work to friends or colleagues. But, once we get over the initial thrill of seeing ourselves on the screen, we should be asking ourselves some sober questions about Web publishing. Why are we doing it? What purpose are we trying to achieve? How long do we expect our work to be read? How do we expect that potential interested readers will find our work? More importantly, how can we place our own work within a logical collection of material on the same or similar subject so that the readers can put it in context?
Let me at once define the limits of this paper. I will be discussing scholarly publishing. I am not discussing trade publications, entertainment, or personal pages (except insofar as they provide a home for scholarly material one may produce). For scholars, the Web provides wonderful opportunities to interact with colleagues, to collect and share data, and to make their work available to a broad audience of peers, colleagues, and students, as well as to the public at large. Particularly in this scholarly arena, the worth of a work will be enhanced if it is located (in the logical, not the geographical sense) with other works on the same subject. Where it can be found by colleagues working on the same topic, and can, through references and comments of future scholars, be incorporated into the edifice of scholarly knowledge.
The great power of the Web lies in being able to search over a wide variety of disparate sources, finding, extracting, collating, and combining different material to generate knew knowledge. The question is how to make this possible. The days of having to physically acquire a copy of everything of interest are gone. It should not be necessary to develop and maintain single, monolithic collections of material any more. The Web provides the power to distribute material among a number of places because all the material should be searchable and recoverable, no matter at which site it exists. There are a few examples of such information resources that include electronic journals, databases, and bibliographic services that are changing forever the methods working with the data. One such example is the field of astronomy where the American Astronomical Society has been publishing their journals in electronic form, starting four years ago (Boyce et al., 1997). The developments in astronomy are instructive in showing the potential of effective electronic delivery of information. The results are amazing, and the feedback from the community is instructive.
First, let us consider the system of scholarly information dissemination as it had come to exist before the advent of the World Wide Web. In many disciplines, particularly in the science, technology, and medicine (STM) area, the body of knowledge within a discipline is carried in a number of scholarly journals, each composed of articles which refer to previous articles, often in other journals. Tracking down all the relevant work on a given subject was often a time-consuming task. Still, the system of journals printed on paper provided a workable record of the state of knowledge and progress within a field.
The publishers of the vast number of scholarly journals have provided a number of services that bear enumerating. As well as disseminating information as widely as possible (or as widely as the paper paradigm will permit), good publishers have traditionally provided:
All of these services have enhanced the ability of scholars to disseminate their results, locate and identify information relevant to the topic of interest, and ensure that information will be available to future generations of scholars.
Although not a new topic (1), this subject does not receive the discussion it deserves. As the author has pointed out in several recent talks (see Boyce, 1999), the paper versions of the scholarly journals have traditionally served a multitude of purposes, not all of which are readily apparent.
This last function is subtle, but critically important. The journals, in effect, define how to write an original research paper; how to cite past work, describe your procedures, summarize new data or findings, and draw conclusions. In practice, the journals provide a necessary and important self-regulatory role. My colleagues often judge the importance of their latest project by asking themselves if it is significant enough that it would be accepted for publication in the Astrophysical Journal. To draw upon another example from the field of physics, if the Physical Reviews had not existed, the well-known preprint server at the Los Alamos National Laboratory would not function. Everyone who posts their work to the preprint server is writing ultimately for publication in the Physical Reviews -- or other similarly prestigious journal. The omnipresent example of a universally accepted set of norms keeps the quality of the preprints high, both in style and substance.
The system of scholarly information is undergoing a rapid transformation. The old system, which has grown up since World War II, is being modified. There is a certain movement, particularly within the university community to encourage the development of a system that will replace the current system of journals as the avenue for exchanging scholarly information. A number of suggestions along this line have been made, but, in general, they look at the problem from a narrow perspective; usually either from that of a librarian whose budget has been stretched beyond the breaking point by the rising cost of journals, or a researcher who is anxious to become aware of the latest information as rapidly as possible. As new models for exchanging scholarly information are being tried, let us try to do so with the goal that the new system should provide all the functions of the present system as well as opening up new capabilities.
The Web is upon us. We, as an academic community, are starting to embrace new paradigms for information transfer. To my mind, we have, as a whole, been remarkably timid in trying new methods and new tools to improve our ability to disseminate, find, retrieve, and use information. The new tools and capabilities should offer tremendous improvement in the distribution and exchange of knowledge, if only we can use them to advantage.
Astronomy provides one example of the power and synergy that result from linking the whole system. We now have four years of experience in publishing a thoroughly linked, scholarly journal on the Web (Boyce, et al.1997). Our journal has a number of advanced features such as versions in HTML (for reading on the screen) and PDF (for printing out), data tables in machine readable format, and video clips where appropriate. But, from the user's standpoint, by far the most important feature is the abundance of links incorporated into the HTML version.
The links to references in past papers and to future papers that cite the present article are consistently rated by our readers as the most important feature of the electronic version.
The links to past references are an obvious need, paralleling the normal practice in paper journals with the difference being that electronic links are nearly instantaneous, bringing the abstract (and even the full text) of the referenced article directly to the reader's screen within seconds. But, the electronic journal need not be content with simple backward links. For the past three years, we have been including links to the future papers, a service that can not be duplicated in the world of paper journals.
But this was still just the beginning for us. The well-established astronomical databases -- which are organized by the names of starts, nebulae, clusters, and galaxies, have, for the past 15 years, included the references to the literature where the data items were published. It has proven simple to incorporate links from the databases to the electronic versions of the articles, and vice versa, from the journals directly to the databases.
Searching for information is of critical importance, so with support from NASA, astronomers have built a searchable database of abstracts covering 150 years of the core literature in astronomy starting with the first issue of the Astronomical Journal in November, 1849. Eichorn et al. (1998) have described this service, the Astrophysics Data System (ADS). Note that since we can manage the 1900 transition, we are, of necessity, Y2K compliant.
But having titles and abstracts is not enough for most users, so the ADS has nearly completed scanning and making available the historical or "legacy" literature. The major difference between this effort and JSTOR is that the historical astronomical literature in the ADS has active backward reference and forward citation links that knit the whole system together for seamless use. The historical literature is linked both forward and backward in time with the new electronic journals being published for the AAS by the University of Chicago Press.
The concept behind the ADS is not unique. Medicine has the same facility, with the PubMed database making the links. In the case of astronomy, all of the core literature is available on line, and so are the majority of the important databases. The coverage is more complete than for any other field of which I am aware.
To summarize, many of the electronic astronomical journals are linked to each other directly, and all are linked indirectly through the ADS abstract system. The ADS also provides search capability and links to abstracts and full text, whether in the electronic journals or the scanned page images of the historical literature within astronomy. With NASA support, this collection is available for free. The ADS also provides the links from abstracts to the machine-readable data tables that reside in the online astronomical databases (CDS, NED, ADC), which can be searched by astronomical object. We call this system of protocols and links Urania. It is not a collection of objects, it is the underlying, enabling protocols -- an important distinction. No need to point out that Urania is entirely a Web Creature that could not function effectively if it were to be created on paper. The unique thing about astronomy's system is the tight interlinking among all the distributed information sources.
There are three keys to making this system work:
Figure 1 demonstrates that the interlinked astronomical information system can be entered at many points, as illustrated by the red arrows entering from the left. One can browse the journals -- jumping to the abstracts of the references and forward citations, then reading the full text, or going to the relevant data in the online databases as shown with the green arrows.
One can search the abstract collection, and get to the full text, the online journals, and the data. Or knowing the reference one can go directly to the historical collection -- full page images of all the core journals, most of whose references are linked to the abstracts and full text.
Or one can enter the databases by the name of the astronomical object of interest, retrieve the published data on that object, and link immediately into the articles where the data were originally published. One of the great tools -- particularly useful in this form for astronomy, is the ability (now only in prototype form) to search over a huge collection of data for a list of all objects that meet certain characteristics (e.g., are in a certain region of the sky, and are brighter than a certain magnitude, but emit a large amount of X-ray energy, and have more than the expected amount of infrared radiation. This capability for discovering new members of a class of objects is changing the way astronomers do their research. The time spent on tedious literature searches can now be used in converting this information into real knowledge about the universe.
One can consider the Urania information system, as powerful as it is for users, to be a prototype of one of many such collections of even more sophisticated interlinked information resources that I expect to appear in the future.
The successful set of preprint servers at LANL, was started by Paul Ginsparg, a theoretical physicist himself, who was unsatisfied with the failure of the traditional publishers in physics to move rapidly toward methods of electronic publishing. Within the disciplines of physics, astronomy, and math, the preprint servers are very popular. They provide a means of rapid communication of the latest results, and secondarily provide an overview of what people are working on. In other words, they are providing the first two functions of the traditional journal, news about the field, and rapid notification of recent results.
Although it is the desire of Paul Ginsparg to have the preprint servers supplant the traditional scholarly journals, there are a number of reasons why this probably will not happen. In astronomy, the community has shown a willingness to use both the preprint servers to stay abreast of breaking developments and to use the electronic versions of the established journals as the repository of the core knowledge of the field and to validate the reputation of the authors. Experience shows that both systems, existing side by side and even, hopefully, working together will best serve the users.
There are a number of reasons why the scholarly journals, at least in astronomy, should be the vehicle of choice for authors. First is the fact that only the journal articles become part of the whole distributed and linked information system. While the articles in the preprint servers can refer to each other, the system is more cumbersome than in the journals, and depends on the willingness and skill of the author to make the links. Searching in the preprint server is by author, title, and keyword, whereas the journals offer full text searching and the ADS offers searching of the full abstract. In fact, the ADS now provides full text searching of the abstracts of the preprints, with links directly into the preprint server, a service which is proving to be popular.
Another problem with the preprint servers is the capability for authors to make changes to their articles. In some sense, this is good -- and a departure from the tradition of paper. But, it means even more articles for a reader to read through, and it is often difficult to determine what has been changed between one version and the next. With the growing tidal wave of information, which scholars everywhere are trying to stay ahead of, it would seem that the last thing readers need is more articles to read.
But even more important is the problem introduced by multiple versions of an article. Which version of the article does the reference you see in a paper refer to? It is not helpful, it is confusing, when a critical reference to an article is rendered moot because the author has changed the article after the critical article was written. The preprint servers added version control after this began happening, but, to my mind, it still leaves the preprint server as a mechanism for rapid communication, but not for archival storage of ones work.
Archival? The preprint servers call themselves archives, but the formats used to prepare the articles are not archival in nature. They are servers of information, not true archives. PDF is one format used widely, but it cannot be claimed to be truly archival, or even capable of being read 20 years from now. Many authors use LaTeX to prepare their articles, another format that cannot claim to be of true archival quality. All electronic information will have to be migrated to new forms as the software and hardware used for reading them evolve. Perhaps LaTeX and PDF will be translatable because they simply represent page images of material. That is the only hope for survival of material stored in those forms. To my mind, this is not a sure thing. And who will pay for the translation?
But, the underlying reason I have for not supporting the preprint servers as repositories of any but transient information is the underlying philosophy of treating articles as separate (and virtually independent) entities composed of page images. Even though they are being delivered electronically, this concept is rooted in the old thinking that derives from the world of the paper journal. It is clear from the experience with the astronomical information system that the world of electronic information will be vastly different. Articles are no longer independent entities. They are tied to other articles forward and backward in time, and soon to pieces of other articles such as data tables, or video clips. And articles will become closely tied to various databases and, eventually, to information sources that we have not yet envisioned. I find the format and the philosophy of the preprint servers, as they now exist, to be more rooted in the past than one might expect.
But, of course, Paul Ginsparg has shown a remarkable capacity to turn technology to the advantage of the users of information. And this is a hopeful sign for all of us.
Having demonstrated what can be done in the way of effective information dissemination, let us examine what is lacking in the way most scholarly information is now being distributed on the Internet. With authors able to post their work directly on the world's "bulletin board," many of the tasks performed by the traditional publisher are being attended to poorly, if at all, by most purveyors of information.
Much "self-published" material on the Web is poorly presented, is not part of a stable system of interlinked information, cannot be easily found, and certainly will not last or be readable a decade from now. The consequences of this failure to produce locatable information on the Web, in a format that can be maintained, with links to relevant information and data, are already making the Web an inefficient medium for transferring information.
This problem is exacerbated by the single-minded and complete dependence of most young people on the electronic resources. Their motto is, "If it isn't on the Web, it doesn't exist at all." (Stevens-Rayburn, 1998) http://www.eso.org/gen-fac/libraries/lisa3/stevens-rayburns.html
Let us look at some of the aspects that go into making Urania a success:
Nowadays it is even more imperative to use the Web effectively, not just to advertise the author's presence, but to ensure that the author's information can be found, will be part of a broad system of electronic information, and will survive into the future.
The tasks fulfilled by the traditional publisher, which have evolved and matured over centuries to make an effective paper-based information system, are still required. Yet, it is clear that the processes and materials of traditional publishing are changing. In the long run, the traditional structure of information itself is being revolutionized. Whether the traditional publishers can adapt to using the new medium and new formats of information exchange is very much an open question. Some STM (Synchronous Transfer Mode) publishers are adapting well to the new era of distributed, interlinked information. Others are not.
In any case, authors should be aware that simple posting of material to their own Web site is not an effective method for information transfer. The material is much less likely to be found by an interested reader, and the longevity is certainly not sufficient for scholarly work. Posting to a preprint server is only slightly better.
Peter B. Boyce is currently a visiting professor at the Centre de Données astronomiques de Strasbourg, Université Louis Pasteur, Strasbourg, France.
(1)The author first became conscious of the importance of considering the broad set of functions of a journal while listening to a talk by Washington Taylor (http://publish.aps.org/EPRINT/KATHD/taylor.html) at a workshop on Electronic Preprints held at Los Alamos, Oct. 14-15, 1994.
Boyce, Peter B. (1999) Electronic Scholarly Journals A Talk given at Université Louis Pasteur.
Boyce, Peter B., et al. Electronic Publishing: Experience is Telling Us Something, Serials Review, 23,1 1997 (As Submitted)