Tobias Rademann <Tobias.Rademann@rz.ruhr-uni-bochum.de>
Ruhr University Bochum
The Internet has begun to establish itself as a universal medium for accessing information of all kinds on a global scale. Naturally, a vast majority of these resources can also be employed in an educational environment. This paper focuses on two questions: Which information resources are available on today's Internet for K-12 and higher education and how can they best be found. For this purpose, a classification scheme of the most important online resources will be presented in the first part, while the second is concerned with presenting background information on the different types of search engines and outlining general search strategies for research assignments.
With world knowledge doubling just about every three to five years and the Internet being on the verge of becoming the primary source for information of all kinds, one question becomes ever more important: How can we properly handle this highly powerful online medium by finding the information and services we need and avoiding the most pressing problem of the Information Age: Information Overload.
For various reasons, this question becomes especially important in the field of education: Let alone the fact that today's generation of students must be educated in a way that they will have no problems whatsoever in handling this new online medium in their future lives, it is most of all in K-12 and higher education that access to multicultural, multidimensional information is of utmost importance. The Internet promises to meet this need by offering access to "in-depth information from around the world at our fingertips." However, there are several major problems with regard to this offer: Partly due to the recency and the complexity of this development, today's teachers and lecturers usually have a lack of experience regarding the proper employment of such a medium in class and often have no idea of the broadness of information accessible via this network. In addition, there are also problems on the side of the students: While the younger ones do not have any difficulties in handling a computer but lack the background knowledge necessary to be able to judge the importance of the various bits of information accessible, the older ones--like the teachers--might possess this background knowledge but lack experience with the Internet as such. Thus there arise two major questions: 1. Which resources are available on the Internet for educational purposes? 2. How can information on a certain subject best be found?
Even though educational material only accounts for a comparatively small part of the overall data on the Internet, there already exists a broad and highly useful variety of online resources which can be employed in almost every subject. Given this broadness and the considerable degree of differences among the numerous sources themselves, however, it is necessary to present a general classification of their different types, which will, in turn, help both teachers and students to gain a better understanding of the various tasks they can be used for.
While there are a considerable number of highly specialized educational projects accessible on the World Wide Web (WWW), general information resources can typically be divided into the following six categories: electronic books (EBs), electronic periodicals (EPs), online databases, encyclopedias, educational sites, and newsgroups. The following paragraphs will now present a brief survey of their most important characteristics, investigating issues such as overall content, quality, suitability for educational tasks, and finally the costs associated with their use. Where appropriate, some high-quality examples will be given.
While printed books undoubtedly represent the most common and probably also most important didactic medium for accessing information in today's educational environment, this is rather different for the Internet. Given the advantages which arise from the possibility of having dynamic rather than static documents and Web sites in combination with all kinds of multimedia data such as animations or sound- and video-sequences, it is practically against the nature of this network to offer anything comparable to the genre of fixed-content books. Bearing in mind the fact that no one currently attempts to read long books online, chapters would have to be printed out anyway. Thus, it is most of all working papers and electronic periodicals (cf. below)--sources the content of which may be modified fairly quickly--that will be found on the Web.
Today's genre of online electronic books primarily consists of electronically edited copies of highly well-known traditional works of such authors as Shakespeare or Jane Austen which are used for various kinds of linguistic analyses. 1 This, however, clearly restricts their use for educational purposes to subjects primarily concerned with linguistic rather than literary, scientific, or social issues.
At present, there is usually no fee for accessing most of the electronic books that have been made available online. However, it can be expected for the future that, as readers become more used to reading online sources and the highly complex issues concerning copyright law have been modified accordingly, their use will most likely no longer be free of charge.
According to Rademann (1996), there are three major groups of online electronic periodicals, namely electronic newspapers, electronic magazines and journals, and last but not least electronic news and information services (NIS). 2 Generally speaking, this subgenre has shown a considerable expansion over the past few months, and has now become one of the leading sources for online information. As most EPs are published on the Web, they are easily accessible and intuitive to use for most of today's Internet users. With respect to their overall content, it can be said that EPs basically cover the same topics as their printed counterparts, i.e. news, current events, opinions, values, theories, and similar information. Some even feature classifieds, cartoons, crossword puzzles, etc. However, due to their lower production and distribution costs in combination with the prospect of reaching an almost unlimited target audience at no extra cost throughout the world, there will probably soon be a considerable number of highly specialized EPs available: While there is, of course, still the possibility of publishing general-interest newspapers and magazines, it is especially in the field of science and leisure activities that publications can begin to focus on the highly specific needs of an often extremely selective readership, as the respective target audience now comprises interested readers from around the world and not only from a regionally restricted area.
As far as their overall quality is concerned, EPs can be subdivided into two groups: The major EPs such as USA Today, Time Magazine, or CNN International are of a remarkable quality: The scope of their articles regarding both depth and breadth is unequalled, the sites are clearly structured and easy to navigate, the information available tends to be very much up-to-date, and the articles include both commentary and factual reporting. In addition, most sites also feature comprehensive archives, which represent a valuable tool for academic research. On the other hand, however, the smaller EPs, whose target audience is presumably not yet sufficiently large to justify maintaining sophisticated Web sites, are of a rather minor quality. Often, no more than a handful of articles are published, which hardly contain additional material such as photographs, etc.
Obviously, the majority of those EPs which are available already are most appropriate for gathering background information about current events (such as, for example, the US 1996 Presidential Election or the Beef War), while it is likely that in the not too distant future, special-purpose (scientific) magazines can be employed for gathering the latest information on almost every subject included in the curriculum.
Fortunately, the vast majority of EPs are currently free of charge, which--in combination with their other advantages--makes them an ideal tool for educational tasks. However, as their target audience grows, more and more will probably be made available on a subscription basis only as, e.g., the Wall Street Journal Interactive Edition (WSJIE).
There are various forms of electronic online databases, such as, for instance, library catalogues, special-purpose databases, or address books. However, for educational objectives, it is only the first two which are of primary interest. Although more and more databases are being made available on the Web, most of them are still only accessible via Telnet, which means their layout is much less sophisticated than that of, say, Web-based EPs, and which appears to make them somewhat harder to handle for the average user.
Library catalogues (as for example the LOC services) have become increasingly refined and ever more comprehensive, assisting both students and teachers with their research on a broad variety of topics. Special-purpose databases (such as Uncover), on the other hand, usually comprise articles of selected (electronic or printed) periodicals. They can be searched and will generate a--hypertext-oriented--output listing those articles that match the user's query.
While library catalogues are usually free-of-charge and suitable for assisting students in gathering literature on special tasks such as assignments or term papers, special-purpose databases are often accessible on a pay-per-view basis only; however, they can be employed for retrieving valuable background information on almost every subject.
While printed encyclopedias belong to one of the most widely-used sources for obtaining background information, their electronic counterparts have only recently begun to emerge. The best-known encyclopedia is, of course, the Encyclopedia Britannica which has launched its online services in mid-1996; however, it is likely that, given time, other major companies will follow suit.
For the time being, the content of the online version is still somewhat restricted as compared to that of the printed copies, especially with respect to photographs and other forms of multimedia elements. However, there is no doubt that this will soon change completely, in fact resulting in online encyclopedias being able to offer much broader and more current in-depth information on any topic (including 3D-animations, sound and video sequences, etc.). Another advantage is their hypertext-based layout which makes them easy to navigate. Finally, their highly-advanced search interfaces are of utmost value for any user because they ensure that every article relevant for a given query will be retrieved instantly.
It has to be said, however, that electronic encyclopaedias tend to be free-of-charge for a comparatively small trial period only. Nevertheless, given their overall potential, educational institutions should seriously consider making them available for their students.
It has been hinted at in the introductory paragraph of this paper that there exists a broad variety of electronic online resources for information of all kinds on today's Internet. Some of those educational institutions that are connected to the Web already have begun to publish their own data, such as additional material for their classes, student papers, or even entire online courses. And although these are primarily aimed at assisting local students, they can, of course, also be accessed by third parties.
Obviously, it is impossible here to give a more detailed description of the content of these sites.3 Suffice it to say that they naturally cover an unlimited breadth of topics on all aspects of K-12 and higher education. In order to find supplementary online material to a given course, the teacher will have to consult the major search engines and choose appropriate categories or keywords ensuring the query is suitable for identifying data which satisfies the respective requirements. On the other hand, however, once these sources have been located, they tend to be highly valuable for educational purposes and are usually also free of charge.
By means of newsgroups individuals can exchange information with hundreds of fellow users throughout the world. These e-mail-based discussion lists are hierarchically organized on a broad variety of topics (more than 15,000) to which every interested user may subscribe in order to read messages and post messages to the other members. There are both moderated and unmoderated newsgroups, a fact which might have to be taken into consideration by a teacher before recommending individual groups to his students.
In an educational environment, newsgroups can be employed for two major tasks: First of all, they can assist in retrieving information (e.g. simply by reading articles posted to a newsgroup on a certain subject or by asking for help by posting articles). In addition, newsgroups can support both intercultural communication (e.g. in foreign language courses) as well as cooperation between regionally dispersed students/schools (e.g. in larger projects and assignments).
Bearing in mind that there are already more than 150 million pages available on the WWW comprising no less than 50 to 60 billion words, looking for a certain piece of information via this medium is like "looking for a piece of a puzzle--a puzzle with more than [60 billion] pieces scattered in millions of locations in more than a hundred countries"4. As a result, numerous methods for finding a given source on the Web have been attempted over the past few years, all of which can be subdivided into two groups, namely goal-driven and arbitrary searches. While the latter are usually referred to as "surfing," there are two approaches to the former (cf. Fig. 01): A user may systematically follow links that are in one way or the other connected with his subject ("browsing"), or he might employ search engines5 to assist him in locating the information he seeks. While not much can be said about surfing or intuitive Web browsing, various types of search engines exist that assist with different tasks. Generally speaking, search engines can be divided into two major classes and three different types, namely the traditional search engines (including subject trees and keyword-driven search engines), and the more modern searching aids, the so-called Intelligent Agents (cf. Fig. 01).
Fig. 01: Classification of Web-Searching
However, while the capability of the existing search engines is becoming increasingly sophisticated, the fact that there is still no such thing as the perfect search engine leads to two important conclusions: In order to gain optimal results when performing a search on today's Internet, it is first of all vital to choose the right type of search engine and to use more than one representative of this class; in this context, it is of utmost importance to have at least a basic understanding of how these search engines work (i.e. their anatomy), including such issues as the sources they index, how they do this indexing, and which query-syntax they require. Only when these issues have been sufficiently considered, can the user begin to concentrate on drawing up adequate search strategies for any research assignment.6
In order to be able to assist the user in retrieving the kind of information he longs to find on the Web, search engines must naturally a) have some kind of knowledge of what is currently available on this medium (raising questions concerning their source-indexing) and they must b) have some kind of understanding of what the user wants from them (raising the question as to how human-computer interaction takes place).
With respect to the traditional search engines, the first is achieved by a method called indexing:7 Both KWDSEs and directories use so-called robots (often also referred to as spiders, worms, or wanderers) that automatically crawl through the Web and record different features of the pages they visit as they move from one site to the other. These spiders are nothing but computer programs using a comparatively simple recursive algorithm to navigate through the WWW by following the URLs included in newly found HTML pages.
Although this procedure is identical for every traditional search tool, the major differences between the individual search engines result from the way their respective robots handle the sources they come across (this process is also known as parsing). There are various ways in which sources may be indexed in the database of a given search engine, depending on the so-called "cataloguing issues": Some robots index the entire content of every document they come across, whereas others try to construct summaries; others again store the first hundred words or the headline(s) or even the HTML title and header only.8 As far as the indexing methods of subject trees and directories are concerned, they belong to those search engines that require human intervention: While a robot automatically gathers data, it is humans who classify the individual sources according to their content on the basis of a (usually highly complex) classification scheme. As a rule, there are two methods by means of which data is categorized in these trees, namely by subject (thus the term subject tree) and by geographical location. The output regularly consists of a systematically arranged tree (or map) that may then be browsed by the reader. Naturally, the different categories become ever more specific the deeper the user follows a link, finally leading him to the leaves of the tree, i.e. the most specific sections, which contain a list of hypertext links to the sources that should be useful for retrieving the information he is looking for. For KWDSEs, on the other hand, the classification issues vary considerably from engine to engine, ranging from highly complex, full-site indexing (e.g. Alta Vista) to simple HTML-header listings (e.g. WWWW). However, quite in contrast to the subject trees and directories, no human intervention is required for building the individual databases; the entire process is fully automated. The output for a given search almost always consists of a hypertext list of URLs, usually complemented by either a short summary or the first few lines of the respective document.
Quite in contrast to traditional search engines, the modern search engines (i.e. Intelligent Agents) do not attempt to index the Web. Each time they are confronted with a new task, they will search the Web online in order to retrieve information. For this purpose, Intelligent Agents use neural network technology by means of which they try to spot patterns and interrelationships in natural language and sample Web pages which will then be matched with newly-found online sources.9 The output typically consists of a list of rated URLs which may be visited by the user once he has recalled the agent.
With regard to the second question raised above, namely the fact that the search engines would have to have at least a basic understanding of what the user expects from them, it can be said that, although much research has been done in this field lately, even the more recent attempts still require a considerable amount of background knowledge on the part of the user with respect to how the search engine works and not the other way round. As there is obviously no need for query-based human-computer interaction when retrieving information from subject trees and directories, this background knowledge primarily concerns the query formulation with KWDSEs: The traditional approach for refining a query has been to employ Boolean logic, using operators such as "and," "or," "not," as well as adjacency operators such as "near," "close to," etc.10 As elementary and simple as it might appear at first sight, Boolean logic has always demanded a high degree of background knowledge and experience in formulating queries on the side of the user and is almost inapplicable for the lay user, at least with respect to more complex queries."11
With Intelligent Agents, on the other hand, there is no such thing as a query in the traditional sense.12 After they have been created by the user, they have to be trained before they can perform their first task (search). This training involves the input of plain English text describing the topic to be investigated. Once this has been done, the agent will independently navigate the Web, rating the material it comes across according to its likely relevance. Although it is said to be the most important upside of Intelligent Agents that they can parse natural language, it is apparent that here, too, considerable practice is required in order to get a feeling as to which information is necessary for their adequate configuration. Thus the promise that the user would not need any background knowledge and experience in handling Intelligent Agents is not quite true for the time being.
It is apparent that the differences in architecture have considerable implications for the suitability of a search tool for a given task: The before-mentioned should have illustrated that, generally speaking, subject trees and directories are of advantage if the user is looking for a variety of links on a fairly broad subject (such as "educational resources for Biology"), while it is usually too tiresome to use these tools for a highly specific query. In addition, it needs to be said that another advantage of subject trees is the fact that the user requires hardly any background knowledge on the way these services work, as the only thing he has to do to retrieve his information is follow the respective link by intuition. What is more, the procedure required with subject trees and directories often helps researchers by suggesting a rich variety of related topics and online sources. However, resulting from the time-consuming manual processing of every new link, it is the most important drawback of these engines that their databases only contain a fraction of the Web's resources. Finally, it must not be forgotten that the links which can be found in subject trees and directories are always of a subjective nature, since it was humans who classified them in the first place. KWDSEs, on the other hand, should only be employed when looking for highly specific information on the Net, such as the WWW site of The London Times, for example. Bearing in mind that they would also consider every other material available (such as commercial sites, advertisements, personal homepages, etc.), a search on Web sites of "chemistry," for instance, would render an immense amount of irrelevant sources, comprising all kinds of documents where this word occurs. Although it is said that Intelligent Agents can be used for a broad variety of search tasks, the necessity to train and retrain them and the fact that they have to scan the Web online makes them suitable for major research projects with a longer temporal dimension only.
The preceding sections have demonstrated that each type of search engine is suitable for a certain task only and that no two search engines ever yield the same output on a given query. To gain the best results possible for a research project, it is thus of utmost importance to select the right type of search engine for the individual task and/or to combine the different types for more complex searches. All aspects considered, there are basically two steps involved in doing research in online sources: At first, the teacher or the students will usually want to get an overview over the information resources that are available on the Internet and could provide useful material on their respective assignment; once these have been located, they will want to browse the individual sources for specific information on their task. Thus a typical search strategy for research tasks will have to have a bipartite structure, too.
On the basis of the information presented in the section on search engine anatomy above, it is clear that in order to get a general survey of the different information resources that are available on the Web, the researcher will have to consult a subject tree or directory. Depending on whether he wants to gather information on a specific topic (as, e.g., on the "Maastricht Treaty") or on a specific type of information resource concerned with a specific subject (i.e. all available magazines dealing with Irish issues), he will either have to browse these trees by the topic of the assignment or by the respective information resource category. At the end of this process, the user will have come up with a more or less comprehensive list of URLs that might be useful for his task.
In order to first of all verify the suitability of these sources for the respective task, and for the purpose of retrieving specific information at a later point of time, the teacher and/or the students will have to browse each of them separately in a second step. This process will heavily depend on the type of information resource that is being investigated: Such sources as EBs and educational Web sites will usually have no extra search facility implemented on their pages (which means they have to be browsed manually in order to get an idea of the scope of the information they offer), whereas EPs, online databases, and encyclopedias tend to have a keyword-driven on-site search engine to scan and access their various archives.13 Finally, newsgroups may be scanned with the help of some of the major KWDSEs, too, such as Alta Vista or DejaNews.
While the type of information resource that is to be examined is the principal factor that determines the kind of search engine to be employed during the second part of any research assignment, its temporal scope must also be taken into consideration: If students have to conduct a long-term research assignment, there are various additional tools that will help them with high-level goal-driven information retrieval. Especially with all kinds of EPs, students might refer to so-called personalized news services. Although there are an increasing number of commercial news delivery services (NDSs) such as Farcast, NewsHound, or IBM's InfoSage that will automatically deliver abstracts or whole texts that match a user's profile to his e-mail address every new day,14 it is most of all the non-commercial services which will be of principal interest in an educational environment. Here, it can be observed that most major EPs have set up personalized newspapers and magazines, where a user may specify the sections and keywords he is interested in (for an excellent example, see The London Times, but also the WSJIE or Pathfinder PE). Once this has been done, a CGI-script will automatically create the user's individual edition of that very newspaper or journal, containing all articles that match his interests. Finally, it has to be mentioned too that agent-based press services (such as Autonomy's Press Office) have also been released recently. Once they have been trained sufficiently, these programs immediately scan several EPs, gathering all the relevant material and presenting it in an EP-like layout to their owner.
To conclude, it has to be said that during a research project, it might become necessary to locate the URL of a specific source. Here, it is, of course, best to consult a keyword-driven search engine. As these engines are used for retrieving highly specific information, they are well-suited for finding out about the existence of a given newspaper, encyclopedia, etc. In order to ensure that he gets the broadest coverage of the Web, the user would naturally either consult a multi-threaded meta-search engine (such as MetaCrawler or Inference Find) or submit his query to more than one KWDSE. It is not wise to use a subject tree or directory here, because browsing these would take too much time.
|Step / Task||Search Tool|
|I. Obtain Overview on Topic||==> Subject Trees / Directories|
|II. Retrieve Information from Individual Sources||Short Term:
==> Online Archives
==> Personalized News Services
==> Agent-based Software
|add. Find Specific Source / Newsgroup||==> KWDSEs|
Fig. 02: Search Strategies involved in Research Tasks
It has long been clear that with the help of the Internet, a new form of multimedia learning environment can be created that supports in numerous ways the various issues of education and enables students to communicate with their fellow peers (as well as with professionals, for that matter) from all over the world. The amount of information available on this medium for educational purposes is already hardly quantifiable, and it can be expected that, as time goes by, there will be an ever increasing pool of electronic online resources providing multifaceted, multicultural in-depth information.
While it has been demonstrated that there are various possibilities for doing research on the Web, it has also been explained that no matter what students or teachers are looking for, they first have to set up an appropriate search strategy containing information as to which search engine should be employed at which stage of the project. This, however, is only possible if they possess a sound background knowledge of the implications the goals of the respective inquiry have on the choice of the different types of search engines. In addition, it has been mentioned that the user would further require a certain degree of experience in handling the various search engines, no matter if they belong to the more traditional or the more modern ones.
It has been illustrated that the different types of search engines are suitable for certain tasks only: While KWDSEs should be employed when looking for information on a highly specified subject, subject trees and directories should be referred to when searching data on a less well-defined topic or on information concerning regional aspects. As for Intelligent Agents, their field of application is almost unlimited. Nevertheless, mainly resulting from the necessity to (re)train the respective agent, their employment is most advantageous if the research project is of a longer temporal dimension.
To conclude, it should also be mentioned that every teacher or lecturer will have to consider various additional issues when using the Internet's resources as supplementary material in any class: The most important of these concerns the fact that there has not yet been any kind of quality-control mechanism implemented on the Web, which means that students would definitely have to be taught how to judge the reliability and the value of the various sources they come across in order not to "get lost" in the immense amount of data that lies at their hands.15 Thus, to draw up suitable search strategies for retrieving potentially relevant information is only the first step--it will always have to be followed by a quality analysis and a detailed discussion of the material obtained in class.
EN1 One of the best-known projects concerned with electronic books is the so-called "Project Gutenberg" which aims at collecting, editing, and maintaining a large number of electronic copies of well-known classics (see Gutenberg (1997)). For some interesting accounts on how computers can be used to assist in analysing linguistic aspects in electronic books see, e.g., Prof. Johnson's course CHUM650 at Dakota State University and his various publications (Johnson (1997)). Return to text
EN2 Rademann (1996), pp. 9-18. Return to text
EN3 For an overview the interested reader might refer to the respective section in the WWW-Virtual Library or Yahoo. One of the most well-known projects is the "Whole Frog-Project" by the Biology Department of the Univ. of California, Berkeley (see Johnston (1997)). Return to text
EN4 Conte (1996). Return to text
EN5 A search engine is "a program that searches through some dataset. In the context of the Web, the word 'search engine' is most often used for search forms that search through databases of HTML documents [...]" (Koster (1996)). Return to text
EN6 For a slightly more elaborate account on this topic see Rademann (1996), pp. 61-87. Return to text
EN7 For the following, cf., among others, Koster (1996). Return to text
EN8 Cf., e.g., Joss/Wszola (1996). Return to text
EN9 Further attempts use thesauri, phonetic look-up tables, or query-by-example. See Joss/Wszola (1996) for a more detailed description of these techniques. Return to text
EN10 For some more background information on this subject, cf. Claus/Schwill (1986), pp. 79-80, but also Selzer-Ray (1997), Alta Vista (1996a), Alta Vista (1996b), InfoSeek (1996), and Lycos (1996). The ANUL report's section 'Brief guide to searching indexes' gives a brief but quite interesting overview on the different forms of Boolean query syntax that have to be used for the most common search engines (cf. ANUL (1996)). Return to text
EN11 See Joss/Wszola (1996). Return to text
EN12 The data on Intelligent Agents presented in this paper draws on the most sophisticated example available at the moment, namely Autonomy's Agent Package (cf. Autonomy Systems Ltd. (1996)). Return to text
EN13 As their handling is comparable to that of the KWDSEs described above, it will not be discussed any further here. Return to text
EN14 The costs as well as the scope of journals that are covered by each of these NDSs vary considerably (for an overview see Rademann (1996), p. 85, Tab. 09). In addition, most of them are currently focusing on the international business executive rather than on the average reader. Return to text
EN15 See, e.g., Kirk (1996). For additional information the interested reader might want to check the online "Bibliography on Evaluating Internet Resources" (one of many), maintained by Auer (1996), or the WWW VL page on "Evaluation of information resources" (WWW VL (1996)). Return to text
© Tobias Rademann, 1996-97.