Kim H. Veltman <email@example.com>
University of Toronto
One of the paradoxes of human nature is that we frequently use new technologies to speed up old tasks rather than as tools to explore new possibilities. In education, for example, many see the Internet simply as a way of putting their traditional texts, lesson plans, and courses on line. Others see it as a medium for children to share e-mail messages or to discuss each other's learning online.
This essay explores some of the new possibilities that are being developed as part of the author's System for Universal Media Searching (SUMS), a software for conceptual navigation on the Internet. General aspects of this system have been considered elsewhere. This paper outlines a vision, some aspects of which have been initiated on a local scale, other elements of which will require international cooperation. The first part of the paper explores how computers can provide a more systematic view of institutions, persons, and peer review (who) and curricula and equivalents, online reference sections, and libraries (what). In our view, the challenge lies in integrating the needs of a particular course into a larger framework for learning, where the end result is not defined a priori, to make education truly an agency for open learning. The second part of the paper focuses on new methods for classing, searching, and ordering materials on the Internet, all of which are prerequisites for systematic searching and metasearching in order to make these resources more powerful tools for learning. To put it differently, part one explores the creation of new educational content while part two focuses on new strategies for accessing that content.
Lists of institutions and persons in education have traditionally been local. The World Wide Web has begun systematic lists of schools and other institutions, such as these key international sites for education:
With respect to persons, a principal knows who are the best teachers and the less qualified teachers in a particular school. A school board superintendent has some sense of exceptional teachers. What is needed is an online Who's Who of teachers, facilitators, consultants, and other persons in education, on a local, national, and an international basis. This list will be created using peer reviewers not chosen by the person in question. It will also include letters of recommendation from individuals chosen by the person, as a safeguard for those whose qualities may not be appreciated by those in their immediate vicinity. In the SUMS framework, such lists will come under the question who, while peer reviews are accessed via Learning, Methods, Evaluate, and Review as outlined in Appendix 1.
Each school will enter its own lists. The names of individuals in these lists will be forwarded to a centralized national list. Detailed information concerning the individuals in these lists will remain local. This detailed information will be confidential in principle and accessible only to the persons directly concerned, e.g., a principal, superintendent, or supervisor. Others wishing to have detailed access must acquire permission from the principal and the teacher in question. On the other hand, the basic facts about any teacher--such as his or her curriculum vita--would become public knowledge; for example, parents wishing to be reassured about the qualifications of their child's teacher could look them up. This approach will apply equally to members of the school board and will thus provide basic community access to the educational structure.
In Canada, since education is a provincial matter, each provincial government will also enter the names of the individuals in their Ministry of Education as a part of the government online process. Connections between the Ministry of Education and school boards will be identified. This entails no threat to the provincial mandate for education but allows federal governments an integrating role for standards between different provinces and other countries, thus laying a framework for coordination with G7 pilot project 3 on education and training.
Traditionally, each province or state designed its own curriculum, which was then interpreted by each local school board and in some cases by each local school. This arose partly of necessity. A curriculum tended to be a list of high-level goals, such as "ability to analyze literary texts" or "transformation of three-dimensional space." Precisely which literary texts or which mathematical problems were suited for these goals was left to local discretion. Traditionally, a Ministry of Education would generate the curriculum goals. Publishers working with designated teachers would then propose possible materials to achieve the goals. The ministry would vet these and some would be accepted as official textbooks, approved for the curriculum.
In the case of the Metro Toronto Separate School Board (MTSSB), these abstract goals of the mathematics curriculum have been translated into corresponding lists of concrete mathematical problems. Such lists should be made public, giving school boards across the country a chance to provide additional, corresponding, concrete mathematical problems. At the MTSSB, one of the mathematics teachers to (John MacDonald), has been given a year's research leave to take this approach a step further, using this list of concrete mathematical problems and linking it with actual textbook examples.
This work can serve as a model for other school boards provincially and nationally, with those in other provinces adding links to their local textbooks. This should be made available online nationally. As a result, a teacher or student in Ontario who is planning to move to British Columbia can check on the mathematical problems for their level and examine corresponding textbook examples in the new place. This approach will provide a new framework for establishing equivalents between education in different provinces and countries and help in the development of international standards such as those toward which the International Baccalaureate has been striving.
The above effectively contextualizes three levels of the learning process: curriculum, courses, and specific problems in individual texts. In the long term more is needed; namely, a full contextualization of knowledge, which will in turn link all levels of learning as outlined below:
Such a thoroughgoing contextualization is necessary in a highly mobile, multicultural environment if teachers and school boards wish to have serious instruments for accountability. If this framework is made public, it will provide parents, students, and teachers alike with a basis for explaining the links between exams, courses, curricula, and the corpus of knowledge generally. This framework will provide a basis for assessing differences in education in other provinces and other countries.
A concrete example helps to explain this more clearly. On a given test or exam, a student may receive a mark of 95%. This test may, however, represent only 10% of a given textbook and only 5% of the course to which that text belongs. The same test may represent only 1% of the actual curriculum pertaining to that subject and .001% of all knowledge in the field. Students who receive 95% clearly should be given a sense of achievement: This mark is much better than a 75% or a 55%. Yet at the same time they need to be made aware of the relativity of their success and recognize the enormity of that which still needs to be learned when one shifts focus from a particular test to the subject as a whole.
The system will begin by focusing on traditional lessons and courses in terms of individual learning. It will also encourage newsgroups, listservs, and collaborative learning where appropriate.
Traditionally, each school had its own library and/or resource center, which provided students with some glimpse of materials beyond the boundaries of their everyday textbooks. It is an obvious next step to use the Internet for these purposes. The Web already has a Virtual Library site and the G7 has made its pilot project 4 a Bibliotheca Universalis .
Libraries have a reference section consisting of classification systems, subject catalogues, dictionaries, encyclopedias, bibliographies, and abstracts, i.e., tools for finding other materials in this and other libraries. These correspond to the first five levels of knowledge in the SUMS approach. A priority should be to collect all Internet references to these reference materials and make them available online. Some of these materials, such as the Oxford English Dictionary and the Encyclopaedia Britannica, require subscriptions. These companies should be approached concerning board-, province-, or nationwide site licenses for educational institutions. This will have the enormous advantage of giving students in a remote site access to the same educational resources that might be found in an affluent school in a major urban center.
In keeping with the quest for contextualization outlined in the section above, a next step will be to make a list of all textbooks available online. Those not yet available should be put online. This will again ensure that people in remote locations will have universal access to the same materials as people in urban centers.
A next step will be to correlate the names and titles used in curricula, courses, and textbooks with those in major libraries, galleries, and museums. If Shakespeare is mentioned in an English course, the system will automatically allow a user to find out which copies of Shakespeare are available in local, regional, national, and eventually international libraries. Enterprising students will be able to explore to what extent the person or topic they are studying is a local phenomenon or international in scope. Such activities can begin as initiatives in given schools, then coordinated by school boards, provincial education ministries, and national organizations such as Schoolnet in Canada.
Given the emerging Z39.50 protocols , aspects of such searches can increasingly be automated. Indeed, some will argue that all of these exercises should be automated entirely and that knowbots acting as electronic butlers should entirely replace the students' and teachers' manual searching. In our view, this is philosophically dangerous in that learning to search is a vitally important active exercise that should not be totally replaced by a passive experience.
Simply having access to the combined resources of libraries throughout the world will constitute one form of new knowledge in that it greatly increases the sample on which claims are based. Someone wishing to study Shakespeare will not only have access to titles in the local library but will be able to trace the spread of editions in different languages around the world. A student interested in a particular mathematical problem will potentially have access to examples from all over the world.
The integration of spreadsheet packages (such as Excel) with databases means that statistical views of data in various kinds of charts will become increasingly easy in the future. Such graphical visualizations of facts will help users to see new patterns.
The rapid development of geographical information systems (GIS), combined with the evolution of computer-aided design (CAD) packages, means that students will have access to dynamic maps so they can trace historical changes in the boundaries of an empire, country, province, county, town, or estate. They can connect these geographical sites with images of churches, historical buildings, and monuments, any of which can be reconstructed at various levels of complexity. These maps can be treated chronologically--one can trace the development of Romanesque or Gothic churches as they spread across Europe. Or can trace the spread of a given motif or theme.
Companies such as Autodesk have extended the notion of object-oriented programming to the building blocks of the manmade world through what they call "industry foundation" classes. A door is now treated as a dynamic object that contains all the information pertaining to doors in different contexts. Hence, if one chooses a door for a 50-story skyscraper, the door object will automatically acquire certain characteristics that are very different from a door for a cottage or for a factory warehouse. This concept can readily be applied to local cultures, both present and historical, adding an enormous richness to our awareness of doors, such that the variety of the particular becomes an incentive for global variation.
In the past, knowledge was primarily about physical events and objects. When was the Battle of Hastings? Where are the manuscripts of Euclid's Optics? What happens in Act 2, Scene III, of Shakespeare's Hamlet? Such questions were also the focus of education. The advent of computers is increasing the definitions of what we know and thus the horizons of education. As noted above we can now trace the development of a style geographically and chronologically, in space and time. We can trace the spread of religious movements such as Christianity and Islam. With statistics, knowledge becomes patterns as well as facts, tracing, for example, not only the publication of a given text, but the history of its publication in different cities and different languages.
In the past, what if questions were vigorously opposed by most historians and scholars in general? This was in no small part because our tools for reconstruction and simulation were so limited that they precluded any serious models for imitating reality. The advent of advanced graphics packages, virtual reality, autostereoscopic displays, and holography is changing all this. Those at the frontiers of the Italian National Research Council are exploring how fully realistic virtual reality models of ancient Pompeii could be used to test different theories concerning the economics of the time. Simulating storms is becoming a serious preoccupation for scientists. Knowledge is not only about what is and was; knowledge extends to what could be and could have been. Knowing how to define a serious or a useful simulation will increasingly become one of the challenges of education.
As a first step toward understanding these changes, countries have begun making inventories and lists of available educational materials. In Canada, for example, Schoolnet is providing a national framework for access to educational sites. Other countries are developing their own versions of Schoolnet. For the G7 countries, some of the key sites are listed below:
In some countries, such as France, this process is proceeding in a centralized manner at a national level. In other countries, such as Italy, there is a regional approach, with some regions much more advanced than others. Each region tends to have its own interface. Ideally, a standard interface such as that provided by SUMS might be translated into the major languages and serve as a common interface for all these countries. More is needed than a common interface. We need new approaches to classing, ordering, searching, and learning.
Most of the early search engines saw no need for classing objects. The assumption was that one merely typed in a random subject and found whatever happened to exist on the Net concerning that subject. Once one found something of interest one could then make a bookmark. This was a sensible strategy, except that once one had a few hundred bookmarks, one tended to get lost.
Another solution is to shift attention from the results to the criteria for finding results, i.e., to create a list of terms in which one is particularly interested such that one has at the outset a set of cubbyholes to arrange the titles that one finds. These lists can be arranged both alphabetically (as a flat file) and hierarchically (as a tree file), which amounts to creating a personal classification system.
In more complex cases, another solution is to use the fields in one's database as the basic set of terms with which to search the Internet. These then become a set of hooks for catching titles. A fourth alternative is to consult a subject list produced by librarians as a means of acquiring a set of terms one wishes to use in searches.
Librarians, who are experts in dealing with large amounts of information, have long recognized the need for classing in the sense of creating classification systems. At a very practical level, classification schemes help determine where books appear on shelves. By consulting the classification number of a given book, we immediately have a bibliography of other books on that subject in this library. This classification number can then be applied to a series of libraries, either randomly or networked in a system such as the Research Libraries Group (RLG) and the On-Line Computer Company (OLCC).
By consulting the terms clustered around this particular classification number, we have a bibliography of related books in the field. In a traditional library, this is called "browsing," that subtle art whereby we often discover that the book that truly interests us is lying on the shelf just a few books from the title with which we originally began. The advent of the Z39.50 protocol and networked library catalogs on the Internet means that we can now do electronic browsing: choose a topic with a given classification number and then study the titles pertaining to it. This is possible today in cases such as the National Library Catalogue of Norway, which includes classification numbers as one of its search parameters. Libraries such as Göttingen have put their old and new systematic classifications online but have not yet linked these to their corresponding titles in the online version.
Electronic browsing allows other possibilities. A particular subject has its given number in one classification system. It will have different numbers in other classification schemes. Each classification scheme is like a method for creating mental cubbyholes. These differ from culture to culture, so multiple classification schemes are ways into the mental cubbyholes of different cultures and, in the case of a particular subject, they reveal the different contexts or constellations within which this book can be placed. Thus, one subject leads to a series of related subjects.
Taken together, these six approaches represent a spectrum of strategies from a completely random search to one using multiple classification systems:
This spectrum is best seen as a set of complementary rather than competing strategies. In some cases, a random search may be sufficient. In other cases, one may wish progressively to move from a personal classification to a combination of multiple formal classification systems. For this to be effective. it is important to have a set of authority files, so that the terms used in a personal system are effectively a subset of those used in major library classification systems such as Library of Congress and Dewey.
It is also essential that the items for which one is searching have been carefully classed ahead of time. In the case of libraries this is almost always true . In the case of the Internet, the opposite is true: Almost everything has not yet been classed.
For material to be searched it must be classed or tagged. This applies particularly to the addresses of materials on the Internet. A series of steps are required. Step one, which was the focus of a recent Schoolnet contract, identified some of the basic addresses using an object-oriented approach to classing. Step two will class all the main addresses for the Canadian context, using a template-based software, and will put these titles online. Step three will license the software to schools across the country such that students and teachers can further class materials they find within these general headings. Many of these references, once they have been checked by the teachers, can be added by students to the national database. In this way the educational system will become a cooperative process in building up its own cumulative list of references. This approach is extendible to all levels of education and training. Thus, students training to become radio engineers will have as their assignment to find, class, and add materials to the national database on radio engineering.
At the higher levels of education, students have always done something like this every time they created footnotes and bibliographies for their essays and term papers, the fundamental difference being that these efforts were inevitably noncumulative. Now, every time a student finds a new reference he or she can add it to a cumulative list. This principle applies not only to Internet addresses but also to content. As the contents of major libraries become available online, the importance of this approach will become ever more fundamental. Building the classing process into the process of learning will allow students to contribute to the organization of new knowledge.
The first two classification systems will be organized in schools. Such classing retains certain ambiguities. For instance, in a system such as the Library of Congress, perspective occurs under architecture, art, mathematics, and technology. In a simplified classification it is likely to occur only once, giving users titles concerning all of these meanings, although they may well be interested in only one of them. If the terms developed in classing methods 1 and 2 are linked with the categories in classing methods 3-5, produced by libraries and major institutions, much greater precision in search strategies can be attained. The challenge lies in linking these systematically. This should be done in conjunction with a research center such as FIS.
Once the basic materials concerning the school system have been entered, the scope of the enterprise can be expanded to include a complete corpus of knowledge that can be accessed by searches and metasearches. Existing search engines typically assume that a simple alphabetical search of all materials is sufficient. This results in too many hits, most items of which are of no interest to the user. Some of the most popular lists of search engines are provided below (an analysis of their relative merits and the specific role of the SUMS engine has been made by Andrew McCutcheon in Appendix 2):
The user may be an elementary school teacher, a high school OAC level teacher, a university professor, or a postdoctoral research scholar. All four may ask for mathematics, but they want something very different. For such a search engine to function, it is important to identify the level of education of the site in question. In this way, when the users identify the level of education that interests them, they will receive only the titles and materials pertinent for that level. The search engine must be distributed; it must use multiple protocols. It is not a simple hierarchy but uses a multiple approach via meters. It assumes nonlinear, multiple cataloging in order to provide an interface standard that is public. It must be HTML-compliant and work toward SGML, use the Z39.50 protocol, and rely on SQL for its query interfaces. These protocols will change as software and hardware evolve.
Basic searching will entail only personal classification systems listed in figure 4. Metasearching will entail standard subject lists and classification systems in isolation and in combination. Metasearching will use these established systems to go from broader term to narrower terms or the reverse. It will use the broader terms of universal systems such as Dewey (cf. http://www.oclc.org/oclc/fp/mrdui/mrdui.htm) or Library of Congress (cf. http://www.w3.org/vl/LibraryOfCongress.html) or find the corresponding term in specialized bibliographies relating to a given field, e.g., the Mathematics Subject Classification of the American Mathematical Association. Such an approach will permit advanced browsing and triangulation of concepts. Metasearching will also entail specific strategies for specialized fields. For instance, botany will involve classifications in the tradition of Linnaeus with different phyla, types of genus, species, etc., whereas chemistry will have periodic tables as entry points for searches.
Metadata is a rapidly evolving field. A recent conference on the subject offers an important survey of some of the major players . Their related site  offers a useful list of other projects in the United States. Some of the key players are listed below. Such efforts are part of a much larger move toward international standards, which is beyond the scope of this paper but which includes a whole spectrum of organizations, from the International Organization for Standardization and national standards bodies to consortia, major companies, and purely proprietary interests.
In addition to classing and searching for materials, a tool is needed for ordering what one has found. The same lists of choices used in SUMS for searching for new materials can be used to organize the materials one has found. In the ordering mode, the choices lead to lists of what one has found on a given topic. Such an approach marks a quantum advance over the present use of bookmarks on the Internet. Bookmarks simply list everything one has found helter-skelter in a list, which soon becomes confusing if one is collecting a large number of references. The SUMS approach links these with basic topics and then with one's personal classification scheme so that one can readily find them again.
In the SUMS approach, each of the categories used for classing, searching, and ordering knowledge can contribute to learning. Students define their level of education and choose a topic that interests them, and the system indicates what courses are available locally, regionally, nationally, and, eventually, internationally.
This approach may seem much more plodding than the present trend to give persons access to everything at once on the Internet, but it brings the enormous advantage of a framework for providing students and teachers with the materials appropriate to their level of education, while leaving entirely open to them the possibility of going to the next level if they are convinced that they have achieved everything at their present level. Because each level will be linked with specific tests, reviews, and evaluations, students will have opportunities to establish objectively what is involved at a given level of education and will not be able to blame their performance or lack thereof on the standards of a given teacher. To achieve this will require help in coordinating the four requirements outlined above.
Education and training are crucial for all nations. A systematic, multifunctional software for classing, searching, ordering, and learning will offer a common interface to disparately organized materials and offer new means of establishing common standards both nationally and internationally. This has major implications for the dissemination of educational materials across large geographical expanses, in individual countries such as Canada and around the world. It will offer new points of entry into global information structures. SUMS offers a model for such an interface.
The System for Universal Media Searching (SUMS, Copyright 1992-1996) approaches knowledge in terms of the questions who, what, where, when, how, why in combination with ten basic choices:
These basic choices break down into hundreds of lists, giving thousands of choices in combination with the questions. One of the rationales for these handy lists of choices was that they can be ported to a remote device. It is instructive to note that such an idea is being adopted by Schoolnet in its Media Awareness Project, where there is a virtual site controller (http://www.screen.com/mnet/eng/). It is equally instructive to note that the basic questions in SUMS have their equivalents in the quickfind section of that same project--e.g., What (Topic, Keywords), Where (Country), How (Medium)--as well as in various meters such Level of Education (Target, cf. http://184.108.40.206/qf_quickfind.db). This appendix provides a breakdown of only one of the 10 basic choices: learning.
Learning is closely connected with formal education but is a more fundamental concept because it includes informal learning as well. It includes five basic elements:
A student may write an exam and obtain a mark of 95%. Yet this may represent only 40% of the text, 10% of the course, 2% of the curriculum, and .002% of the corpus of knowledge in that field. One of the challenges of learning is to contextualize the achievements of students (and teachers), allowing them to understand these links in such a way that they are not discouraged. In concrete terms, this requires a systematic linking of facts and modules in each level of the system as listed below:
Goals list all the benchmarks, curriculum documents, and equivalents in other provinces and other countries:
Curriculum entails the standard items one wishes to achieve in a given course. It includes information about
Curriculum outcomes can in turn be divided into two categories:
Learning methods include all the formal methods of learning:
Different kinds of courses are identified:
At the university level, a course will have a series of choices:
Courses are but one way of learning. A second method is collaborative learning, which leads to a series of further options:
Very much needed are methods that will help us assess which kind of learning is more effective using collaboration and which kind of learning is better pursued on an individual basis.
Simulation and training are other methods of learning applicable in some cases. When students have learned, they are tested in various ways:
As part of the greater move toward accountability, these learning methods include four stages for checking progress: evaluation, review, tracking, and reports. Evaluation breaks down into at least three choices:
Of these, the first breaks down into at least another five categories:
Access to this information will be limited to the appropriate level in each case. The same approach applies to reviews:
It also applies to reports:
These interim and final reports will have as their audience the whole spectrum of the system:
These reports will be in the form of grades:
At the level of theory, a great deal of effort has been dedicated to identifying kinds of learning. Four basic kinds are generally agreed upon:
Precisely what these mean and how they subdivide is a matter of debate. For example, the Ontario guidelines divide cognitive in one way, while theorists such as Bouchard do so differently:
SUMS is not concerned with taking sides or pretending to know which of these alternatives is best. If the province of Ontario uses one alternative and wishes to use that exclusively, the SUMS framework allows this. In time, an international version of SUMS will present users with a list of various versions such that scholars, educators, and students can explore the ways in which they relate and the extent to which they are equivalent. This same principle applies to the other three kinds of learning:
In the future, these features of kinds of learning will be related directly back to the curriculum, courses, texts, exams, etc., such that there is a greater contextualization of knowledge. It will then be possible for a teacher or student to start from some learning skill (such as focus or perceptual performance), determine what things in the curriculum and the course are directed to those goals, and see precisely which courses, texts, and tests exist for those skills. Alternatively, a student or teacher could begin with some item in their course of study and trace back to what skills this item or set of exercises is meant to develop, i.e., the reasons why it is being learned.
At present, most schools give students some basic psychological tests, the results of which are usually only consulted if the student becomes a so-called problem child, in which case the school psychologist uses the results in trying to help the child. There are various kinds of learner:
The basic tests break down into further categories:
In addition to the basic tests, there are others, such as those of Jung, which are the basis of the Myers-Briggs tests:
These two sets of four combine to produce 16 categories (not listed here). Other systems include the seven kinds of intelligence identified by Sternberg:
These too will become part of the system. In the future there will be links between these categories and the different goals and contents of curricula such that these psychological types serve as filters of access to information. These filters can function in different ways. People with, say, a geographic intelligence will be presented geographical materials as best suited to them. They will also be given a particular approach to materials entailing other kinds of intelligence. The precise details of this content are not the concern of SUMS, which focuses on a systematic framework for gaining access to the results.
Much of scholarship is about making links. Each new medium allows some of those links to be made more easily. At the same time, each new medium poses the challenge of creating many new links that were previously impossible. The advent of computers means there will be whole generations working to create systematic connections. SUMS offers a comprehensive framework for dealing with the results.
Some will object that all this is much too complex for the everyday needs of schools and that a much simpler approach would do fine. Here it bears remembering that there is a basic and intermediate level in addition to the complexities of advanced navigation just outlined. A simpler approach is given in the examples below.
At a basic level, a person begins from the list of SCOPE WHY and chooses Education, Learning. He or she would then be led through the basic questions of why, how, when, where, what, and who.
If the user chooses 5, the list looks like this:
The user is then asked to define how:
If he or she chooses 3, the following choices appear:
The user is now asked to define what:
Sometimes the user wants to ask something directly. Having stated that the scope of the search is education, the user presses When Events, at which time the system offers a list of education-related events:
Having chosen one of these alternatives, the user is given the appropriate information. Alternatively, he or she may be concerned with finding information about individual persons. The user presses Who and is given an appropriate list of categories:
The first of these leads to a series of new choices:
The second of these in the previous list leads to a series of new choices:
The third of these leads to a series of new choices:
The fourth of these in the previous list leads to:
Some aspects of learning also come under the scope of other basic choices. For instance, if one is interested in publications, one goes to LEVELS Titles Books, which category then adjusts to one's educational scope and the level at which one is studying. An advanced list might, for instance, look like this:
If one chooses catalogs, the results will adjust for the area in question. For example, if one is in Toronto and chooses regional catalogs, the list will be something like to the following:
Other materials pertinent to education and learning are to be found under MEDIA Internet WWW. In a national context in Canada, that list might include the following:
Under the heading MEDIA Television, one will find other choices:
This approach presents users with the relevant information based on their needs as defined by the series of questions--why, how, etc.--rather than simply trying to find all information on a given topic.
SUMS began as a project at the University of Toronto, developing conceptual navigation techniques. The most recent product has been the SUMS World Wide Web search engine. Traditional search engines are faced with a variety of problems. By combining the best elements of catalogs and indexing systems with the concepts of object-oriented databases, SUMS hopes to eliminate these problems. The goal of SUMS is to create a search engine that uses a human approach to find information.
A catalog consists of hierarchical categories into which documents are classified. Catalog systems like Yahoo!, Lycos A2Z, and The Whole Internet Catalog narrow searches very quickly, and while most of the documents found are on topic, such catalogs are not without problems. Yahoo! employs 20 people to search for new documents and categorize them. These employees are not experts on the information they catalog, and documents are sometimes placed in the wrong category. As the Internet continues to evolve, a wide variety of information will be available. The categories contained in the catalog will have to be changed, and documents will have to be shuffled around. To add to the amount of maintenance a catalog requires, Web content is increasing almost exponentially. Eventually, 20 employees will be unable to keep up with the rate of content creation and more employees will have to be hired. When more employees are hired, the catalog becomes less consistent, as each employee has a slightly different idea of where a document should be placed. In addition to dealing with new documents, catalogs must also deal with changed documents. Documents are constantly moving, and their contents are changing. Unless the catalog is notified, these changes can go without being noticed. While a catalog is a good interim solution, as the amount of content on the Web grows, a catalog will fall short of being able to encompass the entire Web.
Indexing systems like Inktomi, HotBot, Lycos, The OpenText Index, Webcrawler, Infoseek, and WWW Worm store the addresses of World Wide Web documents and a list of keywords that appear in those documents. The keywords are retrieved by programs that roam the Web, jumping from site to site. When a keyword is searched for, a list of documents that contain that keyword is returned. Because no human work is required in the actual storing of the documents, an index doesn't have to deal with many of the problems faced by a catalog. Growth is not a problem, because an index only needs to purchase faster computers with more storage space to keep up. An index is also bias free, because an index works purely with facts--either a document contains a keyword or it does not. Despite these advantages, indexing systems tend to return many off-topic documents, because they blindly search for keywords. A keyword taken out of context could have several meanings on different documents. An indexing system tends to get out of date very quickly, referencing documents that no longer exist or no longer contain the same information that they did when the document was added to the index. This is especially true when World Wide Web content grows and bandwidth does not grow with it, because the programs that build the index take more time to complete the job. In an attempt to make an index context sensitive, a technique called word proximity is used, in which keywords are stored in the order that they appear on the document. The index is then storing almost exact copies other people's documents, which could lead to copyright problems.
Hybrid systems such as Excite and AltaVista that combine the best elements of a catalog and an index would theoretically solve the problems of both. Documents are stored in an index, with similar documents being stored together. The goal is to create a kind of classification system that modifies itself as the type of content on the Web changes. A group of keywords entered to the search is supposed to represent a concept, and groups of documents that match that concept are returned. AltaVista make assertions about documents based on the images used in them, their address, and links they contain, and then catalogs them appropriately. In practice, these systems fall short of their lofty goals. The power of a catalog is to allow people to narrow a search by walking through a tree of named categories. The idea of building a list of documents grouped by concept is less impressive than it seems because the concepts are not visible. The search engine is guessing at the concept the user is trying to search for. The users cannot specify exactly what they are looking for in the same way they could if they were using a catalog. Such a system still faces the problem of out-of-date documents, because they also use programs to compile their index. Irrelevant information is less of a problem in a hybrid system than in an index, but it remains a challenge.
SUMS combines the best elements of a catalog, an index, and an object-oriented database. Instead of entering keywords, a user performs object searches. An object is a group of data that represents a search concept more accurately than a keyword. In addition to containing data, objects are created and stored in parent/child relationships. An easy way to visualize this is to use a hierarchical list:
Each item in the hierarchical list is a child of the nonindented item directly above it. "Fish" and "Dog" are children of "Animal" which is in turn a child of "Life." "Bush" and "Tree" are children of "Plant" which is also a child of "Life."
Recall that objects are groups of data. The data associated with an object are called its properties. Children inherit the properties of their parents. Here are some sample properties for the objects defined in the above example.
|Plant||Name, Climate Preference|
|Bush||Name, Climate Preference|
|Tree||Name, Climate Preference, Type of Tree|
|Animal||Name, Food Preference|
|Fish||Name, Food Preference, Spawning Time|
|Dog||Name, Food Preference, Breed|
Objects become increasingly specific as they reach a deeper level in the list. This property of the object list allows a user quickly to make a search very specific. Note that an object's properties do not necessarily become more specific. The "Bush" object has no different properties than its parent object, "Plant." It is in the list to be used as a category to differentiate between types of plants and to narrow the scope of searches.
The SUMS object list is a hierarchical list of objects. To search for information using the SUMS search engine, these steps are followed:
The object list uses the broader/narrower terms property of a catalog through the parent/child relationship system, but the object list has several advantages over a standard catalog:
Because content providers catalog their own information, and aliases allow multiple classification systems to be used, SUMS is almost as objective as an index while being more functional:
By working hand in hand with content providers instead of assimilating their data, SUMS effectively solves the problems faced by indexing systems and catalogs.
In the last month, several features have been added:
SUMS plans to take the following steps in the next eight months:
Immediately, SUMS could be used to index and catalog the School Net Web site. This could replace the existing search function. After that, SUMS could be distributed to and used by School Net's partners, allowing the search function to extend beyond the School Net site to its partners' sites.