Leslie L. Daigle, Bunyip Information Systems, Inc.
Sima Newell, Department of Electrical Engineering, McGill University, Montreal, Canada
This paper describes work done to encapsulate mechanisms for locating thematically related Internet resources. Some relationships between information resources can be divined by their location (e.g., on a particular company's site) or object type (e.g., by file extensions); capturing semantic or content relationships between information resources relies heavily on encapsulation of human experience. Uniform resource agents (URAs) are introduced as a mechanism for encapsulating that experience, and mechanisms for locating these locators are discussed.
This paper describes more fully the concepts of building intelligent information systems in the Internet and preliminary work done on effectively describing agent capabilities.
The URA work was originally conceived as an extension to the family of uniform resource identifiers (URIs), the proposed uniform resource names (URNs), and uniform resource characteristics (URCs). The approach of formalizing the characteristics of an information task in a standardized object structure is seen as a means of identifying a class of resources and contributes to the level of abstraction with which users can refer to Internet resources. The term "agent" was chosen to reflect the degree of task delegation involved in performing this level of information location.
The evolution of Internet information systems has been characterized by building upon successive layers of encapsulated technologies. Machine address numbers were devised and then encapsulated in advertised machine names, which has allowed the evolution of the Domain Name System (DNS) [RFC1034, RFC1035]. Protocols were developed for accessing Internet resources of various descriptions, and then uniform mechanisms for specifying resource locations, standardized across protocol types, were developed (URLs) [RFC1738]. Each layer of Internet information primitives has served as the building blocks for the next level of abstraction and sophistication of information access, location, discovery and management.
The URA work described here is an experimental system designed to take another step in encapsulation. While TCP/IP protocols for routing, addressing, etc., have permitted the connection and accessibility of a plethora of information services on the Internet, these must yet be considered a diverse collection of heterogeneous resources. The World Wide Web effort is the most successful to date in attempting to knit these resources into a cohesive whole. However, the activity best supported by this structure is (human) browsing of these resources as documents. The uniform resource agent (URA) initiative explores the possibility of specifying an activity with the same kind of precision accorded to resource naming and identification.
A fully instantiated URA carries out a task delegated by an invoker (human or otherwise). The nature of the task is determined by the agent that the invoker instantiates; that originating object encapsulates some knowledge of existence of relevant Internet resources and information required access them. In this way, URAs can be used to allow invokers to carry out high-level Internet resource activities while insulating them from the details of Internet protocols, etc. Also, by formally specifying a high-level Internet activity in an agent, the activity can be repeated by the same or another invoker at a later date, the activity can be incorporated into a still higher-level activity, the agent object can be modified to carry out a related task, etc. The URA is a procedural representation of some information resource knowledge.
As a simple example, consider the client task of subscribing to a mailing list. There are many mechanisms for providing the user information necessary to complete a subscription. Currently, all applications that provide the ability to subscribe to mailing lists must contain net-aware code to carry out the task once the requisite personal data have been solicited from the user. Furthermore, any application program that embeds the ability to subscribe in its code necessarily limits the set of mailing lists to which a client can subscribe (i.e, to those types foreseen by the software's creators). If, instead, there is an agent to which this task can be delegated, all applications can make use of the agent, and that agent becomes responsible for carrying out the necessary interactions to complete the subscription. Furthermore, that agent may be a client to other agents which can supply particular information about how to subscribe to new types of mail servers, etc.
More detail describing the underlying philosophy of this particular approach can be found in [IIAW95].
A number of Internet-aware agent and transportable code systems have been released recently--Java [JAVA], TCL [TCL] and Safe-TCL, Telescript [TELE], and the TACOMA system [TACOMA], to name a few of them. Some of these systems, like Java, focus on providing mechanisms for creating and distributing (inter)active documents in the World Wide Web. Others, like TACOMA, have more general intentions of providing environments for mobile, interacting processes.
While each of these systems makes its individual contribution to solving the transportation, communication, and security issues normally associated with agent systems, they yield more objects that exist within the Internet information space. That is, while they may permit individual users to have a more sophisticated interaction with a particular information resource, they do not address the more general Internet problems of naming, identifying, locating resources, and locating the same or similar resources again at a later date. It is this set of problems that URAs specifically set out to address.
Nevertheless, URAs are an important contribution to the area of agents and the Internet. A reasonably complex URA might perform some post-processing on the results returned from the information resources--rather than simply returning hits from a search (formatted one way or another), the URA could use the information to reason about things. ("There are no new MIDI files since you last checked," etc).
In principle (not yet in the prototypical implementations we've developed), URAs can call on other URAs to carry out their specialized tasks. This yields a structure of agents working together to carry out a larger conceptual task than any of the agents handle themselves. The URAs are building blocks, if you will. This is not unlike the approach described in Marvin Minsky's "Society of Mind" [MINS85].
The URA technology itself provides a framework within which agents can be built; the degree of "intelligence" associated with each agent is at the discretion of its creator. URAs are as stand alone as the next agent--any agent requires some kind of environment in which to work, and it happens that the only existing environment for URAs is built into Silk. By design, however, URAs are transportable, modifiable, and self--contained.
It is important to note that the URA work does not define a language in which to code agent activities. The specifications, as they have appeared in Internet-Drafts, lay out the proposed virtual structure of such an object, indicating that URAs could in fact be written in a variety of different programming languages (anything from Tcl, which is used for the prototype work, to Java, to Pascal binaries). The framework description does outline two basic methods that must be defined and manageable for any URA: "get information" and "execute." The "get information" method causes the URA to give information about its virtual structure--such as a specification of the input required to carry out a given task. The "execute" method allows an external source to specify required input (if any) and invoke an instance of a URA. Thus, the URA framework provides the creator of a URA with the virtual object structure in which to specify an activity (the virtual structure includes components for specifying input requirements, target resources, experience data, activity script, response filter script, as well as general meta-data about the URA). A URA is a formalization of an activity.
The standard methods provide a mechanism for a client (human, software, or other agent) to learn about a particular URA it has never encountered before ("getinfo"), and invoke it ("execute"), without having to know anything about how the URA carries out its task. Thus, URAs become manageable "black boxes" for activities.
At the center of the URA architecture is the concept of a (persistent) specification of an activity. For purposes that should become clear as the expected usage of URAs is described in more detail, we choose to support this concept with the following requirements of the architecture:
To capture the necessary information for carrying out the type of Internet activity described in the introductory paragraphs of this document, six basic (virtual) components of a URA object have been identified. Any implementation of a URA type is expected to be able to conform to this structure within the context of a URAgency.
The six basic components of a URA object are:
Identification of the URA object, including a URA name, type and abstract, creator name, resources required by the URA, etc.
Specification of the data elements required to carry out the URA activity. For example, in the case of an Internet search for "people," this could include specification of fields for person name, organization, e-mail address, etc.
Specification of the URL/URNs to be accessed to carry out the activity. Note that, until URN's are in common use, the ability to tweak URLs will be necessary. A key issue for URAs is the ability to transport them and activate them far from the creator's originating site. This may have implications in terms of accessibility of resource sites. For example, a software search created in Canada will likely access a Canadian Archie server, and North American ftp sites. However, an invoker in Australia should not be obliged to edit the URA object in order to render it relevant in Australia. The creator, then, can use this section to specify the expected type of service, with variables for the parts that can be modified in context (e.g., the host name for an Archie server, or a mirror ftp site).
Specification of data elements that are not strictly involved in conversing with the targets in order to carry out the agent's activity. This space can be used to store information from one invocation of a URA instance to the next. This kind of information could include date of last execution, or URLs of resources located on a previous invocation of the agent.
If URAs were strictly data objects, specifying required data and URL/URNs would suffice to capture the essence of the composite net interaction. However, the variability of Internet resource accesses and the scope of what URAs could accomplish in the net environment seem to suggest the need to give the creator some means of organizing the instantiation of the component URL/URNs. Thus, the body of the URA should contain a scripting mechanism that minimally allows conditional instantiation of individual URL/URNs. These conditions could be based on which (content) data elements the user provided, or accessibility of one URL/URN, etc. It also provides a mechanism for suggesting scheduling of URL/URN instantiation.
The activity is specified by a script or program in a language specified by the URA type, or by the URA header information. All the required activation data, targets, and experience information are referenced by their specification names.
The main purpose of the ACTIVITY module is to specify the steps necessary to take the ACTIVATION DATA, contact the TARGETS, and collect responses from those services. The purpose of the RESPONSE FILTER module is to transform those responses into the result of the URA invocation. This transformation may be along the lines of reformatting some text, or it may be a more elaborate interpretation (e.g., providing a relevance rating for a retrieved HTML page).
The response filter is specified by a script or program in a language specified by the URA type, or by the URA header information. All the required activation data, targets, and experience information are referenced by their specification names.
Having introduced the required capabilities of the URAgency and virtual structure of URA objects, it is now time to elaborate on the tasks and interactions that are best supported by URAs.
URAs are constructed by identifying net-based resources of interest (targets) to carry out a particular task. The activation data component of a URA is the author's mechanism for specifying (to the invoker) the elements of information that are required for successful execution. An invoker creates an instance of a URA object by providing data that are consistent with, or fill in, this template. Such an instance encapsulates everything that the agent "needs to know" in order to contact the specified target(s), make a request of the resource ("get," or "search," etc) and return a result to the invoker. This encapsulation is a sophisticated identification of the task results.
For example, in the case of a mailing list subscription URA, the creator will identify the target URL for a resource that handles list subscription (e.g., a Hypertext Markup Language (HTML) form), and specify the data required by that resource (user name, user mail address, mailing list identifier, etc). When an invoker provides that information and instantiates the URA, the resulting object completely encapsulates all that is needed in order to subscribe the user--the subscription result is identified.
URAs are manipulated through the application of methods. This, in turn, is governed by the URAgency with which the invoker is interacting. However, because the virtual structure of URAs is represented consistently across URA types and URAgencies, a URAgency can act as one of the targets of a URA. Since methods can be applied to URAs remotely, URAs can act as invokers of URAs. This can yield a complex structure of task modules.
For example, a URA designed to carry out a generalized search of bookselling resources might make use of individual URAs tailored to each resource. Thus, the top-level URA becomes the orchestrating URA for access to a number of disparate resources, while being insulated from the minute details of accessing those resources.
The experimental work with URAs includes a prototype implementation of URA objects. These are written in the Tcl scripting language. The URAgency that was created to handle these URAs is part of the Silk Desktop Internet Resource Discovery tool. Silk provides a graphical user interface environment that allows the user to access and search for Internet information without having to know where to look or how to look. Silk presents a list of the available URAs to carry out these activities (e.g., "search for tech reports," "hotlist," etc). For each activity, the user is prompted for the activation data, and Silk's URAgency executes the URA. The Silk software also supports the creation and maintenance of URA object instances. Users can add new URAs by creating new Tcl scripts (per the guidelines in the "URA Writer's Guide," available with the Silk software. See [SILK]). The Silk graphical interface hides some of the mechanics of the underlying URAgency. A more directly accessible version of this URAgencywill become available.
This section of the paper describes some experimental work that is being carried out to identify mechanisms to locate useful URAs in the Internet environment.
Given a set of agents that perform Internet tasks, the next problem is to locate those that are of interest to a given user. Although users may have similar interests, the types of information that they will seek can differ significantly. Suppose, for example, that Jill is a user interested in Electromagnetic Interference (EMI) who has five years of experience working as an antenna designer. The type of information she is looking for is probably quite different from what Jack, an amateur radio operator, is interested in.
Similarly, if Jill has created a URA that seeks EMI-related information from the Internet, and then filters it to her liking, this URA will differ from that which Jack has built. The problem is thus threefold. First, a method must be devised to model users in some meaningful fashion. Second, there is a need to associate meta-data with an agent that is a descriptor of this agent. Finally, the information from the user model and the meta-data must be compared intelligently in order to match users with available agents that could be of interest to them.
The example of Jack and Jill illustrates that although they have used identical keywords ("EMI" and "antennas") to describe what they are looking for, these keywords do not encompass the reasons for their interests. These reasons are the source of the different information needs of the two users, and they stem from the background and experience of each user. Hence, a simple set of keywords does not provide enough information to find meaningful agents (and thus data); it is necessary to take into account the user's background and experience.
The model we have chosen to build a user "profile" is based on the standard one used in business: the curriculum vitae. This document typically summarizes a user's background by describing his or her education, work experience and other interests. Stereotypes [RICH89] can be drawn from the information in each category. For instance, one might assume that if Jill's work experience is as a designer, she is not very athletic. However, if one of her hobbies is skiing, this fact lowers the original belief of her nonathleticism. In effect, each of the three categories in turn contributes more specific information about the user.
Thus, conflicts between stereotypes may be resolved by assuming that information about work experience is more specific than that about education, and data about a user's hobbies are also more specific than that about his or her work experience. (What all this boils down to is that education contributes to the development of an individual; whereas someone's work experience and hobbies contribute to his or her individuality.) In addition, the greater the level of education, and the more work experience a user has, the more specific his or her interest must be in the corresponding areas.
When a user creates a URA to perform Internet tasks related to a specific area or topic, the analogous problem arises of describing the agent's behavior in a more meaningful manner than simply with a set of topic-related keywords. An agent does not have a "background," so how can it be described? The answer lies in noting that an agent is created by a user (who does have a characterizable background). Thus, the creator's own experiences may be a starting point for the agent's description.
Consider these four descriptors associated with a topic: its name, technical level, scope, and privacy level.
The technical level describes the depth of knowledge needed in the topic area in order to understand the results returned by the agent. (In our example, Jill's technical level in EMI would have been greater than Jack's.) Note that "technical" does not imply "scientific"--a paper discussing art history may also have a high technical level. The scope describes the breadth of the information sought: whether a URA will return specific details about a narrow topic, or about a more general area. The privacy level indicates whether the creator's knowledge of or interest in a topic may be disclosed to others, and thus if the URA may be shared in a public database. These descriptors are independent of one another. They form a basis set to describe the topic.
A URA's creator may explicitly decide the values of the four descriptors. If not, their initial values may be inferred from the creator's profile. In addition, many topics and subtopics could be associated with an agent. Each of these should have explicit or inferred initial values for the four descriptors.
Given a set of users with certain profile characteristics, and a set of agents with appropriate meta-information descriptors, it should then be possible to create an intelligent system to return a meaningful set of URAs that are of interest to any particular user. This is the topic of work currently in progress (watch [SILKURA] for more information as the work progresses). The key is that a finite set of descriptors is used to define the agents, and that a similar set of descriptors is associated with each topic of interest to a user. If these descriptors are carefully chosen, the problem of finding information of interest in the Internet may be more effectively resolved.
Although still in preliminary stages, this work has already evoked interest and shown promise in the area of providing mechanisms for building more advanced tools to interact with the Internet at a more sophisticated level than just browsing web pages. The experiments in locating locators should further elaborate the necessary mechanisms for creating useful structures from these agent collectives.
One of the major difficulties that has been faced in developing a collection of URAs is the brittleness induced by interacting with services that are geared primarily toward human users. Small changes in output formats (easily discernable by the human eye) can be entirely disruptive to a software client that must apply a parsing and interpretation mechanism based on placement of cues in the text. This problem is certainly not unique to URAs--any software acting upon results from such a service is affected. Perhaps there is the need for an evolution of "service entrances" to information servers on the Internet--mechanisms for getting "just the facts" from an information server. Of course, one way to provide such access is for the service provider to develop and distribute a URA that interacts with the service. When the service's interface changes, the service provider will be moved to update the URA that was built to access it reliably.
[IIAW95] Leslie L. Daigle, Peter Deutsch, "Agents for Internet Information Clients," CIKM'95 Intelligent Information Agents Workshop, December 1995. Available from <http://www.bunyip.com/products/silk/silktree/uratree/iiaw95.ps>
[JAVA] "The Java Language: A White Paper." Available from <http://java.sun.com/1.0alpha2/doc/overview/java/index.html>
[MINS85] Marvin Minsky, "The Society of Mind," Simon and Schuster, New York, 1985.
[RFC1034] P.V. Mockapetris, "Domain Names: Concepts and Facilities," RFC 1034, November 1987.
[RFC1035] P.V. Mockapetris, "Domain Names: Implementation and Specification," RFC 1035, November 1987.
[RFC1738] T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource Locators (URL)," RFC 1738, December 1994.
[RICH89] E. Rich, "Stereotypes and User Modeling," in User Models in Dialog Systems, Springer-Verlag, New York, 1989.
[SILK] Bunyip's Silk Project Homepage: <http://www.bunyip.com/products/silk/>
[SILKURA] Silk URA Information: <http://www.bunyip.com/products/silk/silktree/uraintro.html>
[TACOMA] D. Johansen, R. van Renesse, F.B. Schneider, "An Introduction to the TACOMA Distributed System," Technical Report 95-23, Department of Computer Science, University of Tromso, Norway, June 1995.
[TCL] J.K. Ousterhout, "Tcl and the Tk Toolkit," Addison Wesley, 1994.
[TELE] J.E. White, "Telescript Technology: The Foundation for the Electronic Marketplace," General Magic White Paper, General Magic Inc., 1994.
"The difference between Believing and Understanding Leslie is a Rational Explanation..." Vice President, Research Bunyip Information Systems--ThinkingCat (514) 875-8611 firstname.lastname@example.org