See section one below for more information about this document and how to use it.
Existing information tools do not provide a way to share users' experience on information resources. Imagine a user wants to find out resources of a certain criteria. The user will try various information tools and search tools to locate the resources. After some trials and possibly with the aid of human experts, the user will locate the necessary resources. Another user looking for similar resources will need to repeat the same sequence of trials.
In this paper, we describe NetAgent system, where users interact with distributed agents before actual searches. Each agent records a user's successful search patterns and keeps them for future usage. NetAgent allows users to search for information in context. Instead of a single indexing system, NetAgent deploys multiple topic-specific indexing agents. A community of users shares a common context over the global information space by interacting with common agents. A user can find a topic-specific indexing agent suitable for the user's community by following a path that agents suggest. By doing this, a user can avoid many of the search results that are useless in the context of the user's community.
The key idea of this paper is to enhance current indexing techniques by achieving the followings:
This paper is organized as follows. Section 2 presents the design of NetAgent, and Section 3 describes its implementation. Section 4 analyzes how NetAgent deals with the scalability problems. Finally, Section 5 concludes this paper.
The idea of deploying agents for collaborative indexing is introduced by the analogy of the real world. In general, when people have a question on a topic and cannot find the answer, they look for experts in that area and ask questions to them.
In the networked electronic world, users looking for resources on a particular topic would send messages to relevant mailing lists or news groups asking if anyone knows where to look. This means that significant parts of search information relies on users' experience.
In the NetAgent model, a user narrows down the resource domain by interacting with agents. Once a resource domain is selected, actual searches are done over the resources. At each stage of the interactions, the user decides which agents will be selected for further interactions.
In our model, an information space is composed of two levels (see Figure 1):
Figure 1. Two-level indexing model
A user can issue a query to search over the global information space. When a user issues a query to an agent, it responds with a list of other useful agents. The user can repeatedly send queries to the resulting agents until he or she is satisfied with the result.
Agents communicate by message passing. An agent can send a message to other agent only when there is an association with the target agent. An association is a tuple of a search key, a target agent, and a new query for the target agent. Agents determine where to propagate a query message by selecting associations.
The NetAgent design offers three principal features for resource domain selection: (1) registration of a new association, (2) trading of associations using weight values, and (3) feedback for changing the weight values of associations. There are three types of agents according to their roles; personal agents, trading agents, and information agents. A personal agent interacts with a human user as well as other agents. A user can send a query to the network of agents by his or her personal agent. A query from a personal agent is propagated to a set of information agents via trading agents. A trading agent suggests which agents to follow for a given topic. The role of a trading agent is to support collaborative indexing by keeping the user-registered index data on a topic. An information agent is an agent that can access resources. An information agent performs resource retrieval by converting agent messages into the protocol of an information tool. As information tools are passive and can not generate agent messages, information agents retrieve resources actively using tools, and reply back to other agents.
A message between agents is in Agent Access Format (AAF). The AAF is designed to be compliant to the Uniform Resource Locator format [4] to allow future information tools to access NetAgent. The syntax of AAF is:
x-na://Host/Agent?Expr1&Expr2;&...&Expr; n
x-na is a temporally assigned name of URL scheme for NetAgent. $host$ is a fully qualified domain name [9] of a network host where the agent resides. Agent is a unique agent name in a host. Expr is a simple expression composed of attribute name, operator and attribute value. The AND operator is assumed for all expressions in a message. Users can give multiple queries instead of using OR operators. Possible attribute names are shown in Table 1.
A trading agent may change the input expression and return it as an association. For example, if a ``multimedia" agent gets an expression containing a term ``multimedia", it will remove the term because it is no longer necessary.
Table 1. Attribute names in AAF expression
Name Type Description ------- -------- ------------------------------------------------- subject string Title or subject of a resource area string User-given area of a resource content string Partial match for a string in a resource content date yy.mm.dd Date appearing in a resource author string The creator of a resource source string Source tool (e.g. nntp, gopher, wais, mail, etc) filetype string File type (e.g. text, postscript, gif, rtf, etc) filename string File name for a resource filesize integer Size of a resource in bytes filedate yy.mm.dd Date of the file for a resource path string Resource domain name where a resource is located
Among the attributes, area is used to give a hint to information agents about finding resource domains. If area is not given, subject attributes are examined instead. Following is an example query expression for ``find all resources on MPEG which is created by an author Kim"
subject=``multimedia" & subject=``mpeg" & author=``Kim"
In addition to the expressions from the user, more parameters are included by agents in the message. These parameters are hidden from the users. Message types and their parameters are listed in Table 2.
Table 2. Message types and parameters
Message Type Parameters Description ------------ ------------------------------ ---------------------------------- QUERY community access-path Request for summary information RETRIEVE community access-path local-id Request for resource retrieve FEEDBACK community access-path Request for changing weight values ASSOCIATION hit-count total-count owner Reply on trading information REPORT subject date author local-id Reply on summary information RESULT message Reply on the content of a resource DEBUG message Debugging messages ERROR message Error messages
Figure 2. Three-phase query processing
A topic of an agent is the list of keywords the agent is supposed to index. The topic mechanism prevents users from registering irrelevant keywords to an agent. Narrowing down an agent's interest to a single topic makes topic-specific indexing more effective. For example, for a ``multimedia'' agent, a user can register a new association only if the association is about pre-defined keywords the agent permits (e.g. multimedia mail, CD-ROM, video, etc).
A personal agent has a community value and records it in all outgoing queries so that trading with community match will happen. A personal agent in an organization will commonly refer to an organization-specific trading agent, and will provide shared feedback to the agent. Thus the organization-specific trading agent (e.g. a laboratory agent) will have good knowledge on the organization members' preference.
A user who wants to be attached to more than one community can register new associations referring other organization's agents, although he or she can not offer feedbacks to those agents.
We chose domain names of the Domain Name System [9] to define the community. This has the benefit of simplicity. Another design alternative is to use a group of user names, but it is regarded to be of too fine granularity. An agent will accept a FEEDBACK message only if the agent's community is a superset of the message's community.
Each trading agent can be regarded as an indexing server on a specific area. A general agent is a special kind of trading agent that does not limit its domain to certain keywords. Most organization-level agents would refer to the general agent for all unclassified keys. The role of a general agent is an operational issue, rather than a design issue. The registration on general agents would be performed intellectually by subject experts, or by automatic indexing methods studied in information retrieval area [12].
For example, if a weight value of an association is 50/100, it means that the agent suggested the association 100 times to users, but it was actually referred 50 times for retrieving resources. The total count is incremented by a ASSOCIATION message, and the hit count is incremented by a FEEDBACK message.
A personal agent copies its community to all messages it generates. A trading agent accepts a feedback only if the community of the message belongs to the same community that the trading agent serves.
For example, suppose a trading agent of community ``mit.edu" suggests to a personal agent of community ``kaist.ac.kr" to follow an association. If the personal agent retrieves a resource using the suggested association, a feedback message will be sent to the trading agent. But the feedback will be ignored since the community of the feedback message (``kaist.ac.kr") does not match the community of the trading agent (``mit.edu").
Figure 3 shows a registry user interface. In this example, a newsgroup is registered to the ``multimedia" trading agent.
Figure 3. The registry in a Mosaic window
Figure 4. A personal agent in a Mosaic window
An FTP agent was implemented using Glimpse (Global Implicit Search) [8]. Glimpse is an indexing and query system that allows users to search through lots of files in many directories by building a very small index (2-5% of the text). The FTP agent translates the given agent query into Glimpse search commands, and generates agent summary reports from Glimpse search output.
A news agent converts an agent query message into NNTP operations. Summary information is extracted from USENET news headers. Some file-related attributes like filename and filedate are ignored by a news agent.
A WAIS agent converts the message of an agent into the parameters of waisq, a WAIS client program. The output of waisq is translated back to agent summary reports.
Information agents are designed to make use of information inherent to information tools as much as possible for efficient searches. For example, for a given topic, an FTP agent tries file name matches first. If the matched file is a text file, then it executes full text searches. A news agent gathers information from USENET news headers such as subject and keyword for efficient searches. Among the news headers, news article fields of From, Organization, Newsgroups, Subject and Date are used for match from the given query. From and Organization are used for matching the author. The WWW agent would navigate the hypermedia space by following a limited number of links from the starting node for full text searches.
The granularity of indexing is different for each information agent. Among the attributes in the query expression, only a subset of attributes will be supported by an information tool. For example, for a given topic, an FTP agent can try all attribute matches except $author$, since FTP agent cannot extract author information from an unstructured document.
Although a query does not specify the resource domain, the information agents should be able to suggest some domains. Information agents usually keep seed index data for this purpose. For example, a news agent tries to find candidate newsgroups by using a news system file that describes USENET newsgroups (e.g. /usr/lib/news/newsgroups in most Unix news server systems). Since the newsgroups file includes keywords of all news groups, it provides a good starting point for domain selection. Similarly, WAIS information agent refers to the WAIS directory-of-servers source to find out default starting points of a WAIS search.
topic: multi media hyper virtual community: kaist.ac.kr register: krnic.net default 193 565 taeha x-na://news.kaist.ac.kr/NEWS-a?path=comp.multimedia default 168 371 taeha x-na://krnic.net/WAIS-a?path=comp.multi.src default 112 439 taeha x-na://cosmos.kaist.ac.kr/FTP-a?path=/pub/mmc default 132 604 taeha x-na://krnic.net/WAIS-a?path=Digital-All.src multimedia mail 52 112 taeha x-na://news.kaist.ac.kr/NEWS-a?path=comp.mail.multi-media virtual reality 54 106 taeha x-na://news.kaist.ac.kr/NEWS-a?path=sci.virtual-worlds hypertext 53 132 taeha hypermedia hypermedia 65 156 taeha x-na://krnic.net/WAIS-a?path=SIGHyper.src hypermedia 58 129 taeha x-na://news.kaist.ac.kr/NEWS-a?path=alt.hypertext
Figure 5 is an example of a configuration file of a trading agent. Each entry represents an association by the key, hit count, total count, owner, and the AAF of the destination agent. In this example, hit count and total count are initially set to 50 and 100 respectively. If no matching associations are found for a given query, default associations are used only. A default association have a small hit count as its initial value to make it appear in a low priority in association reports.
In addition to the explicit registration by users, the owner of an agent can modify the agent configuration file manually to add heuristics to the agent. A trading agent loads its associations when it becomes active, and saves it back to the file when the trading is finished.
Figure 6. A sample interagent communications at run-time
Figure 6 represents a message flow among NetAgent system components. Each host invokes a process called host server for an incoming query. A host server distributes messages to agents in the host. Following is a sample scenario.
Figure 7. An example of a resource domain selection window
Figure 8: An example of selecting an association
Figure 9. An example of summary information on a News source
NetAgent provides navigate-then-search instead of blind search. By navigating among trading agents, user can find a limited set of resource domains for a topic and perform effective search on the selected resource domains. Because users can get response from selected trading agents that has index data on a topic, they can avoid the overhead on information filtering.
The use of community also filters out many irrelevant data. Community-specific trading agents isolates different set of index data although they share same keywords.
In direct information distribution, users usually send a message to a common interested group identified by mailing lists or news groups. The problem of direct information distribution is that reaching to the right set of users is not always possible.
Another problem is that individual users have to be aware of the changes in the global information space, or have to configure an information tool manually to keep his or her personal view up-to-date.
In indirect information distribution, the changes are done in the global information space without sending notice to users. The problem is that only a small subset of the entire related associations will be affected by the manual changes. Users would not notice the changes if they are accessing unaffected information spaces.
In NetAgent approach, trading agents guide a user to the relevant resource domains. An active user will register a new information as an association to a topic-specific trading agent. Other casual users will benefit by the new registration from the active user. If a new association is determined to be useful for a trading agent by the feedback of users, the association will be broadly used by trading agents. Thus the users don't have the overhead of direct information distribution, but still be able to find out new resource domains by active users' contributions.
NetAgent approach is a combination of operation mapping and data mapping, that we call context mapping NetAgent does not try to fully translate operation and data formats. Instead, information agents provide summary information from various information tools first. Users can determine whether to access a resource by the summary of the resource, rather than having to switch to a specific information tool. Thus NetAgent users only need to specify the resource, and agents perform actions on behalf of the user to access various information tools.
Future research areas include the followings:
[1] F. Anklesaria, M. McCahill, P. Lindner, D. Johnson, D. Torrey and B. Alberti, ``The Internet Gopher Protocol (a distributed document search and retrieval protocol),'' Request For Comments 1436, 1993. [2] M. Adreessen, ``NCSA Mosaic Technical Summary,'' Technical Report, NCSA, University of Illinois at Urbana-Champaign, 1993. ftp://zaphod.ncsa.uiuc.edu/Web/mosaic-papers/mosaic.ps.Z [3] T. Berners-Lee T, R. Calliau, J. Groff and B. Pollerman, ``World-Wide Web: The Information Universe,'' Electronic Networking: Research, Applications and Policy, vol. 1, no. 2, pp~52-58, Westport CT, Meckler Publications, 1992. ftp://ftp.cern.ch/pub/www/doc/ENRAP_9202.ps [4] T. Berners-Lee, L. Masinter and M. McCahill, ``Uniform Resource Locators,'' Request For Comments 1738, 1994. [5] C.M. Bowman, P.B. Dansig and M.F. Schwartz, ``Research Problems for Scalable Internet Resource Discovery,'' Proc. INET'93 Conference (San Francisco), 1993. [6] B. Kahle and A. Medlar, ``An Information System for Corporate Users: Wide Area Information Servers,'' ConneXions - The Interoperability Report, vol. 5, no. 11, pp~2-9, 1991. ftp://think.com/wais/wais-corporate-paper.text [7] B. Kantor and P. Lapsley, ``Network News Transfer Protocol -- A Proposed Standard for the Stream-Based Transmission of News,'' Request For Comments 977, 1986. [8] U. Manber and S. Wu, ``GLIMPSE: A Tool to Search Through Entire File Systems,'' Proc. USENIX'94 Winter Conference (San Francisco, CA), pp~23-32, 1994. [9] P. Mockapetris, ``Domain Names Concepts and Facilities,'' Request For Comments 1034, 1987. [10] T. Park and K. Chon, ``Collaborative Indexing over Networked Information Resources by Distributed Agents,'' Distributed Systems Engineering Journal, pp. 362-374, January 1995. [11] J. Postel and J. Reynolds, ``File Transfer Protocol," RFC 959, October 1985. [12] G. Salton, Automatic Text Processing, pp~275-312, Addison-Wesley Publishing Company Inc., 1989. [13] L. Wall and R.L. Schwartz, Programming Perl, O'Reilly and Associates Inc., 1990.
Kilnam Chon is a professor at Computer Science Depart, KAIST. He has special interests in computer networking and distributed systems. He has been the chair of the Asia Pacific Networking Group and Korea Networking Council. He is also a co-chair of Coordinating Committee for Intercontinental Research Networking.