[Help] Last update at http://inet.nttam.com : Mon Aug 7 21:39:48 1995

Abstract -- Searching Internet Resources Using IP Multicast Application Technology Track
A5: Navigating the Web

[Previous] [Table [Next]
[Paper [Paper


Searching Internet Resources Using IP Multicast

Kashima, Hiroaki ( kashima@csce.kyushu-u.ac.jp)
Ishida, Yoshiki ( yoshiki@cc.kyushu-u.ac.jp)
Furukawa, Zengo ( zengo@ec.kyushu-u.ac.jp)
Ushijima, Kazuo ( ushijima@csce.kyushu-u.ac.jp)

Abstract

The Internet is in a flood of information. Recently you can provide many kinds of information with several information providing systems. On the other hand, difficulty of finding out required information is increasing constantly, because of a huge set of data which are provided by many people on the Internet.

Existing information retrieval systems are designed for searching static data. That is, the change of the data should be little or it is managed completely. On the other hand, information providing systems, such as World-Wide Web, make a information providing explosion on the Internet. these exploding data also change their structures and the contents dynamically. The searching systems on static data cannnot correspond to volumes and transitions of exploding data.

We named such kinds of information "super-dispersed information", which is un-search-able or may become un-search-able because of the overflow of searching systems. Super-dispersed information has these characteristics that cause the difficulty to find out the information. (1) the number of data is very large, (2) the data are widely dispersed on the Internet, (3) the mass of the data is increasing rapidly, and (4) the data always changes in structure and contents.

Multicast is suitable to search super-dispersed information. It isn't needed search structure and a user can access multi servers at a time.

Super-dispersed information forms "weak" groups. A weak group is to say the group that people in it know these information in some ways. Boundaries between groups are unclear. A multicast group corresponds the "weak" group. A user can specify the searching area with selecting the suitable group.

To find information, a user throws a request to the small and neighborhood group at first. Each server in the group receives the request, searches its data in the server and sends back the result. One selects a larger group and retry when one can't find what one needs.

Anonymous FTP is a good instance of the super-dispersed information. The number of anonymous FTP sites in the Internet become too numerous to be counted. We can use the "archie" system for finding a file. Because the archie is a centralized system, server process is heavy to the host and we cannot use the service if the server downs.

Therefore, it is useful to make a locating system for anonymous FTP file using the IP multicast. We have made that system and called it as "march" (the Multicast ARCHie).

March servers join the march group. The march group is a class D address. Basically a march server assigns one anonymous FTP on the same host and searches files in the anonymous FTP server. Of course a march server can be assigned some anonymous FTPs at the same time.

We send a request to the march group for finding filenames on many march servers at once. Because it is difficult for current IP multicast to construct a multicast group as a "weak" group, we must set the TTL of the request at a suitable value to simulate the "weak" group. At first, we send a request with small value of TTL and the request goes to local area. If the filename is not found, we set TTL larger and send again the request for wider area.

This method has the following problems.

(1)
If TTL is set larger and sent again, servers around the user will receive same request again. It is harmful to search same data in the server for same request over and over again. However, it will be avoidable if servers remember the client's address/port.
(2)
The current IP multicast has no reliability. No reliability of the IP multicast causes some losses of searching results or, at worst, all of searching results from some servers.
(3)
It is difficult for a beginner to set multicast TTL because the multicast TTL is the number of packet's hop counts. It is necessary to map from areas to TTL values.
In this paper, we describe that unstructured information retrieval systems are useful and that multicast is suitable to find super-dispersed information. And we introduce an implementation of "march" (the unstructured information locating system for anonymous FTP) and discuss the merit and the demerit of it.