[Help] Last update at http://inet.nttam.com : Mon Aug 7 21:39:42 1995

Abstract -- Maintaining Link Consistency in Distributed Hyperwebs Application Technology Track
A1: Information Space Environments

[Table [Next]
[Paper [Paper
[Read Comments]


Maintaining Link Consistency in Distributed Hyperwebs

Kappe, Frank ( fkappe@iicm.tu-graz.ac.at)

Abstract

The problem is quite familiar to all web surfers: Every now and then you activate a WWW link, but the resource the link refers to cannot be fetched. This may either be a temporary problem of the network or the server, but it may also indicate that the resource has been permanently removed. Since the current WWW implementations rely on Uniform Resource Locators for accessing information, it may also mean that the resource has only been moved to a new location. It may also happen that a resource is eventually replaced by a different one under the same name (location).

The net effect of this is that a certain percentage of references are invalid. We may expect that this percentage will rise as time goes by since more and more documents become outdated and are eventually removed, services are shut down or moved to different servers, URLs get re-used, etc. Obviously, it would be desirable to have some support for automatically removing such dangling references to a resource that is deleted, or at least to inform the maintainers of those resources.

This paper presents a scalable architecture for automatic maintenance of referential integrity in large (thousands of servers) distributed information systems. A central feature of the proposed architecture is the p-flood algorithm, which is a scalable, robust, prioritizable, probabilistic server-server protocol for efficient distribution of update information to a large collection of servers.

Extensive simulations of p-flood show logarithmic response of propagation delay to the number of servers, almost no effect of spurious network errors on propagation delay and traffic, and remarkable robustness with respect to persistent server faults.

The total additional traffic generated by p-flood is negligible compared to current Internet traffic (for example, the messages generated to distribute 10,000 modifications of surface objects per day to 1,000 servers account for about 0.005% of current Internet traffic (estimated from NSFnet statistics)). Another nice property of flooding is that the traffic is evenly split among all servers, in other words the traffic per server does not depend on the number of servers (only on the number of updates).

The p-flood algorithm is now being implemented in the HyperG system, but may in principle also be implemented as an add-on for existing WWW and Gopher servers.