Enhancing International World Wide Web Access in Mozambique Through the Use of Mirroring and Caching Proxies

J. J. Eksteen and J. P. L. Cloete

Division for Communication and Information Networking Technology,

CSIR, Pretoria, 0001, RSA

April 2, 1997


Access to the WWW from Mozambique is severely restricted by the extremely limited internet bandwidth between Mozambique and the rest of the Internet. To alleviate this problem, a dual caching/mirroring mechanism was implemented between the site of the University of Eduardo Mondlane in Maputo, Mozambique and the CSIR site in Pretoria, South Africa.

1. Introduction

LABEL: section:Introduction]The Centro de Informática da Universidade Eduardo Mondlane (CIUEM), Maputo, Mozambique would like to provide improved access to International World Wide Web (WWW) sites from within Mozambique. Due to the severely limited bandwidth (9.6 Kbps connection with compression.) between Mozambique and the rest of the world, as well as the high demands placed on this link (See Figure 1[REF: figure:bandwith_use]) a method must be found to maximize utilization of the available bandwidth.

Figure 1: A typical example of the monthly utilization of the international link from CIEUM to the rest of the world. (Source http://www.frd.ac.za/uninet/mrtg/uem.html)

It is fairly standard practice to make use of caching proxies in cases where bandwidth might be a problem due to limited availability [1]or high costs involved [2]. This is usually in the form of client-based (or proxy) caching and uses locality of reference principles to enhance access performance. Access patterns in distributed information systems exhibit three locality of reference (LOR) properties: temporal(recently accessed objects likely to be accessed again), geographical(object accessed by one client likely to be accessed by a nearby client) and spatial(object near accessed object likely to be accessed)[3]. The current usage profile in Mozambique makes it possible to use these locality of reference properties to enhance information access performance through the use of a dual caching/mirroring proxy system.In Section [REF: section:Implementation]the implementation of the system is discussed, possible enhancements are mentioned in Section [REF: section:enhancements]and we conclude in Section 4[REF: section:Conclusion]

2. Design and Implementation

[LABEL: section:Implementation]The user base for Web use in Mozambique is relatively small at this stage. It can therefore be assumed that it would not be sufficient to rely on temporal LOR properties to populate the proxy cache. A pro-active approach is therefore followed. Because of the smaller, more uniform user base, it is easier to determine common areas of interest thereby utilizing a kind of geographical LOR property. Objects relevant to these areas can be collected on a regular basis to ensure that current information is available. In the WWW context, a number of layers beneath the primary object are also fetched to make use of the spatial LOR property. The small user base and pro-active multi-layer caching make it easier to develop a culture of cache use. If the information in the cache is sufficiently up to date, users will prefer to use the information in the cache rather than to wait for information that might not be that more up to date.

Currently a number of sites of interest are identified_ . An automatic process retrieves the site's home page and a specified number of extra pages (by recursively following HTML links on the retrieved pages) through the use of a proxy. Instead of writing the retrieved pages onto the local disk, the mirror process discards the retrieved pages. This is done in order to conserve system resources as well as to avoid possible copyright conflicts. By using the proxy as intermediary, the retrieved pages are guaranteed to be in the cache of the proxy as if a client accessed that page. When a client accesses the retrieved page, it is served from the cache and not over the congested international link. This process can be run in off-peak times in order to maximize bandwidth utilization and not to compete with other access activities. To prevent excessive accesses to international sites outside Southern Africa, a mirror site that is updated more frequently than the CIUEM site is maintained at the CSIR. Accesses crossing another international link (SA-USA) for a current copy of a URL will then be minimized.

There are a number of cases where this setup will be of no benefit to the users. These include dynamic pages (CGI's, Active Server Pages etc.), advertising sites and search engines.

In order to keep the system as cost effective and simple as possible, we used public domain tools. The caching proxy used is the Squid proxy (http://squid.nlanr.net) and GNU wget mirroring software (ftp://prep.ai.mit.edu/pub/gnu/wget-xxx.tar.gz) that makes use of a proxy. The design of the system is shown in Figure [REF: figure:overview](a) with the process explained in Figure [REF: figure:overview](b).

3. Enhancement of the system

[LABEL: section:enhancements]There are a number of enhancements possible. It might include better selection of mirrored material through analysis of the log files, enforced use of the proxy through firewalling, use of an alternative connection solely for the cache and use of more local neighboring caches. Alternatives such as the TeleWeb system [4], search site proxies_ and intelligent agents might also be investigated.

4. Conclusion

[LABEL: section:Conclusion]In this paper a dual mirroring/caching system implemented to enhance access to relevant World Wide Web information for Mozambique is outlined. The system's primary goal is to alleviate the problems presented by limited bandwidth.


[1] N. G. Smith, "The UK National Web Cache: A State of the Art Report," in Proceedings of the Fifth International World Wide Web Conference, vol. 28 of Computer Networks and ISDN Systems, pp. 1407-1414, Elsevier, May 1996.

[2] D. Neal, "The Harvest Object Cache in New Zealand," in Proceedings of the Fifth International World Wide Web Conference, vol. 28 of Computer Networks and ISDN Systems, pp. 1415-1430, Elsevier, May 1996.

[3] A. Bestravros, "WWW Traffic Reduction and Load Balancing through Server-Based Caching," IEEE Concurrency, vol. 5, pp. 56-67, January - March 1997.

[4] B. N. Schilit, F. Douglis, D. M. Kristol, P. Kryzankowski, J. Sienicki, and J. A. Trotter, "TeleWeb: Loosely connected access to the World Wide Web," in Proceedings of the Fifth International World Wide Web Conference, vol. 28 of Computer Networks and ISDN Systems, pp. 1431-1444, Elsevier, May 1996.

This is done on an ad-hoc basis until meaningful results can be obtained from the log files.

A proxy that downloads the top matches in the background.

