J. J. Eksteen and J. P. L. Cloete
Division for Communication and Information Networking Technology,
CSIR, Pretoria, 0001, RSA
April 2, 1997
Access to the WWW from Mozambique is severely restricted
by the extremely limited internet bandwidth between Mozambique
and the rest of the Internet. To alleviate this problem, a dual
caching/mirroring mechanism was implemented between the site of
the University of Eduardo Mondlane in Maputo, Mozambique and the
CSIR site in Pretoria, South Africa.
LABEL: section:Introduction]The Centro de Informática
da Universidade Eduardo Mondlane (CIUEM), Maputo, Mozambique would
like to provide improved access to International World Wide Web
(WWW) sites from within Mozambique. Due to the severely limited
bandwidth (9.6 Kbps connection with compression.) between Mozambique
and the rest of the world, as well as the high demands placed
on this link (See Figure 1[REF: figure:bandwith_use]) a method
must be found to maximize utilization of the available bandwidth.
Figure 1: A typical example of the monthly utilization
of the international link from CIEUM to the rest of the world.
It is fairly standard practice to make use of caching
proxies in cases where bandwidth might be a problem due to limited
availability or high costs involved . This is usually in
the form of client-based (or proxy) caching and uses locality
of reference principles to enhance access performance. Access
patterns in distributed information systems exhibit three locality
of reference (LOR) properties: temporal(recently accessed objects
likely to be accessed again), geographical(object accessed by
one client likely to be accessed by a nearby client) and spatial(object
near accessed object likely to be accessed). The current usage
profile in Mozambique makes it possible to use these locality
of reference properties to enhance information access performance
through the use of a dual caching/mirroring proxy system.In Section
[REF: section:Implementation]the implementation of the system
is discussed, possible enhancements are mentioned in Section [REF:
section:enhancements]and we conclude in Section 4[REF: section:Conclusion]
[LABEL: section:Implementation]The user base for Web use in Mozambique is relatively small at this stage. It can therefore be assumed that it would not be sufficient to rely on temporal LOR properties to populate the proxy cache. A pro-active approach is therefore followed. Because of the smaller, more uniform user base, it is easier to determine common areas of interest thereby utilizing a kind of geographical LOR property. Objects relevant to these areas can be collected on a regular basis to ensure that current information is available. In the WWW context, a number of layers beneath the primary object are also fetched to make use of the spatial LOR property. The small user base and pro-active multi-layer caching make it easier to develop a culture of cache use. If the information in the cache is sufficiently up to date, users will prefer to use the information in the cache rather than to wait for information that might not be that more up to date.
Currently a number of sites of interest are identified_ . An automatic process retrieves the site's home page and a specified number of extra pages (by recursively following HTML links on the retrieved pages) through the use of a proxy. Instead of writing the retrieved pages onto the local disk, the mirror process discards the retrieved pages. This is done in order to conserve system resources as well as to avoid possible copyright conflicts. By using the proxy as intermediary, the retrieved pages are guaranteed to be in the cache of the proxy as if a client accessed that page. When a client accesses the retrieved page, it is served from the cache and not over the congested international link. This process can be run in off-peak times in order to maximize bandwidth utilization and not to compete with other access activities. To prevent excessive accesses to international sites outside Southern Africa, a mirror site that is updated more frequently than the CIUEM site is maintained at the CSIR. Accesses crossing another international link (SA-USA) for a current copy of a URL will then be minimized.
There are a number of cases where this setup will be of no benefit to the users. These include dynamic pages (CGI's, Active Server Pages etc.), advertising sites and search engines.
In order to keep the system as cost effective and
simple as possible, we used public domain tools. The caching proxy
used is the Squid proxy (http://squid.nlanr.net) and GNU wget
mirroring software (ftp://prep.ai.mit.edu/pub/gnu/wget-xxx.tar.gz)
that makes use of a proxy. The design of the system is shown in
Figure [REF: figure:overview](a) with the process explained in
Figure [REF: figure:overview](b).
[LABEL: section:enhancements]There are a number of
enhancements possible. It might include better selection of mirrored
material through analysis of the log files, enforced use of the
proxy through firewalling, use of an alternative connection solely
for the cache and use of more local neighboring caches. Alternatives
such as the TeleWeb system , search site proxies_ and intelligent
agents might also be investigated.
[LABEL: section:Conclusion]In this paper a dual mirroring/caching system implemented to enhance access to relevant World Wide Web information for Mozambique is outlined. The system's primary goal is to alleviate the problems presented by limited bandwidth.
 N. G. Smith, "The UK National Web Cache:
A State of the Art Report," in Proceedings of the Fifth International
World Wide Web Conference, vol. 28 of Computer Networks and
ISDN Systems, pp. 1407-1414, Elsevier, May 1996.
 D. Neal, "The Harvest Object Cache
in New Zealand," in Proceedings of the Fifth International
World Wide Web Conference, vol. 28 of Computer Networks and
ISDN Systems, pp. 1415-1430, Elsevier, May 1996.
 A. Bestravros, "WWW Traffic Reduction
and Load Balancing through Server-Based Caching," IEEE Concurrency,
vol. 5, pp. 56-67, January - March 1997.
 B. N. Schilit, F. Douglis, D. M. Kristol, P. Kryzankowski, J. Sienicki, and J. A. Trotter, "TeleWeb: Loosely connected access to the World Wide Web," in Proceedings of the Fifth International World Wide Web Conference, vol. 28 of Computer Networks and ISDN Systems, pp. 1431-1444, Elsevier, May 1996.
This is done on an ad-hoc basis until meaningful results can be obtained from the log files.
A proxy that downloads the top matches in the background.