Prefilling a Cache: A Satellite Overview

Ivan LOVRIC <ivan.lovric@cnet.francetelecom.fr>
Eric MASCHIO-ESPOSITO <eric.maschioesposito@cnet.francetelecom.fr>
Didier GUÉRIN <didier.guerin@cnet.francetelecom.fr>
France Telecom - CNET
France

Pierrick FILLON <pierrick.fillon@cec.eu.int>
European Commission
Belgium

Abstract

Today, satellites are becoming major vectors of the information diffusion on the Internet. They can be very useful for cache prefilling because they allow a large volume of data to be transferred at high speeds (up to 45 Mb/s) and distributed simultaneously on several reception dishes. When this prefilling information is on the cache, users can benefit from better access time to the stored pages. In this context, the satellite allows the quality of service for the end user to be improved by optimizing satellite links and transferring large volumes of data directly only when the traffic on the network is low.

Contents

1. Introduction

The topics detailed here follow the first experiments that were conducted in 1998 related to cache prefilling in a satellite context [1]. This work has already been presented during the IETF 44th meeting in the working groups UniDirectional Link Routing (UDLR) and Web Replication and Caching (WREC).

This document describes the multicast cache content delivery service designed and currently under test at France Telecom. This service works on an intranet and uses a network infrastructure that now associates terrestrial and satellite nodes. It also uses the cache prefilling module previously developed.

The first goal of our work was to extend the previously created satellite platform by associating it with a terrestrial Internet Protocol (IP) multicast network. The multicast diffusion of a predefined content to be prefilled within a cache was then tested on this platform using the reliable multicast protocol MFTP [2] over a heterogeneous network.

Afterwards, our work consisted of designing and testing the multicast cache content delivery service based on the following components:

For complete efficiency, this service needs to be easily integrated into a company's information data system. For this reason, we also performed integration tests with the Tivoli software distribution framework.

2. Design of the new multicast diffusion platform

We based our new diffusion platform on the platform on which we began our research on cache prefilling by satellite, as described in figure 1.

In this context, an Integrated Services Digital Network (ISDN) connection needed to be initialized by the client before any transfer; it was the only way to identify the IP address of the client-side proxy in to route the data packets toward the satellite link. Consequently, it was possible to activate the cache prefilling process that uses the ICP protocol [3]. It resulted in a faster access to the predefined Internet content.

We will now detail the different requirements of the extended platform.

To meet these requirements, we have built an open diffusion architecture over satellite, which is compliant with network topologies that already exist in companies. Indeed, it makes it possible to preserve the original infrastructure (classic unicast routing devices) while benefiting from multicast technologies and broadcast services intrinsic to satellites. The concept of open architecture means that this platform can also use a standard satellite resource to offer all the traditional services (television...) and Intranet services too, one of our goals.

Figure 2 describes the solution under consideration. Note that the architecture is compliant with our previous experiment. We define a parallel multicast network (left side of the figure) that is linked to the corporate network through the dishes installed in all downlink points. The return link is now available through the terrestrial unicast network.

3. Multicast cache content delivery experiment of a predefined content

We need a solution that provides simultaneous, reliable content delivery to hundreds of receivers in the same time it takes to distribute to only one using any other unicast-based method. For this experiment, it was necessary that this content distribution solution required no change to the network infrastructure. The solution must also provide us with a bandwidth usage control.

For these reasons, we have chosen the Starburst Omnicast software, based on the MFTP transfer protocol, which can simultaneously distribute content to any receiver over any IP network, regardless of network type, receiving operating system, number of receivers, or content size.

If we refer to figure 2, the main MFTP server is installed on both networks, multicast and unicast. The first network card is configured for the multicast and the satellite link. The second is plugged onto the corporate network. This main server maintains the Omnicast database. The database stores all the hosts, groups (associated with hosts), and public addresses (multicast addresses) to which not only groups but also end-user clients must listen. We have identified and defined several downlink points corresponding to a standalone user (small office) or a distant corporate site (i.e., a distant business unit).

At the downlink points, we have installed a specific gateway named FanOut that relayed the multicast packets to the final clients. Each FanOut gateway maintains a local database that contains all the host names and unicast IP-addresses it will serve. All acknowledgements are returned over the terrestrial unicast network (i.e., the corporate network) to the main Omnicast server.

Finally, the process consists of identifying a target multicast group, the content to deliver, the creation of the Omnicast associated outbox job, and the placement of the content in the outbox.

After the content is received on the targeted cache, the prefilling process must be activated.

We have demonstrated that, with this architecture, we could deliver a predefined content to

The process described is fully compliant with unicast networks, with low-cost adaptation for satellites and terrestrial accesses. It can be installed regardless of the size of the companies.

4. TOP 100 component

The purpose of the "Top100" service is, first, to determine the 100 most popular Web pages specific to one or more communities of surfers. Second, the application refreshes the history, aggregates , compresses and diffuses this Top 100 near the end-users. It then optimizes the web page access by prefilling the cache to which the client browser is connected.

Hypothesis: All users are located in two or more distant sites of a company and they have the same centers of interest.

  1. A process analyzes both log files of the Site A and Site B caches. Then it establishes the Top100 of the most popular Web pages stored on these caches.
  2. These Uniform Resource Locators (URLs) are updated or downloaded (as well as the first link level of each URL).
  3. A second process compresses all these URLs in a zip file, to which the URL descriptor file is added[1].
  4. This zip file is transferred via satellite using the MFTP transfer protocol.
  5. The MFTP listening clients at each site simultaneously receive this file and deposit it on a Squid cache.
  6. The Squid cache is then prefilled with the content of this decompressed zipped file, so that all the TOP 100 URLs are now locally available.

In conclusion, we noticed, during the experiment, that the access to the Internet using the corporate network required less bandwidth, the updated URL latency time was also reduced, and the global network performances were better.

Remark: Using this system, we are defining the beginning of the service of automatic community generation; with the TOP100, users can profit by the experience and browsing contents of all other users at the corporate site. This results in a reduction of the time it takes to access the information they need.

5. Cache content delivery to users with mobility constraints

All the experiments which we previously conducted work with the notion of groups of diffusion, where groups identify only the IP addresses of the targeted machines.

In an intranet environment, the users must access every application they need, regardless of the workstation to which they are logged on. We are currently more concerned with portals than with IP addresses or computers. Consequently, the personalization of content delivery must take into account user data rather than machine data. Therefore, we need to dynamically locate the user to diffuse to him the personalized cache content, independent of the computer to which he is logged on and of the site where the laptop is plugged in. Then, the user will be able to surf offline by accessing the pages that have been stored on his local hard disk through a specific Web portal based on the URL descriptor file [1]. In addition to the local cache prefilling service, we have the possibility of sending some video (not streaming yet), learning files, and other content.

To solve this problem, we use both the Lightweight Directory Access Protocol (LDAP) directory server of the company and a Microsoft NetMeeting client associated with an ILS server. The directory is used to handle persons and groups of persons using their name and searchable attributes, and to get their e-mail address. The connection to an ILS server allows the application to get the IP address related to the e-mail address.

There are several requirements to fulfill before the process can work properly:

The process follows these steps:

  1. Through a specially designed Web application, we can access the corporate directory to manipulate groups and/or individual users and create several diffusion groups based on user names.

  2. The intranet Web server executes a script that uses the LDAP protocol [4] to access the corporate directory server. This directory server gives the Unique Identifier (UID) of all the members of the diffusion group.

  3. A specific function named uid2adrIp dynamically converts any user ID to the equivalent IP address. To reach this goal, the function contacts the ILS server to get the IP address of the computer where the user is logged on (when a new NetMeeting session is open, the ILS server adds the user name and e-mail address to the directory database, gets the host IP address, and changes the client state to Active). This function uses a Hypertext Transfer Protocol (HTTP) access to the ILS web server. It can also be achieved by an LDAP access to SiteServer if SiteServer is used as the ILS server (also implemented).

    If a user moves to another workstation or another site and opens a new NetMeeting session, a new call to the function is necessary to return the correct IP address.

    Remark: This feature also allows the diffusion of slides or documents during a conference to the end users rather than to machines, which is particularly useful if attendees have mobility constraints.

  4. At this point, it is possible to take advantage of all these elements to correlate the user name with the computer name. We create either a new multicast group or update a predefined group in the Omnicast database.

  5. We then complete the process by creating and submitting a new job to the MFTP Server. The personalization is now effective , the user can be reached everywhere, and the system delivers the updated cache content to him.

The mobile user can now surf offline on his laptop.

We have successfully tested all of these stages on our corporate LAN with static IP addresses, except the local portal, which is still being developed. Also, the process of dynamic creation of the multicast groups within Omnicast raised an issue, and a specific development is described later in the Issues section.

Moreover, if the user's laptop is connected to the network via DHCP [5], things become more complex. It will then be necessary to dynamically reconfigure all the Starburst Omnicast FanOut gateways with a dynamic table of dependent hosts, which is currently not implemented.

6. Security management

The data transmitted through the satellite are intended to be used only by selected receivers. Unfortunately, multicast satellite transmission protocol does not take security into account. Consequently, the data (M) must be sent in such a way that only authorized receivers are able to use them. A way to achieve this is to use cryptography techniques [6, 7]; the data are sent enciphered. Then, only the receivers that know the appropriate deciphering key will be able to recover the clear content. To minimize the volume of transmitted data, M is enciphered once with a unique key for all the receivers. The key (Ksession) changes as often as there is new data M to be sent. Nevertheless, each receiver has a personal key (Ki), shared with the transmitting station, to carry the session key to the receivers. This allows us to select the receivers (only the hosts we have defined in our Starburst Omnicast database) among all the potential receivers (all dish-equipped computers), and moreover among all the authorized receivers (our Starburst Omnicast defined groups). As the uplink server knows all the potential receivers, a one-pass symmetric key transport mechanism is preferred versus one based on public key cryptography. If there are N receivers, there will be N additional fields for carrying the session key. These fields are of small size compared with the data M. The receiver's key is stored in a secured memory, in a smart card for example. The transmitting station uses a diversification mechanism to recover the keys of all the receivers from a unique one called master key.

This key management method allows us to define a Starburst Omnicast group, in one pass, and the data M are enciphered only once, saving computational time and bandwidth. Some more sophisticated key establishment techniques have been proposed for key sharing within a group. Unfortunately, it seems that none of them is designed for one-pass transmission with low bandwidth and individual key management.

This security management component will be soon tested on the platform.

7. Integration tests with Tivoli software distribution

This experiment aims to benefit from both systems.

When applied to a cache content delivery process, the first step (see figure, stage 1) is to build a Tivoli file package block, based on the cache content to be delivered, using the wcrtfpblock command [8].

Then, in stage 2, it is necessary to dynamically create an appropriate Starburst multicast group of targeted gateways, based on the list of subscribers to the distribution obtained by the wgetsub command and on the gateway to which these end-points are connected obtained by the wep command. Afterwards, the fpblock is copied within the Omnicast outbox directory and sent over the satellite link to the multicast group of gateways using MFTP (stage 3).

In case of failure during this process, an analysis of Starburst logs returns an event to the TMR; otherwise, in case of success, the final installation of the fpblock is remotely operated using the wdistfpblock command from the TMR (stage 4). Assuming that the end-points are caches, then the cache prefilling process is activated automatically. Finally, in stage 5, each Tivoli user agent returns an installation status message to the TMR.

This content delivery system has been tested at France Telecom and works correctly. To achieve this, the multicast platform was extended with a Tivoli framework consisting of one TMR, one gateway, and two end-points.

The main purpose of this experiment was to show the ability of the cache content delivery service to be completely integrated within a global multicast software distribution system.

The content to deliver to caches are processed in exactly the same way as any other software from the TMR point of view.

Some improvements are still necessary to completely automate the process of extracting the list of target gateways with the wep command from the list of distribution subscribers. Also, the Starburst log analyzer still needs to be developed to generate events in case of transmission failures over the satellite link. We must also verify the ability of Tivoli to work properly with users having mobility constraints and with the security management component.

8. Issues raised during the experiments

When testing the platform and activating the different components of the cache content delivery service, several issues were raised. We have tried to find a proper solution to each specific problem.

8.1. Routing of IP-multicast packets through the satellite link

Due to the asymmetry of the network when transmitting packets over the satellite link, the routing of the multicast packets from the last IP-multicast router to the OpenMux (IP/DVB gateway) and then to the satellite was impossible. This is because the OpenMux was not a member of the multicast group; even though the address of the host over the satellite link belonged to the same address class of the OpenMux. Therefore, it was not possible for the router to get the MAC address of the destination host by an arp request.

To solve this problem, we installed another MFTP client between the last multicast router and the OpenMux. After including this client in the multicast group, the multicast packets could then reach the OpenMux and finally the destination host.

There were other ways to solve the problem.

8.2. Dynamic multicast groups

The Starburst Omnicast product has an Application Programming Interface (API) that allows commands on the MFTP engine to be processed. This interface has been written for interoperability purposes. Every command that can be issued by the SBCLI application can also be issued by this API.

Therefore, it is possible to create jobs, run, or stop them from this interface.

However, this API does not give access to group and host features. Thus, it is still not possible to create or modify hosts and groups dynamically using the API.

To develop these features, it was necessary to access and modify the Starburst dispatcher database by creating a specific application that modifies the tables HOST and GRP_MEM (Group Members). This application reads the host records stored in the group file mgrp.hst and writes them within the database of the dispatcher in a dedicated multicast group named MGRP.

An example of host group file mgrp.hst follows:

    192.190.248.1 c-spi-il.caen.cnet.fr
    192.190.248.2 c-spi-eme.caen.cnet.fr
    192.190.248.3 c-spi-pa.caen.cnet.fr

Remark: Further work on this topic will consist in creating or modifying a group in the GROUP table rather than always using the dedicated MGRP group. It will then be possible to simultaneously create different groups with different multicast addresses and send particular content to each of these groups simultaneously by launching a specific job for each group.

8.3. Transmission of huge fpblocks

The Tivoli wcrtfpblock command creates a file package block based on a file package previously defined in Tivoli using the desktop.

This creation process needs some time depending on the file size and Tivoli internal processing.

If we consider that the creation of the huge block is done directly within the directory of the active outbox job of Starburst by running the following command:

    wcrtfpblock -a @FilePackageName StarburstOutboxDirectory
         @SendingMachine,

then part of the fpblock is sent by Starburst automatically before all the file is written within the outbox directory. Also, the MFTP clients send normal completion messages after receiving the truncated file. Thus, it is not possible to see that the file has been truncated by checking the completion messages or log files.

To solve this problem, we have simply created the fpblock in a temporary directory, and run a copy command from the temporary folder to the outbox directory. This works fine for file up to 50 Mbytes, but the problem for bigger files (hundreds of Mbytes) still remains.

To solve this problem, it is also possible to launch the outbox job only after the file has been completely written in the outbox directory, rather than activating it permanently.

9. Conclusions and further work

The experiments have shown the ability of the platform to test the components of the cache content delivery service designed at France Telecom and its high interoperability with other complex systems such as Tivoli software distribution. However, the platform will soon be updated to be compliant with the UDLR protocol.

Notwithstanding several issues that have already been discussed with Starburst, the MFTP protocol works properly and its capacities are powerful in an heterogeneous network, using satellite links.

According to this reliable architecture, it was easy to demonstrate that the cache prefilling by satellite works correctly in a multicast context on all the tested components.

Further work will, first, consist in extending the multicast cache content delivery service by improving the TOP100 component. Rather than using log analysis, we will use cache content analysis based on artificial intelligence [10, 11]. The results will be more accurate, and it will then be possible to automatically determine user profiles and aggregate them to obtain communities of interest. The cache content sent over the satellite link will be specifically targeted for each community.

Then, the cache content delivery service will be tested in an operational context in the health care domain. As presented in the state-of-the-art conducted in the EU "THEN" project (http://www.nt.spr.it/projects.html), satellite connections can be beneficial for telemedicine applications. This is typically useful when a medical staff requires second opinions from remote specialists for an unknown or complex pathology. In such a case, large medical datasets, which are composed of pictures under various formats (CT, MRI, biosignals), video, text (patient record files), or sounds (echography), need to be transmitted quickly under secured protocols to different medical centers. In that context, the cache content delivery service over a satellite link allows the datasets to be transmitted and stored within the cache of each remote hospital. Thus the remote specialists can access information quickly and locally complete medical information and provide an appropriate diagnose.

Such a facility will be included in the European Union funding programs as an extension of the existing Mobile Medical Data (MOMEDA) architecture (http://momeda.cpr.it), which aims at sharing diagnose information between different specialists having roaming constraints.

10. References

[1] C. Goutard, I. Lovric, and E.Maschio-Esposito, "Prefilling a cache, a satellite overview", draft-lovric-francetelecom-satellites-00.txt, France Telecom, February 1999.

[2] K. Miller, K. Robertson, A. Tweedly, and M. White, "Starburst Multicast File Transfer Protocol Specification", draft-miller-mftp-spec-03.txt, April 1998.

[3] D. Wessels, and K. Claffy, "Internet Cache Protocol (ICP), version 2", RFC 2186, National Laboratory for Applied Network Research/UCSD, September 1997.

[4] M. Wahl, T. Howes, and S. Kille, "Lightweight Directory Access Protocol (v3)", RFC 2251, Critical Angle, Netscape, ISODE, December 1997.

[5] R. Droms, "Dynamic Host Configuration Protocol", RFC 1531, Bucknell University, October 1993.

[6] B. Schneier, "Cryptographie appliquée - 2ème édition", 1997.

[7] D. Boneh, and M. Franklin, "An efficient Public Key Traitor Tracing Scheme", CRYPTO 99.

[8] "TME 10, Software distribution reference manual version 3.6", Tivoli Systems, September 1998.

[9] E. Duros, W. Dabbous, H. Izumiyama, N. Fujii, and Y. Zhang, "A Link Layer Tunneling Mechanism for Unidirectional Links", draft-ietf-udlr-lltunnel-02.txt, INRIA, WIDE, HRL, June 1999.

[10] L. Lancieri, "Automated organization of caches architecture", WebNet 98 AACE conference (Orlando) 1998.

[11] L. Lancieri, "Distributed Multimedia Document Modeling", IEEE Joint Conference on Neural Networks (Anchorage) 1998.

Disclaimer

The opinions expressed in this article are those of the author and do not necessarily reflect the view of the European Commission.