Web Caching Meshes: Hit or Miss?
Ton VERSCHUREN <firstname.lastname@example.org>
André de JONG <email@example.com>
Ingrid MELVE <firstname.lastname@example.org>
This paper describes a cost-benefit analysis that has been done for the SURFnet Web-caching mesh, consisting of around 20 "children" that connect to a single "parent." The parent itself has "neighbor" relationships with several other Dutch Internet service providers (ISPs) and several foreign ISPs. The analysis shows that on every level of the mesh -- institutional cache, top-level cache, and the whole mesh -- the benefits of caching exceed the costs in an economic sense. Furthermore, a significant reduction in latency can be achieved.
This document1 describes a comparison between costs and benefits of an operating World Wide Web caching service. It is part of the SURFnet Web Caching project carried out within the framework of the DESIRE project (Development of a European Service for Information on Research and Education, Ref. 1). The DESIRE project is part of the Telematics for Research Sector of the Fourth Framework Programme funded by the European Union. The main goal of the caching project is to investigate caching as a tool to reduce the growing demand on network bandwidth. Earlier results of the DESIRE caching project include a survey of caching requirements (Ref. 2) and a report on Web caching architectures (Ref. 3), where more information on caching in general can be found.
The comparison between costs and benefits is based on an analysis of the statistics collected from the top-level cache in the Netherlands, maintained by the SURFnet, the Dutch Internet service provider (ISP) for the academic and research community, and from the first-level cache of Utrecht University (UU).
The first section gives a short introduction of caching, a description of the SURFnet mesh, and an overview of the costs and benefits involved. Then, the savings for the top-level cache, its neighbors, the first-level cache, and the whole mesh are presented. In section 3 the reduction in latency is calculated, followed by an estimate of the reduced load on the origin servers. Section 5 lists the costs of the various components of the model, and section 6 compares the costs and the benefits and summarizes the results.
Web traffic has grown enormously and will grow at the same rate without any action. The network will be clogged up and the load of origin servers will be unacceptably high. Both effects will cause increasing latency. People all over the world will experience long waiting times to get requested objects from the Net. One way to decrease the network load is to install a Web caching service. Caching migrates copies of requested objects from origin Web servers to a place closer to the clients. Essentially, once the object pointed to by a URL has been cached, subsequent requests for the URL will result in the cached copy being returned, and little or no extra network traffic will be generated. It is possible to use more caching proxies hierarchically, so that caching proxies cache URLs from other caching proxies. In this case the caches can be identified as first-level caches, second-level caches, and so on.
Web caching servers may cooperate in a mesh where they share load and distribute requests. A Web caching server in a mesh will pass an unfulfilled request on to another Web caching server. The simplest mesh consists of two servers strung after another. If a request is not answered on the first cache server, it is passed on to the second and then on to the origin Web server. Another way is to build a network of servers that have different relations to one another; this may be done with the help of the Internet Cache Protocol (ICP, Ref. 5). ICP takes care of the communication between Web cache servers.
Figure 1 gives a schematic model of a simple hierarchical architecture of a caching mesh: Clients request objects from a first-level cache. That first-level cache has a sibling relation with another first-level cache and has a top-level cache as parent. In the figure, the several data streams which play a part in the caching mesh are shown.
Figure 1. A schematic model of a hierarchical architecture of a caching mesh.
In principle it functions as follows: Requests for Web objects by clients are sent to a first-level cache server. If the requested object is present on that cache and is not out-of-date or recently modified on the original server, the object is sent to the client from the cache. So traffic for these objects is restricted to the local or metropolitan Net between the cache server and the specific client group. If the object is not present on the first-level cache, that cache requests its neighbors and the top-level cache to inform it whether the requested object is present on one of those caches. These requests are done following the ICP, which uses small UDP packets to minimize the traffic between caches. The software used (Squid) supports ICP.
The neighbors and top-level cache inform the first-level cache, still following ICP, whether the requested object is present or not. If the object is present at one of the neighbors or the top-level cache, the first-level cache requests the object from that cache using TCP packets. In some cases, if the object is small enough, the object is enclosed in the answering UDP packets. One difference between a parent and a sibling is that a parent can retrieve documents from the network at the request of a hierarchically lower caching server (a child). The parent will send them to the child. A sibling, however, will send the documents to the child only if they are present in its own cache without having to retrieve them from the network. In this case traffic is now restricted to the (local or national) backbone. If none of the neighbors and top-level caches have the requested object at their disposal, the same scenario as described above is followed by the top-level cache: The top-level cache asks its neighbors if the object is present and if so, to get it. If not, the top-level cache will get the object from the origin Web server, and, usually, more expensive (e.g., international) bandwidth will be necessary.
Figure 2 gives a schematic view of the Web caching mesh architecture of SURFnet on July 8, 1997. The lower part of the figure shows the relations inside SURFnet. A top-level Web caching server is used as a parent cache by Web caching servers of SURFnet customers. Clients from the different institutes are connected to the institutional first-level cache server. Some of the first-level cache servers are engaged in a sibling relation to other first-level cache servers.
The top-level cache has a sibling relation to cache servers from other Dutch ISPs, shown in the upper left part of the figure. The upper right part shows the dual parent relations with UNINETT, where the UNINETT top-level cache is parent for all documents in the .NO domain, and SURFnet top-level cache is parent for all documents in the .NL domain. The other top-level neighbors' relations are installed for caching documents in the .COM domain (in favor of the so-called COM-Mesh experiment, Ref. 6), from which it was expected to decrease Web traffic from and to the US where the .COM domain mainly resides.
Figure 2. The SURFnet Web caching mesh
The benefits of a caching service are obvious: The different network segments to which the first-level caches belong handle a fraction of the traffic, and a part of the traffic is handled by the national backbone. It means an important reduction of international traffic on the Internet and an important reduction of the load on the origin Web servers. Besides the reduction in external traffic, another effect of the caching technology is that objects from the caches are delivered in a much shorter time than objects received from the origin Web server because the objects are saved closer to the clients. This means a much shorter waiting time for these objects and an improved quality of service experienced by users and providers, as they will see an improved Web performance. Another effect of caching is a reduction in load on the origin servers, which gives a reduction in latency for objects from that origin server.
These benefits can be expressed in reduced costs and higher quality of service by the provider:
A less measurable benefit, but one that should not be underestimated, is the improved quality of service experienced by users and providers, as they will see an improved Web performance.
The costs for realizing a caching service are
In this document an analysis is made of these costs and benefits, based on an analysis of the top-level cache from SURFnet and the first-level cache of Utrecht University, as well as of the difference measured in elapsed time between the delivery of objects from the caches and from the origin servers.
A key element in a cost/benefit analysis of caching is the price of the network bandwidth. Indeed, the volume of the hits on a cache can be directly translated into bits per second saved, and bits per second means money! The different tariff models of the various ISPs, however, vary substantially. For example, a commercial ISP in the Netherlands does not charge for traffic on its own network, but charges only for "international" traffic. UNINETT, the Norwegian R&E ISP, on the other hand, charges for the use of national and international bandwidth depending on the nominal access capacity. On top of that, customers have to pay the costs of an access line to the nearest Point of Presence (PoP), a function of distance and bandwidth.
In this paper we take the SURFnet cost model, where PTT Telecom, a Dutch telecom operator, charges a fixed price for the total backbone of 34 Mbps (14 PoPs). The main customers, i.e., universities, sit on the backbone and hence do not pay costs for access lines. Therefore, we will only take into account the variable costs of the 34 Mbps connections of these customers, which means a fairly low price for national bandwidth. The customers with access lines (from 64 kbps to 2 Mbps) to the backbone are left out of consideration in this paper.
Another important element of the cost model is the existence or absence of a close-by Internet Exchange Point (IXE). A well-functioning and well-connected national IXE means that traffic for destinations outside the R&E ISP's network, but within the country's top-level domain (e.g., .nl), can be regarded as national traffic. In the absence of such an IXE, that type of traffic will in most cases have to be considered as "international" because it will cross the national border. As SURFnet is connected to the Amsterdam Internet Exchange (AMS-IX) and so are most other Dutch ISPs, all traffic for the .nl domain is considered as national in this report.
The effect of caching on consumed bandwidth can be obtained from an analysis of the traffic on the caching servers. The traffic on the cache is measured in terms of counted UDP packets, TCP packets, and the objects delivered by the cache expressed in Mbytes. A distinction is made between the part of the traffic that is delivered by the cache itself -- hit objects -- and the part that was not present on the cache -- misses. The part of traffic that is delivered by the cache means a reduction in traffic upstream of the cache and its clients and thus less need for expensive higher bandwidth, or put otherwise: The current bandwidth can be used over a longer period before an upgrade is necessary.
In this section, the traffic saved is calculated for different caches in the mesh: the top-level cache, the neighbors of the top-level cache, the first-level cache of UU, all first-level caches in the mesh, and finally the whole mesh.
Figure 3 shows the total amount of Mbytes delivered from the top-level cache to its clients, measured2 over a day, during the first half of 1997, and the part which is delivered by the cache itself, i.e., the hits in Mbytes. The represented data apply only to working days, i.e., Monday through Friday.
Figure 3. Daily total volume and hit volume of the top-level cache.
The figure shows an increase of the daily Web traffic from January to June with about a factor 3, from 1 Gbytes a day in January to about 3.4 Gbytes a day in June. The gaps in daily traffic at the end of March and in the beginning of April are imputed to the holidays in those months. In June the increase in traffic on the top-level cache seems somewhat saturated, which may be due to the starting holidays.
The volume of hits on the top-level cache increases in that period from 100 Mbytes a day to 500 Mbytes a day, i.e., a factor 5 over the first half of 1997. The fraction of the total traffic delivered by the top-level cache ("hit-rate") increases in that period from 10% to 15% of the total volume. In figure 4 the hit-rate measured over a day is shown. The spread in the daily measured hit-rate is about 10%.
Figure 4. Volume of the hit-rate (%) of the top-level cache.
Table 1 gives the monthly totals in Gbytes on the top-level cache:
Table 1. The total volume, the volume of the hits, and the hit-rate of the top-level cache over a month.
If we take June as representative for the present situation, we see in that month a total traffic flow of 73.3 Gbytes through the top-level cache, of which 12.6 Gbytes are hits. This means a saving of 17% on non-SURFnet ("international"3) traffic.
The saving of 12.6 Gbytes on international traffic in a month means that over the time of a month,4 12.6 Gbytes are delivered by the top-level cache and for this part no international bandwidth is needed. To transport 12.6 Gbytes over a month, a bandwidth of 8*12.6 Gbytes / 2.6 Msec = 40.8 kbits/sec5 is necessary and is now saved by the top-level cache.
This is not a realistic measure. The traffic is mostly generated on working days during working hours, i.e., about 20 days of 8 hours each in a month or 567 ksec. In that time the traffic is generated and bandwidth should be able to transport 12.6 Gbytes in that time. This means a bandwidth of 184 Kbits/sec is needed at least. The above calculation is based on a full, i.e., 100% occupation of the Net. To avoid congestion of the Net, the capacity of the connection should be over-dimensioned by, for example, 2.5 times the nominal bandwidth (i.e., 40% load). The realistic bandwidth needed for transportation of 12.6 Gbytes in a month will then be about 460 kbit/sec6.
To express the bandwidth saved in money saved, we should distinguish different charges. On the SURFnet backbone the variable costs of a 34 Mbit/sec connection are ECU 1800 per month (see also section 1.4). Within Europe, 2 Mbit/sec costs ECU 16500 per month and outside Europe (i.e., to the US) the same bandwidth costs ECU 26500 per month.
To apply these different charges, the traffic from the cache is divided in different domain groups based on the domain names in the addresses. Traffic to domains outside Europe and the Netherlands (for instance, .us , .ca , .au , .com , and .gov) is aggregated in a group "outside Europe." Traffic to countries in Europe (for instance, .no , .be , and .uk ) is aggregated in a domain group "Europe" and the domain .nl is sampled in "The Netherlands." Because the majority of the hosts in the .nl domain are connected to SURFnet, we assume that .nl traffic is SURFnet traffic to simplify calculations. Besides, due to the existence of the Amsterdam Internet Exchange, there's no additional cost involved in reaching non-SURFnet .nl domains (refer to section 1.3).
Table 2 shows the distribution over these different domain groups of the saved 12.6 Gbytes on the top-level cache. Conversion to bandwidth is given following the method described above, and the different charges are used. The total savings on the top-level cache due to savings in bandwidth is about ECU 3700 a month.6
Table 2. Volume of hits on top-level cache, converted to bandwidth and costs
The savings on traffic caused by the neighbor relations with the top-level cache are less easy to analyze. The log-files on the top-level cache give only the ICP requests from the neighbors and the size of the delivered objects in Mbytes. Not shown is the delivery to the top-level cache by its neighbors.
An indication of the efficiency can be given as follows: In June, the top-level cache delivered to its neighbors about 7.9 Gbytes. On the other hand the neighbor relations generate traffic, too, in the form of UDP requests and answers. In a month, the top-level cache gets 17.7 million incoming UDP requests from its neighbors. These requests will be answered, generating 17.7 million UDP answers. In 0.6 million of these answers, the requested object was encompassed in the answer, giving a net overhead of about 35 million UDP packets for requests and answers between neighbors. Based on the standard for UDP, the UDP packet size is estimated to be about 64 bytes. This is confirmed by the measurement of the number of UDP requests in a week and the UDP traffic involved in bytes for different neighbors.
If a UDP packet size of 64 bytes is used, the overhead on UDP traffic is about 2.3 Gbytes. Compared to the objects delivered, a net result of about 5.6 Gbytes survives.
If we assume the same proportion of the traffic (in fact: the hits) applies to outside Europe as was measured on the top-level cache in June, i.e., 74%, the traffic saved outside Europe by the neighbors is about 4.1 Gbytes, corresponding to a bandwidth of 149 kbit/sec. This equals a savings of ECU 1925 in a month. We will assume that the volume delivered from the top-level cache to its neighbors is approximately the same as that delivered from the neighbors to the top-level cache. We will therefore use the figure of ECU 1925 to calculate the savings of the total mesh in section 2.6.
The same analysis as was done for the top-level cache could also be done for a first-level cache, to get an idea of the savings of bandwidth on a first-level cache.
Figure 5 gives the daily totals in Mbytes delivered by the first-level cache of Utrecht University to its clients and the volume of the hits. It shows a similar but less pronounced trend as was measured on the top-level cache: an increase of about a factor 2, from 450 to 850 Mbytes a day, in the first half of 1997.
Figure 5. Daily traffic volume and hit volume of the first-level cache UU
The volume through the first-level cache increased in that period from about 100 Mbytes a day in January to about 200 Mbytes a day in June, resulting in a hit-rate of about 25%.
The hit-rate is higher than was measured on the top-level cache, as expected. The first-level cache takes care of requests of a more homogeneous user group. The top-level cache is providing objects to the first-level cache, which were not available on the first-level cache. Next time the same object is requested by the client, the object is delivered by and registered as a hit for the first-level cache.
Figure 6 gives the hit-rate measured over a day in the first half of 1997.
Figure 6. Hit-rate (%) of the volume of the first-level cache UU
Table 3 gives the monthly totals of traffic on the first-level cache and the hit-rate.
Table 3. The total volume, the volume of the hits, and the hit-rate of the first-level cache per month
Taking June again as representative for the present situation gives a saving of 4.8 Gbytes on the UU first-level cache. The distribution over the domain areas for the hit part of the traffic is found as shown in table 4. The same kinds of calculations are applied as was done for the top-level cache.
Table 4. Volume of hits on first-level cache, converted to bandwidth and costs
Part of the savings in bandwidth to the neighbors of this first-level cache can be calculated as follows:
In June, 0.5 Gbytes is delivered from the UU first-level cache to its neighbors. The number of UDP requests in June was 1.8 million. Including the UDP answers, it gives traffic of 3.6 million UDP packets. If we assume again a mean packet size of 64 bytes, the overhead in UDP packets is 0.2 Gbytes. A net result of 0.3 Gbytes is transferred from the first-level cache to its neighbors. Expressed in bandwidth and costs for the part of this traffic outside Europe: 8.1 Kbits/sec, i.e., ECU 104 a month.
As figures are not known yet from the other first-level caches, an estimate must be made based on the numbers from the top-level cache and the UU first-level cache.
As was shown in table 1, the top-level cache delivered a total of 73.3 Gbytes in June to its lower first-level caches. The UU first-level cache delivered 17.9 Gbytes to its clients, from which 4.8 Gbytes were delivered from the cache itself, about 27%. The UU first-level cache has received 13.1 (17.9 - 4.8) Gbytes from the top-level cache, so the top-level cache has 60.2 Gbytes delivered to the other first-level caches. Assuming the hit-rate on those caches is also about 27%, a total of 82.5 Gbytes is delivered to the clients of those first-level caches, including 22.3 Gbytes from these caches themselves. If we assume the same distribution over the domain areas as before, the savings in bandwidth can be estimated as in table 5.
Table 5. Estimated hit volume, bandwidth saved, and corresponding costs saved for the complete collection of cache children of the top-level cache
A remark must be made on the hit-rate of 27% based on the result of the first-level cache UU. It is known that this cache has a poorer result than most other university first-level caches, where hit-rates of about 50% are normal. This is caused by the more diverse disciplines of users at the UU. A technical university, for instance, has a more homogeneous group of users. Such a group will fetch the same documents more often and therefore have a higher caching hit-rate. Twenty-seven percent was assumed for all first-level cache servers and so the savings in bandwidth and costs saved is a lower limit. The actual savings can be about a factor 2 higher.
The total savings on bandwidth by caching based on the June 1997 figures are approximately as follows:
Table 6. Savings for the total SURFnet mesh
Note that only the bandwidth for US connections is listed, as this is the major component. The volumes and costs saved apply nevertheless to the whole mesh component.
From the table we learn that the caching mesh as implemented in SURFnet saves 1 Mbps on transatlantic bandwidth or the equivalent of kECU 14 per month.
The reduction in latency can be distilled from the measured elapsed time statistics.7 The number of total TCP requests for Web objects on the cache is traced. Moreover, the number of TCP hits (i.e., the requested object is present in the cache) and the number of misses (i.e., the object is not on the cache and must be fetched from the origin server) are traced. If the object is in the cache, the time to deliver it to the client will be much shorter than when the object has to be fetched from the origin server. The difference in time between these two situations is measured, too, and can be used for an estimate of the reduced latency.
For instance, in the month of June, 1.19 million hits were counted on the top-level cache, with an average elapsed time of 1.56 sec per request (on a heavily loaded caching server). This gives a total waiting time of 513 hours. If the requested object was not on the top-level cache (a miss), the average elapsed time per request is 9.90 sec. Assuming the 1.19 million hits would have been misses and would have an average elapsed time of 9.90 sec, the total wait time would have been 3266 hours. Compared to the 513 hours wait time for hits, this results in a time profit of about 2750 hours over a month. Strictly, this time has to be corrected for the time it takes in using the cache. Frequent measurements (every 10 minutes) on the connections to the UU first-level cache and the SURFnet top-level cache give an average round-trip time of about 1 sec. The 1.19 million requests result in about 330 hours. So the net savings in time per month is about 2400 hours.
This is an amazing result; over a year, 28800 hours is equivalent to 17 full-time labor equivalents, saving kECU 785 for clients on top-level caching.
The same calculation on the Utrecht University first-level cache gives a profit of 900 hours in a month, which means a savings of 10800 hours or 6.5 full-time equivalents per year: kECU 290. Measurements on the other 18 first-level caches are not known yet, but if the order in savings on wait time is the same, the total time savings as expressed in salary is about MECU 6.
It is perhaps not realistic to count the total time saved. One must realize that the total time is divided over many clients and many requests. Every request accounts for a very small part of the total time saved. For the first-level cache UU, the 900 hours time saved should be divided over 750 unique clients, meaning a time saved of 1.2 hours per client per month.
Instead of the time saved being expressed in total hours and salary, the time saved could be better used as a measure for the improvement in perceived quality of service by the users. For the top-level cache, 513 hours wait time in a month for hits compared to the corrected 2400 hours wait time for misses gives a reduction in wait time of 79%. The clients who use the cache should notice this. The reduction in wait times gives the most important increase in perceived quality by the users, because waiting on requested objects does irritate people. Without the cache the waiting time is almost five times as long on average!
The use of a caching service could remarkably reduce the load on the origin Web server, especially for Web servers with very popular pages, as these often-requested pages are placed on the cache server. The load on the origin Web server will be restricted to the requests for objects from the arsenal of connected caching machines, not by masses of users asking for the same documents.
The reduction of the load on the origin Web servers can be calculated as a percentage of the traffic reduction, the request, and the hit-rate. For example, in April 1997, one of the busiest Web servers in the world, the Netscape Web site, showed a reduction of 75% in requests and a reduction of 60% in traffic on a total of 80000 requests and a volume of 330 Mbytes on the Utrecht University cache. The top-level cache showed a reduction of 45% in requests and a reduction of 20% in traffic on a total of 75000 requests and a volume of 560 Mbytes. The average savings to the Netscape Web server is 60% in requests and 35% in traffic, as seen from these two caching servers.
The costs for the exploitation of a Web caching service depend largely on the size of the site. The different scale in use of a cache server requires different system dimensions in disk size, I/O, and processor speed. Small sites, for example, could use an existing Web server for caching service for approximately kECU 2.2. Large sites need a more powerful dedicated system for approximately kECU 7 (complete with installation, configuration, software, etc.) and 0.1 full-time equivalents for maintenance. With a write-off period of 3 years, the yearly costs will be about kECU 7.8. The top-level cache will need one or more big computer systems, with adequately educated system managers. This might count up to kECU 35 and 0.25 full-time equivalents. Including the write-off costs, the yearly costs are about kECU 24.
The total SURFnet cache mesh consists of 10 small sites, 8 large sites, and one top-level cache. The initial cost of the cache mesh is kECU 113, and yearly personnel costs 1.6 full-time equivalents, i.e., kECU 78 a year. Including write-off, in three years the annual costs are approximately kECU 115.
The purchase of caching software is not taken into account. The software used and recommended is Squid, which is free.
Table 7. Overview of costs for different systems
As stated before, the introduction of the World Wide Web has had a major impact on the use of networked information systems and on the demand on the underlying resource of network bandwidth. This has led to an enormous growth of traffic on the local, national, and international backbone. Without any action the network will be clogged up and clients will perceive a very low quality of the Net.
One option is to increase the bandwidth, but (international) bandwidth is expensive. One benefit of caching is that the clients perceive a better quality of the Net without upgrading the bandwidth. The question is whether caching is indeed cheaper.
If we take as an example the first-level cache of the UU, a large first-level cache, it was shown in section 2.3 that the volume of the traffic of the hits served by the UU cache has a cost equivalence of approximately ECU 1400 per month, or kECU 16 a year. The annual costs of such a cache server are kECU 7 a year (see section 5), which means that caching costs kECU 9 less per year than upgrading the bandwidth. The initial costs of kECU 7 are earned back within less than a year (return of investment (ROI): 129%). Caching on this level is an excellent investment.
For the SURFnet top-level cache, the savings on bandwidth was around ECU 4500 or kECU 55 per year (see section 2.1). The costs for that top-level cache are kECU 24 per year. So saved costs are kECU 31 a year and the initial costs kECU 35 are earned back in just more than one year (ROI 89%). The savings for the whole mesh are around kECU 172; the costs are around kECU 115 per year, which means, given an initial investment of kECU 113, a ROI of 52%. The savings on transatlantic traffic for the whole mesh amounts to the equivalent of 1 Mbps.
These calculations are based on the figures measured in June 1997 which give only an instant exposure of the situation and they are "worst case." Actual savings and benefits will be higher. Figure 3 shows an increase from 100 Mbytes saved a day to 500 Mbytes a day in the first half of 1997 for the top-level server. If this trend continues in the next half year with only half the increase of the first half year, the return of investment for the top-level cache will rise above 200%, and the initial investment is earned back within half a year. Of course, such a rapid growth asks for new investments in hardware. But the idea that those investments are earned back within a year is very comforting.
Another benefit is the reduced latency: On average, objects are delivered to the users of the cache in 79% shorter time than without caching. This means a much better perceived quality of service for the users of the caching service.
Finally, the figures presented here were obtained without forcing the users to use the caching service. With caching enforced, e.g., through transparent caching, results would be spectacular (refer to the appendix of Ref. 7).
Part of this work is funded by the Telematics Applications Programme of the European Union under contract nr. RE1004 and by the SURF-ACE Programme of the SURF Foundation in the Netherlands.