Monitoring the Performance of a Cache for Broadband Customers
Bob WARFIELD <email@example.com>
Replication, or caching, of Web content offers benefits to both ISPs and users. Much of the traffic coming from the Web into an ISP's network is redundant in the sense that exactly the same content is being requested by a number of customers. The redundant traffic can be reduced by caching, hence saving costs for the ISP. Files served from the cache are served, on average, faster than from the Web. This improved speed can save customers time and money. Balanced against these two benefits are the problems of optimizing freshness of the cache contents and managing expansion of the cache as traffic demand grows.
This paper reports on work done to monitor the performance of a cache serving a group of users with broadband access via a hybrid fiber-coax cable system in Australia. Monitoring the cache performance concentrated on the following four major dimensions:
The choice of parameters to monitor was tied closely to our understanding of quality of service for this group of customers. In particular, reducing the time required to download large files is an important aspect of using broadband access to the Internet.
A management system with browser interface was developed to examine cache performance on a daily basis, with archival data and drill-down facilities. Actual performance is discussed in the paper, including improvements that were achieved.
The conclusion of the study reported in the paper is that cache performance can be improved by monitoring performance and fine-tuning caching parameters.
The objective of this study is to support the management of a cache. The primary objective is to ensure that the best possible Quality of Service is delivered to customers. The second objective is the management of the cache, including monitoring savings in WAN bandwidth, and determining when the cache needs to be expanded.
The cache described here is part of a server complex that provides authentication, the Domain Name System (DNS), e-mail, news, and so on. Overall performance metrics relating to the various other functions are monitored routinely. This paper concentrates only on the metrics chosen to examine the performance of the cache.
Hybrid Fiber-Coax (HFC) access to the Internet provides a preview, in a way, of a future broadband Internet. However, providing high performance globally for the minority of users who want it, and are willing to pay a premium is not yet practical and economic. Hence, the emphasis on caching to provide high performance.
Customers of the present HFC access service may work in telecommunications, computer or media-related industry or have some other reason for wanting high performance. Broadband access allows them to download graphics, audio, and video at high speed. Hence, speed of delivery of pages from the Web is an important indicator of Quality of Service for these customers.
Another aspect of Quality of Service is freshness. If an object is fetched from the Web and is designated as cacheable, then it will be stored in the cache and assumed to be fresh for some period of time. However, if the object is updated on the Web before its expiry date, or if it has no explicit expiry date, then it is possible that the proxy server will serve a copy of the object that is not fresh. Measuring freshness is not straightforward -- the solution presented below is based on processing the proxy logs daily.
In addition to monitoring Quality of Service indicators related to speed and freshness, we also monitor the savings being achieved and the spare capacity of the resources used. These measurements allow for the management of the cache.
In the U.S., the NLANR  study has taken caching research further by considering the performance of the caching hardware platforms. Parameters such as CPU utilization, page faults and disk throughput are recorded and analyzed in order to determine future cache expansion.
The JANET caching project in the U.K.  has taken an alternative perspective on caching performance by considering economic and user satisfaction factors; for example, cost savings attributed to reduced traffic and costs incurred by help desk facilities were considered. The time saved by users accessing the Internet through caches were also calculated.
A study by Wooster from Virginia Tech  considered the impact of various cache-purging strategies on the retrieval times of documents.
Retrieving material from a cache over an HFC network and Cable Modem is extremely rapid. Downloading a multi-Mbyte file in a few seconds is quite typical. Streaming audio and video, served from the cache, operate at very high quality. The question is how to monitor and report on this in a way that effectively describes Quality of Service in customer terms.
Our aim is to measure, as well as possible, the experience of customers retrieving material from the Web. Demonstrating that a big improvement can be achieved is very easy, but quantifying and measuring the improvement caching is responsible for, over all customers over a period of time, is more complex.
Our first approach is to ask the hypothetical question: how much faster is the service as a result of caching. Getting a meaningful answer to this question is not straightforward, as explained below.
Individual objects are served much faster, on average, by the use of caching, but the reduction in the time taken to download an entire page is a complex quantity. Estimating the improvement in page latency requires the examination of all the files that make up the page and a comparison with the hypothetical case of no caching.
Each page may consist of many parts:
Content owners may require that some components (such as hit-counters and advertising banners) not be cached. Furthermore, each component part may be served from either RAM or disc at the remote server pointed to by the URL, or may be served from any other server on the Internet. The structure of a sample page is illustrated in Figure 1, below. The illustration shows a page with URL: www.server.com/front_page.htm. Parts of the page are cacheable, parts are non-cacheable, and one part (in this illustration, the background) is served by another host at www.other_site.com.
The experience of the customers depends on their own client cache, the access network, the proxy server cache, the WAN, and one or more remote servers. Determining how much the proxy server cache improves the customer experience is difficult. The complexities of proving the improvement in speed of downloading pages due to caching are avoided by using a simple model that allows us to summarize the improvement due to the use of a proxy server cache. We monitor the average transfer rate of objects from the cache and from the Internet. The simplifying assumption is used that all bytes transferred from cache would otherwise have to come from the Internet at the measured average transfer rate. Using this simple model, Quality of Service to customers is monitored and reported in terms of the time saved per customer per month, on average. For users with broadband access, a significant number of hours can be saved each month through caching because the service from the cache can be much faster than service from the Web. Example results are shown in the next section.
Another approach to quantifying speed improvement is to compare the speed with other access technologies. A 28.8k modem is a convenient baseline at present because the technology is widespread and runs at a fairly predictable rate (as compared with ISDN, which is less widespread, or a 56.6k modem, which typically connects at a rate from 40 to 50 kbits/sec). The maximum speed measured in transferring pages over the HFC access is over 100 times greater than using a 28.8kb/sec modem. However, the speed that users actually experience can be affected by limitations of the WAN or the remote server. Hence, a Quality of Service measurement must include the variation in speed experienced. As well as plotting total time saved, the management system also reports on spread of transfer rates.
Additional techniques for measuring "page performance" include the logging and analysis of error messages, and monitoring the performance of sample pages. A configurable list of URLs is fetched at regular times throughout the day. Performance variations can indicate specific problems or merely a variation in the condition of the Internet.
If an object were known to be stale at the time of receiving a request, it would not be served from the cache. Instead a fresh copy would be retrieved using a GET command. If an object is suspected to be stale, a CONDITIONAL GET command can be used, which asks for a new copy if it has been modified since the date associated with the object in cache.
Issuing GET and CONDITIONAL GET commands too frequently can cancel the performance benefit from caching. The GET command will result in a new copy being fetched -- with the delay implied by transferring the object over the Internet. Even a CONDITIONAL GET that results in confirmation that the object is fresh will add some time to the serving of that object from cache. If a page is made up of a number of small objects, and a CONDITIONAL GET is issued for each one, several seconds can be added to the transfer time.
The expiry policy of the proxy server determines the balance between speed and freshness. Objects are given an expiry time based on:
If the content provider stipulates an expiry time, and does not update the object before that time, then the object will never be served from cache if it is "stale." However, if the object is updated before its stated (or assumed) expiry time, then it will be served "stale" until the next GET or CONDITIONAL GET command is issued.
Either special-purpose "freshness sampling" processes can be run, or, if the estimate is to be based on logs only, a very loose upper bound is estimated. Looking at the logs retrospectively, it can be concluded that some objects served from the cache were fresh; sometimes they were not served because they were known to be stale; and sometimes it can not be said for certain whether they were fresh or stale when served. The measurement ambiguities and some sample results are shown in the next section.
Techniques for managing freshness aim to maximize the probability of serving a fresh object from the cache, or limit the time by which an object may be out of date when served, while retaining most of the cost savings from reducing redundant traffic.
Differentiating on type is important. In the absence of any other information about expiry time of a file or its current age, we could assume an expiry time of one day for text and one week for images. Changing those limits to 0.1 days for text and one day for images (for example) would ensure that no text object would be served if it were more than 0.1 days out of date. Although this reduces the hit ratio somewhat, it gives a substantial improvement in freshness that can be guaranteed.
Although a CONDITIONAL GET can be used to check contents, it is not infallible. For example, a content provider may update content on January 1, then again on January 11. If the object is cached on January 12, then the date last modified will be January 11. If the content provider is not satisfied with the new version and reverts to the version of January 1, the object in cache will be more recent than the version on the content provider's site. A CONDITIONAL GET based on the "If Modified Since" test will decide that the object in the cache is more recent and continue to serve it.
The only sure way to freshen the cached object is by using an unconditional GET, but using this command every time would be equivalent to removing the cache.
If a new type of CONDITIONAL GET with a version number could be used, that would give an unambiguous test of whether the version of the object in cache is the one that the content provider intends to serve to the users. In practice, the best solution would be for content providers to use the "Date Last Modified "as an effective version number. When reverting to an old version, the date should be updated to the current date.
Redundant traffic can be reduced by caching, and the effect can be easily measured. However, some additional traffic may be generated by prefetching and other functions that draw traffic from the Web into the cache without a user requesting it. Also, charges for material served from cache may be less (for example, due to shorter connect time). Cost savings are estimated using a simple model.
Savings are relatively easy to measure. Figure 2 below shows the basic byte flows into and out of the cache. The flow model depicted in the figure includes the following flows in Mbytes (totaled over a suitable time period, such as one hour):
In general an ISP may attribute costs to all bytes fetched from the Web, either as a direct per-Mbyte charge, or an equivalent cost per Mbyte. Revenue will come from (any or all of) the Mbytes served, the time each customer is connected to the ISP, a flat monthly rental per customer, or some other formula.
The simplest model says that the Mbytes served to the customers equals A+B+C, while the Mbytes fetched from the Web equals A+B+F. Hence, the cache has saved (C-F) Mbytes over the period of observation. The savings are given by:
Savings = (Cost per Mbyte) x (C-F)
Clearly, where WAN resources consist of international transmission links, plus all the other national and international resources needed to fetch an object from the Web, the savings can be considerable.
As a second-order effect, we also should consider possible changes in customers' habits. Using the cache, customers experience quicker downloads than otherwise. This may mean that they finish their work more quickly and therefore may spend less time on the Web, or they may find the service more useful and hence spend more time on the Web every month as a result of caching. Similarly, the volume of material that they choose to download each month may be less than, the same as, or greater than it would have been without the cache.
By monitoring spare capacity in terms of storage and CPU resources, and using trend extrapolation from historical records, it is straightforward to estimate when the cache will need to be expanded. Of course, hot-spots in time and in particular components of the caching system have to be allowed for in the estimates.
A cache requires processing and storage resources, which must be engineered to provide capacity for the peak demand. Engineering rules may require a forecast of the peak demand during the busiest half hour of the coming six months (for example).
Proxy log files contain detailed information about every request which customers have issued over a period of time. This includes date, time, URL, HTTP status code(s), cache status and transfer time. Subject to strict privacy controls, this information can be aggregated to analyze the interactions that users have had with the cache over a period of time. This analysis is essential for contrasting the performance of the cache with the performance of the global Web. In particular, it is trivial to determine whether a requested resource was cacheable, newly created in the cache, refreshed in the cache or validated at the origin server.
Providing a management system for a cache is a relatively simple task. The proxy log files must be processed periodically to obtain the required metrics, which are then made accessible via a Web server. The Common Gateway Interface (CGI) is used to provide an interactive interface, with standard HTML (Hypertext Markup Language) and GIF images as the presentation format. System managers can select a report and define the required interval of time. This allows both short- or long-term trends to be discovered. Example reports are given the following section.
When measuring latency, what is of most interest to customers is the amount of time saved due to the presence of the cache. This figure is hypothetical since it is not practicable to observe the performance of the cache in parallel with a proxy server with no cache. Instead, one can calculate the number and volume of requests, and then extrapolate to the two extreme cases where there is no cache (zero byte hit rate) and when there is an ideal cache (100% byte hit rate). A linear extrapolation is assumed in order to gain an understanding of the effect of the byte hit rate on latency. Live results are illustrated in Figure 3.
The current cache saves about 15% - 25% of the download time that would otherwise have resulted if there was no cache. Potentially, this can improved further by increasing the byte hit rate from its current value. Alternatively, if the data transfer rate from the cache can be increased, then the right extreme point will be lowered. Thus, the extrapolated curve will become steeper and the incremental gain per increase in byte hit rate is improved. Work is in progress to achieve this benefit.
Hypothetically, we can draw a comparison between the amount of time required per customer per month using broadband access and a cache with the total logged-in time that would be needed using a modem. Clearly, this is a hypothetical comparison only, and the ratio of the two times depends on assumptions made about transfer rates that could be obtained with a 28.8k or 56k modem. In general terms, the average data transfer per customer would require hundreds of hours per month using a modem.
It is not practicable using existing protocols to determine precisely the freshness of each and every response that has been cached. The reason for this is that the proxy server log files record the state of the origin server at a finite number of instants in time - when and only when a user issues a request. Some of these requests involve interactions with the origin server from which freshness-related information can be inferred, for example, a resource has been modified since it was last stored in the cache. For many responses, however, the freshness cannot be ascertained, since the proxy server simply does not know when each and every resource modification occurred at the origin server. This situation is illustrated in Figure 4.
If only successful responses to the client are considered, that is, HTTP status code 200, then there are three likely remote-side scenarios that may occur. First, if the proxy server provides a response from the cache without validating it with the origin server, then such a response may be stale. However, until further information is received, the freshness of this particular response is unresolved. Then, if a validation of the same resource occurs (with an HTTP status code of 304 from the origin server), then it is now certain that the previously unresolved responses were actually fresh. Alternatively, if the cached copy of the resource is then refreshed (with an HTTP status code of 200 from the origin server), the previously unresolved responses have an indeterminate freshness, because it is not known whether there were zero, one or many modifications of the resource since the last retrieval. The consequence of this limitation is that it is only possible to determine a lower bound of freshness or an estimate.
As a sample, 14 days of continuous data was processed using this technique. The data collected included only non-CGI resources, since these files may be cacheable and are of concern with respect to freshness. Only successful responses (HTTP status code 200) were treated, which is a subset of the defined cacheable responses. Figure 5 illustrates the results.
Figure 5 illustrates an issue with processing proxy server logs as a means of determining freshness. Many resources are requested once, re-requested several times for a brief period and then never requested again. This leads to an excessive percentage of unresolved resources. Of the resolved requests, 87% of cached content is known to be definitely fresh and the remaining 13% may be fresh or stale. It is possible to refine this further using dates which the proxy server receives in the HTTP headers, but there will always be an indeterminate component. Estimation of freshness based on dates may then be applicable.
Caching clearly offers speed improvements that are particularly noticeable for high-speed access over an HFC network. These performance improvements can be monitored and quantified in terms of hours saved per month. The performance improvement increases as the byte hit ratio improves.
Monitoring freshness accurately is inherently difficult. Managing freshness involves a compromise. For example, putting a longer expiry time on objects could improve hit-ratio, but is not recommended due to problems of serving stale objects. Instead, frequent CONDITIONAL GET commands can be used to limit the maximum possible time between an object being updated on the Web and in the cache.
In general, monitoring a proxy server cache and fine-tuning the caching parameters can improve the speed and freshness delivered, while still retaining savings in WAN bandwidth.
The authors gratefully acknowledge the permission of Telstra to publish this paper.
Many thanks to Mani GunJur who assisted with the processing of the proxy logs for this project.