Eiji KAWAI <firstname.lastname@example.org>
Ken-ichi CHINEN <email@example.com>
Suguru YAMAGUCHI <firstname.lastname@example.org>
Hideki SUNAHARA <email@example.com>
Nara Institute of Science and Technology
In this paper, we discuss the scalability issues arising in distributed WWW caching systems that exchange meta information among caching proxies using Internet Cache Protocol (ICP). One of the main objectives of WWW caching systems is to reduce WWW traffic on the Internet. Because the traffic generated by ICP does not usually convey WWW objects, it might be considered as a waste of network resources. However, ICP is expected to improve the hit rate so that the system would in turn reduce WWW traffic on the Internet. Obviously, a system analysis based only on the hit rate is inadequate to evaluate the effect of the distributed WWW caching system. Therefore, it is indispensable to examine quantitatively how much the traffic is reduced or, conversely, increased by the system. For the quantitative analysis of the system effect, we evaluate the amount of overall traffic generated by ICP and HTTP in the system with siblings as well as in the system without siblings. From the results, we derive the requirements of the system for substantial traffic reduction. Considering the fact that the remote hit rate of a single sibling proxy is usually very low, we conclude that the distributed WWW caching system using ICP can hardly contribute to the reduction of WWW traffic.
A distributed WWW caching system consists of caching proxies that share their caches. Squid  is one of the caching proxies widely used for composing a distributed WWW caching system. Squid does not have a function to manage meta data such as information about objects kept in other proxies so that Squid makes queries to the others using Internet Cache Protocol (ICP)  when it receives an HTTP request for documents not stored in its local cache.
One of the main goals of a WWW caching system is to reduce WWW traffic on the Internet. Since a local cache hit at a caching proxy does not produce any further traffic between the proxy and the origin server, the local cache hit has an advantage in the amount of traffic generated on the Internet. On the other hand, if the proxy does not have the requested object in its local cache, ICP messages (i.e., ICP queries and ICP replies) are exchanged over the Internet. The ICP messages usually do not convey any WWW objects so that they are just overhead of the system from the viewpoint on the amount of traffic. Furthermore, the remote hit rate produced by ICP messages is very low in many cases. The number of accesses to WWW pages as a function of its access ranking is well known to follow the Zipf distribution [3, 4], which indicates that most objects are accessed only once and are never accessed after that. In other words, most objects cannot be shared among the proxies. Consequently, most of the ICP messages can be considered as a waste of network resources. To make matters worse, remote hits do not decrease the traffic on the Internet. For each remote hit, the caching proxy still has to forward the HTTP request to the remote sibling proxy and obtain the object from it.
According to these complex circumstances, we can conclude that the hit rate is not adequate for the evaluation of distributed WWW caching systems. In fact, even if the total hit rate increases due to remote hits, it is likely that ICP generates more traffic than the traffic reduced by remote hits.
In this paper, we focus on the total amount of traffic over the Internet generated by a distributed WWW caching system. In general, communication cost is evaluated by means of traffic volume or number of packets. For the evaluation of total traffic on the Internet, however, we define new indices that allow for hop count. By considering hop count, we can evaluate total communication cost quantitatively. The goal of this paper is to make it clear that the distributed WWW caching systems using ICP can hardly decrease the network resource utilization on the Internet.
In Section 3, we introduce Packet-Hop Index (PHI) and Byte-Hop Index as measures of communication cost. In Section 4, we describe a model of the communication environment. The characteristics of WWW traffic on the Internet are summarized in Section 5. Finally, we evaluate the distributed WWW caching system and discuss several design issues arising there.
Several works closely related to the performance evaluation of the WWW system, including the caching proxy services, have been done so far. However, there are few reports covering the proxy caching systems.
Some researchers have argued that ICP degrades the system performance [5, 6]. However, they only pointed out that ICP added an extra RTT to the object retrieval time in case of a remote cache miss. Since WWW users usually mind how long the service delay of WWW access is, its reduction will benefit them. However, the extra RTT is not so significant for WWW users because the extra RTT is only in the order of hundreds of milliseconds. Normally WWW users are irritated by a service delay of seconds or more. Therefore, the performance evaluation mentioned in [5, 6] is not enough to reveal reasons why the service delay is quite large. Furthermore, benefits and/or drawbacks of the caching proxy servers are still not clear.
In general, network performance is going to degrade drastically in a situation where the network is highly loaded, as mentioned in . Today, the Internet can be considered as a highly loaded network because of the rapid growth of WWW services on the Internet. Therefore, we can suppose that reducing the WWW traffic can help decrease the service delay. Originally, the caching proxy system as well as the ICP was designed as a simple, small, and efficient system for reducing WWW traffic over the Internet . However, there are few reports on the analysis of how much the caching proxy system can reduce WWW traffic precisely. These are the reasons why our research is focused on both the quantitative analysis of ICP traffic and the total traffic reduction by the caching proxy systems.
In this section, we introduce two indices: Packet-Hop Index (PHI) and Byte-Hop Index (BHI). Figure 1 gives a basic model of a communication between two entities: a client and a server. In this model, H denotes the hop count between the client and the server. The hop count is defined as the number of network segments in the route from the client to the server. In a single communication, we assume Tp packets are exchanged between the client and the server, and the total amount of exchanged data is Tb bytes. With this model, we can define PHI and BHI of the communication between the client and the server as follows:
PHI = Tp H
BHI = Tb H
Figure 1: Basic model of a communication
Next, we define a task. For example, a single access to a WWW object consists of ICP message exchanges and several HTTP sessions. With this observation, we can model that a task (WWW object access) is a set of communications (ICP message exchanges and HTTP sessions). As the definition of PHI and BHI for a communication, we expand the definition to both PHI and BHI of a task; the PHI and BHI of a task are defined as the sum of PHI and BHI of each communication, respectively.
In this section, we describe the communication model of a distributed WWW caching system. Our simple model depicted in Figure 2 consists of three components: a caching proxy server, a sibling proxy server, and an origin server. When the caching proxy receives an HTTP request for an object that is not kept in the local cache, the caching proxy sends an ICP query to the sibling proxy. If the sibling proxy has the requested object in its cache, the sibling proxy makes an ICP reply for indicating that the sibling proxy has the object. In this case, the caching proxy forwarded the HTTP request originally sent by the client to the sibling proxy to obtain the object. When any sibling proxies do not have the requested object locally, however, the HTTP request is forwarded to the origin server.
Figure 2: Communication environment of a distributed WWW caching system
Here, we let Ho be the hop count from the caching proxy to the origin server. Also, Hs denotes the hop count from the caching proxy to the sibling proxy. We suppose p and q are the local hit rate and the remote hit rate of the caching proxy, respectively. More precisely, q is the ratio of the HTTP requests for the objects that make cache hits at the (remote) sibling proxy to all the HTTP requests received by the caching proxy.
With the definitions mentioned above, we define a task of a caching proxy as a series of communications necessary to retrieve a requested object. There are three cases on the WWW object retrievals: (1) a local hit in the caching proxy, (2) a remote hit at the sibling proxy, and (3) a remote miss at the sibling proxy. In case (1), the caching proxy returns the requested object locally, therefore, no further communications are required. In case (2), two communications between the caching proxy and the sibling proxy are required: one is for an exchange of ICP messages (a query and a reply), and the other is to obtain the object from the sibling proxy through the HTTP connection between the caching proxy and the sibling proxy. In the last case (3), there are two communications required: one is for ICP with the sibling proxy, and the other is for HTTP with the origin server. Note that we do not discuss the case where multiple sibling proxy servers are configured for the caching proxy server. It is popular to use the multiple sibling proxy servers, however, it can be considered as just an additional overhead on the process of ICP message handling. In this case, communications corresponding to each sibling proxy are simply added cumulatively. Hence, the PHI and BHI are larger than ones in the case where only a single sibling proxy is used.
As a basis for the quantitative analysis of the caching proxy system, we examined the statistical characteristics of the WWW traffic. The statistical analysis shown in this section is based on the log data generated by the Squid caching proxy system that was in operation at our institute for a month in May 1998. During this period, the local hit rate of our Squid was 34% and the total remote hit rate was 6% with the configuration where the number of siblings was 8. The details of this analysis are also shown in the Appendix.
For estimations of Tp and Tb, we carefully did examinations on the size of WWW objects and the size of both ICP messages and HTTP requests. We observed that the average size of WWW objects handled by our Squid was 7356 bytes and its median was 1460 bytes. For further analysis, we classified the WWW objects into four categories based on their cache status, and examined the average size for each category. In the case of a local hit, the average size of the objects obtained from the local proxy was 4606 bytes. In the case of a local miss, which means the local proxy does not have the requested object in the local cache, the average was 8393 bytes. In the case of a remote hit, which means the local proxy does not have the requested object but the sibling proxy has it, the average was 6958 bytes. In the case of all miss, which means the local proxy as well as the sibling proxy does not have the requested object, the average size of the objects in this category was 8538 bytes. In this section, BLH, BLM, BRH, and BAM denote these average sizes, respectively.
The other important number we have to know is the average size of ICP messages. Because there is no information on the size of ICP messages in Squid's log data,1 we had to estimate the size based on the retrieved URL. In brief, a single ICP message defined in  has its header and the payload in which a retrieved URL is stored as ASCII text. With this observation, we can derive that the average size of ICP messages is about 70 bytes.
The data exchanged between a client and a server is divided into several packets. The number of packets, however, depends on many factors: (1) implementation of protocols such as ICP, HTTP, TCP, UDP, and IP; (2) underlying datalink technologies; and (3) the conditions of the network such as congested or not. We have made a rough estimate of the number of packets, P, for the object size, B, as follows:
Using this estimation, we can calculate the average number of packets exchanged for a retrieval of a single WWW object over HTTP. In case of local hit, local miss, remote hit, and all miss, the average number of packets is 14.87, 18.21, 16.92, and 18.35 respectively. In this paper, PLH, PLM, PRH ,and PAM denote these average numbers of packets, respectively. The summaries of the results are shown in Table 1.
|Result||Rate||Object Size||Number of Packets|
|local hit||34.3%||4606 bytes||14.84|
|local miss||65.7%||8790 bytes||18.21|
|remote hit||6.5%||6958 bytes||16.92|
|all miss||59.2%||8993 bytes||18.35|
Figure 3 shows the distribution of the hop counts from our institute to origin servers measured by traceroute program. In this measurement, we listed up origin servers from Squid's log data, and tried to apply traceroute for each origin server. Because several WWW servers do not permit ICMP probes by some security reasons, our measurements for more than 20% of origin servers failed. Figure 3 does not include any measurements for this kind of WWW servers.
Obviously, there is an argument that the hop counts to each origin server may vary, because there are several alternative routes from our institute to the origin server. However, the hop counts we measured can be considered accurate enough to make our analysis, because of the stable operation of the Internet. It is not likely to change the count frequently. Note that the distribution of hop counts to origin servers from any other sites may change drastically. In spite of a certain level of inaccuracy, we can find a tendency in the distribution of hop counts; the most hop counts range roughly from 5 to 20, and its mode is 10.
Figure 3: Distribution of hop counts from our proxy to origin servers
The goals of our performance evaluation of the distributed WWW caching system are to examine whether the WWW caching proxy system actually reduces the WWW traffic on the Internet and to estimate the threshold where benefit produced by the system exceeds the communication overhead for the WWW cache system itself. For these goals, we apply both PHI and BHI to tasks we observed.
With our observation on our Squid, we can assume that there is no correlation between the hop counts to the origin servers Ho and the distribution of the packet size P as well as the object size B. In other words, this assumption means that the distributions of both P and B do not depend on Ho. With this assumption, we are going to use the average PHIT and BHIT of tasks whose objects are retrieved from origin servers with the hop count Ho as the performance index of the WWW caching proxy system.
In this section, we discuss our quantitative analysis on the cost associated with communications among a caching proxy, a sibling proxy, and an origin server in the distributed WWW cache architecture. Our approach is to use both PHI and BHI. Furthermore, we derive the boundary conditions where the benefit produced by the system exceeds the communication overhead for the WWW cache system. Based on these analyses, we point out the drawbacks of the distributed WWW cache system using ICP.
As this article is focused on the benefits/drawbacks of the distributed WWW caching proxy system, we apply the PHI to two cases: a caching proxy with a sibling proxy and one without a sibling proxy.
In the case with no sibling proxy, all the HTTP requests that make local cache misses are forwarded to their origin servers. Therefore, there is no overhead for handling ICP messages. We define PLM as the average number of packets exchanged between the caching proxy and the origin server. The PHI can be formulated as follows:
A caching proxy with a sibling proxy acts more complicated. When an HTTP request misses the local cache at the caching proxy, an ICP query is sent to a sibling proxy. Since the average size of ICP queries is about 70 bytes as mentioned in Section 3, a single UDP packet can carry each ICP query. The ICP query makes the remote hits at the sibling proxy with the probability q. In this case, the HTTP request is forwarded to the sibling proxy. If the ICP query makes a miss, on the other hand, the HTTP request is forwarded to the origin server. The average number of packets exchanged between the caching proxy and the origin server is defined as PAM. Therefore, the PHI is derived as follows.
|=||(2(1 - p) + qPRH)Hs + (1-p-q)PAMHo|
It is obvious that the communication cost for ICP messages is inversely proportional to the hop counts to the origin server. The underlying idea of the sibling proxy is to obtain WWW objects from a proxy closer than the origin server in order to reduce the data transfer overhead. If the origin server is closer than the sibling proxy, there is no advantage to using the sibling proxy; getting WWW objects directly from the origin server is more efficient in terms of communication overhead. The remote hit rate is also an important factor. The higher the remote hit rate, the more benefit the caching proxy can get. Therefore, from Equation 2 and Equation 3, we can derive the boundary condition where the ICP overhead is equal to the benefit of the sibling proxy:
(1-p)PLMHo = (2(1 - p) + qPRH)Hs + (1-p-q)PAMHo We can get the minimum requirement of remote hit rate q by solving this equation for q.
As shown in Table 1, the value of p, PLM, PRH , and PAM is 0.34, 18.21, 16.92, and 18.35, respectively. With these numbers, we examine the relation between Ho and q. The value Hs varies from 1 to 8, as a parameter. Figure 4 depicts the relationship.
Figure 4: Threshold (PHI)
Note that the total remote hit rate of more than 10% cannot be achieved even if we use the multiple sibling proxy configuration2 in the ordinary environment. With our evaluation of log data of Squid operated at many sites, few sibling proxies achieve a high remote hit rate of more than 2% individually.
Figure 4 shows the break-even condition between q and Ho, in terms of communication cost based on the PHI. For example, considering WWW servers within 15 hops from the caching proxy, we can conclude that the sibling proxy whose sole remote hit rate is less than 2% has no advantage unless the sibling proxy is within 2 hops away from the caching proxy.
Next, we examine the impact of the local hit rate p to the performance; we vary the local hit rate p. The results are shown in Figure 5. Here we set Hs to 3 (the minimum hop count in the model shown in Figure 2). In general, the higher a caching proxy achieves the local hit rate, the lower the remote hit rate becomes. It is most likely that we can't find the WWW object in the sibling proxy if we can't find it in the local caching proxy, since the distribution of accesses to WWW objects follows the Zipf distribution as mentioned in Section 1. Therefore, even when we improve the local hit rate, it becomes harder to fulfill the requirement on the remote hit rate.
As our conclusion, it is hard to achieve the traffic reduction on WWW accesses in the architecture of the WWW caching proxy with sibling proxies. Especially, it is not practical to use siblings in other organizations where their networks are normally far with the hop count of more than three.
Figure 5: Threshold on Hs = 3 (PHI)
Most of these evaluations can be done in the same way as those of PHI.
The average number of bytes exchanged between the caching proxy and the origin server in case of local miss is denoted as BLM. The BHI is formulated as shown below:
The average sizes of both ICP queries and replies are about 70 bytes, as mentioned in Section 5. To make the calculation simple, we fix them to 70 bytes. BRH denotes the average number of bytes transferred between the caching proxy and the sibling proxy in case of remote hit. The BHI is formulated as below:
|=||(140(1-p) + qBRH)Hs + (1-p-q)BAMHo|
Similar to the analysis on PHI, we derive the boundary condition. From Equation 4 and Equation 5, the boundary condition is expressed as follows:
(1-p)BLMHo = (140(1-p) + qBRH)Hs + (1-p-q)BAMHo
Now, we can obtain the minimum requirement of the remote hit rate q by solving this equation for q.
As shown in Table 1, we can set the value of p, BLM, BRH , and BAM at 0.34, 8790, 6958, and 8993, respectively. Hs is a parameter varied from 1 to 8. Figure 6 shows the boundary conditions.
Figure 6: Threshold (BHI)
In comparison with the case of PHI, the required remote hit rate q decreases rapidly as Ho increases. We can point out from this result that a sibling proxy whose remote hit rate is less than 2% has almost no advantage on reducing WWW traffic even if the sibling proxy is hooked up to the same network segment with the caching proxy. As mentioned in the previous subsection, sibling proxies rarely achieve a remote hit rate higher than 2%. Therefore, this result indicates that most of the sibling proxies cannot provide any benefit for a caching proxy, in terms of reduction of the WWW traffic.
Figure 7: Threshold on Hs = 3 (BHI)
Similar to the PHI analysis, we vary the local hit rate p, and figure out how much the boundary condition between Ho and q is changed (Figure 7). In this analysis, the Hs is fixed at 3. From this analysis, even in the case that the local hit rate is 50%, the sibling proxy has to achieve a remote hit rate of more than 2%. With the same reasons described in the case of PHI, we can conclude again that most of these siblings are useless and ICP messages can be considered completely as a waste of network resources.
The essential drawback of ICP is that ICP queries are invoked every time any HTTP requests make local miss at the caching proxy. Though ICP uses UDP as its transport layer service and its message size is fairly small, it is too expensive to provide any improvements in terms of the WWW traffic reductions. RFC2187  asserts the advantages of ICP such as:
Some of these would be correct, however our analysis confirms that the performance advantages described in , especially on both traffic and delay reductions, cannot be achieved by almost all of the sibling proxies in the ordinary environment of the Internet.
Other approaches can be taken for making improvements on the distributed WWW caching proxy system. One of them is to use the fixed configuration for forwarding the HTTP requests, such as a hierarchical "tree'' configuration among caching proxies. The purpose of ICP is to identify the hosts to which the HTTP requests are forwarded in case of local miss at the local caching proxy. Since the "on-demand'' cache query scheme implemented as ICP can be considered as a waste of network resources, using the fixed configuration for forwarding the HTTP request might be a possible solution because of eliminating any uses of ICP. This approach can help on reduction of the ICP traffic, however it is still too hard to achieve a remote hit rate higher than one using ICP. Consequently, it is more likely that the forwarded HTTP requests cause the cache misses at the remote proxy. Therefore, this approach cannot make much of a difference from the case of ICP.
The other approach is to use a "transparent'' caching system. Recently, several networking equipment vendors released their new WWW cache products in which they are using the "transparent'' caching scheme. Their basic idea is to sneak and cache the WWW transactions at the router. This kind of system resides with a router where the system always monitors any HTTP requests and replies crossing through the router. If an HTTP reply includes a WWW object, the system copies and puts the object into its cache. If the system finds an HTTP request for an object stored in its cache, then the system intercepts the request and returns the object back to the client who claims the object. This approach can achieve traffic localization and may reduce the overall traffic for the WWW on the Internet, however other kinds of issues may arise such as security issues, a waste of cache storage, etc.
The demands for the capacity of the backbone network have been increasing, and it is no exaggeration to say that this demand is caused by the growth of WWW use. However, the actual backbone capacity is not sufficient for all the demand on the Internet. Accordingly, it becomes more significant that any schemes to reduce WWW traffic are developed. As discussed in this article, caching proxy systems using ICP cannot contribute to improvements on the reductions of WWW traffic. Therefore, it is necessary to develop a practical, scalable, and efficient cache mechanism for this goal.
We would like to express our gratitude to Prof. Yuji Oie of Kyushu Institute of Technology, Japan, for his advice and suggestions covering various areas related to this work.
We made statistical analysis on the log data generated by Squid operated at our institute for a month in May 1998. Many studies on the characteristics of the WWW traffic have been done so far, and many of them reported similar results (e.g., [9, 10]). Here we show the details of our analysis as a help for your understanding of our communication cost evaluation discussed in this article.
Our Squid caching proxy system is in operation on SUN Microsystem's Ultra Enterprise 3000 server (4 CPU of Ultra Sparc 168MHz, 256MB of memory). Our Squid is configured to use 60MB of memory and 2GB of hard disks for its operation. We also set up that the Squid refers eight sibling proxies on the Internet, however, this sibling configuration is only for the purpose of evaluating ICP performance. More than three million HTTP requests are processed for a month in this Squid server. Here, we divided the results into four categories described in Section 5.1. Table 2 shows the summary of our statistical analysis. The distribution of the object size for each category is depicted in Figure 8.
|Result||Number of Requests||Rate||Average Object Size|
|LOCAL HIT||1132961||0.34274||4606 bytes|
|LOCAL MISS||2172607||0.65726||8790 bytes|
|REMOTE HIT||215376||0.06516||6958 bytes|
|ALL MISS||1957231||0.59210||8993 bytes|
Distribution of object size (May 1998) LOCAL HIT
Distribution of object size (May 1998) LOCAL MISS
Distribution of object size (May 1998) REMOTE HIT
Distribution of object size (May 1998) ALL MISS
Figure 8: Distribution of object size
In order to estimate the average size of ICP messages, we investigate the average size of URLs in HTTP request. An ICP message consists of a 20-byte header and a URL in an ASCII text. The distribution of the size of URLs is shown in Figure 9. The average size of the URL we observed is about 50 bytes. Consequently, about 70 bytes can be given as an estimation of the average size of ICP messages.
Figure 9: Distribution of size of URLs
Based on the actual observations on the number of packets through our operation of Squid, we plot its distribution in Figure 10. Note that this distribution has interesting characteristics. The log of the number of requests in which requested objects are divided into P packets is almost proportional to .
Distribution of number of packets (May 1998) LOCAL HIT
Distribution of number of packets (May 1998) LOCAL MISS
Distribution of number of packets (May 1998) REMOTE HIT
Distribution of number of packets (May 1998) ALL MISS
Figure 10: Distribution of number of packets
We estimated the number of packets exchanged on HTTP with two assumptions: (1) the maximum segment size (MSS) of TCP is 1460 bytes, and (2) any packets for the retransmission due to network congestion are ignored. In the TCP connection establishment phase, three packets are transferred between a server and a client. Then, the client sends an HTTP request in a single packet and the server sends an ACK packet back to the client. Here, we define B as the size of the WWW object the client requests. The object is divided and stored in packets. For each data packet, a single ACK packet is sent from the client to the server. With our analysis on the implementation of the HTTP, the number of the ACK packets can be supposed as described in Table 3. Consequently, we can formulate the number of all the packets exchanged between the client and the server as Equation 1 in Section 5.2.
|Tb||Number of ACK Packets|
|0 - 1460||1|
|1261 - 4380||2|
|4381 - 8760||3|
|over 8760||an additional ACK per 4 data packets|