Yutaka NAKAMURA <firstname.lastname@example.org>
Ken-ichi CHINEN <email@example.com>
Hideki SUNAHARA <firstname.lastname@example.org>
Suguru YAMAGUCHI <email@example.com>
Yuji OIE <firstname.lastname@example.org>
Nara Institute of Science and Technology
The World Wide Web (WWW) can be considered a vital service for all users of the Internet. Its use has been expanded to include many different areas of activity. Providing information and commerce services through the WWW are quite popular for both organizations and individuals. As a result, people using the WWW usually mind the quality of WWW services such as the quantity and quality of contents, the quality of images they use, the service delay for both connection establishment and data transfers, etc. To improve the level of users' satisfaction, the administrators of WWW servers have to manage their servers to provide better service. From the view of technical management tasks, the server operators have to know how their servers are providing the service and what technical issues should be considered. However, this task still has to depend on their intuitions and experiences, because of lack of tools and utilities for managing the WWW servers, especially for management of large WWW systems. Several approaches can be taken for helping the WWW server operators. Our approach discussed in this article is to provide a powerful monitoring tool for performance measurement of the WWW systems. Our system called ENMA is aiming to reveal all the behavior we can observe from the outside of WWW servers, and allows us to examine the performance measures. The ENMA enables the WWW operators to measure the WWW performance easily. In this article, we discuss the design and implementation of the ENMA system, and show an example in which we applied this system to actual WWW servers operated on the Internet.
The World Wide Web (WWW) has emerged as a vital service on the Internet. Currently, the number of WWW hosts is estimated to be more than 35,000,000 . WWW traffic occupies approximately 80% of the total traffic carried on the Internet  .
A variety of services using the WWW have been developed. A typical use is to publish various kinds of information through home pages. A large number of companies, individuals, universities, and organizations are now providing their own home pages through the WWW. Composing WWW pages as technical/cultural showcases is also widely accepted in many areas, for example the Internet 1996 World Exposition (IWE96)  . In 1998, WWW sites for the Nagano winter Olympic games and FIFA's football world cup particularly attracted people in the world. For these large-scale WWW services, millions of accesses per day were observed. In recent years, electronic commerce has been growing in popularity. Currently, we can make purchases, make travel reservations, and subscribe to many services through the WWW on a 24 hours a day, 7days a week basis.
The quality of WWW service is a top priority. Service administrators have to know how their services are working, and maintain their services to fulfill subscribers' requests. In order to improve the technical quality of WWW servers, operators should be aware of any changes in performance indices such as the average and peak numbers of HTTP  requests, the storage uses of both physical memory and disks, and the amount of total traffic for the services. Through watching these performance indices, the operators can decide how to improve WWW servers. However, measuring the performance of the servers is difficult for the following reasons:
As the result, administrators have to manage their servers with intuition and experience. This may cause several difficulties on the performance tuning. Therefore, the administrators are willing to have several performance indices on how well the WWW server processes users' requests.
The goal of the work discussed in this article is to develop a powerful performance measurement tool called ENMA(1) for the WWW servers. As mentioned above, performance measurements through benchmark or any mechanisms installed inside the server itself are not practical for applying them to "running" WWW servers. Our approach is, therefore, to make performance measurement through packet monitoring. Its fundamental idea is to observe the behavior of WWW services from the outside, and estimate several performance indices.
In Section 2, we review several possible ways to make performance measurements of WWW servers and show the advantages of our approach. In Sections 3 and 4, details of the design and implementation of our system are explained. We applied this system to several WWW servers and proxy servers. In Section 5, we show the result of performance measurement and discuss how this system is adequate for performance evaluations of the WWW servers.
Several ways to measure the performance of WWW servers have been developed so far: (1) log data analysis, (2) kernel-level monitoring, (3) benchmarks, and (4) customer survey (see Figure 1).
Figure 1: Several ways to measure the performance of WWW servers
In method (1), statistical analysis is applied to the activity logging by WWW servers. This is a popular way to know the performance of WWW servers. Almost all of the servers can generate various types of activity logging data as their log files. The statistical analysis of these log files allows us to know the behavior of WWW servers, such as the number of accesses, clients' access patterns, the request processing time, etc. However, this kind of analysis cannot provide any hints on behaviors that the WWW server invokes inside the OS kernel, such as its memory consumption, the number of disk input/output (I/O), the frequency of the network accesses, etc.
To solve this issue as in method (1), kernel-level monitoring, as in method (2), is frequently used. One of the popular ways in many Unix operating systems is to use a utility ktrace to monitor the WWW server. With this utility, most of the kernel-level behavior including system calls, I/O, signal processing, and file system operations can be revealed. With the other utilities, such as kernel debugging utilities, a kernel-level process profiler can be used. However, this method has serious drawbacks: its disk storage consumption and performance interferences. These kernel-level monitoring utilities generate a large volume of files in which kernel-level monitoring data are stored. Therefore, this method can be applied only for a period short enough so the disk storage does not run out. Even if the huge disks are prepared, still the performance interferences caused by these utilities cannot be negligible. Because these utilities invoke a large number of disk I/O for making a log recorded at each system call, therefore, the performance degradation is quite large. In order to minimize the performance interferences, monitoring the kernel-level behavior of the WWW servers with "modified" Unix kernel could be a possible solution. However, any approaches, including kernel modifications, require the source code of the OS. Even if the source code is available, the modification of the kernel is hard because it requires skills and expertise.
The benchmark is another performance measurement method (method (3)). For example, the SPECweb96  by The Standard Performance Evaluation Corporation and WebStone  are famous benchmark methods for WWW servers. The benchmarks can provide various indices on the performance of the WWW servers. The benchmark is especially helpful in determining the maximum performance of the WWW servers with various configurations. However, this method also has several drawbacks. The benchmark requires both the special benchmark software and their environment, and they are costly. Especially for the large scale WWW servers in operation, it is almost impossible to suspend their services only for benchmarks. If we can prepare them, the benchmark result can indicate the performance of the WWW servers in the environment specially prepared for the benchmarks. In other words, it is hard to apply the benchmark to the WWW servers that are in their actual operation.
The customer survey (method (4)) could be another approach. In many WWW services, "customer feedback" questionnaires are prepared. This method is appropriate in determining the level of customer satisfaction, however this method cannot help improve the server technically.
Consequently, the methods discussed above are not sufficient for the performance measurement of the "running" servers. It is obvious that a new method for this purpose should be developed.
The new method we propose in this article is the performance measurement of WWW servers through packet monitoring. The idea of this method is to know the behaviors of the WWW servers through observation of the packets generated by both a WWW server and the WWW clients hooked up to the server. This method has the following advantages:
However, our approach also has a drawback. Because our system cannot see inside of the WWW server, our system does not obtain any performance indices that are available in the OS inside. However, in case we need to examine the WWW server internals, method (1) or (2) can be used as a complement.
In the following sections, we discuss the details of our design and implementations of our system. Moreover, we show our case studies where we apply our system to several WWW servers.
In this section, we describe the design of our system and show how we approach the goal of our system.
For the WWW server operators, it is quite important to know if server performance is degrading. From our experiences on WWW server operations, we developed a simple model of the performance degradation of the WWW server. This model is a state transition model, shown in Figure 2.
Figure 2: The performance degradation model
In the state I, as an initial state, a WWW server can provide its service for all the requests without any overheads. The operators want the WWW server to stay at this state as long as possible.
If the performance of the server is degraded, the state of the server moves to either state II or III. In state II, the server cannot accept all the requests and some of the requests are rejected. If the server is in this state, we can observe the following:
In state III, on the other hand, the data delivery from the server is sacrificed. The server can accept all the requests from the client, however its data delivery becomes slower. This situation is frequently observed in the case where we access busy WWW servers on the Internet. The HTTP connection is established but the we have to wait a long time to obtain all the data in a WWW page.
The server may shift its state between states II and III. However, if the situation is getting worse, the server's state is finally moved to state IV. This state means that the server is saturated. In this state, the server rejects several accesses randomly, and the data are also slow to be delivered to the client.
Our system provides several performance indices in order to help the server operators know these state transitions of WWW servers.
Through the packet monitoring, we can derive several performance indices. Based on the model discussed in the previous subsection, our system is designed to measure several performance indices listed below.
Normally, a WWW server can access several HTTP connections in parallel. In the typical implementation of a WWW server on Unix platforms, a single daemon process resides on the memory and is always waiting for HTTP request arrivals. The Number of Concurrent Connections means a set of HTTP connections that are handled concurrently. If the server is in state II or IV, the number of concurrent connections is limited to the certain number or just decreased.
Connection Continuation Time is defined as the period from TCP connection establishment to its shutdown. More precisely, the period from the time when the first SYN packet (the packet #1 in Figure 3) transmitted to a server by a client is observed to the time when the server's ACK packet (the packet #6 in Figure 3) corresponding to the last FIN packet generated by the client is monitored. The Connection Continuation Time in state III or IV is much longer than one in state I.
Figure 3: Performance indices
Note that this Connection Continuation Time is much longer than the connection time derived from either analysis on the WWW server log files. Almost all of the WWW server implementations, the time when the server calls accept() system call is recorded as the connection start time, and the connection end time is correspondent to the time when close() system call is processed. However, after the close() system call, the actual data delivery in the TCP layer is still going on, because the data are stored in the queue in the socket layer and the close() system call is immediately returned (see Figure 3).
The definition of Response Time is the period between an HTTP request packet from a client and its HTTP response packet from a server. However, this definition is slightly different from the generic definition of RTT (Round Trip Time). Because we discuss the behavior of WWW services through packet monitoring, we use the period A in Figure 3. Therefore, the Response Time is normally shorter than the RTT, in general. This can be considered as a sign of the transition to state III and IV.
Data Transfer Time is defined as the period from the time when the first HTTP data packet transmitted by a server is observed to the time when the last data HTTP packet generated by a server is detected (period B in Figure 3). Note that the Data Transfer Time includes the period for exchanges of FIN packets and their ACK, because these packets sometimes include the actual WWW data. This can be also considered as a sign of the transition to state III and IV.
We define Connection Setup Time as the period between when a server receives the first SYN packet (the packet #1 in Figure 3) and when the server sends a SYN+ACK packet back to the client (the packet #2 in Figure 3). In this period, the server allocates the memory for the socket, then puts the SYN packet into the SYN-RCVD queue in the kernel . Ts indicates the performance of the TCP/IP implementation and its platform.
ENMA consists of two components: ENMA Daemon and Performance Analysis Workbench (see Figure 4).
Figure 4: ENMA components
The ENMA daemon is in charge of the "real time" packet monitoring. It is a single program that consists of two modules: a packet monitor and a connection analyzer.
The packet monitor captures all the packets in any HTTP connections observed on the network to which the ENMA system is attached. Brcause each captured packet is a data-link frame, the packet capture strips a data-link header from the captured frame and obtains an IP datagram in the frame. The obtained IP datagram is passed to the connection analyzer.
The connection analyzer is a core module of ENMA. The important function that the connection analyzer provides is to record the time when each packet is observed. Furthermore, the connection analyzer provides following functions:
The connection analyzer writes this information to a single data file. This data file is used for the further statistical analysis by Performance Analysis Workbench. Note that the size of the data file is much smaller than files generated by either kernel-level monitoring or the command TCPdump, because a single data entry for a single TCP connection in the file is around 150 bytes. Therefore, we can use ENMA to monitor the WWW server for much longer.
The connection analyzer also provides the shared memory for other programs in Performance Analysis Workbench in order to enable them to make a realtime data visualization and analysis.
Performance Analysis Workbench is a set of programs that enable us to make various analyses on data that the ENMA Daemon provides. There are two kinds of programs in this workbench:
We tried to implement our system as a portable system for several Unix platforms. Currently, the ENMA Daemon was implemented in ANSI C using the GNU compiler (gcc) on FreeBSD 2.2.7. However, we can make it as a "platform-independent" program using LBL's packet capture library. Programs in the Performance Analysis Workbench were implemented as shell scripts, AWK programs, and C programs. These programs are also portable. In this section, we provide several technical notes on our implementation of the ENMA system.
The packet monitor module in the current implementation uses the LBL's packet capture library (libpcap ) to capture the packets on the network. The library provides a common API for both the Berkeley Packet Filter (BPF ) on many BSD variants and the Network Monitoring Protocol (the packet snooper) on Sun's Solaris and SGI's IRIX.
In the connection analyzer, we had to implement a module to monitor the TCP state transitions of each TCP connection. This module uses the sequence number, acknowledgment number, and flags in TCP headers for tracking down all the state transitions. The algorithm that we use in the connection analyzer is shown in Figure 5.
Figure 5: The connection analysis algorithm
In this algorithm, the connection analyzer allocates a block of memories for each connection when its SYN packet is observed. This memory block is used as a memo for recording the state for each TCP connection. In the case in which the TCP connection is shut down normally, the memory block is released and the connection analyzer writes several performance measures listed in Section 3 to the data file. However, there are several cases where the memory blocks are not released. These cases can happen in the following situations:
In order to handle these cases listed above, the connection analyzer has a garbage collection mechanism for these "unreleased" memory blocks. In the current implementation, the connection analyzer tries to find any memory blocks that were not updated in the last 24 hours.
The Performance Analysis Workbench consists of two modules: the statistical data collecting module and the visualization module. All the modules are implemented in ANSI C.
The statistical data collecting module obtains the statistical data through shared memory. This module is used as an interface for other modules for "real-time" analysis. All the other modules have to obtain data through this statistical data-collecting module. This module is quite simple and lightweight.
The visualization module is an X-window application program in order to view the data in various forms. The data are provided by the statistical data collection module through BSD's socket interface. Since the ENMA daemon consumes a large amount of system resources so that running both the ENMA daemon and visualization program on the same machine may cause performance interference. In the worst case, the ENMA daemon cannot dump all the packet from the network, therefore, the statistical analysis is less accurate. In order to avoid this situation, we implemented the Performance Analysis Workbench as separate modules.
In order to verify our implementation we applied our system to several "running" systems.
We measured the WWW server that provided various information and "live" video streams about the 80th National High-School Baseball Games of Japan in August, 1998. This event is very famous and popular in Japan so that the WWW server got over 32 million hits per day. We applied our system to this server for 16 days.
The target WWW server host was a Sun Enterprise 450 server with dual CPU (300MHz Ultra SPARC processors) and 512MB of memory. The WWW server program was an Apache 1.3.1 running on Solaris 2.6. This system was installed on the server segment (100BaseT). Our system was hooked up to the service segment and monitored the server.
Our system on which the ENMA daemon was running was an Intel platform (PentiumII 300MHz processor) with 64MB memory.
Figure 6 shows the frequency distribution of Connection Continuation Time (Tc). The analysis based on the log file generated by the WWW server reveals that Tc for most connections is 1 millisecond, while for our ENMA system the Tc is 20 milliseconds. As mentioned in Section 3.2, the result from the WWW log file is quite different from the one from our ENMA system.
Figure 6: The frequency distribution of Connection Continuation Time
The log file generated by the WWW server is the activity logging of the WWW server as one of application programs on the system. In other words, the log file is an application level logging. The buffer size in the socket layer of Solaris 2.6 was configured as 8 Kbytes. Because almost all of the WWW objects that the server handled were under 8 Kbytes, as shown in Figure 7, a single write() system call in the WWW server can put all of the data of each WWW object into the socket buffer. Therefore, the server can process each HTTP request at around 1 millisecond; the sequence of system calls(2) for request handling are processed and terminated immediately. Tc obtained from the log file by the WWW server means that the connection continuation time for each HTTP connection in the application layer.
Figure 7: The cumulative distributed function of the WWW objects
On the other hand, Tc by ENMA includes the HTTP request processing plus its connection establishment and shutdown procedures in TCP layer. These require at least 3 RTT. In other words, Tc derived by ENMA is the connection continuation time in the TCP layer. Therefore, the results reveal the differences.
The analysis on the number of concurrent connections (Nc) is shown in Figures 8 and 9. Because Tc by the WWW log file is different from the one by ENMA, the analysis of Nc is also different; from the WWW server log file the peak is 550 connections, however 13,000 connections were observed by ENMA.
Figure 8: The number of concurrent connections by ENMA log
Figure 9: The number of concurrent connections by WWW server log
As our results show, the analysis through ENMA is more accurate than the results through the performance analysis on the WWW server's log files. Our result is more helpful to design and/or improve the network where the WWW server is located.
We conducted the other experiment for testing the performance measurement of the WWW server. As mentioned in Section 3, both the Response Time (Tr) and the Connection Setup Time (Ts) are expected as indices to reflect the performance of the WWW server. In this case study, we tried to confirm that these values can be used as a performance index of the WWW server.
In this case study, we set up two WWW servers in our laboratory: the Apache  server running on a Pentium II 200 MHz processor and on a 80486DX2 66 MHz processor. The operating systems for these servers are FreeBSD 2.2.7. The benchmark software we developed is configured with the other system connected to the same network segment where the WWW server is located. The benchmark software is quite simple; the program tries to access several WWW objects on the WWW server at random. In this case, we measured Tc, Ts, and Tr for 10,000 accesses.
Figures 10, 11, and 12 show the results of our measurements. By these three graphs, we can easily read the differences in performance between the WWW server on the Pentium II and on the 80486DX2 66MHz processors. Our ENMA system can show the differences of WWW server's performance easily.
Figure 10: The frequency of the connection continuation time
Figure 11: The frequency of the connection setup time
Figure 12: The frequency of the response time
We designed and implemented the ENMA system as discussed in the previous sections. However, our system is a kind of "alpha version" of the products. It is obvious that there are several limitations in our system as well as some extensions to improve our system.
Dropping the packet through monitoring is a significant technical issue. Because of the design of the ENMA system, dropping the packet at the monitoring may influence the performance analysis. Currently, the ENMA tries to grab all the packets as much as possible, however, there may be several packets dropped. There are solutions for decreasing the number of dropped packets:
Tracking down the sequence number in TCP header is another technical issue. In the current implementation, ENMA does not handle the sequence number. Since changing the packet order may cause serious penalty on TCP performance, it is the better to track the sequence number in the TCP header.
Measuring the number of requests rejected in the TCP layer is another technical issue. With the heavily loaded WWW server, some SYN packets are received by the WWW server system but rejected at the TCP layer. This phenomenon is frequently observed on heavily loaded servers. It is better to monitor this phenomenon by ENMA to allow the system managers to know if the system is saturated.
Monitoring the WWW server from the inside is an open issue. If the information available inside the WWW server is available for the performance analysis, we can derive other kinds of results for other aspects of the performance of the WWW server. For more detailed analysis on both the behavior of WWW servers and their performance, information inside the WWW server can help the performance analysis. On the other hand, the additional mechanism to monitor the behavior of the WWW server inside may cause performance interferences with the server, as mentioned in Section 2. Therefore, the design and implementation of the WWW server monitoring system that is co-locating with the WWW server on the same system are quite interesting issues.
In this paper, we describe several reasons why the new method for measuring the performance of the WWW server is required. Our proposed method is based on packet monitoring to reveal all the behavior of the WWW server and derive several performance indices through the monitoring. The method has been implemented as our ENMA system. The ENMA can measure performance indices such as Connection Setup Time, Connection Continuation Time, the Number of Concurrent Connections, etc. As mentioned in Section 5, we applied the ENMA system to several WWW servers and confirmed the effectiveness of its implementation.
1. Hobbies' Internet Timeline v.4.0. http://www.isoc.org/guest/zakon/Internet/History/HIT.html
2. Kevin Thompson, Gregory J. Miller, and Rick Wilder. "Wide-Area Internet Traffic Patterns and Characteristics," IEEE Network, pp. 10 - 23, November/December 1997.
3. Carl Malamud. "A World's Fair for the Global Village," MIT Press, 1997.
4. R. Fielding, J. Getty, J. Mogul, H.Frystyk, and T. Berners-Lee. "Hypertext Transfer Protocol - HTTP/1.1" Internet Engineering Task Force, January 1997.
5. SPEC. An explanation of the SPECweb96 benchmark, December 1996. http://www.spec.org/osg/web96/
6. Gene Trant and Mark Sake. WebStone: The first generation in HTTP server benchmarking, February 1995.
7. Richard W. Stevens. "TCP/IP Illustrated: The protocols, volume 1." Addison-Wesley, Reading, MA, 1994.
8. Gaurav Banage and Peter Druschel. "Measuring the capacity of a web server." In USENIX Symposium on Internet Technologies and Systems, pp. 61-71, Monterey, CA, December 1997.
9. LBNL's Network Research Group. http://ee.lbl.gov/
10. Steven McCanne, and Van Jacobson. "The BSD Packet Filter: A New Architecture for User-level Packet Capture." USENIX conference, January 25-29, 1993 San Diego,CA.
11. Apache HTTP Server Project. http://www.apache.org/
12. Joel Apisdorf, K. Caffy, Kevin Thompson, and Rick Wilder. "OC3MON: flexible, affordable, high performance statistics collection" INET'97 conference, June 25-27, 1997, Kuala Lumpur.
1. The name of this system, ENMA, stands for "Enhanced Network Measurement Agent" for the WWW system. ENMA in Japanese means the King of Hell ("Yama" in Sanskrit) who reveals everything about what people did in their life as their judgment at the entrance of Hell. As we use this name for the agent, we have been trying to develop the agent as an ultimate observer in order to reveal all the communication behaviors in WWW server systems.
2. In the Apache server, each HTTP request is processed through the sequence of system calls : accept() for establishing the HTTP connection, read() for read HTTP request from the socket, write() for sending the WWW object, and close() for shutting down the connection.