An Extensively Instrumented Web Server


Debra Tang (dtang@nist.gov) 1

Jihg-Hong Lin (jihghong@snad.ncsl.nist.gov) 2


February 19, 1999

1. National Institute of Standards and Technology
100 Bureau Drive Stop 8920
Gaithersburg, MD 20899-8920
2. Telecommunication Laboratories
Chunghwa Telecom Co., Ltd.,
Taiwan

Table of contents

Abstract
1. Introduction
2. Overview of the ALMT Architecture
2.1 The hardware and software
2.1.1 The Hardware - MultiKron_II Performance Chip
2.1.2 The Software - Probed Apache and Linux
2.2 Test Suite
2.3 Measurement Procedures
2.4 System Utilities
3. When the MultiKron_II Chip is Absent
4. Future Study
5. Conclusion
6. Acknowledgement
7. Reference

Abstract

This paper describes the development of the Apache/Linux Measurement Toolkit (ALMT), a high-accuracy measurement tool for evaluating the performance of a Web server. This research creates a measurement environment consisting of a fully instrumented Apache Web server running on a fully instrumented Linux operating system. NIST-developed MultiKron_II performance measurement hardware was used to minimize the perturbation induced while collecting measurement data during our experiments. Our system combines hardware and software to provide fine-grained measurement data on the performance of both the kernel and server. The toolkit can also be used without the MultiKron_II hardware by replacing it with an emulation module; however, this substitution increases system perturbation and decreases the precision of the measurement data.

1. Introduction

The growing use of the Internet for business communication has increased the demand for reliable, predictable Internet service. Perhaps the most visible components of business on the Internet are the World Wide Web sites, upon which businesses advertise, sell, and support their products. Measurement tools are essential for evaluating and improving the quality of Internet services, such as Web surfing. Our research focuses on the development of high-accuracy measurement tools for evaluating the quality of service delivered by a Web server.

The Apache server is one of the most widely used Internet Web servers, and thus the performance of Apache [5] is a crucial issue for users, developers, ISP providers, and researchers. Additionally, the fact that Apache and Linux [9] are source code-accessible makes them an ideal platform for HTTP [6][15] measurement research. Our research created a measurement environment consisting of a fully instrumented Apache Web server running on a fully instrumented Linux operating system. Our system combines hardware and software to provide fine-grained measurement data on the performance of both the kernel and server.

This research uses NIST-developed MultiKron_II performance measurement hardware to minimize the perturbation induced by collecting and analyzing measurement data [2] [4]. This is a hybrid method using hardware to support software [3][13]. Specifically, the method uses memory mapping of event samples that are generated by probes inserted into the Apache and Linux source code. The event and resource samples are directly stored on the MultkKron_II interface board. Measurement data can be collected on a local machine or a remote machine and can be analyzed and displayed on-the-fly. Using this hybrid method, our system obtains accurate, fine-grained measurement data at both the application and system level.

Our toolkit, called the Apache/Linux Measurement Toolkit (ALMT), can be used as a "software-only" system when the MultiKron_II hardware is not available, but this increases system interference and decreases measurement data precision. This paper describes the architecture of ALMT, our measurement strategy, and our conclusions on the value of fine-grained measurement data.

2. Overview of the ALMT Architecture

The NIST ALMT is a Web measurement toolkit. The home page: http://dagger.antd.nist.gov contains the software package, an overview of the project, the operational procedures, the download information, and the contact points. ALMT allows users to easily prepare test suites, conduct the tests and experiments and collect, report and analyze measurement data. The toolkit is designed for use in a Web environment. By visiting our web site, users may easily understand the whole toolkit, follow the instructions to download and install the toolkit, and build their own web-based instrument for measuring performance with respect to their own unique application.

Figure 1 depicts the system architecture, which consists of four major components: Hardware & Software, Test Suites, Measurement Procedures, and System Utilities. Subsequent subsections will explain each one of the components.

Figure 1 The ALMT Architecture

2.1 The hardware and software

The hardware used in this research is the MultiKron_II performance chip [1] and its associated MultiKron Interface Printed Circuit Board. The software components include: the Apache Web server (version 1.2.29), Linux (version 2.0.29), and the MultiKron utilities. A key utility is the auto-probe utility built to automatically parse and list out all the function names in the users' source code. Users then select which functions are the Points of Control and Measurement (PCMs). The source code is instrumented at the selected PCMs.

2.1.1 The Hardware - MultiKron_II Performance Chip

Figure 2 depicts the block diagram of the MultiKron_II interface PCI board. MultiKron_II [1] [2] is a VLSI performance measurement chip designed to be memory-mapped to a local processor memory via the PCI Bus. The MultiKron board stores more than 800,00 20-byte trace samples allowing us to record a large sample of user data and timestamps. The chip provides 16 readable and writable resource counters, each with its own shadow register. The resource counters allow an experimenter to trace and tally events at a high rate, while minimizing the perturbation to the system being measured.

Figure 2 The Block Diagram of PCI Toolkit Board

The NIST MultiKron_II Interface Printed Circuit Board [2] connects the MultiKron_II chip to a local I/O Bus and allows an experimenter to control, access, and test the chip from a running processor. The MultiKron utilities for Linux system with a PCI bus consist of a device driver, test programs, sample applications, and sample programs to retrieve the data from the MultiKron interface board. More detailed information is available at http://cmr.ncsl.nist.gov/multikron/multikron.html. Our test system also includes utilities for monitoring and collecting the MultiKron_II measurement data through the Web.

2.1.2 The Software - Instrumented Apache and Linux

The MultiKron chip supports both hardware and software measurement probes. In this research, we used only software probes. A software probe is an assignment statement to a memory-mapped MultiKron address. The trace probes must be inserted in the source code of Apache and Linux. The trace probes cause an event sample to be recorded during the program execution. The 20-byte event sample consists of a timestamp, and processID and probeID fields. The trace probes were inserted into the source code of both Apache and Linux based on pre-identified Points of Control and Measurement (PCMs). The basic PCM scheme, based on the nature of the software structure, is organized in modules, which are visible or tangible tasks or functions. A subset of these modules should be the user's control, monitor and observation points. After inserting the probes, we recompiled the source code and rebuilt both systems.

Apache with Event Probes

Apache, a Hyper Text Transfer Protocol (HTTP) implementation, is freely available software founded upon the NCSA HTTP server. Apache is still being developed. Apache is layered on top of TCP/IP and runs on a Linux operating system. The contents of the documents requested are generally formatted using HyperText Markup Language (HTML). HTML allows linking of different documents stored in the same computer or a computer remotely located in a different part of the Web. The network connectivity between Apache and TCP/IP, the workload, the number of files, the files size, and the file location are major factors to measure in determining the Apache web server performance [8][10][11][12][14].

In order to measure and monitor the performance of Apache, we analyzed the Apache source code. Figure 3 depicts the depth of Apache function levels. The depth of the Apache nested functions, level 1 through level 4, was identified to serve as the Points of Control and Measurement (PCMs). Then, a set of event probes were inserted at the location of the PCMs. PCMs are normally located at the beginning and end of each Apache function. After the insertion of event probes, the source code requires recompilation. Probes can also be directly inserted into the executable code via a binary patch. However, this is not necessary with Apache because the source code is available. Each duty cycle time is approximately the sum of the duty cycle times at the next lower level. For example: a level 1 duty cycle time is equal to the sum of the duty cycle times of its subordinate level 2 functions.

Figure 3 The Four Level of Apache Functions/PCMs

Figure 4 depicts the sample of the service level PCMs and its inserted functions. The trace probes are inserted at the beginning of a function and at one or more of its end/exit points. Each oval represents an Apache function; the left gray circle represents the beginning PCM, and right circle defines the exit PCM(s). The connections between ovals define the depth of Apache function levels depicted in Figure 3.

Figure 4 The Service Level PCMs

Linux with Event Probes

Linux is a freely available Unix-like PC operating system. It can run on 80386, 80486, or Pentium PCs. Linux supports a wide range of software, and can provide full power on a PC Intel platform. Linux is still being developed by a group of volunteers on the Internet from all over the world.

In order to measure and monitor the detailed performance of the Linux kernel when each Apache service primitive is executed, the Linux source code was scanned to identify PCMs. To keep things simple, the depth of the Linux nested functions was only analyzed to the level 1 system call entry points. Then, a set of event probes was inserted in the selected system calls to create the PCMs. Currently identified Linux event probes are :

  1. The process probes (FORK, EXIT)
  2. The TCP/IP probes (ACCEPT, CONNECT, LISTEN, SEND, RECV, DISCON)
  3. The file I/O probes (OPEN, READ, WRITE, CLOSE).

We also modified the Linux kernel to add a system call for monitoring the MutiKron_II chip. After modifying the system and inserting of the event probes, the source code was recompiled.

Figure 5 depicts kernel level PCMs which are located at the beginning and one or more end points of each functions. The ovals represents Linux functions; the left gray circles represent the beginning PCMs, and right circles define the exit PCM(s).

Figure 5 The Kernel Level PCMs

The Sample of Embedded Kernel Probes

The following is a segment of Linux source code. The top function code is original Linux source code, and the bottom function code is the source code with the embedded trace probes. The test probes were inserted at the beginning and end(s) of the functions.

int do_fork() {
    ...
    if(xxx) return -1;
    ...
    return 0;
}

will be probed as

int do_fork() {
    FISPprobe(DO_FORK_BEGIN);
    ...
    if(xxx) { FISPprobe(DO_FORK_END); return -1; }
    ...
    FISPprobe(DO_FORK_END); return 0;
}

2.2 Test Suites

The NIST ALMT provides three categories of test suites: built-in test suites, add-on test suites, and web download pages. Each category can include text, image, audio, video, or multimedia data files. The built-in test suites define a set of test parameters for measuring and monitoring the static and dynamic behavior of the Apache server. For the add-on test suites, the toolkit provides a Web form to allow users to upload their specific test suites and apply the toolkit. The Web-page test suites allow the users to test, measure, and monitor any Web server behavior using its real home pages.

Based on the existing research results describing the characteristics of Apache [7]: a Web server spends 90% of the time in the kernel and network processing. We defined a set of test suites for testing, stressing, and monitoring the Apache server’s internal resource consumption and external processing time. We also used the NIST NET (Network Emulation Tool) software to measure the Apache Network performance with emulated network characteristics. The test parameters can be file sizes, network conditions, number of files, number of pages, number of clients, and workload of pages. For a given file size, the system also provides a simple random generator utility for text and image files.

2.3 Measurement Procedures

We defined five test operations that allow users to conduct the measurements in a Web environment. These are: Add-On Test Suites, Select Test Suites, Collect the Measurement Data, Graphical Test Report, and Table Report. The Add-On Test Suites procedure allows users to upload their own test suites to the toolkit to meet their particular requirements. The system provides an HTML form to allow this to be done. Once uploaded, the user can perform the same test operations as for the built-in test suites. The Select Test Suites procedure allows users to select any one of a combination of the built-in, add-on, or web page test suites to perform a test session. Users can also design test suites (using any existing tools) to define the test environment and conditions, such as: the network conditions, Apache configurations, and various workloads to test Apache's integrated and dynamic behavior.

The Collect the Measurement Data procedure allows the measurement data to be collected from the toolkit either with or without the MultiKron_II chip. With the MultiKron_II chip, users can directly access 16 MB of DRAM on the toolkit board. Alternatively, measurement data may be collected on an external machine through an external cable, S16D, which is a commercially available Sbus interface. In either case the collected measurement data can be processed on-the-fly. Users can enable and disable the MultiKron_II chip through a Web page. Each trace sample consists of three fields: timestamp, processID, and probeID. The time stamp is automatically generated by the MultiKron_II chip, and the time resolution is 100 nanosecond. ALMT provides a set of Web utilities allowing users to control the MultiKron chip remotely through any Web browser.

The Graphical Test Report procedure provides three visulization functions to allow users to visualize the measurement data. The Overview function provides a brief snapshot of a test session. The Hierarchical function depicts the detailed report for each task. The Percentage function depicts the proportional elapsed time for each subtask within a task. Finally, the Table Test Report procedure displays the elapsed time for each process and probe event in both clock cycles and microsecond units during a test session.

The ALMT provides a Web visualization environment with buttons that allow users to conduct the test operation intuitively and directly. Users may use the arrow buttons to sequentially step through the entire five test operations, or alternatively they may select buttons one by one to manually conduct the test procedures.

2.4 System Utilities

Most of the developed utilities are provided to help users conduct measurements via a Web browser. The auto-probe utility provides functions to parse the user's system source code, list a set functions, and allow users to select the specific interested functions or locality. A set of CGI programs and Java applets allows the instrument to be manipulated by a user-friendly Web-based interface.

3. When the MultiKron_II Chip is Absent

By using a simulation module, the toolkit can also be used without a MultiKron_II chip. However, this substitution increases system perturbation, which decreases measurement data precision. The simulation module uses a new system call FISPprobe(long x) in the Linux kernel that calls a handling routine through the function pointer FISPcallout(long x). The simulation module redirects the pointer to its own handling routine when the MultiKron_II chip is not available.

To simulate MultiKron's hardware timestamp, the handling routine uses system call sys_gettimeofday(). For storage of samples, the module calls kmalloc() and builds a linked list internally. The resolution of sys_gettimeofday() is one micro-second which is 10 times coarser than MultiKron's hardware generated timestamp (resolution = 0.1 microsecond). The overhead of system calls to get time stamp and allocate storage in the simulation module increases system perturbation when collecting data.

To test the overhead of FISPprobe both with and without the MultiKron module, we designed the following program

#include <FISP/probe.h>

void main() {
    int i;
    for(i=0;i<1000;i++) {
        FISPprobe(501);
        FISPprobe(502);
    }
}

and calculated the intervals between FISPprobe(501) and FISPprobe(502) for their average and standard deviation. To get more accurate results, we boot Linux with a single user mode.

(Unit=0.1 usec)

MultiKron

Non-MultiKron

Count

1000

1000

Average

26.281

47.740

Standard Deviation

7.752

19.978

From the result, we found that without the MultiKron overhead is almost twice as large as when the MultiKron is used. Even so, it is still on the order of 5 micro-seconds, which is relatively small for most measurements. Although we can provide a substitute for MultiKron here, other MultiKron features, such as time synchronization using satellites, fine-grained data, on-the-fly off-loading, and analyzing data on another machine, will be difficult to replace through emulation.

4. Future Study

The current toolkit only measures Web server performance. In the future, it will be able to measure Web browser performance. In addition to the MultiKron_II performance chip, NIST has developed a time synchronizer between different systems. The time synchronizer can be incorporated into the toolkit to measure Web client-server performance. The knowledge and observations gained from this research will then be employed to build a general, dynamic, and scalable instrument for measuring a system either in a stand-alone mode or in a collaborative, distributive, multi-party environment.

5. Conclusion

The NIST ALMT is a useful toolkit that is available as free software. It is easy to download and install the toolkit from our home page. Additionally, because Apache and Linux are source code-accessible, they make an ideal platform for HTTP measurement research. The ALMT toolkit can provide in-depth information about both the Apache Web server and internal Linux behavior. The Apache server is one of the most widely used Internet Web servers, and thus our toolkit should serve as a useful instrument for users, developers, ISP providers, and researchers.

Through a number of sample experiments, we conclude that ALMT provides not only the fine-grained measurement data but also provides user friendly Web GUI utilities. Specific experimental results will be reported in a separate paper. Since our measurements are fine-grained, the resulting measurement data provides a high level of measurement confidence and can be used to meet a variety of needs. The data can serve as the basis for system resource consumption enforcement, system policy enforcement, internal system security/safety checking, system performance measurement, data analysis, system maintenance, system tuning, system capacity planning, and protocol development/improvement.

6. Acknowledgement

This research has benefited greatly from the help of our colleagues. In particular, we wish to acknowledge Craig Hunt for initiating this project and his valuable guidance, comments, and editing. We also thank Alan Mink and Wayne Salaman for their expertise about the MultiKron_II performance chip.

7. Reference

[1] Alan Mink "Operating Principles of MultiKron _II performance Instrumentation for MIMID Computers", ITL, NIST, DOC (December 1994)

[2] Alan Mink, Wayne Salamon "Operating Principles of the PCI Bus MultiKron Interface Board", ITL, NIST, DOC (March 1997)

[3] A. Mink, Wayne Salamon, Jeffrey K. Hollingsworth and Ramu Arunachalam, "Performance Measurement using Low Perturbation and High Precision Hardware Assists", Proc IEEE Real-Time Systems Symposium, Madrid, Spain, pp 379-388 (December 1998)

[4] A. Mink, Y. Fouquet and S. Wakid, "Hardware Measurement Techniques for High- Speed Networks", Journal of High Speed Networks, Vol. 3, No. 2, pp 187-207 (1994)

[5] Apache HTTP Server, Apache 1.2.6/HTTP1.1, http://www.apache.org/.

[6] Hypertext Transfer Protocol (HTTP), http://www.w3c.org/

[7] J.Almeida, V. Almeida and D. Yates, " Measuring the behavior of a world-wide web server", IFIP TC6 Seventh International Conference on High Performance Networks (HPN '97)

[8] James C. Hu, Sumedh Mungee and Douglas C. Schmidt ,"Principles for Developing and Measuring High-performance Web Servers over ATM", Technical Report WUCS-97-09, Dep. of Computer Science, Washington Univ. St. Louis (Sep 1997)

[9] Linux -- 2.0.29, UNIX-type operating system, http://www.linux.org/.

[10] Louis P. Slothouber, "A Model of Web Server Performance", http://louvx.biap.com/webperformance/modelpaper.html

[11] Robert E. McGrath, "Measuring the Performance of HTTP Daemons", Computing & Communications, NCSA, http://www.ncsa.uiuc.edu/InformationServers/Performance/Benchmarking/bench.html

[12] Robert E. McGrath, "Performance of Several Web Server Platforms", Computing & Communications, NCSA, http://www.ncsa.uiuc.edu/InformationServers/Performance/Platforms/report.html

[13] Y. Fouquet, Richard D. Schneeman, David E. Cypher and A. Mink, "ATM Performance Measurement", Proc of International Conf on Telecommunication, Distribution, Parallelism (TDP'96), La Londe Les Maures, France, pp 63-75 (June 1996)

[14] James Hu, "JAWS: Understanding High Performance Web Systems", http://www.cs.wustl.edu/~jxh/research/research.html

[15] W3C, "Network Performance Effects of HTTP/1.1, CSS1, and PNG", http://www.w3c.org/Protocols/HTTP/Performance/Pipeline.html