Akira KATO <firstname.lastname@example.org>
University of Tokyo
Jun MURAI <email@example.com>
Satoshi KATSUNO <firstname.lastname@example.org>
Tohru ASAMI <email@example.com>
KDD R&D Laboratories Inc.
This paper presents a method of creating a database for actual traffic data and having sufficient information to analyze the end-to-end characteristics of Internet or intranet traffic without disclosing information about the end users. After discussing several inherent problems associated with modeling and simulations on Internet or intranet traffic, the required data that must be captured in sufficient amounts will be described as well as the format of the database and capturing program.
Currently, many issues regarding the quality of services or Internet traffic models such as RSVP ,  and DiffServ ,  are under evaluation. However, it is difficult to verify these traffic models and service models based on the actual network traffic data that are commonly accessible and to analyze these models with sufficient detail. After taking into account the current issues involved in traffic data capturing, this paper suggests a method for supplying actual traffic data in order to evaluate these models.
The first commonly used traffic data were captured on 3 October 1989 at the Bellcore Morristown Research and Engineering Facility . The data files were in ASCII format, consisting of one 20-byte line for each arriving Ethernet packet. Each of these lines contains a floating point time stamp and an integer length representing the Ethernet data length in bytes. The time expressed to six places after the decimal point was intended to the appearance of microsecond resolution. Timestamps however, are only accurate in milliseconds because of the actual resolution (four microseconds) of the hardware clock, jitter in the inner code loop, and bus contention.
The next generation of traffic data collections were based on the tcpdump program. Here, V. Paxson and S. Floy gathered two to four hours of such traffic at Digital's primary Internet access point in March 1995. The raw traces were made using tcpdump on a DEC Alpha running Digital's OSF/1 operating system, which includes a kernel filter with capabilities similar to those of BPF. Sanitize script was used, which renumbers hosts and strips out packet contents in order to address security and privacy concerns. Timestamps have millisecond precision even though they are reported in figures using six digits past the decimal point. The several independent data collections have been made based on almost the same method. Some of these data addressed specific protocols such as HTTP.
The ever-increasing amount of Internet traffic is forcing each Internet service provider (ISP) to analyze whether their network resources are sufficient to maintain customers' satisfaction. The previous traffic data addressed specific protocols or applications and did not reflect all traffic through each ISP. The data should however, give us information on any application or protocol at any time. The need to capture all packets also arises from requirements of other approaches such as an academic research community studying mathematical models of Internet traffic.
We introduce a new method to create a database that contains actual traffic data holding sufficient amounts of information to allow analysis of end-to-end characteristics of Internet or intranet traffic without exposing the user's privacy. The required data to be captured will be described as well as the format of the resulting database. Section 2 shows several inherent problems associated with the modeling and simulations of the Internet or intranet traffic as well as requirements for traffic data to be examined. Section 3 shows a format for captured data. The subsequent Sections 4 and 5 show a new capturing program and our experimental data for evaluation. Some future issues are discussed in Section 6.
Under the current operation environments for the Internet, monitoring at intermediate nodes is usually done by Signaling Network Protocol (SNMP)-related query tools to see a simple set of metrics, such as use rates, packet discard rates, etc. for transmission links. These kinds of parameters have an advantage of being monitored without any stress on the networks. The traffic data repository, which is also monitored without any stress on the networks, is intended to provide the operators with more information on the networks to investigate their services.
Currently accessible tools and data possess several inherent problems associated with modeling and simulations of Internet or intranet traffic. Publicly available tools and public databases with sufficient information based on impartial evaluation measure might serve to encourage this type of research.
There are two issues to be considered in traffic data repository. The first problem is the measuring of locations. The current fields of research, such as management of quality of service on the Internet, require end-to-end flow analysis. This means that measuring at an intermediate node will not be sufficient for analyzing end-to-end traffic. Although the currently used intermediate monitoring node is inadequate, it is still the only way for an Internet service provider to determine service quality, and is therefore an essential task.
The second problem is to what extent to make measurements at intermediate nodes in order to monitor or study the traffic later on with as much information as possible. Other factors that must be considered are the time precision, amount of data to be recorded for each packet, as well as hiding private information from the measured traffic data before making these data available for public use.
After all, the following requirements must be satisfied:
Time precision will be critical for performing analysis in the future on the Internet. The current speed of the major Internet backbones is increasing from 156Mbit/s or 622Mbit/s to 2.4Gbit/s. It is further assumed that a backbone with a speed in tera bits per second will be attained within a few years. Because the shortest Internet Protocol (IP) packet is an Internet Group Management Protocol (IGMP) packet 28 bytes long, the minimum timestamp precision must be equal to the resolution to distinguish this packet from others. This resolution is on the order of 28*8/S where S stands for the packet transmission speed in bit/s over the IP network layer. Figure 1 shows that the current millisecond precision of timestamps will only be adequate for 100kbit/s IP transmission lines. Future tera bit per second line monitoring will require 3.68E-10 seconds of precision.
Figure 1. Required Timestamp Precision vs. Packet Transmission Speed of IP Layer
We categorize traffic capturing into two methods or modes, namely stateless and statefull. Stateless capturing means a record of a packet does not depend on the previously captured traffic data or packets. Processing to make a traffic data is only done over the information provided by the current packet. Most of the currently available tools, such as tcpdump, are based on this packet-by-packet scheme. Statefull capturing on the other hand records the information of a packet based on the previously captured packet records. Every capturing based on identifying flows is considered as a statefull capturing. Statefull capturing will be required to compress the captured data with Transmission Control Protocol (TCP) header compression, Real Time Protocol (RTP) header compression, etc.
Two advantages are known for stateless capturing. First, it is simple and does not require a heavy processing overhead on a capturing machine. Second, there is no need to predetermine what higher layer protocols are running through the observed node since capturing information is uniform for every IP packet. A disadvantage is that it is theoretically necessary to record entire packets to analyze the traffic. This requires a large amount of disk space and also a high-speed recording facility for capturing to follow the link speed, which is assumed to be Gbit/s or higher.
On the other hand, statefull capturing is based on the flow analysis to decide the recording part of the IP payload and to compress the amount of data. The advantage is that it requires less disk space and a moderate speed of recording facility, with the introduction of a high processing overhead for the flow identification, tracing state transitions of higher layer protocols, data compression, etc. In addition to these disadvantages, it is assumed to have a prerequisite knowledge of the captured protocols. Because the evolution of the Internet protocols and applications is very fast, it is very difficult to predetermine all the protocols used in the traffic through the observed node.
Our current implementation is based on the stateless approach. Two data formats to record traffic traces are supported in the system. One is a 24-byte fixed format for a captured packet as shown in Figure 2. This includes timestamp in 64 bit attached by the BPF driver followed by IP source and destination address in 32 bits each, total length in 16 bit, protocol in 8 bit copied from the IP header. Other fields in the IP header are just ignored. There are one flag byte and two 16-bit fields as well. The flag's byte is copied from TCP flags when appropriate. The two 16-bit fields are used to represent source and destination port number for TCP or User Diagram Protocol (UDP), and type and code for Internet Control Message Protocol (ICMP). Some types of ICMP messages include IP headers of target IP datagrams as well as first 64 bits of the payloads. In this case, another 24-byte record is used to represent the target IP datagram with clearing its timestamp field. This format is intended to make the record small. So monitoring on a higher speed network would be possible.
Figure 2. Captured Data in Compressed Format
The other format supported in the system is shown in Figure 3, which is compatible to PCAP library. This format is not so compact, however, several well-known tools such as tcpdump can be used with the format. To save the amount of data and not to record user data, only IP headers (with options if any), TCP headers (with options if any), UDP headers, and ICMP headers (with initial 64 bit of target datagram payload where applicable) are recorded. Each record of the format is not of the fixed length. As TCP options are recorded with this format, it is suitable for TCP-oriented analysis.
As for UDP packets, RTP is now popular to be used over UDP. RTP packets have data reflecting end-to-end throughputs for UDP applications, and RTP as well as TCP packets are very important to capture to understand end-to-end service quality. Because it is very difficult to identify an RTP packet based on the UDP payload of the corresponding packet, Statefull capturing is required to identify RTP packets. When specified in an option, the program records the first 20 bytes of UDP payload for this purpose.
Figure 3. Captured Data in PCAP Format
The paper then considers the architecture and the design policy of a new traffic data repository tool to solve the above problems, describing what parts of IP headers and data must captured, what timestamp is required for what data, etc.
A data compression method for these captured data is also investigated to reduce the total disk space and to increase the capture speed. The end user's privacy resides in IP address and port number. This paper further presents an effective address scrambler based on the network interface and the address classes and also shows a naive address scrambling scheme having a security hole.
As the system intends to capture traffic data over a long period with limited storage resources, it writes captured data to the file system while background process copies the file to the cartridge tape. Providing the change operation of the tape media, with a mechanical autochanger or with timely operator assistance, the system can capture the traffic data for any period, virtually.
The system consists of two processes, as shown in Figure 4. ntap process captures the traffic through a BPF, converts the data to an appropriate format, and writes the data to the file. ntap is configured with a list of file systems. When ntap changes the file to be output periodically, for example, once an hour, it checks the remaining space of the current file system. If necessary, it changes next available file system.
Another process, tapemon, checks each file system periodically. When the growth of file system utilization stops, tapemon assumes that ntap program writes to another file system. After a few minutes, tapemon dumps the files in the nonworking file system to the tape. Before the tape is exhausted, tapemon rewinds the tape and sends a notification to the operator via e-mail.
Figure 4. Structure of Capturing Software
The software runs on BSD/OS operating system version 3.1 and 4.0. As the system dependent feature the system assumes is only BPF interface, the software can be easily ported to other BSD-flavored Unix. In order to avoid access collisions, distribution of SCSI attached devices (hard disks and tape drives) to different SCSI buses is important. In order to eliminate BPF buffer overruns, it is necessary to make BPF buffer sizes larger. The default maximum size of BPF buffer size in BSD/OS was 32 kB. This number has been compiled into the BSD/OS kernel and recompilation of the kernel after changing this into at least 256 kB is necessary.
As the Unix operating system is not a realtime operating system, non-blocking close() system call is not supported. We observed BPF buffer overruns may happen when we switched the output file. This is because there may be a large amount of unflushed data in the kernel buffer and close() requires flush the data first. A technique to avoid this is to delay file close operation for a few minutes so that update daemon may perform sync() system call to flush the data to the disk asynchronously.
When the captured data are exported, we need to encrypt any field related to the user's privacy. When we randomizing the IP addresses in the captured data, the following rules may apply:
The rest of the paper is devoted to issues regarding implementation of this repository tool for BSD Unix. The data format of captured traffic is also described. Field experiments with this tool were performed at a network operation center of WIDE, a Japanese academic network, for two months. As the WIDE Internet has two international links, we run a capturing system for each international link.
The configurations of the capturing systems are shown in Figure 5. The gateway system to the international link consists of two routers connected by a shared ethernet. A capturing system is attached to the ethernet to monitor the international traffic. A system for a T1 link is equipped with a Pentium-II 333-MHz, two 9-GB WIDE-SCSI disks, and a DDS3 (capacity is about 15 GB for the captured data) tape drive on another SCSI bus. For another link of 1.9-Mbit/s bandwidth, a system with a Pentium-II 400MHz, two 18-GB WIDE-SCSI disks, and an AIT drive (capacity is about 45GB) is assigned.
Figure 5. Configuration of Experimental Network for Traffic Capturing
In order to check the ability of the system, a similar system with single 9-GB WIDE-SCSI disk and a FDDI interface is tested on a backbone FDDI network of the University of Tokyo. While the peak traffic exceeds 20 Mbps, the system reports no dropped packet at BPF layer over more than 10 days with an even worse condition where ntap and tapemon access file systems on a single disk.
The cumulative distribution of the packet length for the data captured from 11:00:00 to 12:00:00 on 22 February 1999 is shown in Figure 6. According to these data, the average of the packet length is 361 bytes long, and the top two lengths, namely 40 and 1500 byte, are observed to occupy 41% and 14% of the total number of packets. Packets of 1500 bytes long are notable in our traces. This may reflect a popular usage of the applications like Web or FTP in the current Internet. Of course, the characteristics of our traces are very different from the Bellcore results , but further investigations will be necessary to compare ours with other traces.
Figure 6. Cumulative Distribution of the Packet Length for the Captured Data
In the middle of this experiment, the bandwidth of the latter international link was increased from 1.9 Mbps to 10 Mbps. Hence the resulting data contain a history of the network behavior from a heavily congested state to a normal state. Some data analyses of the captured traffic are also presented.
Although the current version is a stateless capturing system, we present a block diagam of our future statefull version in Figure 7 to show that statefull capturing for a giga bit network will be possible without any disk access bottleneck, using a conventional SCSI disk drive. The three modules, flow identifier, host address scrambler, and header compression need not to be a single process nor run on a single processor.
The first module, flow identifier, must follow a state diagram of each protocol to identify each flow. At a gateway router, a number of flows will exist to be processed and some of them may be terminated abnormally. The latter will increase the virtual number of flows to be processed. These cause a task of flow identification to be expensive.
Figure 7. Extension of ntap to Statefull Capturing
After the flow identification, host address scrambling is applied to each packet, which renumbers hosts in order to address security and privacy concerns. This scrambling can be eliminated if the operator wants to do his job with the actual host addresses.
Then, at the next stage, after making a secure IP header, the TCP and the RTP header compression  can be applied to all the TCP and RTP packets, which reduces 40 bytes of IP and TCP header into 3 bytes, and also 40 bytes of IP, UDP, and RTP header into 2-4 bytes in average. This means only 8 bytes of time stamp, 2 bytes of flow ID, 2 bytes of data length, and 2-4 bytes of header data will be enough to record for each packet. A record of 14-16 bytes for each packet with an average packet length of 361 bytes, solves the disk access bottleneck.
Assuming 90% of the Internet traffic is TCP-based applications, it may be possible to capture IP packets through a datalink up to 2.3 Gbps with disks attached to Ultra2 SCSI or Wide Untra2 SCSI, assuming that the average sustained transfer rate is around 12 MB/s. Thus the disk access speed is not the bottleneck of capturing packets on gigabit networks. There is, however, a significant processing overhead for flow identification and header compression . Specialized hardware devices or parallel processors will be required for this purpose.
All output lines are preceded by a timestamp. The 64-bit timestamp means that the maximum observation period will be 213.5 days, measured from the start of capturing, even if the time precision of one bit is as small as 1.0E-12 second. This is sufficient for monitoring IP traffic over future links at tera bit/s speed. There is, however, a limitation on our current implementation. First, the timestamps attached by the BPF driver in the Unix kernel is of struct timeval. This time value is not started from the beginning of the capturing but is measured between 1 January 1970 and 2038 in the precision of one micro second. Although the clock data returned from this function is 64 bits long, the timestamp for the current clock time is as accurate as the kernel's clock, that is the order of micro second. Further more the timestamp reflects the time the kernel first saw the packet. There is some time lag between when the network interface received the packet from the wire and when the kernel serviced the "new packet" interrupt. Because of these factors, the accuracy of timestamp is restricted to the order of several milliseconds.
To overcome this problem, specially designed network interfaces must also be developed for timestamps. These interfaces must have a function to attach by themselves a timestamp to each received packet or frame with the nano or pico second order of precision counted from the beginning of capturing.
A notifying usage of the resulting traces is to estimate the amount of the exchanged information through the target node. There have been many evidences on the asymmetric traffic between the United States and Japan, based on the inbound and outbound packet data on the gateway routers, but there has not been a further detailed analysis on this fact. Assuming the side to initiate the connection receives the valuable information, which is usually the case for TCP-based applications like FTP, TELNET, Web accesses, etc., we can estimate what amount of information is transferred to which direction during the observed period. The procedure is very simple. We just have to identify which side first sends a TCP packet with a SYN flag.
We present a method to create a database containing real traffic data, which has sufficient information to analyze the end-to-end characteristics of Internet or intranet traffic without disclosing private end user information. Our current version of a program running on Unix implements these features. As we mentioned in the earlier discussions, some improvements for capturing performance and functionality will be required for our program to be used for links faster than our experimental links. Although far from an ideal tool in terms of capture speed, it has been used to make traces of WIDE Project traffic since 23 January 1999 at Internet gates. All the processed traces are intended to be public and will be uploaded to our FTP server after completing the current experiment.