Tracie Monk <tmonk@nlanr.net>
k claffy <kc@nlanr.net>
National Laboratory for Applied Network Research
USA
Most large Internet providers collect basic statistics on the performance of their infrastructure, typically including measurements of utilization, availability, and possibly rudimentary assessments of delay and throughput. In today's commercial Internet, the only baseline against which these networks can evaluate performance is their past performance metrics. No data or even standard formats are available against which to compare performance with other networks or against some baseline. Nor are there reliable performance data for users to assess providers. Data characterization and traffic flow analysis are also virtually nonexistent, yet they remain essential to understanding the internal dynamics of the Internet infrastructure.
Increasingly, both users and providers need information on end-to-end performance and traffic flows, beyond the realm of what is realistically controllable by individual networks or users. Path performance measurement tools enable users and providers to better evaluate and compare providers and to monitor service quality. Many of these tools treat the Internet as a black box, measuring end-to-end characteristics, for example, response time and packet loss (ping) and reachability (traceroute), from points originating and terminating outside individual networks. Traffic flow characterization tools focus on the internal dynamics of individual networks and cross-provider traffic flows, enabling network architects to better engineer and operate networks, better understand global traffic trends and behavior, and better adopt or respond to new technologies and protocols as they are introduced into the infrastructure.
This paper has three goals. We first provide background on the current Internet architecture and describe why measurements are a key element in the development of a robust and financially successful commercial Internet. We then discuss the current state of Internet metrics analysis and the steps underway within various forums to encourage the development and deployment of Internet performance monitoring and workload characterization tools. Finally, we describe the rationale and near-term plans for the Cooperative Association for Internet Data Analysis (CAIDA).
Keywords: measurement, statistics, metrics, performance, flow, tools, cooperation, ISP, CAIDA.
The challenges inherent in Internet operational support, particularly given its underlying best-effort protocol, fully consume the attention of these Internet service providers (ISPs). Given its absence from the list of critical ISP priorities, data collection across individual backbones and at peering points continues to languish, both for end-to-end data (which require measurement across IP clouds) and characterization of actual data flows, such as by application (Web, E-mail, RealAudio, FTP); origin/destination, packet size, and duration of flows.
Yet it is detailed traffic and performance measurement and analysis that have heretofore been essential to identifying and ameliorating network problems. Trend analysis and accurate network system monitoring permit network managers to identify hot spots (overloaded paths), predict problems before they occur, and avoid congestion and outages by efficient deployment of resources and optimization of network configurations. As the nation and world become increasingly dependent on the Internet, it is critical that we develop mechanisms to enable infrastructure-wide planning and analysis and to promote continued efficient scaling of the Internet. User communities are also exerting pressure on providers for verifiable service guarantees that are not readily available under the current Internet. This is particularly true for users that view the Internet as mission critical. Most notable are the higher education/research community (via Internet-2) and the Automotive Industry Action Group (AIAG).[1]
In October 1996, members of the higher education community announced Internet-2.[2] With more than 70 of the larger U.S. universities participating, Internet-2 aims to:
AIAG is taking action to address similar service requirements of the major automobile manufacturers and their suppliers. In January 1997, they tasked Bellcore to develop a strategy for:
Another important initiative, the U.S. government's Next Generation Internet (NGI), will similarly have specific metric goals against which to measure and evaluate services. The NGI aims to connect research institutions with high-speed networks that are 100 to 1,000 times faster than today's Internet, promote experimentation with the next generation of networking technologies, and demonstrate new applications. Details on the NGI are available on the National Coordination Office (NCO) for Computing, Information, and Communications (CIC) home page at http://www.hpcc.gov.
In order to achieve measurements that are readily comparable and relevant, a first step is the development of common definitions of IP metrics. The Internet Engineering Task Force (IETF) IPPM working group is working to provide a more rigorous theoretical framework and guidelines for designing measurement tools to deal with the wide variety of signal sources in the frictionful Internet.[3] In late 1996, draft requests for comments (RFCs) were issued delineating metrics for connectivity [Mahdavi and Paxson], one-way delay [Almes and Kalidindi], and empirical bulk transfer capacity [Mathis]. A charter for continued development of the community's understanding of performance metrics is also available at the IPPM home page, http://www.advanced.org/IPPM.
However, although several efforts are underway, the community is still only at a rudimentary stage with respect to actual tools that can isolate traffic bottlenecks, routing anomalies, and congestion points, and visualize traffic flows. With support from the National Science Foundation (NSF), [4], Merit, for example, [Labovitz] has taken extensive measurements of routing instabilities and path performance problems at and between the original NSF-chartered network access points (NAPs). These measurements are not yet well understood by the community and are still in their early stages of development, which has hindered widespread acceptance.
Also NSF-supported, the National Laboratory for Applied Network Research (NLANR) [Mathis/Mahdavi, PSC] is collaborating with Department of Energy's Lawrence Berkeley Laboratories (DOE/LBL), [Paxson] to define a scalable tool set for IPPM-style metrics that could be run on a national (or international) Internet measurement infrastructure (NIMI).
Another group interested in deployment of measurement infrastructure is the Common Solutions Group (CSG), a consortium of 23 universities that began to station probes at major universities and exchange points in mid-1997. This infrastructure will utilize both active and passive tests to collect data on path-specific metrics such as one-way delay, packet loss, and throughput.
Finally, in April 1997, the Trans-European Research and Education Networking Association (Terena) Performance Working Group kicked off a two-year measurement initiative, supporting cross-European measurements of total traffic through specific links, mapping of reachable destinations covered by a route, delay measurements, flow capacity, continental averages of hops, and packet loss rates.
Well-defined metrics of delay, packet loss, flow capacity, and availability are fundamental to measurement and comparison of path and network performance. Tools that measure in a statistically realistic way are disturbingly slow to emerge, but we are starting to see reasonable prototypes for measuring: TCP throughput (treno, Mathis&Mahdavi/PSC), dynamics indicative of misbehaving TCP imlementations (tcpanaly, Paxson/LBL), and end-to-end delay distributions such as NetNow (Labovitz/Merit), the Imeter (Intel), and detailed ping and trace route analysis (Cottrell/SLAC)[5]. Tools to isolate traffic bottlenecks and congestion points are still generally not available. Merit is developing prototype tools measuring routing instabilities, but they are still deployed only at select exchange points. The network time protocol (NTP) addresses many of the time issues associated with measurement tools, but some metrics (e.g., one-way delay) will require synchronized infrastructure measurement probes and beacons deploying global positioning system (GPS) and similar timing devices.
It is hoped that many of the emerging path performance tools will serve users quite well, both for self- diagnosis of problems and through benefiting from lessons learned by others conducting measurements over the shared infrastructure. Further, potential customers would ideally be able to evaluate and compare alternative providers and monitor service qualities. Most of these performance tools measure end-to-end characteristics from points originating and terminating outside individual networks, for example, response time and packet loss (using ping) and reachability (using traceroutes).
In general, users are most interested in metrics that indicate the likelihood that packets will be get to their destination in a timely manner. Therefore, estimates of past and expected performance for traffic across specific Internet paths, not simply measures of current performance, are important. Users are also increasingly concerned about path availability information, particularly as it affects Internet applications requiring higher bandwidth and lower latency/jitter, such as Internet Phone and videoconferencing. Availability of such data could help assist in scheduling online events, such as Internet-based distance education seminars, and also influence user willingness to purchase higher service quality and associated service guarantees.
The NLANR Web site, http://www.nlanr.net/INFO, maintains a repository of links to key sites containing information on performance tools. NLANR, through the Cooperative Association for Internet Data Analysis (CAIDA, described below), is working on a taxonomy of measurement tools. This analysis will be complete in mid-1997 and will be available on both the NLANR and CAIDA Web sites. Tools being reviewed in this taxonomy are listed in Table 1 below.
Name / contact | Object measured / summary |
---|---|
Internet measurement | |
TReno by Matt Mathis, Jamshid Mahdavi |
TCP bandwidth / User-level TCP implementation |
bing by Pierre Beyssac |
Bottleneck bandwidth / Measures w/o filling link |
{b|c}probe by Bob Carter |
Bottleneck bandwidth / Measures w/o filling link |
Internet availability and latency | |
ping by BRL (now ARL) |
Availability / latency / pkt loss / The original ping |
Nikhef
ping by Eric Wassenaar |
Availability / latency / pkt loss / Many minor differences from orig ping |
traceroute by LBL |
Routes / Each hop in path with per-hop latency |
Nikhef
traceroute by Eric Wassenaar |
Routes / Many minor differences from orig traceroute |
MTrace by Bill Fenner |
Multicast routes / Does for multicasts what traceroute does for unicasts |
traceroute web servers | Traceroutes from odd places / Reverse traceroute from all over globe |
wwping by Jonathon Fletcher |
Web server availability / Tries single html query, returns server info |
User-oriented Internet measurement efforts | |
timeit by Jeff Sedayao, Kotaro Akita, Cindy Bickerstaff |
Web performance / benchmark of HTTP query performance across Internet |
Montreal
Internet service providers by Peter Burke Consulting |
Latency/packet loss to many Montreal ISPs / attempt to systematically rate Montreal ISPs |
Internet measurement efforts | |
MIDS Internet Weather
Report by John S. Quarterman, Smoot Carl-Mitchell, Gretchen Phillips |
Global end-to-end latency / latency distributions from Austin, TX to all over world |
Internet Weather
Report by Clear Ink |
Large ISP latency / pkt loss / latency/loss from Bay Area to large ISPs |
Network Probe Daemon (NPD) by Vern Paxson |
Route behavior / Traceroute data from various end hosts |
NetNow by Craig Labovitz |
ISP backbone delay / packet loss / Taken from each NAP |
IPMA by Merit IPMA Project |
Backbone routing behavior / A study of backbone routing behavior and problems |
Routing Arbiter stats tracking | Defunct / Replaced by IPMA |
SLAC WAN Monitoring by Les Cottrell, Connie Logg |
Latency/availability information to assorted sites / systematic pinging, sophisticated recording of results |
MFS MAE Information by MFS |
Link utilization and current connections / MAE status reports |
looking glass by Ed Kern |
Router stats / router debugging stats query web interface |
High-performance measurement tools | |
netperf by Rick Jones |
Min latency & max throughput / Very thorough high performance benchmark, includes results db |
ttcp by BRL (now ARL) |
Max throughput / This archive includes most versions |
nettest | Max throughput / Cray throughput measurement program |
netspec by Roel Jonkman |
Max throughput / throughput test scripting language |
High-performance measurement efforts | |
vBNS Perf Sampling by Von Welch |
vBNS max throughput / vBNS max throughput between sample host pairs |
Packet trace collectors | |
tcpdump by LBL |
Unix / Most common portable packet dump program |
snoop | Sun,SGI / bundled with Solaris, Irix |
etherfind by Sun |
SunOS/packet dumper bundled with SunOS |
Packetman by Netman Group |
LAN packet dumper |
CflowD by Daniel W. McRobb, John Hawkinson |
Program to analyze Cisco flow-export packet dumps |
OC3mon by MCI/NLANR |
PC with 144 Mbit ATM card / Fast ATM flow dumper |
fs2flows by NLANR |
Unix / Extracts flows from packet dumps |
NeTra
Met by Nevil Brownlee |
DOS, Unix / flow monitoring / analysis for accounting |
Statistics collection | |
NetSCARF by Merit NetSCARF Team |
Collects, manages, and displays SNMP stats |
Data available from traffic flow tools include flow type (e.g., web, E-mail, FTP, RealAudio, and CU-SeeMe), sources/destinations of traffic, and distributions of packet sizes and duration. These measurement tools must be deployed within networks, particularly at border routers and peering points. Traffic flow characterization measurement therefore requires a higher degree of cooperation and involvement by service providers than do end-to-end performance-oriented measurements. End users or large institutional sites can also use these tools to monitor traffic; for example, MCI has placed OC3mon flowmeters at vBNS nodes at each of the NSF-supported supercomputing centers. These devices provide detailed information on traffic flows and assist in analyzing usage and flagging anomolies.
Today's infrastructure is unprepared to deal with large aggregation of flows, particularly flows that are several orders of magnitude higher volume than rest, such as videoconferencing. Providers and users need mechanisms and tools to support more accurate accounting for resources/bandwidth consumed.
Flow characterization tools include the OC3mon traffic monitor (providing real-time monitoring of traffic at 155 megabits per second [Mbps] speeds), developed by Joel Apisdorf and others within MCI's vBNS research team. MCI makes detailed flow data graphics publicly available through the vBNS Web site http://www.vbns.net. Figure 1 below represents a time series plot of flows across the vBNS node at the National Center for Supercomputing Applications (NCSA) from 24-28 January 1997. Other data on autonomous systems, country-specific flows, and distributions of packet sizes, flow volume, and flow duration are available according to user-defined flow characteristics.[6]
Figure 1. Time series plot of packets across vBNS at NCSA
Nevil Brownlee of the University of Auckland, New Zealand, has also been working with the IETF Real Time Flow Meter (RTFM) working group to develop tools for accounting and related flow measurement [7]. He has developed the NetraMet / Nifty tool, most notably to support resource accounting in New Zealand. John Hawkinson (BBN Planet) and Daniel McRobb (ANS) are developing the Cflowd tool to augment and further analyze data provided by the netflow switching capability of Cisco routers. They presented preliminary results of their analyses at the February 1997 meeting of the North American Network Operators Group (NANOG).
Given the dynamic nature of the Internet environment, collected traffic data will be primarily of historical interest unless tangible improvements occur in our ability to analyze and predict network behavior. Without the necessary and fundamental understanding that internetworking traffic modeling and simulation offers, practioners will continue their skepticism about the utility of empirical measurement studies.
Yet, there is little consensus on how to accomplish IP traffic modeling or how to incorporate real-time statistics into such analyses. Telephony models developed at Bell Labs and elsewhere rely on queuing theories and other techniques that are not readily replicable to Internet-style packet-switched networks. In particular, Erlang distributions, Poisson arrivals, and other tools for predicting call-blocking probabilities and other vital telephony service characteristics typically do not apply to wide area internetworking technologies.
Internet measurement researchers face growing skepticism from practitioners (ISP engineers) who question the utility and relevance of traffic studies vis-a-vis the realities of instrumenting large Internet backbones. While gaps persist, the mutual interdependence of these communities and the growing requirements to assess Internet dynamics suggest a strong need for identifying common ground.
Visually depicting Internet traffic dynamics is the goal of collaborations between NLANR researchers at Stanford University, the University of California at San Diego, and Xerox PARC, which are described below.
Traffic visualizations: In late 1995, K. Claffy, Eric Hoffman, Ipsilon (now NLANR/UCSD), Tamara Munzner of Stanford University, and Bill Fenner of Xerox PARC began work on visually depicting Internet traffic components. Rather than tackle the Internet topology as a whole, they chose to experiment with visualization techniques using the smaller MBone infrastructure [8].
Figure 2. European MBone Traffic: illustrates European MBone topology,
characterized by a relatively more efficient star topology than seen in the
United States MBone structure, largely because of bandwidth scarcity that
provides stronger incentive for more efficient configurations. Data from
17 March 1997
To depict this traffic, Munzner et al. used the mrwatch utility developed by Atanu Ghosh at the University College, London, to generate MBone data. They then translated these data into a geographic representation of the tunnel structure as arcs on a globe by resolving the latitude and longitude of the MBone routers. The resulting visualizations provide a macro-level review of textual data (hosts' names, IP addresses, etc.) These visualizations permit a level of understanding of the global MBone traffic structure that is unavailable from the data in their original form--lines of text with only hostnames and IP addresses. The representations are interactive and three-dimensional, and permit analysts to define groupings and thresholds in order to isolate aspects of the MBone topology.
NLANR makes these maps publicly available as both still images and VRML objects, the latter for use with a VRML (virtual reality modeling language) browser.
Figure 3. Global MBone traffic: illustrates the concentration of
MBone traffic in the Northern Hemisphere (US & Europe) - data from
17 March 1997
NLANR staff is developing a network visualization tool (Anemone) that has already been used for tasks such as delineating relationships among AS. Figure 4 depicts AS peering adjacencies, sampled from a BGP session on 1 May 1996. Node sizes are proportional to the total number of BGP peering relationships in which the AS participates; line sizes are proportional to number of routes advertised across the corresponding adjacency.
Figure 4. AS Peering Relationships
Figure 5 provides a 3D VRML view of AS peering versus the earlier planar view. This image depicts BGP peering relationships for all AS that peer with at least seven other AS.
Figure 5. 3D View of AS Peering Relationships
Development of a prototype global Web caching hierarchy is another focus area of NLANR. Under the direction of K. Claffy and Duane Wessels of UCSD/NCAR, NSF, and Digital are sponsoring the deployment of root Web caches at each of the NSF-supported supercomputing centers (SCCs). The SCCs, and hundreds of the caches that tie into these root caches, run the NLANR-developed Squid caching software, a publicly available package supported by community volunteers led by Duane Wessels. Details on this project are available at http://www.nlanr.net/Cache.
Figure 6. Cache traffic in Asia
As part of this global caching project, NLANR has developed a tool to visualize global caching traffic flows. Figure 6 shows a snapshot of Asian caching traffic patterns on 19 January 1997. The red flows indicate a high volume of traffic between the caches. NLANR software automatically updates these images daily, which has already proven useful in optimizing caching topologies. In particular, mid-1996 analysis and visualization of caching logs helped to support the NLANR decision to implement access controls for root caches in the United States to force coherence to a more sound hierarchical global structure.
Various tools used to depict and visualize Internet traffic flows are identified in Table 2 below. Those used to develop NLANR/CAIDA visualizations are described at http://www.nlanr.net/Caidants/.
Name / Contact | Summary |
---|---|
Link congestion
visualization by NLANR |
Plot of latency variance on routes to various hosts |
MBone visualization by NLANR |
MBone geographic visualization, updated daily |
Web cache
visualization by NLANR |
Squid cache hierarchy geographic visualization, updated daily |
A Map of the
MBone by Elan Amir |
A map of the MBone |
ASExplorer br Merit IPMA Project |
NAP route map |
pubnetmap by Dave Jevans |
Visualization of all Internet links and latencies |
Etherman by Netman Group |
LAN traffic monitor |
Interman by Netman Group |
LAN connectivity monitor |
Geotraceman by Netman Group |
Geographical traceroute |
Hostname to Lat/Long | Useful subroutine for Internet mappers |
xplot by Tim Shepard |
Making tcp plots |
MIDS by John Quarterman & Co. |
Internet cartographers extraordinaire |
mview by Thaler |
MBone status visualization |
Simulation tools/models: In addition to a lack of data on Internet traffic flows and performance, a similar dearth exists in quality analysis, modeling, and simulation tools, particularly those capable of addressing Internet scalability. The absence of these tools hinders the ability of networking engineers and architects to reasonably plan and execute the introduction of new technologies and protocols. Developing and improving the quality of these tools necessitates that forums be established to improve cooperation between researchers and commercial firms. Illustrative tools include:
Toward this end, NLANR/UCSD is creating the Cooperative Association for Internet Data Analysis (CAIDA). CAIDA is meant to be inclusive, building upon existing NLANR measurement collaborations with supercomputing centers, ISPs, universities, vendors, and government. CAIDA is also responsive and is designed to meet evolving and future needs of the Internet by encouraging continued innovation by the R&E community, tempered by the realities of the commercial Internet infrastructure.
CAIDA is a collaborative undertaking to promote greater cooperation in the engineering and maintenance of a robust, scalable global Internet infrastructure. It will address problems of Internet traffic measurement and performance and of interprovider communication and cooperation within the Internet service industry. It will provide a neutral framework to support these cooperative endeavors. Tasks are defined in conjunction with participating CAIDA organizations, and are either
CAIDA's initial goals include:
The goal is to have both government and industry participate in CAIDA, for the benefit of the larger Internet environment.
Inherent in CAIDA's creation are fundamental precepts covering the acqusition and use of data and the public availability of CAIDA's tools. Specifically, CAIDA participants will determine
Currently, CAIDA is a project of the National Laboratory for Applied Network Research (NLANR) within the University of California, San Diego. In May 1997, NLANR/CAIDA will host its second in a series of Internet Statistic and Metrics Analysis (ISMA) workshops. During 1997, UCSD/NLANR personnel will work with participating companies to define the goals, priorities, and desired membership of CAIDA. By late 1997, CAIDA will be registered as a nonprofit organization.
UCSD has submitted a proposal to the National Science Foundation to help seed the CAIDA effort. Complementing an industrywide effort with government support will promote balance among the needs of the various communities (private, research, government, and users) and facilitate the near-term development and deployment of critical measurement technology and techniques.
For more information on the status of CAIDA and information on its tools and associated analyses, visit CAIDA's Web site (http://www.caida.org).
See also: