Measurements of the Internet Topology in the Asia-Pacific Region

Bradley HUFFAKER <bhuffake@sdsc.edu>
Marina FOMENKOV <mfomenkova@ucsd.edu>
David MOORE <dmoore@caida.org>
Evi NEMETH <evi@caida.org>
Cooperative Association for Internet Data Analysis
USA

Abstract

CAIDA, the Cooperative Association for Internet Data Analysis, has done a study of network connectivity in the Asia-Pacific region. The focus is on network latency and performance, autonomous system (AS) and country peering, and third party transit. Using ICMP packets from nine geographically diverse monitors, we collect data that includes the forward IP path and the round-trip delay to about 2000 destinations (mostly Web servers) in the Asia-Pacific region. From these IP paths and delay values we gather statistics about transit providers, peering, and the appropriateness of our metrics to measure the Internet.

Contents

1. Introduction

As the Internet grows relentlessly, so do the difficulties in monitoring and understanding its complexity. Studies are available that rely upon theoretical models or simplified assumptions in order to emulate certain features of Internet behavior. However, the actual macroscopic dynamic characteristics of the global Internet have not been studied in any detail. Our work seeks to narrow this gap. By monitoring the performance of a large set (many tens of thousands) of end-to-end paths over a period of time, we obtain empirically-based insight into macroscopic Internet topology dynamics.

We apply active probe measurements to capture and track relevant cross-sections of global Internet topology. CAIDA developed the skitter tool to continuously measure the forward IP path and round trip time (RTT) from a source to many thousands of destinations. In this paper we focus specifically on Internet connectivity and topology measurements of the Asia-Pacific region, using data obtained from nine skitter hosts located in North America (4), Asia (3), Europe (1), and New Zealand (1). Each source queries the same database of more than 21,000 destinations from all over the world, including about 2000 destinations in the Pacific Rim countries. By augmenting this data with both routing tables and geographical data, we can see large components of Internet topology at multiple logical and geopolitical levels: from individual hops, to Autonomous Systems (AS), to countries traversed.

The paper is organized as follows. We review related work in Section 2. Section 3 briefly describes the design and implementation of the skitter measurement tool, lists the sites that host skitter sources, and gives an overview of the raw data. We discuss the data analysis methodology and present results in Section 4. Section 5 concludes the paper.

2. Related work

Internet mapping studies typically focus on characterizing and delineating Internet topology and/or performance. Characterizing topology requires collecting data on Internet nodes and links to create a graph-like map of parts of the Internet. Characterizing performance typically involves measuring RTT's between pairs of hosts and studying how the RTT varies depending on the path, time of day, traffic type, and other parameters.

One of the most extensive attempts to map the Internet is the Mercator project [1], which uses UDP packets in the same manner as traceroute, to discover Internet topology at the router level. Unlike skitter, Mercator maps the Internet starting from a single host, and does not use any external database to guide selection of its probe destinations. Instead, Mercator selects probe targets using a heuristic called "informed random address probing" to guess which portions of the IPv4 address space are likely to contain addressable nodes. The algorithm builds a graph rooted at the Mercator source host, and uses source-routing to try to discover cross-links. Mercator also has heuristics for resolving aliases, i.e., identifying multiple interfaces (IP addresses) that belong to the same router. After running for three weeks in the summer of 1999, Mercator had discovered nearly 150,000 interfaces and nearly 200,000 links. In five days of data collected in November 1999, skitter running on 18 source hosts saw about the same number of nodes and nearly 270,000 links.

There are other Internet mapping projects [2, 3, 4] that attempt to obtain router and/or AS level maps. From skitter data we can derive both router maps and AS maps.

In terms of macroscopic performance studies, Paxson [5] analyzed 40,000 end-to-end path measurements taken with repeated traceroutes among 37 Internet sites in November-December of 1994 and 1995. His goal was to examine routing behavior including pathologies (e.g., loops, rapid changes, errors), stability and path symmetry. The data suggested that at least at that time, Internet paths tended to be heavily dominated by a single prevalent route, and about two-thirds of the source/destination pairs used paths that persisted for days or weeks. Approximately half of the paths measured were asymmetric.

Another large end-to-end performance monitoring project is PingER [6], which involves laboratories and research universities collaborating in High Energy Physics experiments (HEPnet). Similar to skitter, PingER sites perform active measurements using ICMP echo requests at a relatively low frequency. They use five performance metrics and aggregate results by day and month on their Web site. PingER differs from skitter in its focus on a small community of mission-specific nodes (about 500) on a single virtual infrastructure (HEPnet), whereas skitter monitors tens of thousands of destinations distributed throughout the commodity Internet. Also, PingER only collects data on RTT and packet loss, while skitter records RTT and forward path information.

Surveyor [7], like PingER, also continuously monitors end-to-end performance among 70 participating sites deployed in the research community, but uses different metrics. This project measures unidirectional delay, loss and routing for a total of 2641 paths. Their goal is not to learn about the Internet as a whole, but rather to help participating pairs understand the connectivity between them.

Another community-specific monitoring project is NLANR's (National Laboratory for Applied Network Research) Active Measurement Program (AMP) [8], which has located 85 monitors primarily at sites that are on NSF-supported high-speed research networks. The number of participating hosts is increasing by about 10 per month. AMP collects data on RTT, loss, topology and throughput among these sites. NLANR aggregates and posts the results on the Web. [9]

A key difference between skitter and other ongoing projects described above is that skitter collects, on a large scale, both topology and performance data, which allows us to study the correlation between them. Also, skitter measurements deal with the general Internet as a whole rather than with just specific infrastructures.

3. Measurements and data

Skitter (www.caida.org/Tools/Skitter/) is an Internet monitoring tool that measures the forward IP path and round-trip-time (RTT) from a source to many destinations. It uses a methodology similar to that of traceroute but using ICMP echo request packets rather than UDP packets, and supports accurate kernel time stamps for the RTT measurements. Skitter increments the time-to-live (TTL) of outgoing packets until a TTL sufficient to reach the destination is attained, and records the IP addresses of intermediate routers that reply. If an intermediate router does not reply to three ICMP packets, skitter increases the TTL to probe the next hop. When skitter receives an ICMP echo reply from the ultimate destination, it terminates probing that destination and records the RTT. Skitter abandons probing a destination if the TTL reaches 30, or it receives an "ICMP unreachable" reply, or it detects a routing loop (i.e., a duplicate IP address in the path).

A primary design goal of skitter was to minimize the amount of traffic its measurement imposes on the network. Skitter uses 52-byte packets, and probes at low frequency. The interval between packets to final destinations varies depending on the size of the destination set. For our current set of about 21,000 destinations, skitter can probe each path about 20 times a day, less during unfavorable network conditions since packet loss will require repeating probes to intermediate routers (up to three times). Less than one probe per hour is too low a frequency to study intra-day variability of paths, much less shorter timescale performance variations. Skitter focuses rather on discovery and depiction of macroscopic Internet topology and characterization of long-term performance and connectivity trends.

3.1 Destination selection

Most skitter probe destinations are Web or other content servers. We did not randomly scan the IPv4 address space to gather a destination list as we wanted our measurements to represent typical Internet traffic patterns. We used a variety of ad hoc, out-of-band mechanisms to gather our target set of destinations, such as Web cache logs and mapping backwards from an IP network to its nameserver and DNS domain, and then to the hostname www in that domain.

We did not explicitly seek permission to probe destinations, since the load presented is trivial (a few ICMP packets a day) even for the most resource-constrained servers. We compiled the original list in June 1998; by October 1999, 8,000 of the original destinations were no longer reachable. Although we have not done in-depth analysis of this erosion, we suspect that it likely derives largely from IP renumbering or installation of firewalls rather than genuine disappearance of infrastructure (i.e., companies going out of business). In addition to periodically removing unreachable destinations, we also immediately delete any site that asks to be deleted from the database.

For the present study on Asia-Pacific connectivity, we chose a subset of skitter destinations that were in member countries of the Asia Pacific Economic Cooperation forum (APEC):

3.2 Determining geographic location of hosts

Determining the geographic location of a host from its IP address is currently non-trivial and necessarily imprecise. Some host names legitimately indicate geographic location, but it is not reliable.

To mitigate this problem, CAIDA has developed NetGeo (www.caida.org/Tools/NetGeo/), a tool for establishing correspondence between IP addresses, domain names, autonomous systems, and geographical location [10]. NetGeo leverages several whois databases and other heuristics. Relying on whois information for a domain incurs the disadvantage of mapping all hosts in a domain to the whois-registered headquarters. Although sufficient for single-site organizations, this approach becomes a significant source of error for many ISPs with equipment deployed all over the world. Inaccuracy in any known geographical mapping of IP addresses renders challenging any study of physical characteristics of the global Internet. As NetGeo evolves, it will provide more precise geographical location and political mappings for IP addresses. In the meantime, NetGeo mappings are quite useful for this study since they do offer reasonably accurate administrative mappings, i.e., what countries are administratively responsible for a path regardless of where the hops in the path physically reside.

3.3 Augmenting destination list

With NetGeo, we determined that destinations in the United States heavily dominated the original probe database. We used a few techniques to try to get at least 50 reachable IP destination addresses per country in our region of interest. First, we searched the APNIC database to determine blocks of IP addresses assigned to countries on our list. We then used nslookup and traceroute to find active reachable servers in those blocks. Our Asian collaborators provided us with some addresses of interest to them. We also used Web searches to get destinations representing major government, educational, or mass-media Web servers in the Asia-Pacific region.

Figure 1 shows the composition of the subset of destination hosts used for this study, selected from the full database with the following restrictions.

  1. Adjacent or nearby countries with less than 50 destinations each were aggregated. Thus we have three groups:
  2. Of 197 Russian destinations in our database, most are in Moscow and Central European provinces. Only four hosts are near the Pacific coast, too small a sample to be meaningful. We excluded Russian destinations from this study.
  3. For Canada, we included only destinations in provinces on the Pacific coast: British Columbia, Vancouver, Yukon, and Lulu Island.
  4. We applied similar criteria to U.S. destinations, limiting ourselves to hosts in Alaska, California, Hawaii, Oregon, and Washington. The resulting subset was still too large relative to the other countries in our list (more than 2,500 destinations in California alone!). To avoid bias, we capped the number of U.S. destinations at 500. We retained all hosts in Alaska (22) and Hawaii (38), and selected randomly from our database 95 hosts in Oregon, 95 in Washington, and 250 in California.
  5. We did not find any destinations in Malaysia.


Figure 1. The sample of hosts in Pacific Rim countries.

The nine skitter source hosts used in this study are in New Zealand, Singapore, Tokyo, Korea, Canada, London, and three in the United States (Washington D.C., San Diego and San Jose, California). We used sources on each side of the Pacific to get a bi-directional view of Pacific Rim connectivity. Each skitter host continuously queries the full database of about 21,000 destinations, and archives daily to a central storage machine.

4. Analysis of data

A typical data set obtained from each skitter host per day contains from 300,000 to 500,000 traces reaching about 19,000 destinations in 24 hours. Unfortunately, our database continues to erode: destinations become unreachable at the rate of about 30 per day. We plan to assess this phenomenon more carefully and derive automated techniques for keeping the list stably populated.

There are, on average, 16 to 26 traces per destination per day. Each trace consists of an RTT to the ultimate destination, as well as addresses of intermediate routers that responded. In the future, we plan to store RTT's obtained at each hop as well. Summaries are prepared for each individual skitter host and posted daily at http://www.caida.org/Tools/Skitter/Summary. They include graphs of the hop counts distribution, RTT distribution, RTT versus longitude of destinations, and other path information at the AS and country granularity. We use the subset of about 2000 Asia-Pacific destinations described in Section 3.3 for the analyses presented in this paper.

We consider two granularities of paths in our analysis: IP path and AS path. An IP path is the sequence of IP addresses in a trace; and AS path is the sequence of Autonomous Systems traversed. IP paths will always exhibit higher variability than AS paths, often much higher. Load sharing designs can yield multiple paths through the same AS, which would change the IP path but not the AS path between a given source-destination pair. The IP paths seen by the skitter monitoring hosts located in Korea and New Zealand change more than paths from any other skitter source, and the unstable links in these paths are closer to the source than to the destination. Between 60 percent and 70 percent of the IP paths from the other seven skitter monitors are stable throughout the day; the number is closer to 40 percent for the Korea and New Zealand boxes. In contrast, about 90 percent of the destinations from each of the 9 skitter hosts have a single stable AS path throughout the day.

4.1. Analysis of hop counts and RTTs

We consider two quantitative characteristics of a source-destination pair: the IP path hop count and the RTT. While hop count seems like a natural connectivity metric, it often has little correlation to the underlying transit infrastructure. Figure 2 (a, b, c) illustrates this point, depicting the average hop count to a destination versus the physical distance to that destination.


Figure 2a. Average hop count as a function of geographic distance from the San Jose skitter host to Pacific Rim destinations. Typical of skitter sources, there is little correlation between the distributions.


Figure 2b. The distribution of hop counts for the San Jose and San Diego skitter hosts.

Although the two U.S. sources are relatively close (both in California), they have drastically different hop counts to the same set of destinations. The San Diego source has universally higher hop counts, mostly because of its topological location behind several routers inside SDSC. Figure 2b shows the complete hop count distributions for the two sources, including the translation across the x-axis based on the source, but also a somewhat different shape.


Figure 2c. The average hop counts from the San Jose and San Diego skitter hosts to Pacific Rim countries.

More significant is the high variance in the relative differences in hop counts by geographic category. Obviously, the location of the source -- specifically how many hops before it gets to a core router -- will shift the distribution by some constant amount but retain a standard shape. But if that were the only factor, then the two sets of bars in Figure 2b would exhibit the same relative heights. As it turns out, these two sets of bars do not move in tandem at all, and thus the differences derive from more than just the location of the source.

We conclude that hop count is not a representative metric for expressing Internet connectivity geographically.

However, our data do suggest a certain correlation between geographic distance and the minimum obtainable RTT between a source-destination pair. Figure 3a, b depict this relationship, framed asymptotically by the speed of light in the physical medium carrying the signal, which determines a lower bound for the RTT for a given distance. Variable delays in routers at each hop (due to forwarding lookups, queuing and other processing) result in heavy-tailed RTT distributions, but minimum (and to some extent median) RTTs are still correlated with physical distance between two hosts (Figure 3a). The upper tail (90 percentile) loses this correlation, since delays much larger than the median transmission time are due to unpredictable network anomalies.


Figure 3a. Median RTT from the San Jose skitter host to Asia-Pacific hosts. The line denotes the minimum "speed of light" cone.

The symbol and color used for points in Figure 3a represent the country where the destination host was located. A few points that fall below the speed-of-light line do not represent violations of causality, but rather inaccurate geographical mappings of the corresponding IP addresses (cf. Section 3.2). Those hosts are in reality simply closer to the skitter host than our geographical database placed them.

Figure 3b below mirrors Figure 2b but depicts RTT instead of hop count.


Figure 3b. Median RTT from the San Jose and San Diego skitter hosts to destinations in the Pacific Rim countries.

Although the San Diego and San Jose skitter sources have rather different connectivity at the logical level (as demonstrated by their hop counts in Figure 2b), the median RTTs from these hosts to other parts of the world are quite similar.

4.2 Analysis of AS's and routing

We map IP addresses to individual origin AS's, using Border Gateway Protocol (BGP) routing tables collected by University of Oregon's Route Views project [11]. By abstracting our IP paths into AS paths, we can study cross-sections of AS topology.

Figure 4 shows the first three hops of three "AS dispersion" graphs. For each of three skitter source hosts (in D.C., San Jose, and London), these graphs show the first three AS's seen in all paths from the source. The height of the bar represents the percentage of traces that passed through a particular AS at a given hop. Gray areas represent dispersion into too many AS's to draw clearly in the figure. Since the data is sorted from the bottom by percentage of paths through each AS, once the percentage is too small to see, the region will be grey to the top of the display.

Black indicates that the trace had ended in the previous hop, i.e., was only two AS hops long). For example, looking at the traces from the Washington, D.C., skitter host, all start in the same first hop (AboveNet's AS), then about 10 percent of the traces go to AS 701, from 701 about 2 percent go on to AS 702 and another 2 percent go to AS 9010.


Figure 4. Distribution of AS paths from skitter sources deployed by a backbone ISP.

Note that although these source hosts are across the globe from each other, the AS paths in Figure 4 show exactly the same AS-based routing to all destinations regardless of which router in the network sees the packet (all three panels in Figure 4 look practically identical). These results clearly depict this ISP's policy of maintaining a consistent picture of the outside world across its entire backbone.

However, although the next hop AS (usually a peer ISP) is consistent from anywhere within this ISP's network, the packet may exit that network at any of several places where it peers with that next hop AS. The subsequent ISP carries the packet from there. Thus, although the routing views are consistent from each of these three locations, we do not necessarily expect the RTT distribution from each source to remain the same.


Figure 5. Median RTT from the London and Washington, D.C., skitter hosts to destinations in Pacific Rim countries.

Figure 5 illustrates this fact, and also suggests strong correlation between the RTTs observed from the Washington, D.C., and London sites. Indeed, a nearly constant difference in RTT of about 70 ms is observed from London versus Washington sources. Many traces originating in London are likely routed to Washington first, and from there to the next hop AS. The distance between London and Washington is 5,900 km, the speed of light in the fiber is about 230 km/ms, yielding a minimum delay for the round trip of about 50 ms, which is reasonably consistent with our RTT measurements.

4.3. Transit of the Internet traffic

Which AS's and countries are most prevalent transit carriers for Internet hosts in the Asia-Pacific Region? To answer this question, we have examined which AS's and countries are crossed by skitter paths. An AS (country) provides transit if it is neither the source nor the destination AS (country), but appears in the path. We grouped traces by the localities of their destinations (as described in Section 2), and determined the percentage of traces to each country/group that used a given AS or country for transit. The results for AS's are shown in Figure 6 and for countries in Table 1.

Note that these percentages do not sum to 100 percent since more than one country can provide transit for a single path. For example, if a trace goes from Japan to New Zealand as JP -> US -> CA -> NZ, both the US and CA will count this trace in their transit percentage for NZ. In most cases, traces cross through multiple intermediate AS's, but usually through only one third party country.


Figure 6. Percentages of traces handled by the top four transit ASes.

Four AS's appear most frequently as providing transit in all the skitter paths to the Pacific Rim. Figure 6 depicts the percentage of paths transiting each of these four ISPs.

AS's were included in Figure 6 if they played a transit role for more than 5 percent of paths for at least three skitter sources. For this analysis we considered traces from the San Jose skitter source, but not from the ones in London and Washington, D.C.. Using the data from all three hosts belonging to the same ISP would over-represent their (globally identical) policies in our sample. Also, AS's are not included in Figure 6 if they dominate only one or two sources. A good example of this is NAP.NET (5646), which provides transit to 72 percent and 82 percent of all traces from Japan and Singapore, respectively, but for less then 1 percent of traces for all other sources. NAP.NET is the first hop in the U.S. that Japan and Singapore skitter hosts use for transit to the U.S. Its dominance is a local phenomenon and was therefore eliminated from consideration of the region as a whole.

Altogether, the four major ISPs appear in 52 percent of all traces. It is interesting that among them, only one is not registered in the US: TELEGLOBE (6453) (registered in Canada).

The table below reflects transit relationships at the country level. A numerical value in the table indicates that the country in the first column provided transit for the specified percentage of the paths to each of the other countries. For example, the value 46.1 in the row labeled AU and the column labeled NZ means that Australia provided transit for 46.1 percent of the paths from skitter hosts to destinations in New Zealand. Again, we did not use the data from the Washington, D.C., and London skitters in this analysis in order to keep our sample statistically diverse.

Table 1. Transit Countries
all AU CA CH JP KR MX NZ SEA SWA TW US
US 71.5 77.8 82.0 90.3 49.5 61.6 100.0 79.6 63.0 97.8 83.5
CA 13.3 8.3 4.9 37.5 2.1 27.5 22.3 1.3 0.2
AU 2.8 18.4 46.1 1.6 0.4
JP 1.2 1.4 7.4 10.5 12.0 0.3
NZ 0.9 3.7
EUR 0.7 2.1 1.7 4.2 27.0
UK 0.7 0.0 0.0 0.1 0.0 5.8 21.1 0.2
SEA 0.3 0.7 5.6
AR 0.1 5.2
AE 0.1 1.9
CH 0.1 2.8
MM 0.1 1.6

An empty space means that there are no traces of that category, while the number 0.0 means less than 0.1 percent. In the headings on the table, EUR stands for all European countries, except for the United Kingdom, which is shown separately. AR is Argentina, AE -- United Arab Emirates, MM -- Myanmar.

U.S. networks do seem to dominate global Internet topology -- they provide transit for 71.4 percent of the total skitter paths that neither originate nor end in the United States. U.S. networks appear to be especially significant for other countries in the Americas: all traffic to Mexico and 97.8 percent of traffic to Peru and Chile (SWA) crosses the United States on its way. Our sample also shows a large transit role played by U.S. networks for traffic to China-Hong Kong (90.3 percent), Taiwan (83.5 percent) and Oceania (77.8 percent of traffic to Australia and 79.6 percent of traffic to New Zealand).

The second most prevalent transit country in our sample is Canada, but its load of the transit traffic is much smaller than that of the United States.

5. Conclusions

We present results of skitter measurements to destinations in the Asia-Pacific region. Our data shows the relationship between performance and geographical location of the destinations and provides insights into the connectivity patterns in this area of the world. We used the data from nine skitter hosts to obtain a bi-directional view of connectivity. One must be careful, however, to distinguish between results that are source-biased and those that are representative of genuine regional trends.

Our main conclusions are:

  1. There is little correlation between RTT and IP hop count for most Internet connectivity.
  2. A relatively small number of ISPs play an important role in the topology of Internet connections in the Asia-Pacific region. However, none of these ISPs alone dominates the whole network for any country considered.
  3. The United States is the major Internet transit intermediary for the rest of the world: 71 percent of traces that neither start nor end in the U.S. still pass through it. In most connections between different countries, the U.S. is the only third party country that also appears in the path.

6. Acknowledgments

We are grateful to CAIDA and sponsoring ISP's for help installing and maintaining skitter boxes. We thank K. Claffy, A. Broido and T. Monk for their comments on the manuscript and their suggestions for data analysis. This work was supported by DARPA cooperative agreement N66001-98-2-8922.

7. References

[1] Govindan R., Tangmunarunkit H., Heuristics for Internet Map Discovery, ftp://ftp.usc.edu/pub/csinfo/tech-reports/papers/99-717.ps.Z

[2] Govindan R., Reddy A., An analysis of Internet Inter-Domain Topology and Route Stability. In Proc. IEEE INFOCOM '97, Kobe, Japan, Apr 1997.

[3] Braun H.-W., Claffy K., Global ISP Interconnectivity by AS number, http://moat.nlanr.net/AS/

[4] Cheswick, W., Burch, H., Internet Mapping Project, http://cm.bell-labs.com/who/ches/map/index.html

[5] Paxson V., End-to-End Routing Behavior in the Internet, IEEE/ACM Transactions on Networking 5(5), 601-615, 1997.

[6] Matthews W., Cottrell L., The PingER Project: Active Internet performance Monitoring for the HENP Community, submitted to IEEE Communications Magazine on Network Traffic Measurements and Experiments.

[7] Kalidindi S., Zekauskas M., Surveyor: an infrastructure for Internet performance measurements, INET'99 http://telesto.advanced.org/~kalidindi/papers/INET/inet99.html

[8] McGregor A.J., Braun H.-W., Brown J.A., The NLANR Network Analysis Infrastructure, http://moat.nlanr.net/Papers/TonyM-IEEE-comms.ps

[10] Moore D., Periakaruppan R., Donohoe J., Where in the World is netgeo.caida.org?, http://www.caida.org/Papers/inet_netgeo.html, INET 2000 proceedings (this volume).

[11] Meyer D., University of Oregon route Views project, http://www.antc.uoregon.edu/route-views/