IPv6 New Feature: Hop-by-Hop Based Real-Time Efficient Connection/Link Status Investigation Mechanism

Hiroshi KITAMURA <kitamura@ccm.cl.nec.co.jp>
NEC Corporation
Japan

Contents

1 Introduction

In the current best-effort networks, communication quality degradations are caused by traffic congestion, inappropriate network designs, or the like. There is a need to avoid encountering degradations and to have a mechanism to locate bottlenecks and problems in the communication path efficiently.

In order to meet the needs, a mechanism to collect the status information of the nodes along a path is needed. Protocol features for hop-by-hop based data acquisition could be used to achieve the mechanism; however, the current Internet Protocol Version 4 (IPv4) specification does not have such functions.

Since the IPv4 does not provide features for hop-by-hop status investigation, current tools achieve hop-by-hop investigations by using existing protocol features in a different manner from their intrinsic ones (e.g., "traceroute" tool utilizes TTL-expiration notification messages). Therefore, these tools have restrictions that are observed as the following problems. One is that these tools issue many probe/reply packets for investigation, and another problem is that these tools can investigate only "outgoing" path. Since users have greater needs to investigate "incoming" (download) path, the latter is a critical problem.

In order to solve these problems completely and provide an efficient generic framework to collect status information, a new hop-by-hop based real-time status investigation mechanism that is called "Connection/Link Status Investigation (CSI)" is proposed. It comprises Internet Protocol Version 6 (IPv6) protocol extensions and a user tool called "tracestatus." Its design, implementation, and evaluation results are discussed in the following sections.

2 Problems with current status investigation tools (ping and traceroute)

Currently, "ping" and "traceroute" can be considered the representative real-time status investigation tools of IPv4 networks.

The "ping" tool enables End-to-End reachability status to be investigated; its mechanism is based on Internet Control Message Protocol (ICMP) Echo Request/Reply messages. When the source sends an Echo Request message to the destination, the destination sends back an Echo Reply message to the source. In this way, the communication reachability is investigated. The "ping" tool is a good one; it is simple and can collect elementary End-to-End status information. However, it cannot be used for hop-by-hop based status investigation, and it cannot locate bottlenecks and problems in a communication path.

The "traceroute" [5] tool enables a hop-by-hop based investigation to be performed; it finds IP address information of nodes along the communication path. Fig. 1 shows the "traceroute" mechanism.


Figure 1: "traceroute" mechanism

The "traceroute" mechanism is based on User Datagram Protocol (UDP) probe packets and ICMP reply messages. At first, the source prepares UDP probe packets whose TTL field values in the IPv4 header are controlled and sends them to the destination. The TTL values of probes are started from 1 and are increased one by one. If the TTL value is smaller than the number of hops to the destination, it expires on an intermediate node on the way to the destination and the probe does not reach the destination. From such a node, an ICMP message is issued to the sender of the probe message to notify the sender that the TTL has expired, and the ICMP message becomes a reply to the probe. The source IP addresses of the replied ICMP messages provide IP address information of nodes along the communication path. By using this mechanism, reachability to each replied intermediate node is verified.

The "pathchar" [6] tool enables the bandwidth of each link on a communication path to be inferred; it is an advanced form of "traceroute," whose basic mechanism is the same as that of the latter. By producing a large quantity of probes and replies, the queue status and the consumed bandwidth of each intermediate link are inferred.

These tools are effective; however, their mechanisms are not optimized because their investigation is achieved by applying an existing protocol feature (TTL-expiration notification messages) that is not designed for an investigation purpose. Therefore, they have two major restrictions that are observed as the following problems.

One is that they can investigate only the "outgoing" path, i.e., that from the source to the destination. As Fig. 1 shows, there are "outgoing" and "incoming" paths between the source and the destination. The combinations of nodes and links that packets pass through on the outgoing and incoming paths are generally different in large-scale networks. Since most users use networks for Web browsing or data downloading, they desire to know the status information of the "incoming" path. However, "traceroute" and "pathchar" tools can investigate only the "outgoing" path that is not matched with the needs of most users. With only the basic "traceroute" mechanism, it is theoretically impossible to investigate the "incoming" path.

The other problem is that they issue too many probe/reply packets to make investigations. This is inefficient because the packets may cause traffic congestion and it takes a long time to reach a result. From this viewpoint, "pathchar" performance is worse than that of "traceroute."

The use of multiple probes causes another type of problem. It may degrade the reliability of acquired information, because the second probe and those that follow may not pass through the same path as the first probe did in the dynamic routing network environment.

3 IPv6 new feature CSI is proposed as a solution

The main reason "traceroute" cannot investigate the "incoming" path and uses an inefficient method (the issuing of multiple packets) is that the idea of incorporated hop-by-hop based real-time status investigation mechanism into the IPv4 specification did not exist when IPv4 was designed, and the mechanism of "traceroute" was designed without introducing any new IPv4 specifications.

Since the design of the IPv6 [1] specification has not been completely finished yet, it is possible to incorporate an efficient hop-by-hop-based real-time status investigation mechanism into the IPv6 specification.

CSI mechanism [7] is proposed as a solution to the problems of the current status investigation mechanisms, and it is designed as a new feature of IPv6.

3.1 Requirements for the CSI mechanism

In order to create the CSI mechanism as an efficient hop-by-hop-based investigation mechanism, the following requirements must be satisfied.

  1. Investigate both "outgoing" and "incoming" paths.
  2. Minimize the number of issued packets for investigation.
  3. Avoid the problem of investigation path variance caused by dynamic routing.
  4. Be a sufficiently simple mechanism.
  5. Enable CSI messages to pass through CSI feature disabled nodes.
  6. Avoid serious problems occurring when CSI messages are lost.
  7. Be easily expandable to collect various types of data.
  8. Be applicable to networks of any scale and any type.
  9. Be able to run on network environments not having reachability.

4 Design of the CSI mechanism

The CSI mechanism is composed of one new IPv6 [1] Hop-by-Hop option (CSI option) and three new ICMPv6 [2] messages (Status Request/Reply, and Status Report).

4.1 CSI option (IPv6 Hop-by-Hop option)

The mechanism incorporates a new IPv6 Hop-by-Hop option, called the CSI option, to investigate and acquire status information of nodes along the communication path. Option Type of the CSI option must be started as 00 to avoid the discarding of packets at CSI feature disabled nodes, and the third bit must be set to 1 to insert acquired data into pre-allocated data space area in the option. Fig. 2 shows the CSI option format (which includes a collected data example for the Record Route operation).


Figure 2: CSI Option format (Record Route example)

A packet in which the CSI option is set investigates and acquires status information when it passes through the nodes along the path. This mechanism helps to minimize the traffic and to avoid the varying path problem.

4.2 Status Request and Status Reply (ICMPv6 messages)

Second, a pair of messages that makes a round-trip probing loop is prepared by introducing new ICMPv6 messages (Status Request and Status Reply). The basic behaviors of these messages are similar to those of Echo Request and Echo Reply messages.


Figure 3: Basic CSI mechanism

Fig. 3 shows the basic CSI mechanism. A pair of messages makes a round-trip loop, the action of which is similar to an action of a boomerang. The Status Request message is transferred from the source (initiator) node to the destination node along the outgoing path, and the Status Reply message is transferred back from the destination node to the source node along the incoming path. The IPv6 Hop-by-Hop CSI option is set in both the Status Request and the Status Reply messages.

These Status Request/Reply messages have two roles. One is to trigger the status information acquisition by the CSI option operation routines on each node along the paths. The other is to carry the acquired data by attaching them to the messages. To ensure that the length of each message packet does not increase as it passes through the nodes, the attaching procedure is done by inserting the acquired data into a pre-allocated data space area in the CSI option of the messages.

Each acquired data is called a "record." One record is basically composed of one interface's data. When both the incoming and outgoing interfaces are investigated, two record spaces are consumed in the data space area per node.

In most cases, only one pair of Status Request/Reply messages can collect all of the status information of nodes along the paths.

4.3 Status Report (ICMPv6 message)

The data-carrying capacity (the size of the pre-allocated data space area) of one pair of Status Request/Reply messages is limited. In cases where the number of nodes along the path is large and the total record length exceeds the data-carrying capacity, it is impossible to collect status information data with one pair of messages alone.

In order to solve this problem, the mechanism incorporates another new ICMPv6 message (Status Report). When the data space of the pair of the Status Request/Reply probing messages is full, all of the collected data records are transferred to the source (initiator) node by using this Status Report message.

When the issuing condition (typically, data space is fully filled with the collected data) is satisfied, all of the collected data until then are copied to the prepared Status Report message, and the message is transmitted to the source node that has initiated the Status Request message.

After the Status Report message is issued, the data space area of the Status Request/Reply probing messages is reset (emptied) and the probing messages can continue the data collection further.


Figure 4: Status Report message example

Fig. 4 shows an example that explains how Status Report messages are issued. Nodes 2 and 4 satisfy the issuing condition.

Introducing the Status Report removes any restriction on the number of nodes along the paths. Thus CSI mechanism can be applied to any network environments of any size and scale.

The Status Report is introduced not only to make the CSI mechanism scalable but also to enable it to locate problems on problematic networks [see the following section].

4.4 Operation Modes

The CSI mechanism must operate on any network environment. Basically, it is designed to operate efficiently on the normal network environments in which the source can receive reply packets from the destination.

In addition, it is required to operate and locate problems on the problematic network environments whose connection reachability has been lost.

In order to enable the mechanism to operate on the latter environments, the idea "Operation Mode" is introduced. It has two modes (SPSR and SPMR).

4.4.1 SPSR mode

The SPSR (Single-Probe/Single-Reply) mode is used to support normal network environments in which the source can receive reply packets from the destination. Up to here, this document has been discussing the SPSR mode. Fig. 3 showed a typical procedural example of the SPSR mode. The CSI mechanism usually runs in the SPSR mode.

The policy of the SPSR mode is to minimize the number of packets for the investigation. In order to minimize the number of times the Status Report is issued, a maximum size is pre-allocated to the data space area of the Status Request/Reply messages.

In most cases, the initiator node sends one probe packet (Status Request) and receives one reply packet (Status Reply) for the investigation, because the CSI mechanism usually operates on network environments in which the number of nodes along the path is not large and the total size of the collected record data is smaller than that of the pre-allocated data space.

The initiator receives several Status Report messages only in rather rare cases when the number of nodes along the path is large.

4.4.2 SPMR (Single-Probe/Multiple-Reply) mode

The SPMR (Single-Probe/Multiple-Reply) mode is used to support problematic network environments whose connection reachability has been lost.

In problematic network environments, the source node cannot receive the Status Reply message. The reply message is lost somewhere on the path, and the source cannot obtain the collected status information that is carried by the Status Report.

The SPMR mode solves this problem, and the location where the problem occurred on the path is found.

In the SPMR mode, all of the nodes along the path must issue Status Report messages, when the Status Request/Reply messages pass through them. Fig. 5 shows the procedure of the SPMR mode.


Figure 5: SPMR Mode

By using this mode, the location where the problem occurred on the path will be found. The problem is usually located at the link or node that follows the node that issued the last Status Report messages.

From the viewpoint of the number of issued packets, the operation in the SPMR mode is more efficient than the "traceroute" operation (Multiple-Probe/Multiple-Reply), but it is less efficient than the operation in the SPSR mode. Therefore, the SPMR mode should be used only for the problematic network environment investigation.

5 Considerations

5.1 Coexistence with Route option (source routing)

In this section, the relationship between the CSI mechanism (CSI option) and the source routing mechanism (Route option) is described. Basically, the CSI mechanism can coexist with the source routing mechanism, but the following issues must be considered.

It is easy to specify the source-routing path for the "outgoing" path, because packets passing through this path are issued from the source node. On the other hand, it is almost impossible to specify the source-routing path for the "incoming" path, because packets passing through this path are issued from the destination node. There is no convenient way to transfer the source path specification information for the "incoming" path from the source node (user applications) to the destination node.

Even if there were some method of executing such information transfers, another problem would occur. The CSI option operation routines, which manage the "incoming" CSI message (Status Reply) in which the Route option is set, cannot issue proper Status Report messages. This is because the return (destination) address for the Status Report message cannot easily be obtained from the invoked Status Reply message. Fig. 6 shows this situation.


Figure 6: Return Address Problem for Source-Routed Packets

The return (destination) address for the Status Report message must be the address of the initiator node. For an "incoming" path, the return address is taken from the "destination" address of the invoked Status Reply message. Since the Route option modifies the "destination" address of the message, it does not indicate the address of the initiator node and a problem occurs.

Introducing a special usage of the source routing solves these problems completely. It is possible to specify a source-routing path as a loop path (the source node and the final destination node are the same) by specifying all the nodes on the CSI probing loop path. With this method, nodes on the loop path can be investigated by means of the Status Request message alone. Since the loop path is composed of an "outgoing" path only and the "incoming" path has been eliminated, these problems are solved. Fig. 7 shows this solution.


Figure 7: Solution to Coexist with Route Option

Since the CSI option can coexist with the Route option and it is possible to fix the path of the CSI messages, the CSI mechanism can easily avoid the varying path problem that is caused by the dynamic routing mechanism and can provide reliable information in repeated probing procedures.

In addition, this means that the CSI mechanism can be applied to mechanisms that are based on a source routing mechanism (e.g., mobile IP for IPv6).

5.2 Security and administrative issues

Since one of the features of the CSI mechanism is that it is simple and lightweight like the "ping" and "traceroute" mechanisms, the CSI mechanism does not include cumbersome functions (e.g., authentication, authorization, accounting) for obtaining the status information. Since the CSI mechanism is vulnerable from the viewpoint of security, it is advisable to cooperate with other mechanisms, such as filtering procedures performed on intermediate nodes. This is an administrative issue, not a problem of the CSI mechanism itself, and thus they must be treated in a separate manner.

Which network environments the CSI mechanism is applied to and which types of status information it collects must be considered. When the mechanism is applied to an intranet environment, no major problems will occur in collecting various types of status information, because communication area is limited. When it is applied to the Internet, however, the situation is different. An appropriate way to handle this issue is necessary. It is advisable to incorporate an administrative filtering mechanism into the investigated nodes and to allow investigation access only to status information that can be made open to the public.

6 Implementation and evaluation

The CSI mechanism has been implemented as an IPv6 new feature and evaluated, and it has been verified that it operates as effectively as it is designed to (its source code will be made open to the public soon). In this section, implementation issues and evaluation results of the CSI mechanism are described.

6.1 User tool "tracestatus" for the CSI mechanism

The "tracestatus" tool is provided to users to enable use of the CSI mechanism. The relationship between "tracestatus" and Status Request/Reply is similar to that between "ping" and Echo Request/Reply.

The "tracestatus" tool has several operation combinations, which can be specified by using the command line switch options.

In the typical operation of the "tracestatus," probe packets are issued periodically, and the network status is investigated continuously. In this case, the initial probe collects static information of nodes along the path, and the following repeated probes collect dynamic information of the nodes. Since the CSI mechanism can coexist with the source routing mechanism (Route option), the source-routing path (the argument of Route option) of the repeated probes can be set by using IP address information that is collected by the initial, and it becomes possible to fix the investigating communication path.

6.1.1 Timestamp and real-time consumed bandwidth measurement

By using the CSI mechanism, timestamp and counter information of nodes is collected, and real-time consumed bandwidth of each link can be calculated.

By sending repeated probes periodically with a certain interval, timestamp value T(n) and counter value C(n) of an interface on a certain node are obtained. The following is the formula for calculating the real-time consumed bandwidth of the interface on the node.

Consumed Bandwidth: B(n) = (delta C) / (delta T) = (C(n+1) - C(n)) / (T(n+1) - T(n))

Since (delta C) and (delta T) are closed information on the node, they are accurate. Therefore, B(n) becomes accurate information.

In the usage of this formula, it is not necessary to synchronize timestamp information with any time coordinates. Only its dynamic range must be sufficiently larger than the round-trip time and the intervals between the probes.

6.2 Characteristics and implementation on nodes (routers)

As previously described, the specification of the CSI mechanism is simple enough to implement on any IPv6 nodes.

The CSI mechanism design is based on the following policies:

  1. No requests for complex and intelligent operation implementations are made of intermediate nodes (routers), because the simple data acquisition is the main role of the CSI operations on the intermediate nodes.
  2. Intelligent operations are requested only of a utility application on the initiator node, because the collected data is analyzed only by that application.

Mechanism implementation will not cause performance problems, because reliability of the collected data does not depend on how long data acquisition operations take. This means that high-speed processing of the data is not necessary. CSI operations can be performed as low-priority operations. Software implementation is sufficient, and no hardware acceleration mechanism for it is necessary. Therefore, the CSI mechanism can be implemented on most intermediate nodes (routers) with ease.

"Basic investigation data type definitions set" defines fundamental data collection set. Timestamp, IP address, interface attributes, packet and octet count information are used as components of the definitions. There is no special definition of the data format. Timestamp is defined in ICMP[3], and interface attributes and counter information are defined in MIB-II[4]. Since the ICMP (timestamp) and MIB II features have been implemented properly on most nodes, it is easy to implement the CSI mechanism on such nodes.

6.3 Evaluation of implementation on intermediate nodes

Table 1 shows the environment where the CSI mechanism has been implemented and evaluated.

Table 1: Implementation and Evaluation Environment
OS FreeBSD2.2.8R + KAME
Nodes PCs (Pentium II 400MHz)
Network 100BASE-TX/10BASE-T Ethernet
Source code size 1.2kline (on intermediate nodes)

Since effects of the CSI mechanism implementation on intermediate nodes (routers) are the most noteworthy issues, the performance (operation time) of the mechanism only on intermediate nodes has been evaluated.

6.3.1 Evaluation methods and results

Table 2 shows parameters that control operation mode combinations of the mechanism.

Table 2: Parameters for Operation Combinations
Investigation Data Type Name Address, Static, Compress, Dynamic, All
Operation Mode SPSR, SPMR
Investigated Interface(s) Incoming, Outgoing, Both
Number of passed nodes from 20 to 32

Operations of all 168 (4*2*3*7) combinations were evaluated. Operation time of the mechanism on intermediate nodes was measured by comparing the round-trip time of the CSI probe packets and that of null operation packets of the same size.

Since it is difficult to prepare many (from 20 to 32) actual nodes, source-routed packets were used in the evaluation. They passed through three nodes. Fig. 8 shows network configuration that is used for the evaluation. The configuration shown in the left figure is used in the main evaluation. The configuration shown in the right figure is used to obtain data for the correction.


Figure 8: Network Configurations

Fig. 9 shows all results of measured CSI operation time, and Table 3 shows the average operation time for the same investigation data type, operation mode, and investigated interfaces combination.


Figure 9: CSI Operation Time on Intermediate Nodes

Table 3: Average CSI Operation Time

One of the remarkable points of these results is that all operation time is less than 200 microseconds. It is short enough (Reference: it takes 240 microseconds for a 300-byte length packet to merely go through 10Mbps link. [300B * 8bit / 10Mbps = 240 usec]). It has been proved that the CSI mechanism is lightweight and it does not cause a performance degradation problem.

These evaluations were performed on various levels of jamming traffic environments (from 0% to 90% loaded). The obtained results were the almost same, and obvious differences were not observed. This is another proof that the mechanism is lightweight and it works effectively on various environments.

As these results shows, the operation time mainly depends on the operation modes (SPSR or SPMR) and the data record length. The SPMR operation takes twice the time or longer than that of the SPSR in the investigation for short-length data records. Operations for long-length data records take a longer time than that for short-length data records.

Effects of number of passed nodes can be confirmed in Fig. 9. Comparing operation time for a small number of passed nodes (20, 22) with that for a large number of passed nodes (24,26,..) in the investigation for short-length data records (e.g., Address), it is observed that the operation for a large number of passed nodes takes obviously a longer time, because Status Report messages are issued in the investigation operation for large number of passed nodes cases.

7 Conclusion

The CSI mechanism is proposed as a means of solving two types of problems in the current hop-by-hop based real-time status investigation mechanisms (e.g., "traceroute"). It is designed as a new feature of IPv6, has been implemented and evaluated, and has been verified to work as effectively as it was designed.

Since the CSI mechanism can investigate network status of both outgoing and incoming paths between the source and the destination with the absolute minimum numbers of packets (one pair of messages in most cases), it is an efficient and sophisticated mechanism that solves the problems of the current status investigation mechanisms.

The CSI mechanism is designed not only to replace mechanisms such as "traceroute" or "pathchar"; it is also designed as a generic status investigation framework mechanism. It has potential to provide various types of status investigation functions. Since the CSI mechanism is flexible enough to extend, it can be easily applied to various advanced usages by introducing new data type definitions to collect.

The mechanism enables hop-by-hop based real-time consumed bandwidth measurement of links of both the outgoing and incoming paths with only "Basic investigation data type definitions set" (timestamp, IP address, interface attributes, packet and octet count information that is defined by MIB-II). It can locate bottlenecks on the paths, and the obtained data can be utilized to re-design the network. When the CSI mechanism is applied to a network that has the capability to guarantee the quality of service (e.g., bandwidth), it works as a verification mechanism.

This mechanism can also be used to locate problems on the communication path by changing its operation mode. Since the mechanism is simple and does not request complex operations of routers, it is easy to implement the CSI option and does not cause performance problems on intermediate nodes (routers).

It is expected that various types of unique data types definitions to collect will be proposed in future, and that the applicable area of the CSI mechanism will consequently be extended.

References

[1] S. Deering, R. Hinden, "Internet Protocol, Version 6(IPv6) Specification," RFC 2460, December 1998.

[2] A. Conta, S. Deering, "Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification,"RFC 2463, December 1998.

[3] J. Postel, "Internet Control Message Protocol," RFC 792, September 1981.

[4] K. McCloghrie, et al., "Management Information Base for Network Management of TCP/IP-based internets: MIB-II," RFC 1213, March 1991.

[5] V. Jacobson ftp://ftp.ee.lbl.gov/traceroute.tar.Z

[6] V. Jacobson ftp://ftp.ee.lbl.gov/pathchar/

[7] H. Kitamura, "Connection/Link Status Investigation (CSI) IPv6 Hop-by-Hop option and ICMPv6 messages Extension," <draft-kitamura-ipv6-hbh-ext-csi-01.txt> Oct. 1999 "work in progress."