Deployment and Experiences of the WIDE 6bone
Kazuhiko YAMAMOTO <Kazu@Mew.org>
Akira KATO <email@example.com>
Munechika SUMIKAWA <firstname.lastname@example.org>
Jun MURAI <email@example.com>
WIDE 6bone is one of the largest test-beds of IPv6, deployed and operated by WIDE Project. As an activity report on WIDE 6bone, this paper first describes its history and characteristic. Then experienced problems of IPv6, including source address selection, renumbering, and application dilemmas, are explained. Translators between IPv4 and IPv6 are also discussed for smooth migration.
Since the IPv4 address space is being exhausted due to the rapid growth of the Internet, it is of recent interest to develop IPv6 and to migrate to IPv6 environments without great confusion. To deal with this problem, WIDE Project has formed IPv6 working group in the summer of 1995.
WIDE Project is a research consortium in Japan focused on the Internet and related technologies. A few hundred researchers in universities and companies are involved. WIDE Project has operated test-bed networks called WIDE Internet since 1988, including nation-wide backbone as well as international connectivities.
In late 1995, the working group had four independent IPv6 implementations and interoperability had been tested on local networks. To test the implementations against live traffic and to activate IPv6 developments, it was highly desired to deploy IPv6 test-bed networks to interconnect the sites involved in the IPv6 development by IPv6 over IPv4 tunnels and native data-links on top of the WIDE infrastructure.
We started deploying our IPv6 network, called WIDE 6bone in June, 1996. Currently, WIDE 6bone is one of the largest IPv6 test-beds and is playing a role of core site in worldwide 6bone. Various data-links including serial links and ATM PVCs are used while other 6bone sites typically use tunnels only.
Through the operation of WIDE 6bone, we found some unique topics to IPv6, including a source address selection algorithm, a renumbering procedure, and application dilemmas on dual stack environments. We also developed three translators for smooth migration from IPv4 to IPv6.
This paper is an activity report of the IPv6 working group of the WIDE Project, published with the hope that our experiences will provide useful information to those who will take similar approaches. It is organized as follows: Section 2 describes the history and characteristics of WIDE 6bone. Experienced topics and translators are explained in Section 3 and Section 4, respectively. We mention our future plans in Section 5 and show our conclusion in Section 6.
During the 35th IETF at Los Angeles in March, 1996, Jon Postel, Bob Fink, Jun Murai, etc., had a private meeting. They agreed to deploy worldwide the 6bone connecting America, Europa, and Asia-Pacific rim. WIDE Project decided to deploy WIDE 6bone by the next IETF meeting.
At 15:22 on June 9 (JST), the first IPv6 packet made a round trip between Nara Institute of Science and Technology (NAIST) and WIDE Tokyo NOC. This moment is the birth of WIDE 6bone. The connectivity between them was not an IPv6 in IPv4 tunnel but a serial link.
We split a T1 circuit into two data-links: 1472 Kbps and 64 Kbps. The 1472 Kbps link was used to connect the regular IPv4 network while the 64 Kbps link was assigned to the IPv6 network. This method eliminated payment of an additional leased line. This also prevented the IPv6 network from affecting the IPv4 network operation.
The University of Tokyo was also connected to Tokyo NOC, and packets from NAIST to the university were routed on the same day. Osaka University joined to WIDE 6bone after a week. At this time, IPv4 compatible IPv6 addresses were used just for convenience, and routing was managed manually due to the lack of routing daemons.
At 6bone BOF on the 36th IETF at Montreal in June, 1996, several activities including WIDE Project were introduced. Consensus was achieved to establish worldwide 6bone on July 15. WIDE Project raised its hand for participation. Toward the promised day, WIDE Project renumbered IPv4 compatible IPv6 addresses to the old IPv6 test addresses. Cisco systems became the first peer of WIDE Project on July 16. An IPv6 in IPv4 tunnel was also deployed with G6 on July 18. Since then, WIDE 6bone provided packet transit service for the worldwide 6bone as one of the core members.
Since the number of participants gradually increased, static routing was becoming unmanageable. So we developed a RIPng daemon, then installed it onto the routers in WIDE 6bone in February 1997. A half year later, hops within WIDE 6bone exceeded 10, and the number of full routes in worldwide 6bone reached 300.
To resolve the RIPng mess and to test the new address architecture, the ngtrans working group of IETF, where 6bone BOF activities were absorbed, decided the following on the 39th IETF at Munich in August:
Since some parts of our IPv4 backbone switched to ATM PVCs, we needed to find an alternative of the multiplexor technology. ATM can be considered a logical multiplexor, but there was no specification to carry IPv6 packets over ATM PVC links. So we wrote up an Internet-Draft on this topic, then migrated to this substitution.
As of this writing, about 10 independent implementations of IPv6 stack are being developed and are operated in WIDE 6bone, and about 20 organizations are interconnected by serial links, ATM PVCs, and IPv6 over IPv4 tunnels. Figure 1 illustrates the current topology of WIDE 6bone. Compared with other IPv6 networks, WIDE 6bone is characterized by the fact that many vendors are involved and heterogeneous data-link technologies are in use.
This section explains some topics that we learned through our operation of WIDE 6bone. Section 3.1 shows the source address selection algorithm. The smooth renumbering procedure and the application dilemmas are discussed in Section 3.2 and Section 3.3, respectively.
Through the operation of WIDE 6bone, we realized that one of the key mechanisms for IPv6 is to select an IPv6 source address when initiating a communication. We simply call this "source selection." The following subsections discuss issues on scope rule, renumbering measures, preliminary policy routing, and interoperability, then show our proposed algorithm.
We first insist that source selection must be scope-oriented. As of this writing, we categorize IPv6 addresses in four scope-classes: node-local, link-local, site-local, and global. The node-local class consists of a loopback address and node-local scope multicast addresses. The link-local class consists of an unspecified address, link-local unicast addresses, and link-local scope multicast addresses. Site-local class consists of site-local unicast addresses and site-local scope multicast addresses. Global class consists of global unicast addresses and other scope multicast addresses.
For convenience, we describe a scope class corresponding to an IPv6 address as SC(addr) and define the following total ordering:
node-local < link-local < site-local < global
To ensure communication, we conclude a basic principle for a given IPv6 packet as follows:
SC(src) = SC(r1) = SC(r2) = ... = SC(rn) = SC(dst)
where "src" is the source address, "dst" is the destination address, and r1, r2, ..., rn are addresses of intermediate nodes specified in the routing header if one exists.
The background to introduce this principle is as follows:
Source selection must be robust enough against renumbering, which has two big problems. One is communication within the site during renumbering and the other is aging of old addresses.
Since a site-local address is chosen as a source for a given destination whose scope-class is site-local according to our scope rule, communication within a site is ensured during renumbering. (Even if site-local addresses are not used, new global addresses are selected thanks to the longet match mechanism described later.)
A mechanism to expire addresses is defined . When the preferred lifetime is expired, an address becomes deprecated until its valid lifetime has expired. Deprecated addresses must not be candidates for source selection.
Source selection should provide a policy routing mechanism, if possible. One candidate to accomplish preliminary policy routing is longest match (this idea was originally proposed by Quaizar Vohra on the IPng mailing list). Consider the following multi-homed example in Figure 2.
Site "S" has two connections: one is to provider "P," the other is to provider "Q." Site "S" also has two global address blocks: "ps" and "qs" which are assigned by provider P and Q, respectively.
A typical requirement of site S is effective routing. Effective routing means that incoming packets should come back through the link where the outgoing packet was carried. If we use longest match against the destination for source selection, this requirement is achieved in some cases.
When a host in site S initiates a communication in a host in provider P, the outbound packets are routed to the link P-S if the routing is well-managed at the border of site S. The reply packets are routed to the link P-S again if the initiating host chose the source address of "ps" rather than "qs."
Lastly, source selection must provide high interoperability. The discussions above are on cases where a source address is not bound. But there are cases where it is already bound, typically found in TCP connection established by a remote host. It is very likely that the remote host violates our scope rule. Experience tells us that the best strategy for interoperability is to be liberal in what you receive and be conservative in what you send. So, use a bound source in reply packets rather than consider it an error.
Here a source selection algorithm integrated the functions above, which were already implemented and well-tested in our IPv6 stack.
Twice renumbering events lets us discuss this topic in detail. Renumbering on the IPv6 environment is much more smooth than it is on IPv4, thanks to router advertisement and stateless address configuration. Here is a procedure to carry out gentle renumbering.
More research on this topic is still necessary to automate this procedure as much as possible. The key is probably automatic router renumbering which is being discussed in IPng working group, IETF.
In dual-stack environments, it is desired that differences of network layer are invisible to applications or users. Thanks to the protocol independent name-to-address function (i.e., getaddrinfo()) , it is straightforward to make clients protocol independent. For example, a user specifies a host name to telnet without a concern about whether it is an IPv6 host or an IPv4 host. The getaddrinfo() function resolves A and/or AAAA records making use of AF_UNSPEC magic and gives the results to telnet in the socket address structure which is protocol independent in the application point of view.
If servers are executed by Internet super daemon (i.e., inetd), they can also be protocol independent. But if they are implemented as stand-alone daemon, it is necessary to tell them which socket should be opened, AF_INET or AF_INET6 or both, maybe by command line arguments.
Through our experience, we found two ambiguities in the dual stack environment. One is a common problem to all applications while the other is related with MX.
The getaddrinfo() function has the former ambiguity if AF_UNSPEC is specified. Consider this simple scenario. "Host.foo" and "host.bar" are different host and have an IPv6 address, say A6, and an IPv4 address, say A4, respectively. Suppose that a search list to complete the domain name is configured as "foo bar."
Let a user try to log in executing "telnet host." If an implementation of the getaddrinfo() function tries to resolve with AF_INET first, then AF_INET6 next, telnet connects to A4 (i.e., host.bar). If another implementation of the getaddrinfo() function tries in the reverse order, telnet connects to A6 (i.e., host.foo).
This problem is that there is no consensus on preference between address family and the search list. We believe that the search list is preferred to address family. In this case, results must be a list of A6 and A4 (i.e., host.foo then host.bar, according to the search list), independent on how the getaddrinfo() function is implemented. We should have a consensus on this.
E-mail delivery highly depends on an MX record of DNS. In a dual stack environment, an MX record is associated with a list of FQDN; each of them is then associated with a set of IPv4 or IPv6 addresses. Given an MX record, a dual stack host can deliver messages to the most preferred MX host, regardless of the capability of the MX host. An IPv4 host, however, is able to send messages only to another IPv4 MX host. So is IPv6 host.
What happens within the same MX domain? If there is no reachability to the most preferred MX host in the domain, messages cannot reach the final destination. So, in the dual stack environment, operators must be careful in reachability within their MX domain. One possible solution is to make the most preferred MX host dual stack.
NS record also has the same problem. Unfortunately, the solution above is not available for the NS record since there is no preference for it.
In the early stage of the migration from IPv4 to IPv6, it is expected that IPv6 islands will be connected to the IPv4 ocean. On the other hand, in the late stage of the migration, IPv4 islands will be connected to the IPv6 ocean. IPv4 hosts will remain for a long run after the exhaustion of IPv4 address space. So, it is necessary to develop translators to enable direct communication between IPv4 hosts and IPv6 hosts. For convenience, let us categorize IPv4/IPv6 translator into four types (See Figure 3).
Practical assumptions here are:
A typical translator consists of two components. One is an interpretation between IPv4 packets and IPv6 packets discussed in Section 4.1. The other is address mapping between IPv4 and IPv6, as explained in Section 4.2. Section 4.3 describes implementation status.
For interpretation of IPv4 and IPv6, we can use two technologies. One is a traditional header conversion router like NAT. The other is upper layer gateways to bridge an IPv4 connection and an IPv6 connection like SOCKS.
Header conversion is fast enough but has dilemmas in common with NAT. A good example is difficulty in translation of network level addresses embedded in application layer protocols, which are typically found in FTP and FOOBAR. Header conversion is also suffered from problems which are not found in NAT: A large IPv4 packet is fragmented to IPv6 packets because the header length of IPv6 is typically 20 bytes larger than that of IPv4. Semantics of ICMPv4 and that of ICMPv6 are not inter-changeable.
The upper layer approach is free from the dilemmas above, since every connection is closed in each network protocol. We have implemented two translators of this kind based on the SOCKS scheme and TCP relay.
SOCKS is a framework of remote procedure calls for socket operations. Since SOCKS maps port numbers on multiple hosts onto port numbers on a single node, only outgoing connections can be established. Thus, the SOCKS scheme can be used for Translator A and C.
TCP relay server acts like the man-in-the-middle. When a TCP packet reaches a relay server, the network layer tosses it up to the TCP layer even if the destination is not a server's address. The server accepts this TCP packet and establishes a TCP connection to the source host. One more TCP connection is also made from the server to the real destination. Then the server reads data from one of two connections and writes the data to the other.
Address mapping should be invisible both to IPv6 applications and to IPv4 applications. Typical approaches are library extensions on local hosts and DNS extensions.
SOCKS is an example of library extensions. To install remote procedure call functions for socket operation, SOCKS clients are linked to SOCKS libraries. SOCKS protocol version 5 can make use of FQDN to tell a target host to a SOCKS server. Since FQDN is protocol independent, address mapping, more strictly connection mapping, is closed in the SOCKS server host.
Let's consider Translator C, for example. A SOCKS clients can make a TCP connection over IPv4 to a SOCKS server. Then an IPv6 target host is specified in FQDN by the client to the server. The server can resolve an IPv6 address concerned with the FQDN string, then connects to the IPv6 host. Of course, this mechanism can apply to Translator A.
Another approach is to extend DNS. For Translator A, for instance, an extended DNS server returns a temporary IPv6 address against a host name associated with IPv4 only. On this approach, we must ensure reachability of assigned addresses in each protocol and resolve a DNS cache problem. If we use TCP relay for interpretation, address mapping is necessary only for destination addresses since source address mapping is closed in the relay server. Discussions later assume TCP relay, so we focus on destination address mapping.
For Translator A, DNS extensions are straightforward, since the address space of IPv6 is much larger than that of IPv4. An extended DNS receives a query to resolve AAAA records against an IPv4 host. The server can assign a temporary IPv6 address to embed the IPv4 address. This mapping can also be implemented by extensions of name-to-address libraries as described later in Section 4.3 because the mapping rule is static.
For Translator B, this approach is hard to implement. We have to map all IPv6 hosts inside to a very small set of IPv4 addresses assigned to the site. IPv4 address assignment must be dynamic. Worse, we must take care of time to live of such dynamically assigned entries.
The DNS cache problem is closed in the site for Translator C. Moreover, we can use a fairly large space of IPv4 private addresses. It is dynamic to map an IPv6 address to an IPv4 private address but we can set a long time to live.
Translator D is straightforward. We can assign an IPv6 address out of the assigned IPv6 space to each IPv4 host inside statically in advance. Since this is a static assignment, no cache problem occurs.
We have one implementation of header conversion router. It is designed to cover all types of translators with extended DNS servers. But the DNS cache problem on Translator B has not been resolved yet. Currently, we are testing it setting time to live of DNS entries to zero.
Translator C based on SOCKS has been developed using the NEC's SOCKS package. This approach is very useful for SOCKS-ready environments. When SOCKS clients are IPv6-ready, this translator can act as Translator A, thanks to the protocol independence feature of SOCKS protocol version 5.
We have implemented a TCP relay server of Translator A as a user level daemon, called faithd. We also modified our IPv6 stack implemented on FreeBSD, so that it tosses up TCP segments destined to another host if a manager sets it to do so. For address mapping, we extended the getaddrinfo() function to map an IPv4 address to a specific IPv6 address block specified by an environment variable.
We linked many clients including telnet, rlog in, rsh, ftp, ssh to this function so that they become faithd-ready. Since we implemented our original name server which can handle UDP/IPv6 queries as well as UDP/IPv4 queries, the clients are completely IPv4 free. Faithd cannot only relay both normal and out-of-band TCP data but also can emulate TCP half close. It is also able to translate network level addresses in both the PORT and PASV FTP command. Faithd is installed in WIDE 6bone to test its stability.
We are planning several advanced experiments using WIDE 6bone.
We started assigning the fourth octet, pseudo NLA1, to non-WIDE member sites in Japan on request bases. This will provide us with environments where various experiments, including NLA1 assignment policy, multi-homing beyond TLA, are possible. In Figure 1, three boxes filled in gray are the NLA1 applicants.
Our original registry is now operational. This registry collects information of not only organizations under the pseudo NLA1 of WIDE Project but also under other NLA1s in Japan. This information is automatically registered and updated to the 6bone registry. We need more research on this topic if security and other aspects are in consideration.
It is necessary to implement Translator B immediately, since some organizations will start using IPv6 equipment for their business, presumably in this year. As mentioned above, though DNS extensions for this kind of translator pose the dilemmas above, we have some ideas how to resolve them in our mind.
We don't know several things until we implement and run them.
Through the operation experience of WIDE 6bone, which is one of the biggest test-beds of IPv6, we learned the importance of source address selection and renumbering, and noticed ambiguities of dual stack environments.
Our proposed algorithm of source selection is scope-oriented and robust enough against renumbering, and we found that it carries out preliminary policy routing for multi-homed networks. We explained the gentle renumbering procedure making use of stateless address configuration. On dual stack environments, it is necessary to achieve consensus on preference between protocol families and a domain search list. Also, operators working on e-mail delivery systems must ensure reachability in the same MX domain.
To achieve smooth migration from IPv6 to IPv4, the translator in the early stage needs to be implemented quickly. Outgoing connections from a site can be established with our programmed translator which is already running on WIDE 6bone. More research is necessary for incoming connections.
Our deep gratitude goes to other members of the IPv6 working group, WIDE Project, for their support over the past years. We thank operators of worldwide 6bone for their cooperation.