AYAME: A Design and Implementation of the CoS-Capable MPLS Layer for BSD Network Stack

Yojiro UO <yuo@nui.org>
Satoshi UDA <zin@jaist.ac.jp>
Nobuo OGASHIWA <n-ogashi@jaist.ac.jp>
Satoshi OHTA <osatoshi@jaist.ac.jp>
Yoichi SHINODA <shinoda@jaist.ac.jp>
Japan Advanced Institute of Science and Technology
Japan

Abstract

We are developing experimental implementation of a Multi Protocol Label Switching (MPLS) base Label Switching Router (LSR), named "AYAME." This system is based on the Berkeley Software Distribution (BSD) system, and has several improvements to the original semantics to support the advanced idea of MPLS. AYAME has also improved Class of Service (CoS) related functions to support Differentiated Services (Diff-Serv) capabilities. This paper describes  AYAME's design issue, implementation experiments, and new mechanisms such as efficient label processing.

Contents

Introduction

With the growth of the Internet, many animated discussions have been appearing about addition of Service Classes to traffic. The Differentiated Services (Diff-Serv) architecture [1] is one of the most common technologies to append these functions to today's Internet.

Since the Internet will become a more high-performance network in the near future, some improvements in the following areas are required:

When ISPs wish to provide the differentiated services, they need additional functions that support more efficient and better-operated traffic engineering, better-controlled QoS management and VPN services in their networks. There is the "label switching forwarding paradigm with network layer routing" that satisfies these requirements. The Multi Protocol Label Switching (MPLS) [2] [3] is one of technologies being studied in IETF MPLS working group [4] to actualize that paradigm. This technology is attracting a great deal of public attention for adding these functions to networks that want to have Differentiated Service capabilities.

MPLS is a technology for flexible transfer of Layer 3 packets using the fixed short-length label information created from the Layer 3 address information or other information to constrain the route to a specific path. In an MPLS domain, when a data stream traverses a common path, a Label Switched Path (LSP) can be established using MPLS signaling protocols. The ingress Label Switch Router (LSR) assigns a label to each packet and transmits it to downstream. Each LSR along the LSP decides the next hop according to the label in each packet.

Since using the label approach of the MPLS allows flexible routing control, each node can decide the next hop by only the label(s) rather than the network layer information. Thus, new routing services can be introduced easily, independent of the existing routing mechanism and not requiring change in the forwarding paradigm [5].

If the LSRs have the capabilities to define a QoS/CoS characteristic, and to map Diff-Serv Per Hop Behaviors (PHBs) to each LSP on it, then Diff-Serv enabled MPLS networks can be constructed [6].

Because MPLS-related research are developing, we need platforms as a research environment for it. In other words, we need implementation that can be changed and extended freely for the purpose of research, verification and experimental operations.

To address this situation, we designed the architecture of an MPLS router system that is well suited to the CoS functions, and implemented it to the BSD UNIX Network stack (NetBSD [7], derived from the BSD system). This system is named "AYAME."

This paper discusses the AYAME architecture mainly, and how to match the MPLS functions to the BSD network stack.

What is the MPLS?

In the Internet, a packet is transmitted with Internet Protocol (IP) encapsulation. When the packet is received by each node in its path, the node determines the next hop using the IP layer (Layer 3) header information in an incoming packet and the Layer 3 routing information held by the node.

Each node en route performs such a decision for each packet passing through. If these repeated operations could be reduced, it would be possible to make the faster network come true.

MPLS labelswapping and traditional IP forwarding
Figure 1: Layer 3 forwarding and Layer 2 label swapping

MPLS actualizes the requirement through separating the analysis of "forwarding behavior" of packets from "forwarding operations" based on their results (Figure 1). In the MPLS fashion, forwarding behavior is referred the forwarding behavior by "Forwarding Equivalent Class (FEC)," which is the class of the packets that would be handled same style. Only the ingress node analyzes forwarding behavior of a forwarded packet in an MPLS domain, and inserts the result of this analysis as a short fixed sized label bound to each corresponding FEC in a Layer 2 packet. Because the MPLS architecture [3] allows multiple labels as a stack to add several labels to a single packet, the label(s) is called the label stack. A packet with such label(s) added is called a labeled packet, and one without them is named an unlabeled packet. Any label distribution protocol passes out the binding of an FEC and label(s) to nodes, which packets travel along. When each transit node receives a labeled packet, the node resolves the Next Hop Node with the label that is bound to the information of the result of the analysis.

As described above, MPLS does the decision for the intra-network forwarding using labels in Layer 2. Therefore it can perform the routing independent of the Layer 3 routing information. This feature of MPLS is regarded as an extremely useful function for the actualization of network traffic engineering. It is also a currently developing technique that makes good use of the operation of such entities as a backbone network.

AYAME: a new-generation network layer on BSD network stack

This section describes the architecture of AYAME, its basic characteristics, design issues, and some advanced features such as its design structure modularity, hierarchical Label Switching Engine support, and exception processing.

Overview of AYAME

System-wide overview of AYAME
Figure 2: AYAME system overview

AYAME is the implementation of Multi Protocol Label Switching (MPLS) based Label Switching Router (LSR) on a NetBSD system, with CoS related functions to support Diff-Serv capabilities. We implemented AYAME faithfulness to the policy of the original BSD Network code. "Packet forwarding" never changes itself before and after the insertion of MPLS functions into the system.

Most of the MPLS essential functions, such as the label swapping engine (LSE), label information structures and any other mechanisms that would be needed when each packet is processed, are located in the kernel (details are below). But some entities, for example, a label distribution protocol(s), a mechanism for the interaction with layer 3 routing modules, and configuration utilities, are located in userland rather then kernel. Figure 2 shows the brief architecture of AYAME.

Followings are main features and characteristics of AYAME:

Inside of AYAME kernel

This section describes AYAME kernel design policy and some remarkable characteristics of the implementation issue.

Design policy

The functions provided by MPLS are located between Layer 2 and Layer 3 in a network. Therefore, there are some requirements of changes in both layers to add the MPLS extensions to the existing network stack.

We must give much attention to introducing MPLS, which was not involved in the design of the existing network stack, without any violation of the current semantics. The total consistency might be destroyed if thoughtless changes to the existing network cords affect some parts that have used those cords. In such cases, besides, it would be difficult to ensure the current semantics of network processing.

AYAME kernel implementation uses "complete modularization" according to functions to minimize those impacts. We classified all function blocks that construct MPLS as follows:

MPLS specific module (MPLS-CORE,MPLS-LDPs):

Core module of MPLS label processing

Layer 3 related module(s) (MPLS-NETWORK):

Interface(s) between a particular Layer 3 and MPLS-CORE module, (e.g. MPLS-IPv4, MPLS-IPv6) to process the Layer 3 specific functions

Layer 2 related module(s) (MPLS-DATALINK):

Interface(s) between a particular Layer 2 and MPLS-CORE module, (e.g. MPLS-Ethernet, MPLS-ATM) to process the Layer 2 specific functions

Because this approach can separate the MPLS functions from the existing network cords as far as possible, it needs minimum alteration to the existing network cords. Furthermore, the layering adapted to a current BSD UNIX Network stack layering. We inserted the MPLS layer between the Datalink layer (Layer 2) and the Network layer (Layer 3). Most MPLS-related functions are integrated into this layer.

In this layering architecture, the network layer can treat the MPLS layer as a kind of "intelligent" network interface. While the MPLS layer introduces some new functions, it does not destroy the current network semantics. We extended some APIs to manipulate the MPLS specific configurations and data structures (the routing table, and so forth) as well. Such design policies are familiar with the current network applications, and make it effortless to handle new functions offered by the MPLS layer from the upper layers.

This modularization makes division and relation between the functions clearer, and, as a result, the extendability is improved very much. As MPLS is a developing technique in the standardization process, it is supposed that more specifications will be suggested in the future. The AYAME design policy will make it easy to include these new specifications.

Module structure

This section describes the module structure of AYAME.

Figure 3 is the illustration of data flow in the extended network stack with AYAME. In the rest of this section, summaries of each module will be described.

AYAME kernel-side Data flow
Figure 3: Data flow of AYAME kernel

MPLS-CORE/MPLS-LDPs

Only MPLS specific functions are provided by the MPLS-CORE and MPLS-LDPs modules, while all functions that need interaction with any other layer are located in other modules.

The MPLS-CORE mainly provides MPLS style packet forwarding, "label swapping." Because this module is designed symmetrically for input to and output from other modules, it treats both MPLS-NETWORKS(s) and MPLS-DATALINK(s) in the same manner. This module consists of two sub-modules, the Label Swapping Engine (LSE) that manipulates the label swapping and the label stack processing for each labeled packet, and the Label Information Database (LID) that maintains the system-wide information related to MPLS label swap processing such as the Label-FEC binding.

The MPLS-LDPs provide some support functions to support Label Distribution Protocols, such as the functions to allocate labels and to preserve the label space consistency. Because the MPLS-LDPs would be marshaling the information from LDPs, the LID at MPLS-CORE keeps summarized information, which is enough to process a received packet. Figure 4 shows detail of MPLS-CORE. Detail of MPLS-LSPs' module internal structure is described below.

Detail of MPLS-CORE
Figure 4: MPLS-CORE and MPLS-LDPs

MPLS-NETWORK

AYAME provides one module per network layer contained in the system. The modules are generically called MPLS-NETWORK. (E.g., the MPLS-IPv4 manipulates IPv4 specific operations, and the MPLS-IPv6 performs IPv6 specific ones.) It provides a kind of buffer function that canonicalize a data structure between any network layer and MPLS-CORE; moreover, it also provides the network layer specific functions to process MPLS label swapping, such as FEC analysis of each packet.

Detail of MPLS-LDP
Figure 5: MPLS-IPv4 (an example of MPLS-NETWORK)

Figure 5 shows the detail of MPLS-IPv4, which is one of MPLS-NETWORK modules to support the IPv4 network layer.

As an example, consider an operation that a packet, transmitted by any datagram from an upper layer or by the IP forwarding, and forwarded to a Next Hop node. In this operation, the IP packet is outputted from an interface connect with the Next hop node, by the ip_output() function. If the MPLS ingress node manipulates this process, to decide which it is needed to put the transferred packet as a labeled one in the MPLS domain, it can hook in the middle of the ip_output() function and perform the following sequence:

  1. Analyze the network layer information of the packet and decide a FEC corresponding with it.
  2. Lookup a FEC-to-NHLFE (Next Hop Label Forwarding Entry) mapping in the LID. If a corresponding entry is found, the packet should be passed to the MPLS module, or it should be sent back to the original IP stack.
  3. Return the result.

If it does not have to deal with the packet as a labeled one, it can continue the process without any change of the existing semantics. The processing units that actually execute functions related to MPLS are located in the MPLS-IPv4. For this reason, the alternation of the existing IP structure can be minimized because this function requires no change of the codes but addition of a single hook.

MPLS-DATALINK

AYAME also provides one module per datalink layer contained in the system, as same as those for the network layers. The modules are generically called MPLS-DATALINK. (E.g., the MPLS-Ethernet manipulates Ethernet specific operations, and the MPLS-ATM performs ATM specific ones.) Each module provides a kind of buffer function that canonicalizes a data structure between any network layers and MPLS-CORE; moreover, it provides also the Datalink layer specific functions to process the MPLS label swapping, such as one to compose a packet from a packet payload and an MPLS label stack.

Detail of MPLS-Ethernet
Figure 6: MPLS-Ethernet (an example of MPLS-DATALINK)

Figure 6 shows the detail of MPLS-Ethernet, which is one of the MPLS-DATALINK modules to support Ethernet interfaces. This module provides some Ethernet specific label processing such as label stack encoding and decoding for the Ethernet type media defined in [3] and queue control of output interfaces. This module also provides some of queue management support functions described below.

Remarkable features of AYAME

This section describes remarkable features of AYAME kernel.

FEC/Label support mechanism

The MPLS architecture must use either a destination network address with a prefix length or an explicit destination address as a basic FEC because both are able to represent the minimum restriction that can provide the same semantics as the current Layer 3 forwarding.

The binding of this FEC and a corresponding label is distributed in a MPLS domain by the LDP [10]. Also, CR-LDP [11] is defined as the label distribution protocol that can handle more detailed FECs.

In the case of using MPLS with the purpose of traffic engineering, that is, to manage network characteristics flexibly, each ingress LSR needs an efficient mechanism to map each packet to a particular FEC. And since what we pursue is the research environment related to MPLS, it is probably important to prepare a support mechanism that will be required in case of the development and operation of yet another label distribution protocol to propagate some specific restriction within an MPLS domain.

Generally, an FEC in a broad sense is resulted from the combination of (1) a specification description related to the packet processing inside a node, such as packet transfer; and (2) a representation of characteristics of the processed packet. Therefore, flexible packet handling needs the following features:

AYAME introduces the following structures to meet these requirements:

Packet classifier mechanism

To treat a particular data flow running on the Internet flexibly, in case of wanting to select an FEC according to the behavior either in a TCP session or on each protocol, it is sometimes wished to use the following optional information for selection of an FEC:

We would like to call these FEC "Extended FEC" to distinguish between the existing FEC that consists of only some Layer 3 information.

Packet CLassifier flow of AYAME
Figure 7: Packet Classifier Mechanism

The mechanism to classify packets is designed using these conditions defined over several layers and implemented to AYAME (Figure. 7). This mechanism consists of several classifier units that cascade each other. Each classifier unit has a rule base to identify a packet, and outputs the corresponding FEC identifier (FEC-id) if the database has an entry that matches an incoming packet. Once an FEC identifier value is decided, it will never be overwritten by other units; the first FEC identifier will be taken priority over any later value.

Creation of paths that accept FECs from the user space explicitly is intended, although implementation has not been finished yet. This feature will make more manifest FEC operation possible.

Multiple Label Distribution Protocols support

MPLS architecture allows multiple entities that handle label distribution protocols on a single system if uniqueness in the label space can be kept. In this case, independent development and operation are possible (only) under the guarantee that each entity does not have to exchange information with another, and works without recognition of the existence of others. AYAME is designed on the assumption to locate multiple entities for label distribution protocols, such as daemon, in the user space. This mechanism consists of two modules, "LSR Capability Negotiator" and "Label Allocator." Unfortunately, these mechanisms have not been implemented yet, but we will implement them in the near future.

Detail of LDPs support mechanisms
Figure 8: Label Distribution Protocol support

LSPs need to recognize a set of LSRs' capabilities to spread an Extended FEC(s), because each LSR's capabilities provide the range that an FEC in the wide sense can represent. Such operation is not required when a FEC is expressed by standard LDPs and CR-LDPs, as it is the MPLS default behavior. In AYAME, the mechanism for the LDP treating extended FECs to get capabilities of an LSR, in which this LDP is working, is called the "LSR Capability Negotiator." If the LSR supports more than basic capabilities, it keeps its own capabilities about the following functions:

And it responds those capabilities according to a request. Each LDP requests in the way the LSR can put them into practice.

Also, when multiple LDPs exist in a single LSR, it is required that the label space be kept consistent. The "Label Allocator" controls available labels in the kernel, and allocates a unique label in the system according to a request by a LDP.

CoS processing Support

In MPLS architecture, transfer related behavior is specified within the capacity that an FEC can represent, and an LSP is established according to the content of the FEC. Therefore, the transfer behavior in the particular MPLS domain is fulfilled as long as it goes along the LSP corresponding to the FEC representation, although the actual label transfer is performed only by a simple label swapping action. (Of course, each LSR must be capable of satisfying the CoS and/or QoS constraint.)

About the way to represent the CoS constraints in MPLS, it is currently suggested to reflect the restrictions defined in Diff-Serv [1] at the point of forwarding, that is, the method which makes it possible to involve an MPLS domain as a part of a Diff-Serv domain [6].

But an LSR that processes MPLS label swapping can not refer directly to the CoS specification of Diff-Serv in the scope of MPLS processing because it is encoded in the Diff-Serv code point (DSCP) field in the IPv4/IPv6 header. Therefore, a process is required to reflect a DSCP in an IP header of a Diff-Served packet in an MPLS label (or a label with additional information) (E.g., in Ethernet, it is defined using the label field of the Experimental (EXP) field in the top label of the label stack in the SHIM header.) [6] defines the architecture, in which this process is performed at the MPLS ingress LSR with Diff-Serv capability, and the binding of a label and an FEC is propagated using CR-LDP[11], to satisfy the CoS constraint in the MPLS cloud.

In result, each transit LSR is enabled to process conforming to the Diff-Serv restriction without referring to the DSCP in the IP header directly.

To actualize the capability that satisfies the definition in [6], AYAME integrates the queue management function using ALTQ[13] in a form suited to the process of MPLS.

ALTQ extension/modification for MPLS processing

In the original ALTQ distribution codes, a packet is analyzed just before the packet forwarding, after arriving at the Datalink layer, because a mechanism located in this position can look up all information of every upper layer. The classified packet is enqueued into a particular transmit queue for each interface.

Meanwhile, in MPLS, the result of classification of a packet is unified as an FEC, which is mapped to a particular label. Therefore, because there is no need to use ALTQ to analyze a packet for the labeled packet transfer, the "Label Only" mode extension to the ALTQ classifier is added to classify only with MPLS labels. In this mode, a packet label is explicitly mapped to a CoS entity, and then the packet is processed according to the CoS.

ALTQ and MPLS-modules
Figure 9: CoS related processing with ALTQ

AYAME must absorb the differences between label stack encodings because it supports various types of Datalink layers. These ALTQ extensions and alterations are implemented in the way that needs the smallest modification of the existing ALTQ cords (Figure 9).

Exception handling on MPLS label processing

In the consideration of the MPLS process flow, some possible exceptions can be found, which are derived from label operations during the MPLS label swap or other processes. Sometimes these exceptions have to resolve using non-MPLS layer information or operation; for example, the packet fragmentation processing caused by the MTU size exceed during an LSP, or the processing of special signaling-purpose labels such as router alert label [9]. The mechanism to handle these exceptions is implemented in AYAME.

Conclusion

This paper discusses the issues that we considered when integrating AYAME, the layer to support MPLS functions, into a BSD network stack. Because the BSD derived network stacks were designed supposing only the current Internet routing model that depends on the Layer 3 information, the MPLS architecture does not fit well for the existing network stacks.

To insert the MPLS mechanism into the network codes, keeping the semantics of the existing network operations, the following step-by-step method used in AYAME is effective: (1) partition MPLS functions into sub-modules according to their categories and relationships, and (2) provide an unified interface to connect these modules. In the way described above, AYAME classifies all faculties in it under these three types of modules: Layer 2 oriented, Layer 3 oriented, and MPLS specific. Therefore, it can accomplish the purpose of integration of MPLS functions into the existing network codes, holding original semantics and with the minimum alteration.

Since we regard AYAME as a research platform, it is another advantage that the modularization in AYAME's architecture has the ability to be extended to include new functions. This modularization allowed the addition of the functions described in the latter part of this paper.

We have succeeded in gaining an MPLS-based platform, which has capabilities of CoS operations and flexibility to extend new functions, available for the purpose of research. In the future, we plan to continue the project with special emphasis on such areas as implementation of more advanced technologies. We will study the following issues using AYAME as a base platform:

Availability of AYAME package

We have a plan to provide the AYAME package with full source under the BSD style AS-IS license. It is, however, not released yet, because we are refining it to make the source more portable. We are going to open the AYAME kernel patch for NetBSD and a variety of userland module sources in the near future. Please contact us if you are interested in them.

For more information about AYAME, please see the AYAME project web page [15].

References

  1. M. Carlson, E. Davies, Z. Wang, W. Weiss. An Architecture for Differentiated Service. RFC 2475, 1998.
  2. R. Callon, P. Doolan, N. Feldman, A. Fredette, G. Swallow, A. Viswanathan. A Framework for Multiprotocol Label Switching, Internet-Draft, < draft-ietf-mpls-frameworks-05.txt >, 1999, work in progress.
  3. E. Rosen, A. Viswanathan, R. Callon Multiprotocol Label Switching Architecture, Internet-Draft, < draft-ietf-mpls-arch-07.txt >, 1999, work in progress.
  4. IETF Multi Protocol Label Switching Working Group
  5. D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus. Requirements for Traffic Engineering Over MPLS, RFC 2702, 1999
  6. F. Faucheur, L. Wu, B. Davie, S. Davari, P. Vaananen, R.Krishnan, P. Cheval. MPLS Support of Differentiated Services, Internet-Draft, < draft-ietf-mpls-diff-ext-02.txt >, 1999, work in progress.
  7. The NetBSD Foundation NetBSD homepage, http://www.netbsd.org/
  8. B. Davie, J. Lawrence, K. McCloghrie, Y. Rekhter, E. Rosen, G. Swallow, P. Doolan. MPLS using ATM VC Switching, Internet-Draft, <draft-ietf-mpls-atm-02.txt >, 1999, work in progress.
  9. E. Rosen, Y. Rekhter, D. Tappen, D. Farinacci, G. Fedorkow, T. Li, A. Conta. MPLS Label Stack Encoding, Internet-Draft, <draft-ietf-mpls-label-encaps-07.txt>, 1999, work in progress.
  10. L. Andersson, P. Doolan, N. Feldman, A. Fredette, B. Thomas. LDP Specification, Internet-Draft, <draft-ietf-mpls-ldp-06.txt>, 1999, work in progress.
  11. B. Jamoussi, editor. Constraint-Based LSP Setup using LDP, Internet-Draft, <draft-ietf-mpls-cr-ldp-03.txt>, 1999, work in progress.
  12. Y. Rekhter, Eric Rosen Carrying Label Information in BGP-4 <draft-ietf-mpls-bgp4-mpls-03.txt>, 1999, work in progress.
  13. K. Cho. A Framework for Alternate Queueing: Towards Traffic Management by PC-UNIX Based Routers, In Proceedings of USENIX 1998 Annual Technical Conference, New Orleans LA, 1998.
  14. K. Cho. ALTQ homepage, http://www.sonycsl.co.jp/~kjc/kjc/software.html#ALTQ
  15. AYAME Project web page http://shinoda-www.jaist.ac.jp/Projects/AYAME/