Xipeng XIAO <firstname.lastname@example.org>
Lionel M. NI <email@example.com>
Michigan State University
In a link-state protocol, every router has complete topology information about the network. In this paper we argue that with such information, divide-and-conquer schemes can be used to make the protocol processing more efficient. As an example, we present a simple approach for a router running the open shortest path first (OSPF) protocol to automatically detect if its interfaces lead to multiple disjoint regions within an OSPF area. If such disjoint regions exist, this approach can make the routing table computation more efficient. Compared with OSPF, this approach can reduce the computation work by more than n times in ideal cases, where n is the number of disjoint within area routing regions (WARRs). Furthermore, route computations in different WARRs are independent and can be done in parallel to give an additional speedup of min(n-1, m), where m is the number of central processing units in the router. The correctness of the algorithm is illustrated. The implementation of this approach is very simple and requires no change to OSPF. This approach is especially suitable for routers in corporate networks and can be immediately applied to current routers.
The emergence of multimedia applications and e-commerce on the Web has made it necessary for the Internet infrastructure to provide Quality of Service (QoS). In order to provide QoS, routers must be able to route packets based on resource (bandwidth, etc.) availability information as well as reachability information. Such QoS Routing schemes [3, 5, 12, 13] in turn make it necessary for routers to know the complete network topology and resource information. Only link state protocols such as the Open Shortest Path First (OSPF)  and the Intermediate System-to-Intermediate System (IS-IS)  distribute topology information throughout the network. They can also be extended to distribute resource information. Therefore, link state protocols are becoming more and more widely used.
However, link state protocols are widely believed not to scale well to large networks because of their high computational complexity. For example, in OSPF, an intra-area link state change will cause the whole routing table to be re-computed. Today an Internet Service Provider (ISP) core router typically has more than 45,000 routing table entries. It was reported that there are more than 40 "route flaps" per second on the Internet backbone . Consequently, computation load on routers is extremely high. It was observed that Cisco's 7500 series routers could be 98% central processing unit (CPU)-busy. QoS Routing makes computation demand even higher because, even without topology changes, resource availability changes may reach the point that previously computed routes are not valid and must be recomputed [3, 5, 12]. Therefore, it is critical to reduce the computational complexity of link state protocols.
Today, high performance routers such as Ascend's GRF IP Switches , Cisco's GSR Routers (12000 series) , and Juniper's M40 routers  typically consist of multiple forwarding cards interconnected by one or more high-speed switch fabrics. A control card is used to do all the control processing like routing protocol processing, Signaling Network Management Protocol (SNMP), etc. It then downloads the forwarding table and other control information to the forwarding cards. With so many control functions and forwarding cards (and thus many protocol peers), the control card does not have much processing power left to meet future challenge. Brute-force improvements, such as using 400 MHz processors on the control card, will soon reach the physical limit and cannot be counted on. Any approach that can reduce the computation load of the routers can improve the performance and stability of the Internet.
In this paper we present a novel point: in link state protocols, because the complete topology of the network is known, divide-and-conquer schemes can be used to make protocol and other control processing more efficient. We present a simple approach for a router running the Open Shortest Path First (OSPF) protocol to automatically detect if its interfaces lead to multiple disjoint regions within an OSPF Area. If such disjoint regions exist, this approach can make the routing table computation more efficient. This approach requires no change in OSPF, and can be immediately applied to current routers.
The organization of the rest of this paper is as follows. We give a brief introduction to OSPF in Section 2. In Section 3, we present an efficient routing table computation approach for OSPF and illustrate its correctness. The parallelism in this approach is explored in Section 4. The performance enhancement is shown in Section 5. Finally, we summarize the paper in Section 6.
For routing purposes, the Internet is divided into multiple Autonomous Systems (ASs). An AS is a collection of networks and routers under a single administrative authority. Between ASs, the Border Gateway Protocol (BGP) is used to compute the routing table. Within an AS, OSPF or IS-IS is used. We focus on OSPF in this paper. A similar approach can be applied to IS-IS.
In OSPF, an AS is divided into multiple Areas. An AS with four Areas is shown in Figure 1 (Figure 1 is taken from the OSPF Request for Comment [RFC]), where the cost of each link is 1 unless specified differently. OSPF distributes the attribute (i.e., state) of every link to all routers inside that Area. Such topology information is stored in the Link State Database (LSDB) of each router. To compute routes to other networks and routers, the computing router treats every network/router as a vertex of a graph, and every link as an edge. It then computes the shortest paths (called routes) to all the vertices in the graph using Dijkstra's Algorithm. After the initial computation, if a link in an Area changes, every router inside that Area must re-compute routes to all the destinations, inside and outside the Area. For example in Figure 1, if the link between router RT1 and network n1 goes down, RT1 must re-compute all routes including those to networks n3, n6, and n9. If a link outside the Area changes, a router only updates the routes affected by this change. There is no need to re-compute the whole routing table .
The idea of our approach is based on the following observation. In OSPF, when there is an intra-area link change, it is necessary for every router inside that Area to re-compute all routes in its routing table. However, if the topology of the network is somewhat special, then some of the work may be saved. For example in Figure 1, if link RT1-n1 goes down, RT1 does not have to re-compute routes to networks n3, n6, and n9. We define the Routing Region (RR) of an interface on router R as all the networks and routers reachable from that interface without going through other interfaces of R. For RT1, the RR of interface if2 consists solely of network n1; the RR of if1 consists of all the routers and networks except n1 and RT1 itself. When link RT1-n1 goes down, the reasons why RT1 does not have to re-compute routes to networks n3, n6, and n9 are: 1) link RT1-n1 is in the RR of if2; 2) networks n3, n6, and n9 are in the RR of if1; and 3) the RRs of if1 and if2 are disjoint. The point in this example is, if a router can detect that its interfaces have multiple disjoint RRs (Fig. 2a), and can locate the RR where a change occurs, then it can re-compute only routes in the changed RR. We show that this is possible with a link state protocol like OSPF.
An intuitive way to do this would be to add some mechanisms to a router so that it can: 1) automatically discover if some or all of its interfaces have disjoint RRs; and 2) determine in which RR a change is. For such an enhanced router, when there is a link change in one RR, if there are some disjoint RRs, computation work can be reduced because routes through other disjoint RRs need not be re-computed (Fig. 2a). However, if all the RRs are overlapped, the whole routing table must be re-computed (Fig. 2b). But even in the second case, this approach is still as efficient as OSPF, because both approaches re-compute the whole routing table. The more disjoint RRs, the more efficient this approach is. It seems that as long as the enhancing mechanisms are simple enough, this intuitive approach would work well.
But an important question is how often different interfaces of a router will have disjoint RRs. If we look at the whole AS, a destination can usually be reached via different interfaces of a router. In other words, the RRs of these interfaces will overlap somewhere in the AS. Examining the RRs of the routers R3, R4, R5, and R6 in Figure 1 clearly shows this. Therefore, the intuitive approach does not work because it will always be reduced to the OSPF approach. To limit the scope of impact of link changes, we must find a new dividing scheme other than dividing by RRs.
The key observation in solving the problem is that, although the RRs of two interfaces are likely to overlap somewhere in the AS, they are unlikely to overlap within the Area. We call the part of an RR that is within the Area as Within-Area Routing Region (WARR). It is desirable that we divide the computation work according to disjoint WARRs instead of disjoint RRs. We present such an algorithm in this section. As long as the WARRs of two interfaces are disjoint, no matter whether the RRs are overlapped outside the Area or not, the algorithm can still limit the scope of impact of a link change to mostly within the changed WARR.
Formally, the WARR of an interface of router R consists of all the routers and networks within the Area that are reachable from that interface without going through other interfaces of router R. Networks and routers outside the Area are completely ignored in defining WARR. For router RT1 in Figure 1, the WARR of interface 1 (if1) consists of networks n2, n3, and n4 and routers RT2, RT3, and RT4; the WARR of if2 consists solely of network n1. The WARRs of the three interfaces of RT6 are shown in Figure 1. Note that an Area is divided into WARRs differently from different routers' perspective. Under this definition, WARRs of all interfaces on all routers in Figure 1 are disjoint. This is a strong indication that the WARR approach is a good dividing scheme. Maps of real world networks also show that disjoint WARRs are common for routers in corporate networks, and for the edge routers in ISPs .
An important property of WARR is that the WARRs of two interfaces are either disjoint or completely overlapped (i.e., identical). It will never happen that two interfaces have different WARRs but these two WARRs are partially overlapped. This is because WARR is defined on reachability. If two WARRs are overlapped, one WARR is reachable from the other. By definition, the two WARRs would be considered as a single one (Similarly, RR 1 and RR 2 in Figure 2b would actually be considered as a single RR). We call this the equivalence property of WARRs. If the WARRs of two interfaces are disjoint, routes through one WARR have nothing to do with the other WARR, therefore each interface can independently compute its part of the routing table through its own WARR. If the two WARRs are identical, just one interface needs to compute the routes. Since all interfaces share a single routing table, the other interface does not have to compute the routes. It simply uses the routes computed by the other interface.
Assuming that a router has mechanisms to automatically detect whether its interfaces have disjoint WARRs, and to locate the WARR where a change occurs, our approach can be summarized conceptually below. The mechanisms themselves are described in Section 3.3.
Compared with OSPF, Step 2 only uses topology information of the changed WARR to compute the routing table while OSPF uses topology information of the whole Area. Since RRs can overlap outside the Area for disjoint WARRs, special care must be taken to ensure that the computed routes are indeed shortest. This is done in Step 3. Before detailing Step 3, we illustrate the algorithm with two examples.
Case 1: disjoint WARRs without common outside-area destinations. The two interfaces of router RT1 have disjoint WARRs, and these two WARRs do not lead to any common destinations outside the Area. In Step 2, the algorithm computes the route to network n1 through the WARR of if2, and routes to other destinations through the WARR of if1. If link RT1-n1 changes later, the algorithm only recomputes the route to n1 in Step 2. Routes to other destinations are not affected at all. Similarly, if there are any changes in the WARR of if1, the route to n1 does not need to be reserve-computed. In this case, using only topology information of the changed WARR to recompute the routing table does not affect the correctness of the algorithm. Compared with the OSPF approach, the computation work is reduced.
Case 2: disjoint WARRs with common outside-area destinations. In Figure 1, if1 and if2 of router RT6 have disjoint WARR1 and WARR2, respectively. But both WARRs lead to network n3 in Area 1. In Step 2, the algorithm will find two routes to network n3, through WARR1 and WARR2, with cost 3 and 2, respectively. Only the route through WARR2 with cost 2 should be installed into the routing table. Later, if the cost of link RT6-RT3 increases, say from 1 to 10, the algorithm will again be triggered. After the recomputation in Step 2, the agent of WARR2 will find out that the cost of the route to n3 (only through WARR2) becomes 11. Obviously, this is not the shortest path from RT6 to n3. Although WARR2 previously provided the shortest path to n3, after the change, other WARRs may have shorter paths to n3. Therefore, the algorithm must try to recompute shorter paths through other WARRs to n3. If there are indeed any shorter routes, as is true in this case (the route through WARR1 with cost 3 is shorter), the shortest one of them should be installed as the route to n3. This is what Step 3 must do.
Step 3: After Step 2, if the new route is
The rationale of this algorithm will be illustrated in the next subsection.
We use RT6 as the computing router to illustrate the idea of Step 3. The current route from RT6 to n3 consists of link RT6-RT3 and RT3-n3 with cost 2. If the cost of link RT6-RT3 increasing from 1 to 10, it will be handled by Step 188.8.131.52. The example in case 2 clearly illustrates why a recomputation is needed. Step 184.108.40.206 handles cases such as the cost of link RT6-RT5 increasing from 1 to 10. The new route of cost 12 through WARR1 should simply be discarded since it is worse than the current route. Similarly, in Step 220.127.116.11, the fact that the new route and the current route are through the same WARR means that the current route has disappeared. Therefore the new route should replace the current route. If the new route is in a different WARR, then the current route still exists. The new route should be appended.
In Step 18.104.22.168, changes in one WARR can trigger computations in other WARRs. But much work is still eliminated compared with OSPF. This is because the algorithm only recomputes routes for those destinations outside the Area whose cost has increased. The number of such routes is far less than the total number of routes in the routing table. For example, when the cost of link RT6-RT3 increases from 1 to 10, only routes to networks in Area 1 will be affected. The algorithm does not have to recompute routes to any destinations in Area 2 and Area 3. OSPF does.
In essence, the introduction of WARRs adds another hierarchy to OSPF, just as the introduction of Areas does. The difference is that Areas have to be manually configured by the network administrators while WARRs are automatically discovered by the algorithm.
The correctness of the algorithm is illustrated below. The numbers here correspond to the step numbers in the algorithm except #1.
1. For an outside-area link change, the algorithm works exactly the same as OSPF and is thus correct.
2. For an intra-area link change, the agent for the changed WARR is guaranteed to receive this link state advertisement (LSA). Compared with OSPF, Step 2 of the algorithm only uses topology information (LSAs) of the changed WARR to compute the routing table, while OSPF uses topology information of all WARRs (i.e., the whole Area). But all problems introduced by this difference are fixed in Step 3.
3.1. For destinations within the Area, by the equivalence property of WARRs, these destinations are unreachable from other disjoint WARRs. Therefore, ignoring LSAs of other WARRs will not affect the correctness of the algorithm.
3.2. For destinations outside the Area:
In OSPF, routing table computations can only be triggered by either outside-area link changes or intra-area link changes. For intra-area link changes, a changed route can only be for a destination inside (handled by Step 3.1) or outside the Area (handled by Step 3.2). For destinations outside the Area, the route can only become shorter (handled by Step 3.2.1), or become longer (handled by Step 3.2.2), or remain the same (handled by Step 3.2.3). For a new route of higher or equal cost, it can only be in the same WARR of the current route or in a different WARR (handled by Steps 3.2.2 and 3.2.3, respectively). We have shown that the algorithm can handle all possible cases correctly. Its correctness is therefore guaranteed.
In this sub-section, we discuss how to represent WARR in the data structure, how to determine in which WARR a change is, how to detect if the WARRs of two interfaces are overlapped or not, and how to handle WARR merging and splitting. These turn out to be surprisingly easy.
In OSPF, topology information of an Area is stored in the router's Link State Database (LSDB), which is composed of Link State Advertisements (LSAs). By adding the Internet Protocol (IP) address of the receiving interface to each LSA, we can distinguish LSAs of one interface from LSAs of other interfaces. All LSAs of an interface belong to the interface's WARR. If multiple interfaces share the same WARR (i.e., these interfaces' WARRs are overlapped), LSAs of all these interfaces belong to that WARR. Since changes are reflected by LSAs, we can easily determine in which WARR a change is by looking at the tagged interface address of the corresponding LSA. When there is a change in a particular WARR, only LSAs of that WARR are used to recompute the routing table. Therefore, routes in other WARRs will not be affected.
Note that, although we mention detecting the WARR of an interface and locating the WARR of a link state change many times in this paper, detecting the WARR of an interface and locating the WARR of a change are only logical concept. The mechanisms described in the previous paragraph are the actual implementation mechanisms for those purposes. More specifically, the computing agent of an interface actually does not need to know explicitly which networks/routers belong to its WARR. What the agent needs to do is to only use LSAs of its WARR in recomputing routes. This makes the implementation of the algorithm very simple.
Two overlapped WARRs will be detected when the WARR agent for one interface tries to enter a route for an intra-area destination into the routing table, but finds out that the WARR agent of the other interface has already entered a route for that destination. The fact that both interfaces can reach an intra-area destination implies that their WARRs are overlapped. By the equivalence property of WARRs, these two WARRs are identical. Therefore we can let one WARR agent (say, the one for the interface with smaller IP address) continue to compute the routes, using LSAs of both interfaces. The WARR agent of the other interface is terminated so that redundant work will not be done. After discovering that two interfaces share the same WARR, the continuing agent must immediately recompute the routes using LSAs of both interfaces. The previous works of both agents are wasted. Therefore, strictly speaking, the algorithm will do a little more work than OSPF if all interfaces share the same WARR. But this can only happen when computing the routing table for the very first time because, after that, overlapped WARRs would have been discovered. Since WARRs of different interfaces are likely to be disjoint, this should be acceptable.
There are still two subtle cases to consider: 1) the WARRs of two interfaces may merge when a link is added to an Area; and 2) a merged WARR may split into two disjoint WARRs when a link is removed. For example, if we add a link in Area 0 between RT3 and RT4, the WARRs of if1 and if2 of RT6 will merge. After the merging, two interfaces share the same WARR. This will be detected by the mechanism described above. One WARR agent will continue and the other is terminated. Note that in the case of merging, the algorithm does less work than OSPF. OSPF has to recompute the whole routing table while the algorithm only recomputes routes through the merged WARR.
The handling of WARR splitting is a bit more complex. Imagine that after the merging, link RT3-RT4 is removed again. The merged WARR will split into two disjoint WARRs. When this happens, the WARR agent will find out that the other interface is unreachable from outside the router. The algorithm intentionally ignores it for a period of no less than 30 seconds. During this period, it still treats the two WARRs as a single big WARR, as OSPF does. In fact, OSPF always treats all WARRs as a single big one. The WARR agent will use LSAs of both WARR1 and WARR2 in recomputing the routes. There is nothing wrong with this except that some unnecessary work may be done. If the two WARRs do not merge back at the end of this period, another WARR agent will be created for the new WARR. From now on, each WARR agent computes its own routes using only LSAs of its own interface. Unnecessary work is then avoided. The reasons why the algorithm ignores the splitting for a period of no less than 30 seconds are: 1) LSAs of the WARRs are guaranteed to be received by the corresponding interfaces after 30 seconds ; 2) the two WARRs may merge again during this period if the link failure is temporary; and 3) the algorithm performs no worse than OSPF when the splitting is ignored. A simpler implementation is to ignore WARR splitting totally. The algorithm performs no worse than OSPF anyway.
The implementation of our approach is summarized below:
Dividing an Area into disjoint WARRs has more advantages than just reducing the computation work. Because routes through different WARRs have nothing to do with each other, they can be computed in parallel, as described in Section 4.
Plenty of parallelism exists in this routing table computation algorithm. First, since routes through different WARRs are mutually independent, if multiple link changes happen at about the same time in different WARRs, they can be processed in parallel. For example, for router RT1, if links RT1-n1 and RT1-n3 change at about the same time, they can be processed by two agents in RT1 simultaneously. Second, in Step 22.214.171.124, after replacing an existing route with a new one with higher cost, agents for other disjoint WARRs will be triggered to search for shorter paths. These computations can be done in parallel. For example, for router RT6, when the cost of link RT6-RT3 increases from 1 to 10, the new route to network n3 through WARR2 will have a higher cost than the current route. The agents for WARR1 and WARR3 of RT6 can be triggered to search for shorter paths through WARR1 and WARR3 simultaneously (Step 126.96.36.199). If the router has multiple CPUs on the control card, multiple WARR agents can work in parallel.
In a multi-threaded Operating System (OS), a thread can be used for each WARR agent. Overhead to create a thread is low. Different threads share the same address space where the routing table and the LSDB are stored. In this algorithm, threads of different WARRs do not have to communicate with each other. Therefore, a parallel implementation of the algorithm will be simple, have high speedup and very low overhead.
Performance improvement of this approach comes from two sources: the efficiency and the parallelism.
Efficiency comes from avoiding unnecessary work. By dividing an Area into multiple disjoint WARRs, we limit the scope of impact of an intra-area link change to only within the WARR and to outside-Area destinations that are reachable from that WARR. If an Area is divided into n disjoint WARRs, the size (defined as the total number of networks and routers) of each WARR will be approximately 1/n the size of the Area (an ideal case in our algorithm). For intra-area link changes, the Dijkstra's Algorithm used in OSPF has a time complexity of O(s*logs), where s is the size of the Area. The speedup of our approach, gained from reducing the number of networks and routers involved (from an Area-size down to a WARR-size), can be more than n. Although disjoint WARRs can lead to common destinations outside the Area, which makes it necessary to compute routes through other WARRs for those destinations whose cost increased (Step 188.8.131.52), the number of changed routes is usually far less than the total number of routes. Even less is the number of routes with increased cost. Therefore, the performance improvement is still significant in this case.
Plenty of parallelism exists in our approach: 1) multiple intra-area link changes in different WARRs can be processed simultaneously; and 2) when a route with higher cost replaces the current route (Step 184.108.40.206), different agents can simultaneously search for shorter paths through other WARRs for that destination. The number of agents running simultaneously can be as high as n in the first case, and n-1 in the second case. If the router has m CPUs, the speedup of parallel processing can be as high as min(n-1, m).
Combining improvements from both efficiency and parallelism, the best case performance improvement of our approach is n*min(n-1, m).
In real world, the plethora of redundant routes for ISP core routers makes disjoint WARRs less likely. However, disjoint WARRs are common in corporate networks, as shown by their network maps . Therefore, the approach presented in this paper is more useful for corporate networks than for ISPs. The approach can also be useful for ISP edge routers. Given that the implementation of this approach is simple, and that the improvement can be significant, this approach is useful, even if it is only used for corporate networks.
To the best of the authors' knowledge, there is no previous work in using the complete topology information of link state protocols to reduce their computation overhead. Some remotely related works include the Zebra Project  in Japan and the Multi-threaded Routing Toolkit (MRT) Project  of Merit Inc. In the Zebra Project, routing daemon Gated is implemented with multiple processes, one per protocol. The MRT project is similar except that a thread is used for each protocol. These approaches do not intend to provide much performance improvement over the monolithic Gated. They are mainly for software modularity and scalability.
This paper presents a novel point (i.e., in a link state protocol), because the topology of the network is known, divide-and-conquer schemes can be applied to make the processing more efficient. As an example, we present a simple approach for a router running OSPF to automatically detect if its interfaces have multiple disjoint WARRs. If such disjoint WARRs exist, this approach can limit the scope of impact of an intra-area link state change to mostly within the changed WARR. More specifically, only routes through that WARR, not all the routes in the routing table, need to be re-computed. Compared with OSPF, which recomputes the whole routing table, this approach can reduce the computation work by more than n times in ideal cases, where n is the number of disjoint WARRs. Furthermore, route computations in different WARRs are independent and can be done in parallel to give an additional speedup of min(n-1, m), where m is the number of CPUs in the router. The correctness of the algorithm is illustrated.
The implementation of our approach is very simple, as summarized at the end of Section 3.3. It requires no change to the OSPF Protocol itself. Routers using this approach can immediately improve their performance. The improvement does not have to wait until other routers also use this approach. This approach is especially suitable for routers in corporate networks.
Divide-and-conquer approaches can be applied to other link state protocols such as IS-IS and QOSPF to reduce the computation overhead. With the coming of QoS Routing, link state protocols will become widely used to provide the much needed topology information. Approaches that can reduce protocol-processing overhead, such as the one presented in this paper, will be highly demanded.
The authors would like to thank John Moy and Ted Stockwell of Ascend Communications for their help and comments.