A Proposal to Apply ECN to Wireless and Mobile Networks

Fei PENG <fpeng@bupt.edu.cn>
Shiduan CHENG <chsd@bupt.edu.cn>
Beijing University of Posts and Telecommunications
China

Jian MA <jian.janne.ma@nokia.com>
Nokia Research Center
Finland

Abstract

This paper describes a proposed modification of Explicit Congestion Notification (ECN) for wireless and mobile networks. Transmission Control Protocol (TCP) is currently the dominant transport protocol used in the Internet. We begin by describing the problem of TCP's use of packet drops as an indication of congestion when packet losses are not due to congestion. Next, we describe the traditional ECN mechanism used in the wired environment and its limits when applied to wireless networks. Finally, we describe what modifications would be needed in the Internet Protocol (IP) and TCP of the ECN mechanism to give it wireless capability.

Contents

1. Introduction

TCP congestion control has been developed on the assumption that network congestion is the only cause for packet loss. Thus, it drops its transmit window upon detecting a packet loss. In the presence of high error rates and the intermittent connective characteristics of wireless links, this results in an unnecessary reduction in link bandwidth utilization for packet losses that are not mainly due to congestion. With the envisaged growth in the Internet of wireless and mobile networks, it becomes crucial to recommend a proposal to improve TCP congestion control protocol over systems with imperfect link properties.

Some of the research is concerned with improving transport underlying protocol stacks. For example, there have been several proposals for reliable link-layer protocols as Forward Error Correction (FEC) and retransmission of lost packets in response to Automatic Repeat Request (ARQ) messages. A network layer protocol, called Snoop protocol, is proposed to cache packets at the base station and perform local retransmissions across the wireless link. However, the main concern about these underlying protocols is that they might not be able to completely shield the sender from wireless losses. In practice, with the enhancement of TCP underlying protocols, the link error rate will still remain 10-6 bits/sec. So it is essential to provide a solution in TCP protocol stack.

Several schemes modified at the transport layer have been proposed to alleviate the effects of noncongestion-related losses on transport performance. The indirect TCP is one of the first protocols to achieve a separation of flow and congestion control of the wireless link from that of the fixed network. It does this by splitting a TCP connection between a fixed and mobile host into two separate connections at the base station. However, there are some drawbacks to this approach, such as loss of semantics, application relinking, and software overhead, etc.

As lost packets can be simply divided into congestion-related losses and noncongestion-related losses, an ELN protocol can be used to differentiate the packet loss by adding an explicit loss notification option to TCP acknowledgments when a packet is dropped on the wireless link. In practice, this algorithm is a burden to the implementation nodes because a judgment is required for each dropped packet of transmission errors. Additionally, it may be hard to determine the connection that a corrupted packet belongs to, for example, the header could itself be corrupted.

Here, we propose a new flow control mechanism that can remedy these problems. The basic idea is to provide a simple way to endow TCP with the ability to differentiate different source of packet losses.

2. Motivation

ECN proposed in [4] provides a lightweight mechanism for routers to send direct congestion notification to the source. It makes use of two experimental bits in the IP header and two experimental bits in the TCP header. When the queue length exceeds a threshold, the incoming packet is marked. When the marked packet is received, the receiver marks the acknowledgment (called an ECN-Echo) to send congestion notification back to the source. Upon receiving the ECN-Echo, the source halves its congestion window to alleviate the congestion. The window reduction is done only once in a window of packets. In the next RTT period, the window will not be increased in response to acknowledgment.

The ECN proposal obtains a part separation of congestion control and packet losses to prevent unnecessary packet drops due to buffer overflow. Because of complex condition of networks, packet drops due to buffer overflow cannot be completely prevented and ECN itself will be lost by congestion or corruption. In addition, the participating routers that are capable of RED or some other active queue management mechanism have a probability of dropping packets where this probability is dependent on average queue size.

Therefore, ECN will have to couple with a congestion control mechanism in TCP. In networks over imperfect links where packet losses are not mainly due to congestion, the unnecessary reduction of throughput will occur for the congestion window that has been halved before ECN could return. In addition, the packet sent by ECN has to transmit to the receiver first, then return to the sender, which implies that ECN has the same control loop as TCP.

All gateways must contain code for sending ICMP Source Quench (ISQ) messages when they are forced to drop IP datagram due to congestion. Source Quench is considered to be too valuable to omit from production gateways.

There are two problems for a gateway sending Source Quench: (1) the consumption of bandwidth on the reverse path, and (2) the use of gateway Central Processing Unit (CPU) time. To ameliorate these problems, a gateway should be prepared to limit the frequency with which it sends Source Quench messages. This may be on the basis of a counter (e.g., only send a Source Quench for every N dropped datagrams overall or per given source host), or on the basis of a timer (e.g., send a Source Quench to a given source host or overall at most once per T milliseconds). How to give a suitable value of N or T in practice is also a problem.

In another proposal [7], an ISQ message is generated only by the intermediate RED router when it captures a packet with the ECT bit set by active queue management. If the ECT bit is not set, the packet will be dropped whether RED chooses it or the average queue size goes above the maximum threshold. The purpose of this approach is to reduce the reaction time to congestion in the network and provide multilevel congestion feedback. However, the above two problems also exist for ISQ because a message is generated whenever ECN is set in the networks.

This paper provides an effective way to improve TCP performance in wireless and mobile networks and incorporate ECN into such an environment. When a packet is dropped due to buffer overflow or the RED routers, an ISQ (ICMP Source Quench) is generated and returned to the source of the dropped packet to initiate congestion control. The approach reduces the reaction time to congestion and retransmission of packet losses due to congestion in the network. With the ECN control, network congestion is alleviated. When the CE bit can be marked, no ISQ is required to be generated, and bandwidth will not be wasted. To make the system more secure when ECN or ISQ messages are lost, a window threshold is calculated to allow packet losses to initiate window reduction when ISQ and ECN messages are not returned. It is important to note that this method is simple to implement because it requires minimal modification of current systems.

The entire mechanism is described in section 3. Section 4 illustrates implementation issues using OPNET 5.0/5.1 modeler tool. Analysis and simulation results are presented in section 5. We give our conclusions in section 6.

3. ICMP Source Quench messages and window adaptive algorithm

3.1 The role of routers

While there are some applications/environments where it might be highly advantageous for the sender to receive some indication of congestion without having to wait a roundtrip time, this is not common. Source Quench packets add traffic in the reverse direction on what might be a congested path. Even with multilevel functioning of ISQs, the congestion window and the slow start threshold value are only halved at the TCP source. Without the corresponding reaction of the source behavior, the multilevel ISQ loss is significant. And as we see, if the threshold is set appropriately lower, ECN can also be considered effective to alleviate congestion in time. This section proposes that ISQ messages are not required to be generated if only ECN can be set by the RED or other queue mandates. ISQ messages are returned to the corresponding sources only when packets are dropped due to buffer overflow or queue management without ECN. The algorithm in the router can be described as follows:

    If the incoming message causes average queue size go above maximum
    threshold or causes buffer overflow, the packet is dropped and an ISQ 
    then sent back to the source of the incoming packet.

    If the incoming message causes the average queue to go between the
    minimum and maximum thresholds then:

        If the RED probability picks this packet then:

            If the ECT bit is set and the CE bit is not already marked
            then:

                Mark the packet (CE bit)

            Else if RED chooses this packet and ECT bit is not set then:

                Send an ISQ back and drop the packet.

Therefore, ISQ messages are not generated by the intermediate congested RED router if only that router decides to mark the CE bit. ISQs are usually not generated for a packet that has already been marked by another router. With the ECN mechanism in networks, the frequency of the generation of ISQ messages is reduced, which results in saved bandwidth and avoids implementation complexity although CPU time is no longer a constrained resource. However, when the router queue level mandates the dropped packet, an ISQ is sent back to the source regardless of whether the packet was marked previously or not. So, it represents congestion loss to initiate congestion control when ECN is disabled or when ECN is lost. This function of ISQ has been supported in the routers [RFC 1009] and because each router is required to provide a disable parameter, only configuration operation is taken to enable its function.

3.2 Support at the TCP layer

Currently, the requirements for the end host's reaction to ISQ are [RFC 1122]. The source reacts at the transport protocol level by lowering its data throughput into the network. In TCP, upon identifying the flow causing congestion, the sender reacts by halving both the congestion window and the slow start threshold value for that flow. It does not react to an ISQ message more than once per window. When multiple ECN and ISQ are returned from the networks, the source only reacts to the first one in a window. Upon receipt of the first ISQ or ECN at time t, it notes the packets that are outstanding at that time (sent but not yet acked) and waits until a time u when they have all been acknowledged before responding to a new ISQ or ECN message.

To prevent some deterioration in networks caused by the failure to initiate congestion control because of losses of ISQ or ECN, the measurement of maximum window (called Wmax) is experienced on a given connection. We expect this can change over time, and TCP should track these changes and modify its timeout accordingly. First, TCP must measure the Wmax whenever it begins to shut down the window by ISQ or ECN. We will use Mw to denote the measured Wmax. Then TCP updates a smoothed Wmax estimator using a low-pass filter

Wmax <- a Wmax + (1-a) Mw

(1)

where a is a smoothing factor with a recommended value of 0.9. This smoothed Wmax is updated every time a new measurement is made. Ninety percent of each new estimate is from the previous estimate and ten percent is from the new measurement by the first receipt of ISQ or ECN in a widow cycle.

Given this smoothed estimator, which changes as Wmax changes, we recommended the threshold Tl to determine when packet losses should call the congestion control. It is set to

Tl = B Wmax,

(2)

where B is a delay variance factor. The ISQ or ECN gives the initialized value of Wmax after the measure of the first reduction of the window by either of them.

If the initialization of the window size is caused by packet loss controlled by the threshold, the Wmax and Tl are calculated as

Wmax = a Wmax + (1-a) Tl

(3)

Tl = B Wmax

(4)

The SSTHRESH set for the window is generally half of Tl. Because the sender does not reduce the window more than once per window, the following ISQ or ECN message does not change the transmission rate until the outstanding data before the sender initiates congestion control by a packet loss when the reach of the threshold has been acknowledged.

4. Implementation issues

In this section, we describe the implementation of TCP response to this mechanism in our simulator. The earliest Tahoe TCP is used without incorporating Fast Retransmit and Fast Recovery congestion control. For the simulation described in this paper, the gateway set ECN bit in the packet as an indication of congestion when buffer size exceeds a threshold. For the implementation of ISQ, packets are dropped when buffer occupancy exceeds the prefixed threshold and ISQ messages are generated and returned to the corresponding source.

First, we briefly describe the Slow-Start, Congestion Avoidance, and Retransmit Timeout algorithm in TCP. There are two phases to the window adjustment algorithm. The connection begins in the Slow-Start phase, and the current congestion window CWND is doubled each roundtrip time until the congestion window reaches the Slow-Start threshold, SSTHRESH. Then the congestion avoidance phase is entered, and the congestion window is increased by roughly one packet each roundtrip time. In the Tahoe version of TCP, the source reacts to a packet loss by setting SSTHRESH to half the congestion window, decreasing the congestion window to one packet, and entering the Slow-Start phase. In contrast to the Fast Retransmit algorithm in Reno, the source uses retransmit timers to detect lost packets. All transmitted data are acknowledged by the receiver, and a lost packet is signaled by the expiration of a timer at the transmitter with the arrival of duplicate acknowledgments. Multiple lost packets in a congestion window are recovered by multiple backoff of Retransmission Timeout.

Instead of modifying the TCP behavior of detecting packet losses based on timeout, ISQ messages accelerate the recovery phase of lost packets by reducing the backoff of the timeout value. That is, if an ECN or ISQ message is received in a congestion window, the Retransmission Timeout set for the following packet losses in the same window will not be doubled.

When there is no ISQ and ECN coming from the networks and Tl is reached by the adaptive window algorithm according to (1), (2), (3), (4), the TCP source shields the congestion window and recovers the following lost packets without multiple backoff of Retransmission Timeout. In our implementation, we briefly considered dropped ECN due to congestion, so more consecutive window adaptive algorithm is required.

First, we used 0.7 as the coefficient of B until the calculated Tl reaches the minimum size of one packet. Then, B is increased by 0.1 for each calculation of Tl. When Tl reaches maximum window size, B returns to the value of 0.7.

Second, when the current window size is shut down by the calculated maximum window size (Tl), we set SSTHRESH to the same value of CWND, and only the Congestion Avoidance phase is included in the roundtrip time.

5. Analysis of simulation results

5.1 Simulation configuration


Figure 1. Simulation network

This section discusses the results of simulations of TCP in networks with imperfect links. The simulation network in figure 1 consists of five FTP connections with an unlimited amount of data to transfer. The five clients connect to the network with transmission rate and propagation delay of connection 1 as 2 Mbits/sec and 1 msec; connection 2 as 2 Mbits/sec and 40 msec; connection 3 as 3 Mbits/sec and 2 msec; connection 4 as 3 Mbits/sec and 50 msec, and connection 5 as 4 Mbits/sec and 2 msec. The other links between the intermediate two routers and from the router to the server have the rate of 10 Mbps and 100 Mbps separately and their propagation delays are 40 msec and 1 msec each. The link with a transmission error of 1.0 x 10-6 is set between two routers. The congested router is configured at router1 with an IP forwarding rate of 3800 packets/sec.

We compare several sets of simulations to traditional TCP configurations. In the ECN+loss configuration, ECN cooperates with packet losses to initiate window reduction. When router 1 is configured as thresh of ECN 62 packets but with small buffer capacity (70) which causes multiple drops due to congestion, we denote it as loss-ECN. At this time, we generate ISQ messages at the router or use the window adaptive algorithm described in section 4.2 to improve its performance, denoted as the ICMP+ECN and ECN+algorithms separately. In another configuration we use large buffer size (123) with the same ECN or ISQ thresh 62 packets. For ECN, this ensures that no congestion loss occurs. Also, only ECN is used in such an environment to initiate congestion control. Packets are dropped when the buffer occupancy exceeds the threshold; simultaneously, ISQ messages are generated and returned to the corresponding sources.

5.2 Analysis of simulation results


Figure 2. Throughput versus time for multiple connections

The graph in figure 2 shows throughput of three sets of simulations as Loss-ECN, ECN+ISQ, and ECN+algorithm. The throughput of TCP, ECN+loss, and ECN are shown in figure 3. The average throughput is calculated as sent sequence number / simulation time. We perform the simulation for 300 seconds, and investigate that ECN with packet losses to initiate congestion control does not improve TCP throughput in wireless/mobile networks. However, ECN with no congestion losses greatly enhances performance because it shields transmission errors from reducing the sending rate at the source (figure 3).


Figure 3. Throughput versus time for multiple connections

In figure 2, the throughput of ECN with small buffer capacity (70) deteriorates significantly although it has the same thresh (62) as ECN with large buffer size (123). Because of the burst nature of the TCP congestion control protocol, especially in wireless and mobile networks, the recovery of one loss due to transmission errors will acknowledge all following successfully transmitted data and thus cause a burst of packets to be sent to the network. In the Tahoe TCP, packet losses are determined by retransmission timeout, and the burst of losses needs a long recovery time which thus empties the buffer before all the losses are recovered. Because buffer occupancy returns to zero after each burst of overflow, network congestion is mainly due to each burst size in a window, which greatly depends on window size. If the burst size is larger than the buffer capacity, then buffer overflow will occur. Therefore, the next burst occurs when the window increases until link errors occur. If the window size is allowed to be large enough, the lower buffer capacity with a very high ECN thresh will result in a burst of packet losses. Recovering multiple packet losses caused by buffer overflow enlarges the roundtrip time of the network and influences TCP performance greatly. ECN itself may lose by congestion, which induces failure of window reduction and even heavier congestion in the networks (compare figures 2 and 3).

Because buffer overflow is mainly due to burst of losses as described, and because of the limited processing time of the gateway CPU, generally ISQ cannot be generated when losses occur so quickly. Therefore, even with the loss of ECN, ISQ added to the ECN mechanism has approximately the same number of congestion losses as ECN (see figure 4). Because of this, the average total throughput obtained by ISQ added to ECN is not improved. (See figure 2. The total throughput obtained by ICMP+ECN is 358.1688 and by loss+ECN is 360.232.) We obtain this result by configuring one congested node in the network. And because of the random occurrence of transmission errors, the burst of buffer overflow of ICMP + ECN does not happen simultaneously as ECN (figure 4) in the same node.

However, with our window adaptive algorithm, network congestion is greatly alleviated and throughput of loss of ECN is improved significantly. Figure 4 shows that at the beginning of the simulation, only the Slow Start phase was included, with more aggressive window increasing the algorithm until it reached maximum window size or encounters congestion. The window increase resulted in heavy congestion losses. Because of the heavy loss of ECN at this moment, the window adaptive algorithm is initiated with the reduction of maximum window size Th by the coefficient of 0.7 in each window epoch. ECN is not returned until the window Th reaches minimal size. It then increases with a very slow rate of the coefficient of 1.1. Therefore, no buffer overflow occurs at this phase. After this, TCP enters into the Congestion Avoidance phase, and because the burst size also relates to the window increasing algorithm, ECN loss does not occur as frequently in this phase as in the Slow Start phase. Because the window cannot be reduced by the adaptive algorithm with ECN returned, buffer overflow may occur because of the large window and low buffer capacity. However, even in such a case, the burst of losses is reduced because some losses of ECN cause the readjustment of the maximum window size by the window adaptive algorithm (see figure 4). The burst loss of ECN + algorithm does not exceed 25 during the Congestion Avoidance phase; however, ECN or ECN + ICMP often exceeds 25.


Figure 4. Buffer overflow versus time at router 1

Unlike ECN, we use the ECN threshold to determine when a packet should be dropped and generating ISQs. To avoid burst of losses due to low buffer capacity, we also set a high buffer size (123) with a low threshold (62) as ECN. Because the packets losses selected generally belong to the connections that gain the most network resources, it shuts down all connections simultaneously, so the throughput of links with higher propagation delays will not be reduced significantly [11]. Therefore, the fairness performance is improved compared with ECN. Because it also reduces the reaction time to congestion and quickens the retransmission of lost packets, it reaches nearly the same total throughput as ECN without congestion losses (as shown in figure 3, the total throughput for ISQ is 625.242 packets/sec; for ECN with no loss it is 627.121).

6. Conclusions and future work

Disable packet loss to invoke window reduction by ECN without congestion losses could improve TCP performance significantly in wireless and mobile networks. When ECN losses are heavy because of buffer overflow, the additional threshold set for initiation of congestion control in time prevents multiple packet losses of congestion. It simultaneously enhances the throughput performance by shielding packet losses from invoking congestion control. In wireless and mobile networks, buffer congestion is mainly caused by bursts of data. It is difficult to generate ISQ messages on these lost packets because of the CPU time limits. ISQ added to ECN to represent lost ECN or data to initiate congestion control could not reach its aim. However, ISQ messages in systems where ECN is not supported are proposed. This would not only prevent packet losses from initiating congestion control, but could also avoid sending congestion signals caused by transient traffic. It could also desynchronize sender windows to reduce the queue length, prevent buffer overflow, and improve fairness performance through active queue management as well. Because all gateways must contain code for sending ISQ messages when they are forced to drop IP datagrams because of congestion [RFC 1009], it is only required to configure the operation at each intermediate router in the networks.

To further analyze the benefits of the whole system, continuous simulations must executed in the future. These simulations would contain multiple congested gateways and two-way traffic with either support ECN or nonsupport ECN systems. Without modification of the ECN mechanism, the addition of ISQ in nonsupport ECN systems would invariably improve TCP performance in wireless and mobile networks by preventing packet losses from initiating window reduction. We also plan to investigate the extent to which it would improve TCP performance through more optimal active queue management and a better window adaptive algorithm to suit large, wide network configurations.

References

[1] Cho, K.J. ALTQ/RED Performance,http://www.csl.csl.sony.co.jp/person/kjc/red/perf.html.

[2] FEI, P., JIAN, M. "Overload control method for a packet-switched network", Patent Application NC 18254, June 1999.

[3] FEI, P., Jian, M., etc. "TCP Performance Enhancements in Wireless and Mobile Networks", International Conference on Communication Technology (ICCT'98), Oct,1998, pp.S46-07-1:S46-07-5.

[4] K. K. Ramakrishnan, Sally Floyd, "A proposal to add Explicit Congestion Notification (ECN) to IP", Internet Draft-kksjf-ecn-03.txt, Oct, 1998.

[5] Sally Floyd, "TCP and Explicit Congestion Notification", Computer Communication Review, V.24 N.5, October 1994, p.10-23.

[6] R. Braden, J. Postel, Requirements for Internet Gateways, RFC 1009.

[7] Hadi Salim,J.,etal, A proposal for Backward ECN for the Internet Protocol (Ipv4/Ipv6), June, 1998.

[8] Fei Peng, Jian Ma, "An Effective to Improve TCP Performance in Wireless/mobile Networks", draft-fpeng-wecn-01.txt, January 2000.

[9] Fei Peng, "A proposal to add Fast Congestion Notification to IP and Improve TCP Performance in Wireless and Mobile networks", draft-fpeng-fcn-01.txt, January, 2000.

[10] Fei Peng, Jian Ma, "A proposal to apply ECN into Wireless and Mobile Networks", draft-fpeng-ecn-01.txt, January, 2000.

[11] T.V.Lakshman, etal., "The Performance of TCP/IP for Networks with High Bandwidth-Delay Products and Random Loss", IEEE/ACM Transactions on Networking. Vol. 5, June, 1997.