QoS provision to QoS-unaware applications on IntServ networks

Authors

Nicola Ciulli (n.ciulli@cpr.it)
Consorzio Pisa Ricerche - META Center
Italy

Stefano Giordano (giordano@iet.unipi.it)
Telecommunications Networks Research Group, Dept. of Information Engineering - University of Pisa
Italy

Gianluca Insolvibile (g.insolvibile@cpr.it)
Consorzio Pisa Ricerche - META Center
Italy

Maurizio Molina (maurizio.molina@cselt.it)
CSELT
Italy

Abstract

A number of network models have been proposed in order to bring QoS capability into IP networks. The success of such solutions is strongly tied to the deployment of new tools able to let legacy (QoS-unaware) IP-based applications exploit the new quality services. This article describes the studies, implementations and experimental results of University of Pisa work in this framework.

Table of contents

Introduction

    In the last years there has been an increasing interest in the provision of Quality of Service (QoS) at IP level, and a number of architectural models have been proposed (and in some cases implemented) for that purpose; the most discussed and complete (in terms of standardization) are the Integrate Services over the Internet (IntServ) and Differentiated Services (DiffServ) models, developed by, respectively, the IntServ and DiffServ IETF Working Groups. The ultimate aim of such efforts is to allow real-time multimedia applications (e.g. video conferencing, on-line video streaming) to work over the Internet, since the standard behavior of current IP networks is the simple shipping of data with no end-to-end delay or loss guarantees (Best Effort service).
    The proposed models cover many of the aspects of IP-QoS provision, but some facets are still pending; one of these is a complete interface between the new, QoS-capable, IP networks and the large pool of non-QoS-aware applications. The absence of such interface is big obstacle to the successful deployment of the above-mentioned models. Each of these models has its own features and needs its self-tailored interface to the applications.
    We chose to concentrate our efforts on the problem of interfacing legacy applications to IntServ networks, being this model the most complete and standardized, and likely to have a role also in DiffServ solutions (which promotes the usage of IntServ “stub islands” as access networks to the DiffServ-based core).

Elements of a QoS-capable IP network

University of Pisa IntServ-over-ATM trial network

    At University of Pisa, an implementation activity in the framework of ACTS project PETERPAN [ 1 ] and MAESTRO [ 2 ] led to the development of a prototypal  IntServ-over-ATM network (that is, an IntServ trial whose routers have RSVP-to-ATM interoperability functions: e.g., RSVP signalling mapped into the ATM signalling).
    This trial allowed us to experience a wide range of problems related to the building of enhanced IP services (i.e. the Controlled Load Service (CLS) [ 3 ] and the Guaranteed Service (GS) [ 4 ]), from the very low-level and core elements (the single RSVP router with its enhanced Traffic Control [ 5 ] components) to the high-level (edge) parts (the user access to the service). The IntServ-over-ATM trial is depicted in  Figure 1 .


Figure 1 - IntServ-over-ATM trial layout

    The basic "bricks" of such enhanced services are the Traffic Control components (Admission Control module, packet classifier, packet scheduler, policer), whose configuration is achieved, according to the IntServ specifications, through a signalling protocol for IP. The most famous (and most fitting into the IntServ model) is the Resource reSerVation Protocol (RSVP) [ 5 ].
    In our trial, these components are implemented in the so called RSVP-ATM Hybrid Edge Device (HED) [ 6 ]. This device is both an RSVP router (with IntServ-compliant Traffic Control) and an RSVP-over-ATM network access router. The two HEDs make up the IntServ QoS network of our trial. The rest of the trial is made up of a number of IP subnets (based on Ethernet or ATM network technologies; in the latter case the IP-ATM interworking protocol is the Classical IP over ATM - RFC1577 [ 7 ]).

Basic steps to the QoS-provision for QoS-unaware applications

    In order to make the QoS-unaware applications (running on peripheral hosts) able to exploit the QoS offered by IntServ networks, we identified some fundamental steps:
  1. provide the legacy applications with a user-friendly interface to the RSVP signalling.
  2. The IntServ model requires the sender application to provide the network with a traffic characterization (a traffic specifier, or TSpec), which will be possibly used by the receiver(s) in order to define its (their) QoS requirements; thus the sender hosts need a tool which produces an automatic traffic characterization.
  3. The IP protocol unreliability and absence of congestion-avoidance capabilities led to the definition of the TCP, which ensures a sequenced and reliable data transfer and congestion control through a closed-loop between the sender and the receiver. This approach conflicts with the IntServ model, where the network has its own congestion avoidance and control. Real-time applications (which suffer less from packet loss than from untimely delivery or rate reduction) should use the (open-loop) UDP on QoS networks.
    A very basic assumption is that we dealt with unicast flows only, since most of the problems which rise up with the specification of the connection ( see ) or with the use of TCP connections ( see ) are not present in the framework of multicasting (based on UCP).

RSVP signalling interface

    The best way to bring a quality service to a given application is to upgrade it with an built-in interface to RSVP (using the RSVP API [ 8 ]), since the RSVP manager would have a complete knowledge about the ongoing flows. However, most of the applications which would benefit from a QoS support don't have this RSVP interface and are not likely to be upgraded in such a direction.
    Thus, the end-host must be equipped with a separate tool to manage the RSVP signalling. Being this tool external to the supported applications, it misses much of the information needed to manage the RSVP reservations; the user himself should provide the missing pieces of information, and these steps may be troublesome:
  1. Specification of connection: when opening a unicast RSVP session (either as sender or as receiver), the user is required to specify the TCP/UDP connection to have QoS for; the connection identifier is made up of the L4 protocol identifier and the couple of peer IP addresses and TCP/UDP ports. The user may have problems in dealing with such items.
  2. Knowledge of the RSVP signalling mechanisms: the user should be minimally involved in the their dynamics.
  3. Specification of the traffic description and reservation parameters: in the IntServ model, the sender is required to provide a traffic description, whereas the receiver has to declare a set of parameters (FlowSpec) which describes the desired level of QoS; a common user may not be able to know what the application needs in terms of network resources.
The remainder of this section will describe a tool which represents our proposed solution.

The Rsvp Manager Module (RMM)

    This application has two working modes: the "raw testing mode" allows the experienced user to have a complete control over the RSVP signalling, whereas the "support mode" helps the inexperienced user with a partial and semi-automatic management of the RSVP.
    The RMM software architecture (see) is made up of the "RSVP QoS module" and of the "Automatic Tspec Retrieval" module (described later on). The RSVP QoS module communicates with the RSVP daemon by means of the standard RAPI interface [ 8 ], and with the ATR module with a socket. The user-kernel boundary is crossed with the TC interface [ 5 ] and with the PCAP library (see).

Figure 2 - RMM software architecture

The remainder of this section presents RMM's "support mode" main features.

User-friendly management of the RSVP signalling

The user-friendly management is made possible by an intuitive Graphical User Interface (GUI) and by a reduced set of operations on the RSVP sessions: open / close a session as sender or receiver, manage the session parameters and switch the session between an inactive state (not sending PATH or RESV messages) and the active one.

Figure 3 - RMM main window

    There are four windows where a session can be placed, depending on whether it is a sender (TX) or receiver (RX) session, and it is active or inactive.
    An active sender session is refreshing the PATH signalling (i.e. a rapi_sender() call has been issued for it), and an active receiver session is refreshing the RESV signalling (rapi_reserve()). Inactive sessions are registered sessions currently not participating in the signalling. When an active session is turned inactive, a PATH_TEAR (if sender) or RESV_TEAR message (if receiver) is sent.

Semi-automatic specification of the TCP/UDP connection


Figure 4 - Opening a new session

    The "New session"   window requires the specification of the destination peer address, the protocol used for the connection and the remote port number. These data can be provided using the "Connections" window, where the user can get the list of the TCP/UDP connections used by the supported application or the list of connections with a given remote peer on the other end. Alternatively, both the application and the peer can be specified, or none of them.
    This module is semi-automatic in the sense that, if the search produces two or more connections, the final choice is up to the user. We noticed that (with the most common applications) if both fields are input, the list of available connections is very short (often made up by just one item) and the user's intervention is very limited.

Figure 5 - The connection window

Automatic specification of traffic descriptions

    After opening an RSVP session for a connection, the Sender Template is still missing the Sender TSpec (the flow's average rate r, its burst size b, its peak rate p, its minimum policed unit m and the flow's maximum packet size M). These parameters can be set for both a sender active and sender inactive window (in the active case, the change causes the asynchronous sending of a refreshing PATH message).

Figure 6 - Sender session management

    When specifying the Sender TSpec parameters, the user can be aided by the ATR module, which monitors the traffic on each RSVP session using a packet filter on the host's outgoing interface. The interface between the kernel-level filters and the ATR module is implemented by the PCAP (Packet CAPture) library: it helps managing the system-dependent kernel level structures in order to create / destroy packet filters and receive upcalls from them (e.g. under the Solaris operating system, the PCAP library provides an access to the bufmod STREAMS module [ 9 ]).
    The traffic trace is elaborated on-line in order to estimate the set of Sender TSpec parameters ("On-line LBAP characterization"), with a refresh interval longer than 30 seconds (the RMM periodically asks the ATR process for a new TSpec estimate).
    In case the new TSpec is "different" from its previous estimate, a new PATH message is immediately sent in order to advertise the new traffic description. The new TSpec is considered different from the old one if:

The second and third conditions are due to the fact that a "bigger" flow should be re-declared soon in order not to be affected by some enforcement points, whereas, if the flow is "smaller", the advertising of the new description can wait some time, in order limit the RSVP signalling traffic.

Semi-automatic specification of the reservation

    RESV messages are filled using the following windows. FilterSpec(s) and Receiver TSpec(s) are automatically loaded with the information contained in PATH messages. A session can start sending RESV messages only after a PATH event upcalled for it.


Figure 7 and 8 - Session management as receiver in the "support" and "raw testing" mode.

    The default reservation style is the Wildcard Filter (WF) in case of a single sender, and the Fixed Filter (FF) in case of multiple senders. When using FF or Shared Explicit (SE) styles, the user can specify separately each FilterSpec (and FlowSpec, too, in case of FF). The Specifiers of BE flows can be excluded from the list.
    The IntServ class can be CLS or GS; if GS is selected, two more parameters (RSpec: R and S) have to be specified.
    In the "support mode" the whole FlowSpec is represented by two parameters: "bandwidth" and "delay".
    When using the GS class, there may be two possible meanings for the term "bandwidth": the flow's average service rate on the routers along the path ("RSpec R") or the flow's average rate expected by the network ("receiver r").
    In the first meaning, the user specifies R and the r value can be obtained from the following "delay -bandwidth" relationship [ 4 ]:

where W ("delay" parameter) is the maximum end-to-end delay allowed by the user (according to [ 4 ], only the contribution of the outgoing interfaces queuing delays is considered), and Ctot and Dtot are sums - on all the IntServ nodes - of the AdSpec exported parameters C and D (which take into account the non-ideality of the packet-schedulers in the routers along the way, and other rate dependent and rate independent contributions [ 4 ]); the other parameters comes from the FlowSpec.
    If the second meaning is applied, the user specifies r and the R value can be got from the same formula. Of course, the most intuitive interpretation of the "bandwidth" term is the first one but, in that case,  we should consider that:

    Thus, we chose to let the user set the r value as the "bandwidth" parameter (flow's rate expected through the network), and the R term is got from the delay-bw formula; R is larger than r to reduce the end-to-end delay.
    Furthermore, for each Specifier, the user can choose to set the "Automatic Load from PATH" option, which will allow the session (active or inactive) to have its Receiver TSpec parameters filled automatically whenever a corresponding Sender TSpec "changes".
    The "AutoRESV" option make an active RX session reply automatically and asynchronously with RESV messages to incoming PATH messages (this allows to place fully automatic reservations).
    This set of features allows the receiver host to specify its reservation with a minimal knowledge of the application's needs; possibly, the reservation can be adjusted by throttling the "bandwidth" and "delay" sliders.
    The following picture gives a snapshot of the RMM workings.


Figure 9 - RMM overall view

Automatic on-line traffic LBAP characterization

The traffic description which should be provided by the flow's sender can be useful for the network in some Admission Control techniques and are considered by the network as the reference parameters to enforce. Thus, the flow's packets may be discarded if: The sender should take into account, also, that any packet smaller than m bytes will be considered m-bytes long by the network as far as the policing action is concerned.
    This section will discuss the matter of obtaining the TB parameters for a given flow; this problem is also known as "Linear Bounded Arrival Process (LBAP) traffic characterization".

Overview of the LBAP Characterization

    Given a traffic profile d(t), the goal of an LBAP (Linear Bounded Arrival Process) characterization [ 10 ] is to find a couple of (r, b) parameters (average sustained rate and burst size) such that the amount of data sent over any time period [ t1 ,t2 ] Í [ tS ,tE (where [ tS ,tE ]  is the traffic lifetime) is less than r·(t2 -t1)+b:

In the case of off-line LBAP characterization, the profile d(t) is assumed to be known a posteriori.
    The example shows two possible bounding straight lines over a given interval [ 0 ,tx ]. Of course, the LBAP definition requires that the condition above is satisfied over any time interval along the traffic lifetime. Thus, we may find a sub-interval  [ ta ,tb ] Í [ 0 ,tx where the (r,b) couple found for  [ 0 ,tx may not be large enough to take into account the traffic burstiness over [ ta ,tb ].


Figure 10 - Example linear bounds for a traffic

    Generally, the couples (r, b) which satisfy the previous bound are points of an r-b plot which  resembles an hyperbole whose shape is strongly dependent on d(t).
    The LBAP characterisation will produce a (r, b) plot, whose points may be seen as dimensions of a Token Bucket (TB) which the traffic d(t) is conformant to; that is, the traffic d(t) may be seen as shaped by a (r0, b0) TB regulator, (r0, b0) being chosen on the above mentioned plot.
    Once chosen a rate r, the corresponding bucket size b can be found as the greatest difference between a relative maximum and its following minimum on q(t)-d(t). Given a relative maximum M (assumed in t = tM ), we define “following minimum” the relative minimum on  [ tM, tMnext ], where tMnext  is:

That is, either Mnext is the next maximum with respect to M, or it is the last point of the q(t)-d(t) function.

An alternative way to solve the second problem is suggested by the  TB enforcement mechanisms: let's have an infinite “negative-sized" bucket (a bucket which shows the tokens debt, instead of the credit); the “debt” is reduced at a rate r and no credit is allowed (i.e. the number of tokens is never positive). The whole traffic passes through the TB and b is the maximum token debt reached along the traffic lifetime.
    Here is an example of LBAP characterization on a video-conferencing traffic.


Figure 11 - An example (r,b) plot.

It should be noted that b has a minimum bound in the maximum packet size of the traffic; which is reached and kept for r greater or equal to a given threshold rate.
    A common feature of (r, b) plots is the presence of a knee point, which can be useful choose the (r,b) couple to describe the traffic: there, an r variation produces a same order b variation, whereas, in other regions of the plot, a little error on r results in a huge b variation (and vice-versa).

On-line LBAP characterization

    The on-line LBAP characterization must produce a traffic description large enough to let the traffic pass the enforcement points, but not oversized. Furthermore, the description should be possibly renewed along the traffic lifetime, exploiting the soft-state feature of the RSVP.
    In our solution, we used the "negative-sized bucket" approach to evaluate, at a given time, the traffic burstiness. The token generation rate may be provided by the user if known (e.g. video streaming application) or may be variable (a continuously update average rate). The maximum burst is remembered for a pre-configured time and then substituted with minor bursts sizes (updated in parallel): this allows to take into account the burstiness decreasing in sufficiently long time periods (the "burst ageing time" must be properly tuned, e.g. some minutes).
    The algorithm that we implemented is summarized in the following flow-chart.


Figure 12 - An on-line LBAP characterization algorithm flow-chart.

    A list of the lowest bucket levels (i.e. highest bursts) experienced is maintained; in order to allow some distance between two levels, a new level is recorded only if it is higher than a given percentage of the previous one (remember that these levels are negative). The average rate is evaluated either on [ ts , tnow ] or on a moving window.
    This algorithm does not ensure that the resulting burst size is higher than the real one corresponding to the average rate, nor it guarantees a bound on how much the real value can be exceeded. Nevertheless, it showed to work fairly well in lots of experiments, and it has a light impact on the source host overall computational load.

QoS for TCP connections

    The use of TCP on a QoS-capable network may lead to a conflict between the two congestion control mechanisms. Some considerations can be done on TCP connections, depending on the nature of the traffic conveyed: We should highlight that the first approach is a good solution for both kinds of traffic, but the second one is necessary when the TCP source behaviour must be independent from the network conditions; e.g. because:
  1. the ACK flow in the backward direction experiences congestion (this case is discussed later on);
  2. the first approach results in a good source behaviour in a stationary period, but the source may be not compliant with the reservation or the description during the transitory one, thus resulting in packet losses (or discards) and end-to-end delay greater than the committed bound. These effects are tolerable by non-real-time traffics (it is just a short lengthening of the download time), but not by real-time traffics.
    On the other hand, the "TCP Cheating" approach can not be used with file transfers, since it turns off the TCP's data transfer reliability.

Specification of reservation parameters for TCP connections carrying non-real-time traffic

    The problem of setting Sender TSpec and FlowSpec parameters for TCP connections carrying traffic from non-real-time applications can be solved considering that the QoS perceived by such applications is basically the “transfer time”: downloading a file of size S from host A to host B in less than T seconds; thus, the flow's average rate should be at least  r = S / T.
    The source TCP has to keep a rate greater than r in order to exploit the reservation and possible amounts of spare bandwidth redistributed to the connection.
    The first point to be noticed is that the sender host (A) is not able to characterize its generated traffic: in fact, it does not have any information on the QoS expectations of the receiver B (which determines the reserved average rate) and, consequently, cannot declare the r and b values.
    In our proposal, the class chosen is the CLS. The sender host reads the FlowSpec r and b values carried in RESV messages and reacts accordingly (sending a new PATH message with the right Sender TSpec values).
    The receiver host is able to force the source TCP rate; in fact, the source TCP sends W (= min{ WC , WR })  bytes each RTT seconds (W is the transmission window, WC and WR  are, respectively, the congestion and receiver window, and RTT Round Trip Time – is assumed to be constant and measured in absence of congestion). Two basic assumptions: Being WR / RTT the average rate (on intervals no much longer than RTT), the source TCP can be forced to keep the “right pace” r, by setting WR with a value in the [RTT · r, RTT · MPC]  (where MPC is the minimum path capacity); WR ¬ 2 · RTT· r seems to be a good trade-off (considering that the flow's burst size depends linearly on WR).
    Two conditions must be met to have WR constant. First, the receiver TCP buffer must be emptied by the receiving application with a rate greater or equal to r; this assumption is reasonable, since the application is expecting to receive the data with such a rate. Second, the traffic should reach the receiver TCP buffer with very limited burst; this is allowed, as well, since any burst should have been cleared upstream along the path when the bottleneck r has been encountered by the flow.
    Then, the receiver should set the right b value in the Receiver TSpec: not too larger than needed (since the reservation might be rejected by the Admission Control mechanisms), and not smaller than the real flow's burst size (since it will be used in the traffic description and enforced at the entrance of the QoS network).
    Thus, the exact flow's burst size in the stationary period must be found. Let’s suppose a stationary situation where the TCP source sends WR  bytes every RTT seconds, at a rate rS ( > r ) (which is the sender access rate to the network). The shape of the amount of data send versus time is reported in the picture below.


Figure 13 - Example shape of the data-versus-time function

    According to what is explained in the section on LBAP characterization, the burst size can be easily calculated to be:

This is the value which to be advertised in the Sender TSpec.

Implementation of a mechanism for the provision of QoS to TCP connections carrying real-time traffic on IntServ networks

    It may happen that a real-time data flow (that is, with strict requirements in terms of end-to-end transfer delay and bandwidth) is sent on a TCP connection. One reason why this should happen is that UDP-based application may have a “malicious” behaviour, since the network has no mean to control resources utilization.
The biggest problem experienced by a TCP flow happens when the backward ACK flow finds some bottlenecks along the reverse path and suffers from losses or delay. In both cases the TCP source reduces the transmit rate, and the resulting flow is not “large” enough to exploit the reserved resources along the forward path. The ACK flow may experience congestion for two reasons: In the "TCP Cheating" solution, the TCP source is made independent on the network conditions by breaking the TCP closed loop.

The “TCP Cheating” mechanism


Figure 14 - The TCP Cheating mechanism

    The TCP Cheating can be applied separately to each single TCP connection for which a reservation is in place.
    The QoS network ingress router (A) cheats the TCP source by sending a "fake" ACK for each “ACKable segment” received on the selected TCP connection ("sender TCP Cheating"); an “ACKable segment” is a TCP segment which:

The WR size written in the fake ACK is chosen in order to let the source TCP free to send at the Sender TSpec rate p (if declared) or at MPC between the sender host and router A, i.e. WR = PRTT · p or WR = PRTT · MPC (where PRTT is the partial round trip time between the sender host and A, evaluated with an ICMP Echo Request).
    Then, at router A, any real ACK message (coming from the receiver host) will:     These operations ensure that the TCP source is “blind” with respect to the network conditions, and does not back off if some congestion occurs in the reverse path.
    The described mechanism relies on a number of assumptions, all included in the IntServ model: The second half of the "TCP Cheating" ("receiver TCP Cheating") prevents the receiver TCP from stalling when the received byte stream has a “hole” in it (if one or more IP packets have been lost, the receiver TCP waits for the lost data before going on ACKing new segments and relaying the received data to the application).
    When the egress router B finds that the ACKs from the receiver keep on repeating the same ACK number N, it sends the receiver a “fake” TCP segment whose sequence number is N. The payload of the fake segment is not relevant and it contains as many bytes as the difference between the lowest sequence number acknowledged by the receiver and N.

    It must be highlighted that the “fake ACKs” generation requires that the ingress router keeps some state about the latest ACK number passed (or generated) and the latest IP identification number passed (or generated). These fields must be consistently updated in every passing packet.
    Thus, the TCP Cheating approach may introduce severe scalability limits, since those states must be kept updated for each "reserved" TCP connection and – if the IP on the receiver hosts checks the IP identification number – for each packet exchanged between the two hosts.
    Notwithstanding, the TCP Cheating is implemented at the edge of the network, where the scalability issues have a weaker impact, and may be a good solution for the exposed problems.
    So far, we implemented the “sender TCP cheating” part of the whole mechanism, and it proved to be working correctly: the source TCP keeps sending data with its own pace (determined by the application source speed, congestion window and by fake receiver window sizes) independently of the bandwidth of the reverse path.

Conclusions

    This article presents some key problems related to the provision of QoS to legacy applications, proposes some solutions and outlines their implementation. In order to exploit new QoS features of IntServ(-over-DiffServ) networks, QoS-unaware applications need to be complemented by a tool (RSVP Manager Module) able to make it easy the management of RSVP signalling, the traffic description and the specification of QoS parameters.
    The traffic description is performed on-line and automatically, basing on an LBAP parameters fitting process, whereas the specification of QoS parameters is semi-automated using some network calculus results.
    A further problem concerns the TCP connections, since the TCP congestion control conflicts with the QoS-capable network one. A way to make data transfer TCP connections exploit the whole reservation is presented, but in some cases (backward congestion and / or real-time flows) it may not be enough and a second approach ("TCP Cheating") should be used, which "turns off" the TCP congestion control loop.
    These features, on the whole, help a large set of existing applications benefit from QoS.

References

  1. ACTS PETERPAN Contract AC307 (http://peterpan.lancs.ac.uk), Technical Annex, Jan. 1998
  2. ACTS MAESTRO Contract AC233, Technical Annex, Mar. 1996
  3. J. Wroclawsky, Specification of the Controlled-Load Network Element Services,  RFC 2211, Sep. 1997
  4. S. Shenker et al. Specification of Guaranteed Quality of Service, RFC 2212, Sep. 1997
  5. R. Braden et al., Resource reSerVation Protocol (RSVP) — v. 1, Functional Specification, RFC 2205, Sep. 1997
  6. N. Ciulli, S. Giordano, A. Casaca, P. Silva, M. Dunmore, N. Race, The Hybrid Edge Device Concept and Implementation in the PETERPAN Network Architecture: a proposal for IP QoS provisioning on ATM networks, IEEE ATM Workshop '99 procs
  7. M. Laubach, Classical IP and ARP over ATM, RFC 1577, Jan. 1994
  8. R. Braden, D. Hoffman, RSVP Application Programming Interface (RAPI), The Open Group Technical Standard C809, Dec. 1998
  9. Sun Microsystems, STREAMS Programming, Mar. 1990
  10. R.L. Cruz, A Calculus for Network Delay and a Note on Topologies of Interconnection Networks, Ph.D. Thesis, Univ. Of Illinois, Report UILU-ENG-87-2246, Jul. 1987

  11.