Satisfying High Quality Requirements of Videoconferencing on a Packet-Switched Network

Mauro Draoli <draoli@iasi.rm.cnr.it>
Carlo Gaibisso <gaibisso@iasi.rm.cnr.it>
Maurizio Lancia <lancia@iasi.rm.cnr.it>
Emiliano Antonio Mastromartino <mastro@iasi.rm.cnr.it>
Istituto di Analisi dei Sistemi ed Informatica
Italy

Abstract

This paper is mainly concerned with the design, the dimensioning and the tuning of network systems with the main goal of offering a high quality desktop videoconference service to a large community of users dispersed in a metropolitan area. We will refer to a network topology in which a set of Ethernet technology-based LANs is interconnected by a DQDB backbone, trying to exploit the bandwidth saving introduced by the use of best efforts techniques and to maintain the investment of a large community of users.

We recognized three main subnetwork systems: the single Ethernet segment, the couple of switched Ethernet segments, and the MAN subnetwork. For each of them a simulation model has been designed and has been run separately. Intermediate results have been used to dimension and tune the whole system. Finally a unique simulation model has been defined and run showing that the available protocols and technologies can support high quality multimedia real time services on packet switched networks at the price of a careful planning and tuning activity. More in detail, we show that it is possible to reach a 70% bandwidth utilization maintaining a satisfactory user quality perception and that Ethernet could be the system bottleneck, due to sudden and unpredictable peaks in the LAN access delay. The use of RTP has been proved to be mandatory to improve the user quality perception.

Contents

1. Introduction

This paper is the result of a long period of research activity with the aim of offering a high quality desktop videoconference service to a large community of users dispersed in a metropolitan area and is based on the experiences already maturated by testing videonference applications [1] and, via simulation, by modeling and by evaluating the performance of heterogeneous network systems [2].

The great diffusion of the Internet technologies caused most current multimedia applications to be based on the IP connectionless network layer protocol. In this direction we will refer to a particular topology of network in which a set of Ethernet technology-based LANs is interconnected by a DQDB backbone. The choice of Ethernet, with awareness of the performance constraints introduced by this technology [3], is due to its efficiency in offering low cost bandwidth and to the need for maintaining users' investment.

We already proved [2] that, in this context, a DQDB subnetwork can support the transmission of multiple high quality video streams with a satisfactory QoS in terms of access delay with a bandwidth utilization higher than the one obtained by bandwidth reservation techniques.

In addition the behavior of the interconnected system was simulated in order to determine the optimal topological configuration with respect to the number of served users [4]. In such a simulation, delays different from those generated by the MAC layer and the mutual influence of different sources of information were voluntarily neglected.

The present work completes the analysis of the whole system since

The considered protocols and technologies have been shown to effectively support high quality multimedia real time services on packet switched networks at the price of a careful planning and tuning activity.

Among the results we obtained, we proved that a 70% bandwidth utilization is reachable at the same time maintaining a satisfactory user quality perception and that, due to sudden and unpredictable peaks in the LAN access delay, Ethernet could represent the system bottleneck.

In addition, the adoption of RTP is proved to be mandatory to improve the user quality perception, if already in the presence of a properly dimensioned network platform.

Before going into deeper details let us introduce just a few notations. In what follows we will denote by:

Figure 1 shows the temporal relation among the just introduced events.


Figure 1: timing diagram in transmitting video streams.

2. Videoconferencing service and user requirements

In this section we will identify and quantify the parameters which could reasonably lead to a good quality level of perception on widely available and not particularly expensive computing resources. Some of them are easily identifiable, such as the image size, the frame rate and the number of displayed colors. Much more difficult is to identify and to quantify the impact on the user satisfaction level of the network delay and loss.

At this regard it is absolutely reasonable to consider two additional parameters: the video frame end-to-end delay, i.e. Vn-Pn, and the number of discontinuities due to the frames that are played multiple times at the receiver. Obviously discontinuities may have a very severe effect on the quality of perception.

Due to the adopted computing power restrictions, we limit ourselves to consider a full motion service for full color images of medium size. This choice guarantees a reproduction quality level higher than the one required in analogous evaluations [6], [7], [8], and makes possible the transmission of detailed still images.

For the same reason only 2-way conference sessions are considered. This choice does not invalidate the treatment generality since it has been shown [8] that the popularity of multiple way video-calls is low if the community of users is not large.

Finally, the maximum desirable end-to-end video frame delay is fixed to 300 msec in order to guarantee the "lip sync" and the discontinuity rate constrained to a desirable value of 0.25 discontinuities per second.

3. The simulation scenario

The whole system is composed by a looped DQDB MAN backbone (based on the DS3 transmission system at 45 Mbps) which can be accessed by seven nodes; figure 2 shows only two of them, each of which is a multiport high performance switch allowing the interconnection of multiple LAN segments.


Figure 2: the network topology.

The network spans a metropolitan area. The distance between two adjacent access nodes is assumed to be bounded by 5 kilometers, while the stations connected to the same LAN are assumed to be very close. Two segments among the ones connected to the same switch are preferably dedicated to videoconference machines, while at most two of them can be simultaneously active if connected to the same segment.

Three different kinds of videoconference sessions have been identified: local sessions, between two users on the same LAN; local switched sessions, between two users on different LANs connected to the same switching device; remote sessions, between two users on different LANs connected to different switches.

Remote sessions are the most critical ones: our purpose is to reach a satisfactory quality of service in the worst case, when all the videoconference sessions are simultaneously running and all the sessions are remote (local sessions are obviously improbable).

As far as the choice of a particular protocol stack is concerned, the UDP/IP protocol suite has been adopted, extended up by the RTP/RTCP protocol [9] (see figure 3) which guarantees a transport and delivery service with useful characteristics for multimedia real time applications.


Figure 3: the Protocol stack.

In this way the whole system can be considered as an IP subnetwork, thus not requiring the introduction of routers. The need for a fast forwarding and for a high throughput at the backbone access nodes suggests the adoption of self-learning, store and forward switches.

4. Modeling and simulation activities

In this section we describe the most relevant steps of the modeling and simulation activities. Firstly we tested some videoconference tools and chose the one guaranteeing the highest quality level, instead of the better efficiency. The media stream generated by the application was then captured by a LAN spying monitor, analyzed in depth, and finally played back in the simulation model.

Three main subnetwork systems have been identified: the single Ethernet segment, the couple of switched Ethernet segments, and the MAN subnetwork. We designed a simulation model for each of them and we ran them separately. Intermediate results on each subsystem behavior were used to dimension and tune the whole system. Finally a unique simulation model has been defined in order to evaluate the performance of the interconnected system.

4.1 Video stream characterization

Our tests have been run on a Sun Sparcstation equipped by a 24 bits per pixel SunVideo Board. With respect to this particular hardware platform, ShowMe [5], proved to be the one best fitting our needs.

ShowMe makes it possible for the user to fix the frame rate between 1 and 30 frames per second and to select the image size among 176x144, 384x288 and 640x480 pixels in 256 colors. As already stated, we restricted ourselves to consider a full motion service (25 frames per second) with medium-sized pictures (384x288).

The compression algorithm adopted by ShowMe is a complex mechanism which implements both intraframe and interframe compression techniques, thus producing a flow of compressed video frames of variable length. We concentrated our attention on the streams generated by rather still images and we experimentally investigated their characteristics. Both the average bit rate, 1.7 Mbps, and the average size of compressed frames, 8,634 bytes with a standard deviation of 225 bytes, have been determined.

As far as the end-to-end frame delay is concerned, the SunVideo Board does not start compressing a video frame until it has not been entirely bufferized. As a consequence, a delay of one frame period, i.e. 40 ms, is accumulated from the time a scene is captured to the time the application makes the corresponding video frame available to RTP. At the receiver, the delay introduced by the decoder is 40 ms, since it operates on a frame by frame basis too. As a consequence, in order to not exceed the maximum tolerable end-to-end frame delay, fixed in our context to 300 ms, the elapsed time from the time the RTP source entity is invoked to the time the RTP destination entity makes the frame available to the application, i.e. pn-tn, must not exceed 220 ms.

4.2 The RTP/RTCP and UDP/IP models

The RTP/RTCP protocol suite guarantees a transport and delivery service with useful characteristics for multimedia real time applications.

RTP maintains the temporal structure of transmitted data with no respect to the variability of the end-to-end network delay, dn in what follows, while RTCP, by reporting on the network load conditions, makes it possible for the RTP source entity to adapt its emission rate in order to improve the quality of the service or to save bandwidth. Obviously there is a price to be paid: the introduction of a buffering delay, pn-an, in reproducing the multimedia stream and the loss of a certain amount of information. Clearly, the lower the accepted delay, the bigger could be the amount of lost information.

Our RTP/RTCP model schedules the playout times in such a way that the introduced delay is optimized with respect to a certified 1% rate of frame loss. The adopted playout mechanism directly derives from the blind delay method described in [10]; it simply increases the buffering delay of 40 ms, the video frame period, when the frame loss exceeds the 1% threshold.

Finally our model does not exploit the facilities introduced by RTCP, but takes in account its bandwidth requirements.

The characteristics of the UDP/IP models will be only briefly sketched since they implement all the main functionalities of IP and UDP and we assume the reader to be already familiar with their behavior. Anyway, it is worth noticing that any UDP entity is a "blind" source delivering datagrams to the network with no regard to the bandwidth availability. If the network is already overloaded, the entity keeps on trying to occupy the available bandwidth, inducing stronger queue occupancy problems, or, in the worst case, causing queue congestion and frame loss. Consequently in order to assure the desired end-to-end delay and loss performances, it is mandatory to carefully dimension the network resources. This aspect will be faced in the next sections.

As far as the IP entities are concerned, they are modeled by a FIFO queue system with a fixed service rate of 1,000 packets per second. Such a rate is high enough to have a minor influence on the whole system performance.

4.3 The switch model

The modeled switching devices support a subset of the functionalities described by the IEEE 802.1D standard [11]: the MAC frame filtering and forwarding, and the MAC frame discarding, by which it is possible to certify the switch delay.


Figure 4: the network queue switch model.

Each device implements a store and forward switching technique: each valid frame is submitted to the "Learning Process," which updates and maintains the filtering database. Each frame is then processed by the "Filtering Process" which discards frames whose destination is on the same LAN they come from. Every other frame is associated to the port indicated by the filtering database, and processed by the "Forwarding Process," which demultiplexes the input flow and forwards the frames to the output ports.

The model is mainly characterized by three parameters: the maximum switch transit delay, the filtering service rate and the forwarding service rate. The values we respectively fixed for such parameters are, as usual in the industrial standard: 10 ms, 100,000 MAC frame per second, and 50,000 MAC frame per second.

4.4 Ethernet and DBQB models

The Ethernet model is fully conformed to the IEEE 802.3 [14] recommendation for 10Base2 Ethernet segments. Propagation delays are computed assuming signal propagates at 0.66 times the light speed.

The DQDB subnet model conforms to the IEEE 802.6 [15] recommendation and provides a significative subset of the functionalities required for the DQDB Layer Service to LLC. The MAC connectionless service is fully implemented: its MAC sublayer fragments the received service data units in small fixed size slots, called cells. The loss of any cells causes the unit to be lost and, as a consequence, the loss of the whole video frame. We carefully model the influence of some management operation, like the MID assigning mechanism. The physical layer convergence function to the DS3 transmission system, usually neglected in analogous simulation models, has been implemented, and the transmission system overhead taken into account.

As already illustrated, each DQDB access node is connected to two unidirectional buses, and in order to optimize the use of the available resources, has to route each instance of communication on just one of them. Unfortunately the 802.6 recommendation does not contain mandatory indications with this respect; consequently the technical solution adopted has been that of implementing a self-learning procedure, similar to that used by switches.

5. Performance evaluation criteria and designing goals

RTP/RTCP obviously helps the system in satisfying the QoS established requirements. Nevertheless it is not difficult to realize that this capability is greatly influenced by the value of the network end-to-end delay dn.

Unfortunately, there is no way to directly control the delay introduced by the network; as a consequence an effective network dimensioning and tuning process is absolutely critical in order to obtain low values of dn. Already known results can be exploited in order to optimize the network performances; e.g. the adoption of a 1500 bytes MAC frame length makes it possible to reach optimal LAN access delays [12].

The dimensioning and tuning processes have been carried out by modeling the whole system as a network queues system, as shown in figure 5, and have been finalized to: 


Figure 5: the network delay stages.

As far as the network behavior is concerned, the main delay sources turned out to be the MAC servers, the IP receivers, due to the need of reassembling long UDP datagrams, and the RTP buffering delay. Components of the delay different from the just-mentioned ones have been taken into account, but revealed themselves to be neglegible.

Since devices and network access delays depend in a non-linear manner on their utilization level, in order to fairly distribute loss and delays among devices and protocol entities, it is mandatory to maintain each device utilization level under its critical threshold. We monitored the load condition of each network device, in order to prevent one of them from being the bottleneck of the whole system.

The frame jitter is the variation in the network video frame interarrival time, i.e. the variation of dn. We have no means to control the dn jitter, since it is mainly due to the variation of the MAN and LAN access times and to the time of permanence in the network queues. The effect of the network jitter is smoothed by the RTP destination entity, as previously described.

A video frame can be lost either during its transmission through the network or once it has already reached its destination. In such a case the frame can be discarded:

We neglect the possibility of discarding due to checksum fault.

Obviously we would like to avoid discarding frames at the destination node, since it could save bandwidth. With this aim we dimensioned the MAN access queue to 1,500 cells, a value small enough to reach congestion and discard frame when the delay in a longer queue would exceed the dn threshold.

The fairness is traditionally intended as the capability of the network to assure a service rate independent from the device location and load. In our context, a system will be said to be fair if the times of permanence into the access queues to the same shared medium are independent from its location and load. In other words, the bigger the load is that the device has to carry off, the faster must be the time of access to the medium.

We evaluated the network capability to assure the same performance level to different remote sessions and improved the MAN subnetwork fairness by adopting the bandwidth balancing technique as defined in [15].

6. Simulation results

We planned four series of simulations, one for each defined simulation model. Each simulation inside the same series is mainly characterized by the temporal sequence of activation for the different involved trasmissions. As will be clear later, this aspect greatly influences the simulation results.

6.1 Single Ethernet segments

The simulation activity in this particular context has been finalized to evaluate the maximum number of video streams supportable by the Ethernet technology. Due to the particular characteristics of the data flows generated by ShowMe, at most two local videoconference sessions have been assumed to be simultaneously active on the same Ethernet segment; i.e. at most four different video streams have been considered by our simulations. This assumption is surely significative, since it corresponds to a 70 percent level of utilization of the available bandwidth while transmitting video information [12]. On the other hand three local sessions are clearly not supportable by Ethernet.

Several simulations have been performed, randomly generating different load distributions and evaluating, for each transmitted flow, the average value of dn, its standard deviation, the maximum and average MAC delay, and finally pn-tn, which has been evaluated once the rate of loss of frames suffered by the RTP presentation algorithm stabilized itself on a very close value to the required 1%. From this time on in fact, pn-tn has been experimentally proved to stabilize itself. Table 1 illustrates the average results of our simulations together with the values obtained by the simulation producing the worst results.

We observe that:

Concluding:


Figure 6: RTP and network delays comparison

Workstation pn- tn dn MAC access delay
Avg Std Dev Max Avg
1 114/178 14/18 10/16 123/140 10/12
2 134/130 18/19 15/19 123/198 9/13
3 154/218 17/21 16/25 160/218 9/12
4 151/211 15/18 15/22 148/195 11/14

Table 1: LAN sessions (Average Results/Worst Case)

6.2 Switched Ethernet segments

There are two main differences between local and local switched sessions. In local switched sessions:

Several simulations have been run, randomly generating different load distributions. In addition to the previous series, the MAC delay suffered by the switch has been evaluated. The results we obtain in the absence of burst superposition phenomena do not substantially differ from those obtained for the local session case; thus the Ethernet technology seems to guarantee an excellent level of fairness also with respect to local switched sessions. These results are confirmed by those reported in [4]. On the opposite, when such a kind of problem arises, the required quality of service seems not to be achievable.

Workstation pn - tn dn MAC access delay
Avg Std Dev Max Avg
1 151/222 21/25 19/34 149/207 9/11
2 153/230 19/22 20/37 138/211 10/13
3 161/290 19/22 22/44 101/142 9/10
4 135/217 17/20 15/27 152/342 12/18
Bridge port1 110/155 3/5
Bridge port2 81/178 3/5

Table 2 Switched LAN sessions (Average Results/Worst Case)

6.3 The MAN subnetwork

The problem of effectively dimensioning the DQDB subnetwork system has been already faced and solved in [2]. Obviously we will strictly rely on the results obtained in that paper. The dimensioning process is greatly influenced by the architectural incompleteness of such a system; in fact all protocol layers higher than Ethernet are absent. As a consequence the only performance index that can be considered in that context is the transfer delay for one frame. Unfortunately it is not still clear what the relation is between the quality of service as perceived by the end user and the values assumed by a network performance index as such a delay. In other words it is not still known what is the limit that must not be exceeded in order to guarantee a high quality videoconference service. As a consequence, in [2] the authors rely to the limit fixed by ETSI, which requires 95 percent of transmitted frames to be delivered in less than 20 msec, considering as acceptable a situation in which 95 percent of them do not wait in the access queue for more than 15 msec. In the paper it is also experimentally proved that at most seven access nodes have to be considered in order to be sure not to exceed such a limit.

In this paper, we deal with the problem of guaranteeing a good level of fairness. Several series of simulations have been planned trying to consider more and more realistic contexts and exploiting the results obtained for one simulation to define a more effective tuning of the MAN subsystem on which to rely for successive simulations. We focused our attention on the value to be assigned to the Bandwidth Balancing Module parameter [13] at each node, BWBM parameter in what follows, in order to achieve the required level of fairness. Due to this particular objective, we voluntarily considered as significative simulations in which the maximum tolerable frame delay has been exceeded.

Our attention will be focused on just one of the two unidirectional buses that realize the bidirectional connection adopted by DQDB. Access nodes to such a bus will be identified by an integer from 1 to 7, with node 1 the node closest to the head of the bus and node 7 the node closest to its end. Different occurences of the same video stream will flow into the network through the access nodes, several streams possibly flowing through one single node.

Table 3 contains the results of the five series of simulation we performed. For each series, a couple of rows in the table contain the value of the BWBM and the number of media streams generated at each of the seven nodes. The higher the value of the BWBM for a node, the higher the service rate is. We measure unfairness as the standard deviation of the average permanence time in the access queue at each node. The lower this value, the fairer the system is.

The first two series of simulations aim to evaluate the behavior of the system when the load is uniform. Really, due to the particular type of connection utilized by DQDB, with an high probability, the closer the node is to the head of the bus, the higher the load it has to carry off. The results of the last series of simulation show the success in obtaining a substantially good level of fairness, by using a value of BWBM equal to 8 for the nodes loaded with four streams and equal to 4 for the nodes flowing two streams.

Node

1

Node

2

Node

3

Node

4

Node

5

Node

6

Node

7

Ut% Unfairness
Streams 1 1 1 1 1 1 1
BWBM 4 4 4 4 4 4 4 32% 0.11
Streams 2 2 2 2 2 2 2
BWBM 4 4 4 4 4 4 4 72% 0.1
Streams 4 4 2 2 2 2 0
BWBM 4 4 4 4 4 4 4 82% 4.63
Streams 4 4 2 2 2 2 0
BWBM 8 8 3 3 3 3 3 82% 2.62
Streams 4 4 2 2 2 2 0
BWBM 8 8 4 4 4 4 4 82% 0.21

Table 3: DQDB MAN Tuning

6.4 The overall system

The overall system considered during our simulations is composed of 28 workstations concurrently performing remote two-way videoconferencing sessions, connected to the network in such a way that exactly four streams flow into the MAN subnetwork through each node. We will again focus our attention on just one bus (but taking into account the effect of both buses in the overall end to end delays) with node 1 the node closest to the head of the bus and node 7 the node closest to its end. With respect to the only considered bus, we will assume four streams flowing into the network through node 1, three through node 2, two flows through nodes 3, 4 and 5, just one stream flowing through node 6 and no streams through node 7. The details concerning the nodes to which each flow is addressed are voluntarily omitted. This is not a relevant aspect, indeed, since each flow propagates all the way down the bus with no respect to the destination node and in a negligible time.

Several series of simulation have been performed monitoring the behavior of the whole system in correspondence of nodes 1, 4 and 7, respectively the node at the head, in the middle and at the end of the bus.

At each considered access node the behavior of the system has been monitored with respect to the flows received by the workstations accessing the MAN by such a node. 

The analysis of the worst case simulation shows that:

LAN Site Workstation pn - tn dn MAC access delay
Avg Std Dev Max Avg
LAN 1 Node 1 177/222 31/40 30/38 118/92 12/10
Node 2 184/160 32/32 29/22 119/116 14/9
Node 3 177/179 40/35 30/30 163/141 12/9
Node 4 202/231 32/34 32/37 135/83 17/10
Bridge port1 104/79 7/6
Bridge port2 108/57 9/6
LAN 4 Node 1 200/200 50/27 36/30 119/148 10/10
Node 2 156/181 21/32 18/33 170/229 19/25
Node 3 186/226 25/33 25/28 106/122 10/12
Node 4 241/260 33/41 36/50 202/170 20/17
Bridge port1 74/70 7/6
Bridge port2 113/139 10/13
LAN 7 Node 1 156/180 21/46 15/33 154/186 13/14
Node 2 186/148 47/33 38/25 181/175 16/14
Node 3 136/150 32/32 17/19 146/228 14/20
Node 4 145/141 41/24 30/23 184/143 19/15
Bridge port1 93/126 8/10
Bridge port2 120/142 10/8

Table 4: Overall System (Average Results/Worst Case)

The analysis of the average results shows that:

7. Conclusion and future works

As already stated, the designed architecture can really support high quality desktop videoconferences, and the system is well dimensioned and tuned. Our future efforts will be addressed to the experimentation of more efficient playout mechanisms, in order to compensate for network delay variations and packet loss. We are evaluating the performance of similar protocol architectures using diverse network technologies, both for LANs and MANs.

References

[1] C. Gaibisso, G. Gambosi, M. Lancia, M. Vitale "Multimedia conferencing on packet switched network: testing and evaluation," Proc. Network Services Conference 1994 (NCS'94).

[2] M. Draoli, M. Lancia, A. Laureti-Palma "Video conferencing on a LAN/MAN interconnected system: QoS evaluation," Proc. International Conference on Computer Communication '95.

[3] K.M. Khalil, Y.S. Sun "The effect of bursty traffic on the performance of local area networks," Bellcore 1992.

[4] M. Draoli, G. Gambosi, M. Lancia "Videoconferencing on a LAN/MAN architecture: service evaluation and system dimensioning," 1996 International Conference on Communication Technology, Bejing 1996.

[5] SUN, SM, "ShowMe Documentation for ShowMe Video," Sun Solutions, 1993.

[6] A. Banerja, E. W. Knightly, F.L. Templin, Hui Zhang "Experiments with the Tenet Real-Time Protocol Suite on the Sequoia 2000 Wide Area Network," ACM Multimedia October 1994 San Francisco.

[7] K. Jeffay, D.L. Stone, F.D. Smith "Transport and display mechanism for multimedia conferencing across packet-switched networks," Computer Networks and ISDN systems 26 (1994) pp. 1281-1304.

[8] Andy Hopper "Communication at the desktop," Computer Networks and ISDN Systems 26 (1994) pp. 1253-1265.

[9] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A transport protocol for real-time applications," IETF Internet-DRAFT March 21, 1995.

[10] H. Schulzrinne, "Issues in designing a transport protocol for audio and video conference and other multiparticipant real-time applications," Audio Video Transport working group (May 1994).

[11] IEEE 802.1D "IEEE standards for local and metropolitan area networks: media access control (MAC) bridges."

[12] I. Dalgic, W. Chien, F.A. Tobagi "Evaluation of 10BaseT and 100BaseT Ethernets carrying video, audio and data traffic," Proc. INFOCOM '94, pp. 1094-1102.

[13] Summita, Fetterolf "Effect of Bandwidth Balancing Mechanism on Fairness and Performance of DQDB MANs," Proc. of INFOCOM 1992.

[14] IEEE 802.3 "CSMA/CD Access Methods for the Local Area Networks."

[15] IEEE 802.6 Working Group Editor "Distributed Queue Dual Bus Subnetwork of a Metropolitan Area Network."