INET Conferences

	Conferences

	INET
	NDSS Other Conferences

Adaptive Loss Concealment for Internet Telephony Applications

Henning SANNECK <sanneck@fokus.gmd.de>
GMD Fokus
Germany

Abstract

Today's Internet is increasingly used not only for e-mail, ftp and the World Wide Web, but for interactive audio and video services (MBone). However, the Internet as a datagram network offers only a "best effort" service, which can lead to excessive packet losses under congestion. Internet measurements have shown that the overall probability of loosing one packet is high, however drops significantly for the loss of several consecutive packets.

In this paper we consider this Internet loss characteristic and the property of long-term correlation within a speech signal together, to mitigate the impact of packet losses. This is accomplished by an adaptive choice of the packetization interval of the voice stream at the sender. When a packet is lost, the receiver can use adjacent signal segments to conceal the loss to the user, because a high similarity can be assumed due to the adaptive packetization at the sender. The subjective quality of the proposed scheme as well as its applicability within the current Internet environment (high loss rates, common audio tools, standard speech codes) are discussed.

Background
AP/C
- Adaptive packetization
- Adaptive packetization of speech transitions
Concealment
- Concealment of speech transitions
Applicability to Internet telephony
Conclusions
Acknowledgments
References

Background

Packet-switched networks are increasingly used for audio and video transmission beside "classical" services like electronic mail. However, datagram-oriented networks typically offer only a "best effort" service, which does not make any commitment about a required minimum bit-rate or a maximum delay allowed. Consequently, when the network gets congested, real-time packets may arrive too late at the receiver or may be dropped due to buffer overflow at routers or bit errors (wireless networks). In the case of the transmission of telephone-quality audio for conferencing applications, which we will further explore in this paper, packet loss causes signal dropouts which are very annoying for the listener. To tackle the loss problem, different techniques have been proposed, which can be divided as follows:

Loss avoidance (adaptation [4], reservation [7]);
Loss reconstruction (redundancy mechanisms [9], [11]); and
Loss alleviation (interleaving [12], [13], concealment [14]).

In this section we want to briefly describe these different methods (especially with regard to the Internet environment) and finally introduce our approach.

Bandwidth adaptation

Real bandwidth adaptation, i.e., varying the coder output bit-rate according to (RTCP, [19]) loss reports by receivers, is currently not feasible for speech transmission, as no standardized scalable audio codec is available. However, such codecs (e.g., wavelet codecs) are under development ([23]), but haven't found wide deployment yet. Additionally, when considering the use of such a scalable codec (i.e., where the quality to bit-rate relation is continuous), one must realize that the bandwidth range of such a codec is usually one order of magnitude lower than the video coders in use today. Thus the overhead for this scheme (RTCP control traffic) does not seem to justify the possible gain in available network bandwidth.

When using the constant (low) bit-rate codecs available in the current MBone tools (vat [24], rat [25], FreePhone [26], NeVoT [27]), no output bit-rate adaptation in response to temporary congestion is possible. However, [4] proposes to switch between available codecs (PCM, ADPCM, LPC, etc.) for noncontinuous bit-rate adaptation. We argue that this is problematic due to the nonlinear (or even noncontinuous) relation between the bandwidth and the subjective quality of the codecs: e.g., GSM (13 kBit/s, [17]) sounds subjectively better than G.723 (ADPCM, 32 kBit/s) and different (not necessarily inferior to) from A-law PCM (64 kBit/s). Additionally, considering the service model, when switching codecs the choice of the codec/subjective quality is taken away from the user and it could be argued to take always the codec with the best quality/bit-rate relation (assuming sufficient computing power).

Resource reservation

Recently, much work has been devoted to the Internet Integrated Services (IIS, [7]) model and its resource reservation setup protocol RSVP. Assessing the effectiveness of reservations for speech traffic, we have identified two major drawbacks.

First, the IIS mechanisms have to be deployed in every IP router along the path from the source to the sink. Then, for each flow, state has to be installed in every participating node. Considering numerous low bandwidth voice flows, this results in a high per-flow state overhead to bandwidth ratio. Because the properties of voice flows (constant bit-rate, loss sensitive) are known in advance and could, for example, be identified by the RTP payload type, the IIS traffic characterization objects (Sender and Receiver TSpec) are largely redundant.

Secondly, a mismatch between the properties of the currently existing Internet service classes and the requirements of telephone-quality speech traffic can be observed: The IIS Guaranteed service is intended for nonadaptive flows which need a strict delay bound. The Controlled Load service strictly requires the loss rate to be near 0%. However, subjective testing has shown that, to a certain amount, tolerance for delay can be traded against loss tolerance (i.e., that applications can repair isolated losses: [9], [14]). Additionally, all typical voice applications can adapt fairly well to changing delay (jitter).

Redundancy schemes

These methods "piggyback" redundant information of earlier packets on the current packet to be sent ([21]). Two different methods are proposed:

Source-coded redundancy, i.e., transmitting the same signal several times (with an offset in time), possibly coded with different codecs [9], [10] and
Channel coding (Forward Error Correction: FEC, [11]) uses well-known methods of information theory; however, specific properties of the speech signal are not taken into account.

The overhead of these schemes is relatively high with respect to the additional data to be transmitted (to accommodate losses, the bit-rate has to be increased first in proportion to the number of consecutive losses to be repaired). Yet, the scheme is useful for reconstructing small bursts of lost packets, as well as for larger packet sizes (when concealment can't be applied), and (for source-coded redundancy) all existing codecs in tools can be used.

Interleaving

A simple method to increase the audibility of a loss-distorted signal is interleaving, i.e., sending parts of the same signal segment in different packets, thus spreading the impact of loss over a longer time period. The following schemes exist:

Unit interleaving ([12], chap. 4.3): The speech signal is partitioned in units of size L/I (L being the size of the packet, I being the number of units per packet). Then every Ith unit is packed into the same packet, until it is completely filled and sent. When a packet is lost, I "gaps" of unit size are present in the signal, resulting in a less audible distortion.
Sample interleaving ([13]): Consecutive samples of a waveform coder are put into two different packets. Thus when only one packet is lost, at least every second sample is received and missing samples can be interpolated. However, the scheme works only with waveform coders and the loss of one packet has a significant impact on a speech segment of length 2L.

Interleaving always needs resequencing at the receiver, thus introducing higher latency (as I is also the number of packets needed to regenerate the entire signal segment).

Concealment

A speech signal can be (roughly) partitioned into voiced and unvoiced regions. Voiced signal segments show high periodicity (pitch period). When packetizing, the contents of consecutive packets resemble each other. Concealment algorithms try to exploit this by processing the signal segments around the gap caused by a lost packet and then filling the gap appropriately. This can be done, for example, simply by repeating a signal segment of pitch period length throughout the missing packet ("Pitch Waveform Replication": PWR, [14]), possibly supported by (per subband) LPC analysis/synthesis ([22]).

Usual concealment schemes are receiver-only, i.e., they do not introduce additional processing and data overhead at the transmitter and are well suited for heterogeneous multicast environments. This means that transmitters may use different audio tools than the receivers, and receivers can mitigate packet loss according to their specific quality requirements.

However, the applicability is limited to isolated losses of small- to medium-sized packets (the quasi-stationary property of the signal can be assumed with a high probability only for speech segments smaller than 40ms). To conceal with a high output speech quality, a high number of successfully received packets are necessary after the gap, resulting in additional playout delay. As the (fixed) packetization interval is unrelated to the "importance" of the packet content and changes in the speech signal, some parts of the signal cannot be concealed properly due to the unrecoverable loss of entire phonemes.

AP/C

In this work, we want to facilitate concealment by the processing of the undistorted signal at the sender resulting in adaptive packetization. A very low amount of redundancy, not for reconstruction, but to support the possible concealment operation, is added. Thus it is possible to exploit long-term correlation properties of speech not only for coding, but for loss recovery. We therefore propose to use Adaptive Packetization and Concealment (AP/C, [15]) to enhance applications' loss resiliency and discuss its applicability in the Internet/MBone environment.

Adaptive packetization

The part of the sender algorithm interfacing to the audio device copies PCM samples from the audio input device to its input buffer and returns the position of the maximum of the auto-correlation function p(c) of the input segment of a size of at least 2 p_max (p_max being the correlation window size; c being the "chunk" number; evaluation of the auto-correlation function starts at p_min to constitute a lower bound on possible chunk/packet sizes). Then, the input buffer pointer is moved by p(c) samples (thus constituting a "chunk"), c is incremented and if necessary new audio samples are fetched from the audio device.

If no periodicity was found in the signal (i.e., the content of the "chunk" is unvoiced speech or noise), p(c) is close to p_max (figure 1). Thus, by applying a fixed bound p_u (minimal length of a chunk classified as "unvoiced") to p(c) and p(c-1), as well as applying another bound $/Delta p$ to the first derivative of p(c), it is possible to detect speech transitions. The detection routine may run in parallel and can be combined with silence detection.

To alleviate the incurred header overhead, which would be prohibitive for IP if every chunk is sent in one packet, two consecutive chunks are associated to one packet (see figures 1 and 2, s(n): time domain signal n: sample number).

Figure 1. AP/C sender operation: transition voiced/unvoiced

Figure 2. AP/C sender operation: transition unvoiced/voiced

Adaptive packetization of speech transitions

If a voiced/unvoiced (vu) transition has been detected, the "transition chunk" is partitioned into two parts c_a and c_b (8a/b in figure 1) with p(c_a) set to p(c-1) and p(c_b)=p(c)-p(c_a) (p(c) being the original chunk size). Note that if cmod2 = 0, the chunk c-1 (no. 7 in figure 1) is sent as a packet containing just one chunk.

When an unvoiced/voiced (uv) transition has taken place, backward correlation of the current chunk with the previous one (no. 3 in figure 2) is tested, as it may already contain voiced data (due to the forward auto-correlation calculation). If true, again the previous chunk is partitioned with p(c_b-1)=p_backward(c-1) and p(c_a-1)=p(c-1)-p(c_b-1) (p_backward is the result of the backward correlation). Note that the above procedure can be performed only if cmod2 = 0; otherwise the previous chunk has already been sent in a packet. A solution to this problem would be to always retain two unvoiced chunks and check if the third contains a transition; however, the gain in speech quality when concealing would not justify the incurred additional delay.

With the above algorithm, "more important" (voiced) speech is sent in smaller packets and thus the resulting loss impact/distortion is less significant than using fixed size packets of the same average length, even without concealment (assuming that the network's loss probability is independent of the packet size and the mean number of packets sent remains the same). To enable concealment at the receiver, it is necessary to transmit the intra-packet boundary between two chunks (i.e., p(c) of the first chunk in the packet) as additional information in the packet itself and the following packet.

Figure 3. Packet size frequency distribution for four different speakers l n(l) / L
n(l): number of packets of length l; L: length of the test signal
Normalized packet size distributions for four different speakers l n(l) / L

With our scheme, the packet size is now adaptive to the measured pitch period. Frequency distributions of packet sizes (weighted by the packet size itself to show the contribution to the entire test signal) for four different speakers in Fig. 3 show that the parameter settings can accommodate a range of pitches, as their overall shapes are similar to each other (parameters were: p_min=30 samples (start offset point of the auto-correlation); p_u=120; p_max=160; note that p_min <= l <= 2 p_max). The most common packets contain two voiced chunks (vv packets), as distributions are centered around a value that is twice the mean pitch period (i.e., the mean of voiced chunks).

Concealment

When detecting a lost packet (by keeping track of RTP [19] sequence numbers), the receiver can assume that the chunks of a lost packet resemble the adjacent chunks, because of the pre-processing at the sender. To avoid discontinuities in the concealed signal, the adjacent chunks are copied and resampled (using a linear interpolator) to exactly fit the lost chunk sizes, which are given by the packet length and the transmitted intra packet boundaries. No time-scale adjustment ([14]) is necessary because the chunk sizes are small. Because the sizes of the lost and the adjacent chunk most probably only differ slightly (and thus the respective spectra), no significant audible impact of the operation can be observed. Fig. 4 shows the concealment operation in the time domain.

Figure 4. Concealment of a distorted signal (50% loss)

Concealment of speech transitions

Transitions in the signal might lead to extreme expansion/compression operations, because the length of an unvoiced chunk of a transition packet (denoted v|u or u|v) will usually be significantly smaller than in u|u packets (two unvoiced chunks). This is due to the chunk partitioning described in section "Adaptive packetization of speech transitions".

Table 1. Concealment of/with packets containing speech transitions
Left Packet Lost Packet Right Packet Exp./Compress.

v | u_a u_L u u_a << u_L: expansion

u u_L u_a | v u_a << u_L: expansion

u u_a u_L | v u_a >> u_L: compression

v | u_L u_a u u_a >> u_L: compression

u (u|v)_L v_a v v_a << (u|v)_L: expansion

u (u|v)_a v_L v (u|v)_a >> v_L: compression

Table 1 lists the possible cases. v_a, u_a are (the relevant) voiced/unvoiced available chunks, and v_L, u_L are (the relevant) voiced/unvoiced lost chunks. A u (u|v) packet is a packet where the second chunk contains an unvoiced/voiced transition that was not recognized by the sender algorithm. To avoid high compression, adjacent samples of the relevant length are taken and inserted in the gap. An audible discontinuity which might occur can be avoided by overlap-adding the concealment chunk with the adjacent ones. High expansions can be avoided by repeating a chunk until the necessary length is achieved and then again overlap-adding it.

Applicability to Internet telephony

Support for frame-based codecs

Two properties of modern, frame-based speech coders do not allow a straightforward application of AP/C:

Synchronization of coder and decoder (synchronization is lost during a packet loss gap; thus the decoding is worse after the gap due to previous coder state loss, especially for backward-adaptive coders [22])
Operation on (small) fixed size speech frames (e.g., F=10ms for G.729 [16], F=30ms for GSM [17] and G.723.1 [18]).

The first problem can only be alleviated by either trading higher loss-resiliency against higher bit-rate (using a nonadaptive codec: PCM) or, as a compromise, using a hybrid codec (waveform/parametric), where the impact of a packet loss to subsequently decoded speech is less severe (see [20] with regard to the G.729 codec).

The second issue should be tackled in the long term by a close integration of coding and packetization, as well as decoding and concealment (+FEC) functions. However, to allow operation together with existing codecs, we evaluate a simple fragmentation scheme.

Fig. 5 shows the packetization, when speech boundaries found by the AP algorithm are used to associate frames of length F to the actual packets sent over the network. As AP packets overlap the frame boundaries, a significant amount of redundant data, as well as additional alignment information (s_i), needs to be transmitted (yet redundant data can be used in a possible concealment operation: e.g., by overlap-adding it to the replacement signal). To allow analysis, we assume a constant AP packet size of l=kF+n, k and n being positive integers.

Figure 5. Packetization of a framed signal

The fragmentation data "overhead" associated with packet i can then be written as follows:

Per packet fragmentation data 'overhead' formula

For a sequence of N packets, this results in

Overall fragmentation data 'overhead' formula

With F mod n = 0 (0 < n < F), we have O_f = N(F-n). Assuming n << F, O_f' = O_f / (2 p_v N) gives an indication of the relative fragmentation overhead which can be expected for different speaker/ranges of packet sizes (p_v being the mean pitch period). Table 2 compares that value to measurements. The fragmentation scheme results in an increase: e.g., for the G.729 codec, from 8 kBit/s to 12-14.4 kBit/s. If this increase is justified by an increased speech quality or if, for example, the built-in concealment of the G.729 is to be used instead, it should be evaluated by a separate subjective test for each codec.

Table 2. Relative fragmentation overhead for four different speakers
(mean pitch period: p_v) for F=10ms
Speaker p_v [samples] O_f' [%] relative overhead (measured) [%]

Male low 79.20 50.50 48.86

Male high 67.05 59.65 58.36

Female low 57.74 69.27 64.40

Female high 49.88 80.20 76.36

Table 2 also shows that the mean value p_v of the chunks classified as voiced can be used as an estimate for an adaptive packetization "equivalent" packet size (cf. Table 3).

Subjective test results

To evaluate the properties and performance of AP/C, a subjective test was carried out. Test signals were the four signals (with different speakers) of approximately 10 seconds each, also used for the objective analysis (PCM 16 bit linear, sampled at 8 kHz). The new technique was compared with silence substitution (i.e., an adaptive packetization without concealment) and the simple receiver-based concealment algorithm "Pitch Waveform Replication" (PWR), which is the only one able to operate under very high loss rates (isolated losses). For PWR we used the same algorithm and fixed packet size (160 samples) as in [14].

Thirteen nonexpert listeners evaluated the overall quality of 40 test conditions (4 speakers x [3 algorithms x 3 loss rates + original]) on a five-category scale (Mean Opinion Score). Tests took place in a quiet room with the subjects using headphones.

The same packet loss pattern was applied to all input signals for one speaker (note that the sample loss pattern is different due to PWR working on fixed packet sizes only). To allow complete concealment and thus a relative evaluation of the algorithms, only isolated losses were introduced. Therefore we used a drop function which satisfies the condition P_i(i|i-1) = 0 (P_i(i|i-1) is the conditional probability of packet i being lost when packet i-1 has been lost) and approximates at the same time an equally distributed loss behavior with a given sample loss rate ([15]).

Before testing started, an "Anchoring" procedure took place, where the quality range (Original = 5, "Worst Case" signal = 1) was introduced. For this test we used the unconcealed 50% loss signal (with AP) as the "Worst Case" signal.

Figures 6-8 show the mean MOS values for the three algorithms (Silence Substitution, Pitch Waveform Replication, and AP/C). Figure 9 gives the respective standard deviations of the MOS. As loss values we give the actual sample loss rate instead of the packet loss rate, as we deal with variable size packets. The pitch frequency axis refers to the measured mean of voiced chunks.

It can be seen that for all speakers, AP/C leads to a significant enhancement in speech quality compared to the "silence substitution" case, which is maintained also for higher loss rates. However, for speakers with high pitch frequencies, the relative performance (vertical distance between the surfaces if put in one graph) decreases. A reason for this is the chosen start offset point p_min (= 30 samples) of the auto correlation computation, which constitutes a lower bound on the chunk/packet size to avoid excessive packet header overhead, but also limits the accurateness of the periodicity measurement (note the small distance between the peak of the packet size distribution and the lower bound in Fig. 3 for the highest pitch frequency speaker: "female high").

The PWR algorithm performs well for loss rates of about 20% (cf. [14]); however, speech quality drops significantly for higher loss rates, as the specific distortions introduced by that algorithm become increasingly audible.

Figure 6. MOS for "Silence Substitution"
MOS for 'Silence Substitution'

Figure 7. MOS for "Pitch Waveform Replication"
MOS for 'Pitch Waveform Replication'

Figure 8. MOS for "Adaptive Packetization/Concealment"
MOS for 'Adaptive Packetization/Concealment'

Figure 9. Standard deviations of MOS values
Standard Deviation of MOS for 'Silence Substitution'

Subjective tests have been performed with PCM samples; this carries the implicit assumption that the speech immediately after the gap is decoded properly (see section "Support for frame-based codecs").

Objective measurements are clearly inappropriate for PWR (no aim at mathematical approximation of the missing signal segments). AP/C is not a reconstruction scheme as well; however, the adaptive packetization and subsequent resampling should perform better concerning mathematical correctness. Calculated overall SNR values for PWR (for the examples which are presented in this paper) are always below those for the distorted signal. SNR values for AP/C are always above those for the distorted signal and at least 4dB higher than for PWR. This confirms our conjecture, yet conclusions about speech quality should be based only on the subjective test results.

Data overhead

Table 3 gives the packet header overhead for different speakers, based on the sum of actually measured packet sizes. For a low average pitch period, we see that the overhead is comparable to a typical parameter setting in IP networks (160 bytes (=20ms) G.711 PCM audio in an IP/UDP/RTP packet [20+8+12 bytes header], resulting in 20% packet header overhead). However, it increases with an increasing mean pitch period. But even for higher pitch voices, the additional packet header overhead stays below 10%, which is comparable to adding a very low bit-rate additional source coding to reconstruct isolated losses ([9]).

Table 3. Relative cumulated header overhead O for AP assuming o=40 bytes per-packet overhead for four different speakers (mean pitch period: p_v)
Speaker o/(o+2p_v) [%] O [%]

Male low 20.16 20.14

Male high 22.97 22.83

Female low 25.72 24.84

Female high 28.62 27.98

To support a possible concealment operation it is necessary to transmit the intra-packet boundary between two chunks as additional information in the packet itself and the following packet. That amounts to two octets of "redundancy" for every packet, that could, for instance, be transmitted by the proposed redundant encoding scheme ([21]).

When the frame length F is significantly smaller than the mean packet size (section "Support for frame-based codecs"), support for frame-based codecs can be assured with a reasonable amount of additional data.

Implementation

The maximum additional delay introduced in the current implementation consists of

Time interval corresponding to the length of the additional buffered speech segment needed to create a packet (d_S,max = 2 p_max-p_min);
Time interval corresponding to one packet length after the loss was detected at the receiver (d_R,max = 2 p_max); and
Time needed for computations d_C,max

The computational complexity is low at sender and very low at the receiver as only simple operations (auto-correlation, sample rate conversion) have to be performed (thus d_C,max << d_S,max + d_R,max). This makes the scheme well-suited for multicast environments with low-end receivers.

Backwards compatibility to existing audio tools is ensured, as most tools can receive properly variable length PCM packets (and then mix them into their output buffer); however, delay adaptation algorithms might need to be modified.

Conclusions

A technique for the concealment of lost speech packets has been presented. The core idea of preprocessing a speech signal at the sender to support possible concealment operations at the receiver has proven to be successful. It results in an inherent adaptation of the network to the speech signal, as predefined portions of the signal ("chunks" assembled to packets) are dropped under congestion.

The subjective quality, when using AP/C in conjunction with existing frame-based codecs, needs to be evaluated in further subjective tests. However, a more efficient scheme integrating the coder and appropriate packetization should be devised. We also plan to test more sophisticated speech classification/processing algorithms, yet always taking into account the compromise of quality and computational complexity.

From the perspective of the network, the presented application-level scheme could be complemented by influencing loss patterns at congested routers (queue management), thus also supporting more fairness between flows by avoiding bursty losses within one flow.

Acknowledgments

We are grateful to members of the GloNe (Global Networking) research group at GMD Fokus for discussions and participation in the subjective test.

This work was funded in part by the BMBF (German Ministry of Education and Research) and the DFN (German Research Network) and in part by the EEC within the ACTS project AC012 MULTICUBE.

References

1: V. Kumar,
"The MBONE FAQ,"
http://www.mbone.com/mbone/mbone.faq.html, January 1997.
2: J.-C. Bolot, H. Crépin, and A.V. Garcia,
"Analysis of audio packet loss in the Internet,"
in Proceedings of the 5th International Workshop on Network and Operating System Support for Digital Audio and Video, Durham, NH, April 1995, pp. 163-174.
3: M. Yajnik, J. Kurose and D. Towsley,
"Packet loss correlation in the MBone multicast network,"
in Proceedings IEEE Global Internet 1996 (Jon Crowcroft and Henning Schulzrinne, eds.), London, England, November 1996, pp. 94-99.
4: J.-C. Bolot and A.V. Garcia,
"Control mechanisms for packet audio in the Internet,"
in Proceedings IEEE Infocom '96, San Francisco, CA, April 1996, pp. 232-239.
5: T. Turletti, S. Fosse Parisis, and J.-C. Bolot,
"Experiments with a layered transmission scheme over the Internet,"
Research report 3296, INRIA, November 1997.
6: S. McCanne, V. Jacobson, and M. Vetterli,
"Receiver-driven layered multicast,"
in Proceedings ACM SIGCOMM '96, Stanford, CA, September 1996, pp. 117-130.
7: R. Braden, D. Clark and S. Shenker,
"Integrated services in the Internet architecture: an overview,"
RFC 1633, IETF, 1994,
ftp://ftp.nordu.net/rfc/rfc1633.txt.
8: N. Shacham and P. McKenney,
"Packet recovery in high-speed networks using coding and buffer management,"
in Proceedings ACM SIGCOMM '90, San Francisco, CA, June 1990, pp. 124-131.
9: M. Handley V. Hardman, M. Sasse and A. Watson,
"Reliable audio for use over the Internet,"
in Proceedings INET'95, http://info.isoc.org/HMP/PAPER/070/abst.html, 1995.
10: M. Podolsky, C. Romer, and S. McCanne,
"Simulation of FEC-based error control for packet audio on the Internet,"
in Proceedings IEEE Infocom, San Francisco, CA, March 1998, pp. 48-52.
11: J. Rosenberg and H. Schulzrinne,
"An RTP Payload Format for Generic Forward Error Correction,"
Internet Draft, IETF Audio-Video Transport Group, November 1997,
ftp://ftp.nordu.net/internet-drafts/draft-ietf-avt-fec-01.txt.
12: C. Perkins,
"Options for repair of streaming media,"
Internet Draft, IETF Audio-Video Transport Group, January 1998,
ftp://ftp.nordu.net/internet-drafts/draft-ietf-avt-info-repair-02.txt.
13: N.S. Jayant and S.W. Christensen,
"Effects of packet losses in waveform coded speech and improvements due to an odd-even sample-interpolation procedure,"
IEEE Transactions on Communications, vol. COM-29, no. 2, pp. 101-109, February 1981.
14: H. Sanneck, A. Stenger, K. Ben Younes, and B. Girod,
"A new technique for audio packet loss concealment,"
in Proceedings IEEE Global Internet 1996 (Jon Crowcroft and Henning Schulzrinne, eds.), London, England, November 1996, pp. 48-52.
15: H. Sanneck,
"Concealment of lost speech packets using adaptive packetization,"
Proceedings IEEE Multimedia Systems 1998, Austin, TX, June 1998.
16: International Telecommunications Union,
"Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP),"
ITU-T Recommendation G.729, March 1996.
17: J. Degener,
"GSM 06.10 lossy speech compression,"
Documentation, TU Berlin, KBS, October 1996,
http://kbs.cs.tu-berlin.de/~jutta/toast.html.
18: International Telecommunications Union,
"Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s,"
ITU-T Recommendation G.723.1, March 1996.
19: H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson,
"RTP: A transport protocol for real-time applications,"
RFC 1889, IETF, January 1996.
ftp://ftp.nordu.net/rfc/rfc1889.txt.
20: J. Rosenberg,
"G. 729 error recovery for Internet Telephony,"
Project report, Columbia University, 1997.
21: C. Perkins et al.,
"RTP payload for redundant audio data,"
RFC 2198, IETF, September 1997,
ftp://ftp.nordu.net/rfc/rfc2198.txt.
22: Kai Clüver,
Rekonstruktion fehlender Signalblöcke bei blockorientierter Sprachübertragung
(Reconstruction of missing signal blocks for block-orientated voice transmission)
PhD thesis, Telecommunications Department, Technical University of Berlin, January 1998
23: Martin Isenburg,
Transmission of multimedia data over lossy networks
Technical Report TR-96-048
ICSI, 1996
http://www.icsi.berkeley.edu/~isenburg/studyA4.ps.gz.
24: "Visual Audio Tool (VAT),"
LBNL Network Research Group
http://www-nrg.ee.lbl.gov/vat/.
25: "Robust Audio Tool (RAT),"
UCL, Dept. of Computer Science
http://www-mice.cs.ucl.ac.uk/mice/rat/.
26: "Freephone,"
INRIA
http://zenon.inria.fr/rodeo/fphone/.
27: "Network Voice Terminal (NeVoT),"
Columbia University, Dept. of Computer Science
http://www.cs.columbia.edu/~hgs/nevot/.

INET Conferences

Adaptive Loss Concealment for Internet Telephony Applications

Abstract

Contents