The application of this work is multimedia conferencing over the Internet. The work is based on experiences in multi-way multimedia conferencing demonstrations from Project MICE (Multimedia Integrated Conferencing for Europe) (using the popular tool `vat'), Project ReLaTe (Remote Language Teaching over SuperJANET), and preliminary experiments.
Experience has shown that the major problem with the MBONE for audio is one of packet loss. Packet loss occurs for a number of reasons: when routers become congested, when packets arrive too late to be played back, or when scheduling difficulties in a multi-tasking operating system occur.
Experience has also shown that packet loss is a persistent problem, and it can be expected to get worse. From a user's point of view, packet loss severely disrupts speech intelligibility, even for very low loss rates. Consequently, a solution which renders the speech intelligible at all loss rate levels is required. Intelligible speech in these situations can only be achieved using redundancy; a separate `stream' of speech is transmitted in addition to the primary information.
Audio for transmission over the MBONE/Internet has to be split into packets, which are then launched onto the network. At the receiver, the packets may be delivered out of order, or not at all, and packet arrival times are unpredictable. This means that a number of packets must be kept `in hand' during play-out, so that the receiver has chance to re-order packets, and to smooth out unpredictable packet arrival times. Whenever one or more packets are lost, silence is played. This leads to the familiar speech clipping effects currently heard over the Internet. The nature of the Internet (large packets) unfortunately means that packet loss has a serious effect on the intelligibility of speech.
This paper introduces a method of using cheap redundancy within the packets sent from the transmitter. The redundancy is synthetic speech, (Linear Predictive Coding), which, when split into packets only adds a very small amount of overhead to a packet. The redundancy is added later in the train of packets than the primary speech information, which means that the receiver, upon suffering the loss of the primary speech information, has the possibility of substituting something sensible in the output stream of speech, provided that the redundancy can be received.
Preliminary experiments have been carried out into the perception of speech repaired with a synthetic substitute. The experiments objectively measured speech intelligibility as well as performing subjective evaluations. The results show that this technique is very successful at repairing speech with large packet sizes and for very high loss rates (results were taken up to 40%).
The paper also identifies how this technique might be used in a multi-cast audio tool for the MBONE, and describes the work that has been done towards the implementation of such a tool to reliably transfer speech across the Internet.