Woohyong Choi <whchoi@cosmos.kaist.ac.kr>
Kilnam Chon <chon@cosmos.kaist.ac.kr>
Korea Advanced Institute of Science and Technology
Korea
The current model for lightweight multicast session-based teleconferencing applications provides a very primitive set of control mechanisms such as net mutes mic and mic mutes net. Commercial products based on Recommendation T.124 are being introduced, and it seems likely that similar products will be derived for Internet usage. However, the Internet Engineering Task Force's current emphasis on wide area scaleable multicast-based conferencing is desirable, and we shouldn't sacrifice the benefits of multicast-based sessions to conform to the tightly coupled model of T.124.
This paper proposes a conference control model for lightweight sessions in which media applications can collaborate with a coordination tool to provide control. This tool provides a generic base to manage conferencing states and find agreements among the participants, whose varying policies could be implemented without much change to existing applications. A prototype of the coordination tool has been built and is being used.
Keywords: conference, control, scalability, MBone, multicast, Internet.
There has been much work in recent years on multimedia teleconferencing applications based on desktop computers. The previous generation of conferencing tools, such as mmconf [2], Etherphone[21], and the Touring Machine[3], were based on centralized architectures, in which a central application on a central machine acted as the repository for all information relating to the conference. Although simple to understand and simple to implement, this model proved to have a number of disadvantages, the most important of which was the disregard for the failures arising from conferencing over a wide area[9].
Since early 1992, a multicast virtual network has been constructed over the Internet [1]. This multicast backbone, or MBone [14], has been used for a number of applications, including multimedia (audio, video, and shared workspace) conferencing. These applications include Visual Audio Tool (VAT) [12], INRIA Video Conferencing System (IVS) [19], Network Video (NV) [4], VIC [15], and Whiteboard(WB) [11] amongst others and have proven successful, especially in terms of scaleability.
An alternative approach to the centralized model is the lightweight session model promoted by Van Jacobson [13]. In this model, communication is regarded as inherently unreliable and applications are loosely coupled cooperating instantiations distributed over the network.
Recently, centralized conference systems based on the International Telecommunication Union (ITU)'s Recommendation T.124 [20] have been introduced, and it seems likely that similar ones will be derived for Internet usage [6]. However, the current emphasis of the Internet Engineering Task Force (IETF) on wide area scaleable multicast based conferencing is desirable, and we shouldn't sacrifice the benefits of multicast-based sessions to conform to the ITU's centralized model [5].
This paper presents a conference control model for lightweight sessions. The model relies on a coordinator running on each conferee's host to manage shared states among the participants and control media applications. Policies are not tied to any specific mechanisms and can be easily changed as needed. To manage shared states, the coordinator uses an agreement protocol derived from a similar protocol [18] proposed by the Multiparty Multimedia Session Control (MMUSIC) working group [17] of IETF. Coordination among media applications and the coordination tool is made possible by the conference bus abstraction, briefly introduced in the design of VIC [15]. The coordination tool is demonstrated in several scenarios, particularly negotiation of media encoding and floor management.
Multimedia applications' domains vary immensely. The same tools are used for small (say 20 participants), highly interactive conferences and for large (500 participants) seminars, and developers are working toward broadcasts for millions of receivers.
Observations of MBone show that people can cope with some inconsistency arising from partitioned networks and lost messages, as long as the distributed state converges in time. Users prefer relatively open conferences but do not have ways to support the range of policies in diverse social and electronic conventions [18].
Because the various media in a conference session are handled by separate applications, we need a mechanism to coordinate them. The conference bus [15] provides this mechanism. Each application can broadcast a typed message on the bus and all applications that are registered to receive that message type will get a copy. The bus is currently used to support voice-switched windows in VIC, with cues from VAT to focus on the current speaker. The bus can also be used to provide controls for media applications.
The Conference Control Channel Protocol (CCCP) [9] abstracts a messaging channel and provides reliable/unreliable semantics using a simple distributed interprocess communication system. The protocol defines a class hierarchy, with an application type as the parent class and subclasses of network manager, member, and floor manager. It consists of a generic protocol used to talk between these classes and the application class and an interapplication announcement protocol.
Figure 1: Conceptualization of CCCP
CCCP implementation has the following requirements, originated primarily from the MICE [8] project and from multicast Internet conferencing:
The conference control architecture of CCCP allows floor control and session control applications to be modularized and replaced at will, although all the applications have to be modified.
The task of conference control can be divided as follows [9].
The problem of meta-conference management is outside the bounds of the conference control architecture, and should be addressed using tools such as Lawrence Berkeley Laboratory (LBL)'s Session Directory [10], traditional directory services, or external mechanisms such as e-mail [9]. The conference control system is intended to maintain consistency of state among the participants as far as is practical, and not to address the social issues of how to bring people together or coordinate initial information.
Membership control involves limiting or modifying participation and is entirely a key distribution/revocation problem [13]. This same problem appears in many other areas of Internet architecture and we will not cover it. We will also leave the job of network management to each media application and focus on the remaining elements of conference control.
Until now, we have discussed what the tasks of conference control for lightweight sessions should be. There are also nonfunctional requirements we need to consider. The conference control model should work with existing applications in multicast conferencing, provide the same basic facilities, and have scaling properties that are no worse than the media applications themselves. Any conference control scheme should not restrict use of the applications it controls, and therefore should not impose any single control policy. A class of 10-year-olds might use very different floor control from a class of graduate students.
The basic idea behind the conference control model is straightforward: we establish a conference bus [15] to exchange messages between media applications and the coordination tool. The coordination tool dictates behaviors of media applications as defined in the session policy specification and interprets session policies in a series of procedural commands.
Figure 2: Initiating a Conference
Policy specifications and initial values of variables are made available to the conference session directory via the session description protocol [7]. When the session directory spawns the coordination tool as well as each media application, the policy specification is passed to the coordination tool.
Figure 3: The Conference Control Model
The coordination tool lies in the central part of the conference control model and keeps shared variables consistent or changes them in accordance with the policy description. Upon changes to certain variables, the coordination tool puts messages on the conference bus so that media applications can take appropriate actions.
Figure 4: The Coordination Tool
There are two kinds of communication in the coordination tool. Communication with media applications is implemented through conference buses; communication with other coordination tools is implemented as multicast sockets bound to another transport address of the current conference. The agreement protocol is bound on the latter part of the communication interface.
The coordination tool can send messages to media applications through the conference bus to ensure that the application follows the conference's session policy; the tool can also send mute or unmute messages. Media applications can also send messages to the coordination tool; VAT(audio) may ask the coordination tool for "floor." Messages in the conference bus can be defined as needed.
Our conference control model uses the MMUSIC agreement protocol [18] as an integral part of its architecture. Before we go into further details, this protocol needs to be discussed briefly.
The MMUSIC agreement protocol [18] provides a framework for expressing a broad family of policies for joint control of ephemeral states. These policies describe who can propose changes to state and the degree of consensus needed to enact them. The policies also describe to what extent the views of state must be consistent when voting and when all changes to state have been executed. The communication model described in this paper is assumed to be an unreliable shared bus model as inherited by the MBone multicast conferencing.
Policies are specified along three dimensions: initiation, voting, and consistency. We use a repeated transmission of state message that announces the current value of one or more state variables. We chose to announce the resulting variable, rather than the operation, as that relieves the need for all members to receive exactly the same set of change operations in order to eventually agree. The mechanisms that are applicable to the MBone rely on the set of messages:
Figure 5: The Agreement Protocol Illustrated
If the proposed operation requires a vote, then the message exchange is poll, response, and then a sequence of state messages. If no vote is required, then there is merely the sequence of state messages. Exactly how this sequence of state messages is sent out will determine the overhead of the algorithm and its correctness.
This section discusses a coordination control tool prototyped from the model described above. Before we go into the details, we will discuss programming language issues. The policy specification in the coordination tool should support the following features:
To meet these requirements, the policy specification is implemented as a Tcl/Tk script [16]. Tcl is a simple scripting language that was originally developed as a generic command language for integrated circuit design. Tcl provides generic programming facilities, such as variables, loops, and procedures, that are useful for a variety of applications. Furthermore, its interpreter is a library of C procedures that can easily be incorporated into applications, and each application can extend the core Tcl features with additional commands. One of the most useful extensions to Tcl is Tk, which is a toolkit for the X window system. Together with Tcl, Tk provides a programming system for developing and using graphical user interfaces. We used a distributed programming extension to Tcl/Tk called Tcl-DP for the development of the coordination tool.
The coordination tool is composed of the following parts:
When the coordination tool is initiated, the policy interpreter parses a policy specification and binds each policy with a user interface object.
Figure 6: User Interface of the Coordination Tool
Users cast votes by selecting a menu item under the policy menu button. The command on the menu item is executed and the initiator sends a poll message to relevant conferees. Conferees participating in the vote are prompted with a dialog asking whether they support the vote. Upon receiving user responses, each conferee's coordination tool responds with response messages. The initiator solicits response messages for a certain amount of time, a linear function of time-to-live value of a conferencing session, and checks if the poll has passed each time it receives a response message. If the poll passes, the initiator sends out state messages periodically. Figure 4 depicts the dynamics of the coordination tool.
There are three dimensions to policy: initiation, voting, and consistency. Any change operation that doesn't require a vote can be modeled as voting with a pass always condition and any initiation policy can be combined into a vote cast. Consistency is always supported by the agreement protocol, and therefore any policy statement can be specified in the form of a vote. A policy statement can now be specified as a tuple of the following variables.
Poll(label, initiator, pass-condition, query-dialogue, notify-dialogue, pass-code, reject-dialogue, fail-code, variable i, value value-i, variable j, value value-j, ...)
The following syntax is used to specify policies. The policy description is written in two parts; one declares shared variables and the other declares any operation upon those variables.
global V(variable) {value}
...
[ct_pack label initiator voting-condition query-dialog notify-dialog
pass-code reject-dialog fail-code num-var
{ variable1 value1 ... }]
Each of the items written in italics is explained below. The syntax used to describe the policies is that of Tcl. When there is no explicit notation given, the Tcl notation is assumed.
$myself
,
$creator
are examples of such variables.
pass-always
,
majority
, and unanimous
are examples
of predefined keywords.
The agreement protocol used in the coordination tool works in a simple manner (Figure 7). The initiator sends a poll message to the multicast channel to which all coordination tools are subscribed. When other conferees receive the poll message, they send response messages to the initiator via the multicast channel. The initiator collects response messages for a certain period of time and updates the variable when the voting condition is passed. State messages are sent periodically to keep consistency among the conferees.
Figure 7: Agreement Protocol Messages
Alive messages are periodically sent by all coordination tools participating in the conference to identify conferees who are alive. The frequency with which alive and state messages are sent is determined by the number of participants to keep the bandwidth bound on a constant number. The delay in learning the new state also increases with the number of members because the overhead is kept constant by reducing the update frequency as membership increases.
This is similar to the algorithm used in the Internet teleconferencing tools VAT and NV to maintain lists of members of the conference. Each new member is apprised of the current state by the incoming state messages. The lack of an initial state status exchange allows this mechanism to efficiently support an open membership policy (anybody who wants to can join). Membership is announced merely by beginning to send state messages; the new member need not contact an old member to be initiated into the group.
The protocol messages are formatted in plain ASCII texts. There are four message types defined in the agreement protocol.
Items written in italics above have the following meanings
yes
, no
, or abstain
.
{
variable1
value1 ... }
Media applications such as VAT, VIC, and WB are already designed to support conference buses. The user interface portions of these applications are built with Tcl so that new message types can be easily handled by the applications. The message focus supports the voice-switched window feature in VAT and VIC.
Whenever a new message type is defined, it can be declared as
a cb_dispatch
handle and the handler function can
be bound with the dispatch handle. For example, the mute
message in VAT can be implemented as follows.
# conference bus API # $cb send "mute $cname" set cb_dispatch(mute) mute_someone proc mute_someone cname { audio $cname mute }
Because many of the MBone applications currently use Tcl to control user interface parts, messages for the conference bus are written in Tcl.
The coordination tool described in the previous section can support various conference policies. We now show some uses of the coordination tool in action.
In an explicitly chaired conference, a chairperson decides when someone can send audio and video. There are three policy descriptions: Request Floor, Release Floor, and Revoke Floor.
global V(chair) whchoi@cosmos.kaist.ac.kr global V(speaker) "" [ct_pack "Request Floor" $V(myself) $V(chair) "Can I speak next time?" "You can speak now" { confbus "mute all" confbus "unmute $V(speaker)" } "You are not allowed talk right now" { } 1 { speaker $myself } ] [ct_pack "Release Floor" $V(speaker) pass-always "" "" { confbus "mute $V(speaker)" } "" { } 1 { speaker "unknown" } ] [ct_pack "Revoke Floor" $V(chair) pass-always { confbus "mute $V(speaker)" } "" "" "" 1 { speaker "unknown" } ]
The variables used in the policy description are declared first. The first policy defines Any conferee needs explicit permission of the chairperson before she/he can talk. The second one defines Speaker returns floor whenever she/he finishes talk. The third means Floor can be revoked by the chairperson.
In the token-passing conference, the potential speaker asks the current token holder for the floor . This is very similar to the previous example, but there is no conference moderator. The policy description has a failure recovery code to handle when there is no current speaker defined.
global V(speaker) [ct_pack "Request Floor" $V(myself) $V(speaker) "Can I speak next time?" "You can speak now" { confbus "mute all" confbus "unmute $V(speaker)" } "You are not allowed talk right now" { if [expr $V{speaker} = "unknown"] set $V(speaker) $myself reenter } 1 { speaker $myself } ]
Assume that unanimous agreement is required to change the audio format. The following example shows a policy specification to change the current audio format to a low-bandwidth format.
[ct_pack "Change Audio to Low" $V(myself) unanimous "Can I speak next time?" "Audio format changed to low quality GSM" { confbus "select_format gsm 4" } "Audio format could be changed" { } 1 { audio "gsm 4" } ]
After investigating the prior work in conference control for lightweight sessions, we developed the following requirements for the conference control model:
To meet these requirements, we proposed our own conference control model. The model has a coordination tool in the central part of the architecture that uses an agreement protocol to manage shared states among the conferees. The agreement protocol assumes an unreliable shared-bus communication model, as in Internet multicast communications. The coordination tool can collaborate with media applications via the conference bus.
A prototype of the coordination tool has been implemented in a small Tcl-DP code that is about 1,000 lines long. This is made possible by high-level communications and string manipulation functions provided in Tcl-DP.
Two key changes have been made possible by the model. First, it does not rely on cooperation among all the remote participants in a session. Misbehaving participants cannot cause problems because they will be muted by all participants that follow the protocol. Another benefit is that the conference policy can be changed at will because the model has been designed to separate policies from the mechanisms that implement them.
The model can be incorporated with session directory tools and session description protocols to keep directory information up-to-date so that latecomers can join the conference without any problems. The coordination tool is currently in an early stage of development. We hope to release it when it gets more usable. Further updates on this work will be available from http://cosmos.kaist.ac.kr/~whchoi/ct.