A Conference Control Model for Lightweight Sessions

Woohyong Choi <whchoi@cosmos.kaist.ac.kr>
Kilnam Chon <chon@cosmos.kaist.ac.kr>
Korea Advanced Institute of Science and Technology
Korea

Abstract

The current model for lightweight multicast session-based teleconferencing applications provides a very primitive set of control mechanisms such as net mutes mic and mic mutes net. Commercial products based on Recommendation T.124 are being introduced, and it seems likely that similar products will be derived for Internet usage. However, the Internet Engineering Task Force's current emphasis on wide area scaleable multicast-based conferencing is desirable, and we shouldn't sacrifice the benefits of multicast-based sessions to conform to the tightly coupled model of T.124.

This paper proposes a conference control model for lightweight sessions in which media applications can collaborate with a coordination tool to provide control. This tool provides a generic base to manage conferencing states and find agreements among the participants, whose varying policies could be implemented without much change to existing applications. A prototype of the coordination tool has been built and is being used.

Keywords: conference, control, scalability, MBone, multicast, Internet.

Contents

Introduction

There has been much work in recent years on multimedia teleconferencing applications based on desktop computers. The previous generation of conferencing tools, such as mmconf [2], Etherphone[21], and the Touring Machine[3], were based on centralized architectures, in which a central application on a central machine acted as the repository for all information relating to the conference. Although simple to understand and simple to implement, this model proved to have a number of disadvantages, the most important of which was the disregard for the failures arising from conferencing over a wide area[9].

Since early 1992, a multicast virtual network has been constructed over the Internet [1]. This multicast backbone, or MBone [14], has been used for a number of applications, including multimedia (audio, video, and shared workspace) conferencing. These applications include Visual Audio Tool (VAT) [12], INRIA Video Conferencing System (IVS) [19], Network Video (NV) [4], VIC [15], and Whiteboard(WB) [11] amongst others and have proven successful, especially in terms of scaleability.

An alternative approach to the centralized model is the lightweight session model promoted by Van Jacobson [13]. In this model, communication is regarded as inherently unreliable and applications are loosely coupled cooperating instantiations distributed over the network.

Recently, centralized conference systems based on the International Telecommunication Union (ITU)'s Recommendation T.124 [20] have been introduced, and it seems likely that similar ones will be derived for Internet usage [6]. However, the current emphasis of the Internet Engineering Task Force (IETF) on wide area scaleable multicast based conferencing is desirable, and we shouldn't sacrifice the benefits of multicast-based sessions to conform to the ITU's centralized model [5].

This paper presents a conference control model for lightweight sessions. The model relies on a coordinator running on each conferee's host to manage shared states among the participants and control media applications. Policies are not tied to any specific mechanisms and can be easily changed as needed. To manage shared states, the coordinator uses an agreement protocol derived from a similar protocol [18] proposed by the Multiparty Multimedia Session Control (MMUSIC) working group [17] of IETF. Coordination among media applications and the coordination tool is made possible by the conference bus abstraction, briefly introduced in the design of VIC [15]. The coordination tool is demonstrated in several scenarios, particularly negotiation of media encoding and floor management.

Related work

MBone applications

Multimedia applications' domains vary immensely. The same tools are used for small (say 20 participants), highly interactive conferences and for large (500 participants) seminars, and developers are working toward broadcasts for millions of receivers.

Observations of MBone show that people can cope with some inconsistency arising from partitioned networks and lost messages, as long as the distributed state converges in time. Users prefer relatively open conferences but do not have ways to support the range of policies in diverse social and electronic conventions [18].

Because the various media in a conference session are handled by separate applications, we need a mechanism to coordinate them. The conference bus [15] provides this mechanism. Each application can broadcast a typed message on the bus and all applications that are registered to receive that message type will get a copy. The bus is currently used to support voice-switched windows in VIC, with cues from VAT to focus on the current speaker. The bus can also be used to provide controls for media applications.

Conference Control Channel Protocol

The Conference Control Channel Protocol (CCCP) [9] abstracts a messaging channel and provides reliable/unreliable semantics using a simple distributed interprocess communication system. The protocol defines a class hierarchy, with an application type as the parent class and subclasses of network manager, member, and floor manager. It consists of a generic protocol used to talk between these classes and the application class and an interapplication announcement protocol.

Figure 1: Conceptualization of CCCP

CCCP implementation has the following requirements, originated primarily from the MICE [8] project and from multicast Internet conferencing:

The conference control architecture of CCCP allows floor control and session control applications to be modularized and replaced at will, although all the applications have to be modified.

Requirements

The task of conference control can be divided as follows [9].

Application control
Applications need to be started with the correct initial state, and the knowledge of their existence must be propagated across all participating sites. Applications may need to cooperate (for example, to archive audio and video synchronization).
Membership control
Who is currently in the conference and has access to what applications.
Floor management
Who or what has control over the input to particular applications.
Network management
Requests to set up and tear down media connections between end points (no matter whether they be analog through a video switch, a request to set up an ATM virtual circuit, or using RSVP [22] over the Internet) and requests from the network to change bandwidth usage because of congestion.
Meta-conference management
How to initiate and finish conferences, how to advertise their availability, and how to invite people to join.

The problem of meta-conference management is outside the bounds of the conference control architecture, and should be addressed using tools such as Lawrence Berkeley Laboratory (LBL)'s Session Directory [10], traditional directory services, or external mechanisms such as e-mail [9]. The conference control system is intended to maintain consistency of state among the participants as far as is practical, and not to address the social issues of how to bring people together or coordinate initial information.

Membership control involves limiting or modifying participation and is entirely a key distribution/revocation problem [13]. This same problem appears in many other areas of Internet architecture and we will not cover it. We will also leave the job of network management to each media application and focus on the remaining elements of conference control.

Until now, we have discussed what the tasks of conference control for lightweight sessions should be. There are also nonfunctional requirements we need to consider. The conference control model should work with existing applications in multicast conferencing, provide the same basic facilities, and have scaling properties that are no worse than the media applications themselves. Any conference control scheme should not restrict use of the applications it controls, and therefore should not impose any single control policy. A class of 10-year-olds might use very different floor control from a class of graduate students.

Model

The basic idea behind the conference control model is straightforward: we establish a conference bus [15] to exchange messages between media applications and the coordination tool. The coordination tool dictates behaviors of media applications as defined in the session policy specification and interprets session policies in a series of procedural commands.

Figure 2: Initiating a Conference

Policy specifications and initial values of variables are made available to the conference session directory via the session description protocol [7]. When the session directory spawns the coordination tool as well as each media application, the policy specification is passed to the coordination tool.

Figure 3: The Conference Control Model

The coordination tool lies in the central part of the conference control model and keeps shared variables consistent or changes them in accordance with the policy description. Upon changes to certain variables, the coordination tool puts messages on the conference bus so that media applications can take appropriate actions.

Figure 4: The Coordination Tool

There are two kinds of communication in the coordination tool. Communication with media applications is implemented through conference buses; communication with other coordination tools is implemented as multicast sockets bound to another transport address of the current conference. The agreement protocol is bound on the latter part of the communication interface.

The coordination tool can send messages to media applications through the conference bus to ensure that the application follows the conference's session policy; the tool can also send mute or unmute messages. Media applications can also send messages to the coordination tool; VAT(audio) may ask the coordination tool for "floor." Messages in the conference bus can be defined as needed.

Our conference control model uses the MMUSIC agreement protocol [18] as an integral part of its architecture. Before we go into further details, this protocol needs to be discussed briefly.

MMUSIC agreement protocol

The MMUSIC agreement protocol [18] provides a framework for expressing a broad family of policies for joint control of ephemeral states. These policies describe who can propose changes to state and the degree of consensus needed to enact them. The policies also describe to what extent the views of state must be consistent when voting and when all changes to state have been executed. The communication model described in this paper is assumed to be an unreliable shared bus model as inherited by the MBone multicast conferencing.

Policies are specified along three dimensions: initiation, voting, and consistency. We use a repeated transmission of state message that announces the current value of one or more state variables. We chose to announce the resulting variable, rather than the operation, as that relieves the need for all members to receive exactly the same set of change operations in order to eventually agree. The mechanisms that are applicable to the MBone rely on the set of messages:

Figure 5: The Agreement Protocol Illustrated

Poll(Id: id, Operation: op, Variable: i, Value: value-i, Variable: j, Value: value-j ...)
Asks for a vote on the proposed operation, with result as shown. The response is {YES, NO, ABSTAIN}.
Response(Id: id, Response: response)
Response to poll message.
State(Variable: i, Value: value-i, Variable: j, Value: value-j, ... , Time: timestamp)
This announces the content of state variable i (several state variables can be included in the same message).

If the proposed operation requires a vote, then the message exchange is poll, response, and then a sequence of state messages. If no vote is required, then there is merely the sequence of state messages. Exactly how this sequence of state messages is sent out will determine the overhead of the algorithm and its correctness.

Design

This section discusses a coordination control tool prototyped from the model described above. Before we go into the details, we will discuss programming language issues. The policy specification in the coordination tool should support the following features:

Graphical user interface
Conferencing is fundamentally a human endeavor and the policy specification involves interactions with users.
Messaging
To send messages over the conference bus and to the other coordination tools in the conference.
Standard procedural language features
It is hard to specify policies except procedurally [13].

To meet these requirements, the policy specification is implemented as a Tcl/Tk script [16]. Tcl is a simple scripting language that was originally developed as a generic command language for integrated circuit design. Tcl provides generic programming facilities, such as variables, loops, and procedures, that are useful for a variety of applications. Furthermore, its interpreter is a library of C procedures that can easily be incorporated into applications, and each application can extend the core Tcl features with additional commands. One of the most useful extensions to Tcl is Tk, which is a toolkit for the X window system. Together with Tcl, Tk provides a programming system for developing and using graphical user interfaces. We used a distributed programming extension to Tcl/Tk called Tcl-DP for the development of the coordination tool.

Coordination tool

The coordination tool is composed of the following parts:

Agreement protocol engine
Keeps the states in the coordination tool consistent with states in the other coordination tools. In the event of changes in variables, appropriate messages are sent to the conference bus to control media applications.
Policy interpreter
Interprets the policy specified in Tcl commands and hooks up policies with user interface glues.
User interface
Displays participants of the conference, states of the variables, and debug messages. Prompts dialog boxes when the coordination tool needs users' attention.

When the coordination tool is initiated, the policy interpreter parses a policy specification and binds each policy with a user interface object.

Figure 6: User Interface of the Coordination Tool

Runtime description

Users cast votes by selecting a menu item under the policy menu button. The command on the menu item is executed and the initiator sends a poll message to relevant conferees. Conferees participating in the vote are prompted with a dialog asking whether they support the vote. Upon receiving user responses, each conferee's coordination tool responds with response messages. The initiator solicits response messages for a certain amount of time, a linear function of time-to-live value of a conferencing session, and checks if the poll has passed each time it receives a response message. If the poll passes, the initiator sends out state messages periodically. Figure 4 depicts the dynamics of the coordination tool.

Policy specification

There are three dimensions to policy: initiation, voting, and consistency. Any change operation that doesn't require a vote can be modeled as voting with a pass always condition and any initiation policy can be combined into a vote cast. Consistency is always supported by the agreement protocol, and therefore any policy statement can be specified in the form of a vote. A policy statement can now be specified as a tuple of the following variables.

Poll(label, initiator, pass-condition, query-dialogue, notify-dialogue, pass-code, reject-dialogue, fail-code, variable i, value value-i, variable j, value value-j, ...)

The following syntax is used to specify policies. The policy description is written in two parts; one declares shared variables and the other declares any operation upon those variables.

global V(variable) {value}
...

[ct_pack label initiator voting-condition query-dialog notify-dialog
pass-code reject-dialog fail-code num-var

{ variable1 value1 ... }]

Each of the items written in italics is explained below. The syntax used to describe the policies is that of Tcl. When there is no explicit notation given, the Tcl notation is assumed.

label
String to be bound on the menu item under the policy menu of the coordination tool.
initiator
A variable or a list of variables can be used. The coordination tool prototype doesn't support a list yet. This is to designate who can initiate the operation. There are predefined variables that can be used as the initiator string. $myself, $creator are examples of such variables.
voting-condition
Predefined keywords or participants from which positive response should be collected to pass the vote. pass-always, majority, and unanimous are examples of predefined keywords.
query-dialogue
Query string to be used in the dialog asking for a confirmation.
notify-dialogue
Notify string to be used in the dialog when the vote is passed.
reject-dialogue
Notify string to be used in the dialog when the vote is rejected.
pass-code
Tcl code to be invoked when the poll passes. This is usually used to control media applications via messages over the conference bus.
fail-code
Tck code to be invoked when the poll fails. This is usually used to handle failure recoveries.
num-var
Number of variables passed in the variable list.

Agreement protocol

The agreement protocol used in the coordination tool works in a simple manner (Figure 7). The initiator sends a poll message to the multicast channel to which all coordination tools are subscribed. When other conferees receive the poll message, they send response messages to the initiator via the multicast channel. The initiator collects response messages for a certain period of time and updates the variable when the voting condition is passed. State messages are sent periodically to keep consistency among the conferees.

Figure 7: Agreement Protocol Messages

Alive messages are periodically sent by all coordination tools participating in the conference to identify conferees who are alive. The frequency with which alive and state messages are sent is determined by the number of participants to keep the bandwidth bound on a constant number. The delay in learning the new state also increases with the number of members because the overhead is kept constant by reducing the update frequency as membership increases.

This is similar to the algorithm used in the Internet teleconferencing tools VAT and NV to maintain lists of members of the conference. Each new member is apprised of the current state by the incoming state messages. The lack of an initial state status exchange allows this mechanism to efficiently support an open membership policy (anybody who wants to can join). Membership is announced merely by beginning to send state messages; the new member need not contact an old member to be initiated into the group.

The protocol messages are formatted in plain ASCII texts. There are four message types defined in the agreement protocol.

poll
src poll id dst
response
src response id dst response
status
src status num-var var-list
alive
src alive name

Items written in italics above have the following meanings

src
Identifier of the source
dst
Identifier of the destination
id
Unique identification number given to the poll message generated by the source. The tuple of src and id uniquely identifies a given poll.
response
Response to a poll. May be yes , no , or abstain.
num-var
Number of variables in the var-list .
var-list
List of variables and values. { variable1 value1 ... }
name
Human-readable form of the conferee's name

Conference bus messages

Media applications such as VAT, VIC, and WB are already designed to support conference buses. The user interface portions of these applications are built with Tcl so that new message types can be easily handled by the applications. The message focus supports the voice-switched window feature in VAT and VIC.

Whenever a new message type is defined, it can be declared as a cb_dispatch handle and the handler function can be bound with the dispatch handle. For example, the mute message in VAT can be implemented as follows.


# conference bus API

# $cb send "mute $cname"



set cb_dispatch(mute) mute_someone



proc mute_someone cname {

    audio $cname mute

}

Because many of the MBone applications currently use Tcl to control user interface parts, messages for the conference bus are written in Tcl.

Application

The coordination tool described in the previous section can support various conference policies. We now show some uses of the coordination tool in action.

Explicitly chaired conference

In an explicitly chaired conference, a chairperson decides when someone can send audio and video. There are three policy descriptions: Request Floor, Release Floor, and Revoke Floor.


global V(chair) whchoi@cosmos.kaist.ac.kr

global V(speaker) ""



[ct_pack "Request Floor" $V(myself) $V(chair) "Can I speak next time?"

"You can speak now" { confbus "mute all" confbus "unmute $V(speaker)" }

"You are not allowed talk right now" { } 1 { speaker $myself } ]



[ct_pack "Release Floor" $V(speaker) pass-always "" ""

{ confbus "mute $V(speaker)" } "" { } 1 { speaker "unknown" } ]



[ct_pack "Revoke Floor" $V(chair) pass-always { confbus "mute $V(speaker)" }

 ""  ""  "" 1 { speaker "unknown" } ]

The variables used in the policy description are declared first. The first policy defines Any conferee needs explicit permission of the chairperson before she/he can talk. The second one defines Speaker returns floor whenever she/he finishes talk. The third means Floor can be revoked by the chairperson.

Token-passing conference

In the token-passing conference, the potential speaker asks the current token holder for the floor . This is very similar to the previous example, but there is no conference moderator. The policy description has a failure recovery code to handle when there is no current speaker defined.


global V(speaker)



[ct_pack "Request Floor" $V(myself) $V(speaker) "Can I speak next time?"

"You can speak now" { confbus "mute all" confbus "unmute $V(speaker)" }

"You are not allowed talk right now"

{ if [expr $V{speaker} = "unknown"] set $V(speaker) $myself reenter }

1 { speaker $myself } ]

Changing the audio format

Assume that unanimous agreement is required to change the audio format. The following example shows a policy specification to change the current audio format to a low-bandwidth format.


[ct_pack "Change Audio to Low" $V(myself)  unanimous

"Can I speak next time?" "Audio format changed to low quality GSM"

{ confbus "select_format gsm 4" }  "Audio format could be changed"

{ } 1 { audio "gsm 4" } ]

Conclusion

After investigating the prior work in conference control for lightweight sessions, we developed the following requirements for the conference control model:

To meet these requirements, we proposed our own conference control model. The model has a coordination tool in the central part of the architecture that uses an agreement protocol to manage shared states among the conferees. The agreement protocol assumes an unreliable shared-bus communication model, as in Internet multicast communications. The coordination tool can collaborate with media applications via the conference bus.

A prototype of the coordination tool has been implemented in a small Tcl-DP code that is about 1,000 lines long. This is made possible by high-level communications and string manipulation functions provided in Tcl-DP.

Two key changes have been made possible by the model. First, it does not rely on cooperation among all the remote participants in a session. Misbehaving participants cannot cause problems because they will be muted by all participants that follow the protocol. Another benefit is that the conference policy can be changed at will because the model has been designed to separate policies from the mechanisms that implement them.

The model can be incorporated with session directory tools and session description protocols to keep directory information up-to-date so that latecomers can join the conference without any problems. The coordination tool is currently in an early stage of development. We hope to release it when it gets more usable. Further updates on this work will be available from http://cosmos.kaist.ac.kr/~whchoi/ct.

References

  1. S. Casner, First IETF Internet Audiocast, ACM SIGCOMM Computer Communications Review, July 1992.
  2. T. Crowley et al., MMConf: An infrastructure for building Shared Multimedia Applications, In Proceedings of CSCW'90, Los Angeles, USA, October 1990.
  3. M. Arango et al., The Touring Machine System, Communications of the ACM, 36(1), January 1993.
  4. R. Frederick, nv UNIX Manual Pages, Xerox Palo Alto Research Center, Palo Alto, USA.
  5. M. Handley, Minutes of the Multiparty Multimedia Session Control Working Group, The 31st Internet Engineering Task Force Meeting, July 1995.
  6. M. Handley, J. Crowcroft, and C. Bormann, The Internet Multimedia Conferencing Architecture, Internet Draft, February 1996.
  7. M. Handley and V. Jacobson, SDP: Session Description Protocol, Internet-Draft, November 1996.
  8. M. Handley, P. Kirstein, and A. Sasse, Multimedia Integrated Conferencing for European Researchers (MICE): Piloting Activities and the Conference Management and Multiplexing Center, Computer Networks and ISDN Systems, p. 26, November 1993.
  9. M. Handley, I. Wakeman, and J. Crowcroft, The Conference Control Channel Protocol (CCCP): A scalable base for building conference control applications, In Proceedings of ACM SIGCOMM'95, Boston, USA, August 1995.
  10. V. Jacobson and S. McCanne, sd UNIX Manual Pages, Lawrence Berkeley Laboratory, Berkeley, USA.
  11. V. Jacobson and S. McCanne, Using the LBL Network Whiteboard, Lawrence Berkeley Laboratory, Berkeley, USA.
  12. V. Jacobson and S. McCanne, vat UNIX Manual Pages, Lawrence Berkeley Laboratory, Berkeley, USA.
  13. V. Jacobson, S. McCanne, and S. Floyd, A Conferencing Architecture for Light-weight Sessions, Technical Report, Lawrence Berkeley Laboratory, November 1993.
  14. M. Macedonia and D. Brutzman, MBONE provides Audio and Video across the Internet, IEEE Computer, 27(4), April 1994.
  15. S. McCanne and V. Jacobson, vic: A Flexible Framework for Packet Video, In Proceedings of ACM Multimedia'95, San Francisco, USA, November 1995.
  16. J. Ousterhout, Tcl and the Tk Toolkit, Addison-Wesley, 1994.
  17. E. Schooler, R. Lang, and M. Handley, Charter of the Multiparty Multimedia Session Control Working Group, Internet Engineering Task Force.
  18. S. Shenker, A Weinrib, and E. Schooler, Managing Shared Ephemeral Teleconferencing State: Policy and Mechanism, Internet Draft, July 1995.
  19. T. Turtetti, INRIA Video Conferencing System(ivs), Institut National de Recherche en Informatique et an Automatique, France.
  20. International Telecommunication Union, ITU Recommendation T.124: Generic Conference Control.
  21. H. Vin et al., Multimedia Conferencing in the Etherphone Environment, IEEE Computer, 24(10), October 1991.
  22. L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala, RSVP: A New Resource ReSerVation Protocol, IEEE Network, September 1993.