Re: [codec] #16: Multicast?

"Christian Hoene" <> Wed, 21 April 2010 17:12 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9E6843A6A9E for <>; Wed, 21 Apr 2010 10:12:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.494
X-Spam-Status: No, score=-4.494 tagged_above=-999 required=5 tests=[AWL=1.754, BAYES_00=-2.599, HELO_EQ_DE=0.35, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Jfat4GCQePRk for <>; Wed, 21 Apr 2010 10:11:49 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id DD2C828C117 for <>; Wed, 21 Apr 2010 09:58:41 -0700 (PDT)
Received: from hoeneT60 ([]) (authenticated bits=0) by (8.13.6/8.13.6) with ESMTP id o3LGwKYq016710 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 21 Apr 2010 18:58:26 +0200
From: Christian Hoene <>
To: 'stephen botzko' <>
References: <> <> <> <> <>
In-Reply-To: <>
Date: Wed, 21 Apr 2010 18:58:19 +0200
Message-ID: <000001cae173$dba012f0$92e038d0$@de>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0001_01CAE184.9F28E2F0"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AcrhYw2eWK/SpMeuSG2Wjx0gWq/NugAC+5GQ
Content-Language: de
X-AntiVirus-Spam-Check: clean (checked by Avira MailGate: version: 3.0.0-4; spam filter version: 3.0.0/2.0; host: mx05)
X-AntiVirus: checked by Avira MailGate (version: 3.0.0-4; AVE:; VDF:; host: mx05); id=20827-qI3ox1
Subject: Re: [codec] #16: Multicast?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 21 Apr 2010 17:12:01 -0000

the teleconferencing issue gets complex. I am trying to compile the different requirements that have been mentioned on this list.
-          low complexity (with just one active speaker) vs. multiple speaker mixing vs. spatial audio/stereo mixing
-          centralized vs. distributed
-          few participants vs. hundreds of listeners and talkers
-          individual distribution of audio streams vs. IP multicast or RTP group communication
-          efficient encoding of multiple streams having the same content (but different quality).
-           I bet I missed some.
To make things easier, why not to split the teleconferencing scenario in two: High quality and Scalable?
The high quality scenario, intended for a low number of users, could have features like
-          Distributed processing and mixing
-          High computational resources to support spatial audio mixing (at the receiver) and multiple encodings of the same audio
stream at different qualities (at the sender)
-          Enough bandwidth to allow direct N to N transmissions of audio streams (no multicast or group communication). This would
be good for the latency, too.
The scalable scenario is the opposite:
-          Central processing and mixing for many participants .
-          N to 1 and 1 to N communication using efficient distribution mechanisms (RTP group communication and IP multicast).
-          Low complexity mixing of many using tricks like VAD, encoding at lowest rate to support many receivers having different
paths, you name it...
Then, we need not to compare apples with oranges all the time.
Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen 
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 
From: [] On Behalf Of stephen botzko
Sent: Wednesday, April 21, 2010 4:34 PM
To: Colin Perkins
Subject: Re: [codec] #16: Multicast?

Stephen Botzko
On Wed, Apr 21, 2010 at 8:17 AM, Colin Perkins <> wrote:
On 21 Apr 2010, at 12:20, Marshall Eubanks wrote:
On Apr 21, 2010, at 6:48 AM, Colin Perkins wrote:
On 21 Apr 2010, at 10:42, codec issue tracker wrote:
#16: Multicast?
Reporter:  hoene@…                 |       Owner:
 Type:  enhancement             |      Status:  new
Priority:  trivial                 |   Milestone:
Component:  requirements            |     Version:
Severity:  Active WG Document      |    Keywords:
The question arrose whether the interactive CODEC MUST support multicast in addition to teleconferencing.

On 04/13/2010 11:35 AM, Christian Hoene wrote:
P.S. On the same note, does anybody here cares about using this CODEC with multicast? Is there a single commercial multicast voice
deployment? From what I've seen all multicast does is making IETF voice standards harder to understand or implement.

I think that would be a mistake to ignore multicast - not because of multicast itself, but because of Xcast (RFC 5058) which is a
promising technology to replace centralized conference bridges.

Regarding multicast:

I think we shall start at user requirements and scenarios. Teleconference (including mono or spatial audio) might be good starting
point. Virtual environments like second live would require multicast communication, too. If the requirements of these scenarios are
well understand, we can start to talk about potential solutions like IP multicast, Xcast or conference bridges.

RTP is inherently a group communication protocol, and any codec designed for use with RTP should consider operation in various
different types of group communication scenario (not just multicast). RFC 5117 is a good place to start when considering the
different types of topology in which RTP is used, and the possible placement of mixing and switching functions which the codec will
need to work with.

It is not clear to me what supporting multicast would entail here. If this is a codec over RTP, then what is to stop it from being
multicast ?
Nothing. However group conferences implemented using multicast require end system mixing of potentially large numbers of active
audio streams, whereas those implemented using conference bridges do the mixing in a single central location, and generally suppress
all but one speaker. The differences in mixing and the number of simultaneous active streams that might be received potentially
affect the design of the codec.

Conference bridges with central mixing almost always mix multiple speakers.  As you add more streams into the mix, you reduce the
chance of missing onset speech and interruptions, but raise the noise floor. So even if complexity is not a consideration, there is
value in gating the mixer (instead of always doing a full mix-minus).

More on point, compressed domain mixing and easy detection of VAD have both been advocated on these lists, and both simplify the
large-scale mixing problem.

Colin Perkins

codec mailing list