Re: [codec] #16: Multicast?

"Christian Hoene" <> Wed, 21 April 2010 19:29 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id AFD1D3A6B9B for <>; Wed, 21 Apr 2010 12:29:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -5.196
X-Spam-Status: No, score=-5.196 tagged_above=-999 required=5 tests=[AWL=1.053, BAYES_00=-2.599, HELO_EQ_DE=0.35, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id XrUEAOshJxG9 for <>; Wed, 21 Apr 2010 12:29:05 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id AC9683A6AB3 for <>; Wed, 21 Apr 2010 12:27:28 -0700 (PDT)
Received: from hoeneT60 ([]) (authenticated bits=0) by (8.13.6/8.13.6) with ESMTP id o3LJR6vu032235 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 21 Apr 2010 21:27:12 +0200
From: Christian Hoene <>
To: 'stephen botzko' <>
References: <> <> <> <> <> <000001cae173$dba012f0$92e038d0$@de> <> <001101cae177$e8aa6780$b9ff3680$@de> <>
In-Reply-To: <>
Date: Wed, 21 Apr 2010 21:27:03 +0200
Message-ID: <002d01cae188$a330b2c0$e9921840$@de>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_002E_01CAE199.66B982C0"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AcrhfziX4JGoQYrXTDa3JX/8e0fOvAABOaOg
Content-Language: de
X-AntiVirus-Spam-Check: clean (checked by Avira MailGate: version: 3.0.0-4; spam filter version: 3.0.0/2.0; host: mx05)
X-AntiVirus: checked by Avira MailGate (version: 3.0.0-4; AVE:; VDF:; host: mx05); id=20827-yPTMMn
Subject: Re: [codec] #16: Multicast?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 21 Apr 2010 19:29:15 -0000

if we take those two scenarios (high quality and scalable teleconferencing), what are then the CODEC requirements?
High quality:
-          Quite the same requirement as an end-to-end audio transmission: high quality and low latency.
-          Maybe additionally: variable bit rate encoding to achieve a multiplexing gain at the receiver
-          and thus, a fast control loop to cope with variable bitrates on transmission paths.
-          Maybe stereo/multichannel support to send the spatial audio to the headphone or loudspeakers.
-          Efficient encoding/transcoding for multiple different qualities (at the conference bridge)
-          The control loop must not react (fast) because (multicast) group communication requires to encode at low quality anyhow.
-          Receiver side activity detection for music and voice having low complexity (for the conference bridge)
-          Efficient mixing of two to four(?) active flows (is this achievable without the complete process of decoding and encoding
Are any teleconferencing requirements missing?
Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen 
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 
From: stephen botzko [] 
Sent: Wednesday, April 21, 2010 8:19 PM
To: Christian Hoene
Subject: Re: [codec] #16: Multicast?
On Wed, Apr 21, 2010 at 1:27 PM, Christian Hoene <> wrote:
Hi Stephen,
not too bad. You answered faster than the mailing list distributes…
Not sure how that happened! 
Comments inline:
From: stephen botzko [] 
Sent: Wednesday, April 21, 2010 7:10 PM
To: Christian Hoene

Subject: Re: [codec] #16: Multicast?
I agree there are lots of use cases.

Though I don't see why high quality has to be given up in order to be scalable.  
CH: These are just experiences from our lab. A spatial audio conference server including the acoustic 3D sound rendering needs a LOT
of processing power. In the end, we have to remain realistic. Processing power is always limited thus if we need a lot then we
cannot serve many clients.
Also, I am not sure why you think central mixing is more scalable than multicast (or why you think it is lower quality either).
CH: With multicast, you need N times 1:N multicast distribution trees (somewhat small tan O(n)=n²).  With central mixing you need N
times 2 transmission paths (O(n)=n). Also, this distributed mixing you need N times the mixing at each client. With centralized, you
can live with one mixing for all (and some tricks for serving the talkers).
I agree you need more distribution trees for multicast if you allow every site to talk. There is a corresponding benefit, since
there is no central choke point and also less bandwidth on shared WAN links.

In the distributed case,  you don't need an N-way mixer at each client, and you also don't need to continuously receive payload on
all N streams at each client either.  In practice you can cap N at a relatively small number (in the 3-8 range) no matter how large
the conference gets.  In a large conference, you can even choose to drop your comfort noise if you are receiving two or more
streams, and just send enough to keep your firewall pinhole open.  This is all assuming a suitable voice activity measure in the RTP
packet.  Of course in the worst case, you will receive all N streams.

Stephen Botzko 
On Wed, Apr 21, 2010 at 12:58 PM, Christian Hoene < <>> wrote:
the teleconferencing issue gets complex. I am trying to compile the different requirements that have been mentioned on this list.
-          low complexity (with just one active speaker) vs. multiple speaker mixing vs. spatial audio/stereo mixing
-          centralized vs. distributed
-          few participants vs. hundreds of listeners and talkers
-          individual distribution of audio streams vs. IP multicast or RTP group communication
-          efficient encoding of multiple streams having the same content (but different quality).
-           I bet I missed some.
To make things easier, why not to split the teleconferencing scenario in two: High quality and Scalable?
The high quality scenario, intended for a low number of users, could have features like
-          Distributed processing and mixing
-          High computational resources to support spatial audio mixing (at the receiver) and multiple encodings of the same audio
stream at different qualities (at the sender)
-          Enough bandwidth to allow direct N to N transmissions of audio streams (no multicast or group communication). This would
be good for the latency, too.
The scalable scenario is the opposite:
-          Central processing and mixing for many participants .
-          N to 1 and 1 to N communication using efficient distribution mechanisms (RTP group communication and IP multicast).
-          Low complexity mixing of many using tricks like VAD, encoding at lowest rate to support many receivers having different
paths, you name it...
Then, we need not to compare apples with oranges all the time.
Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen 
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 
From: [] On Behalf Of stephen botzko
Sent: Wednesday, April 21, 2010 4:34 PM
To: Colin Perkins
Subject: Re: [codec] #16: Multicast?

Stephen Botzko
On Wed, Apr 21, 2010 at 8:17 AM, Colin Perkins <> wrote:
On 21 Apr 2010, at 12:20, Marshall Eubanks wrote:
On Apr 21, 2010, at 6:48 AM, Colin Perkins wrote:
On 21 Apr 2010, at 10:42, codec issue tracker wrote:
#16: Multicast?
Reporter:  hoene@…                 |       Owner:
 Type:  enhancement             |      Status:  new
Priority:  trivial                 |   Milestone:
Component:  requirements            |     Version:
Severity:  Active WG Document      |    Keywords:
The question arrose whether the interactive CODEC MUST support multicast in addition to teleconferencing.

On 04/13/2010 11:35 AM, Christian Hoene wrote:
P.S. On the same note, does anybody here cares about using this CODEC with multicast? Is there a single commercial multicast voice
deployment? From what I've seen all multicast does is making IETF voice standards harder to understand or implement.

I think that would be a mistake to ignore multicast - not because of multicast itself, but because of Xcast (RFC 5058) which is a
promising technology to replace centralized conference bridges.

Regarding multicast:

I think we shall start at user requirements and scenarios. Teleconference (including mono or spatial audio) might be good starting
point. Virtual environments like second live would require multicast communication, too. If the requirements of these scenarios are
well understand, we can start to talk about potential solutions like IP multicast, Xcast or conference bridges.

RTP is inherently a group communication protocol, and any codec designed for use with RTP should consider operation in various
different types of group communication scenario (not just multicast). RFC 5117 is a good place to start when considering the
different types of topology in which RTP is used, and the possible placement of mixing and switching functions which the codec will
need to work with.

It is not clear to me what supporting multicast would entail here. If this is a codec over RTP, then what is to stop it from being
multicast ?
Nothing. However group conferences implemented using multicast require end system mixing of potentially large numbers of active
audio streams, whereas those implemented using conference bridges do the mixing in a single central location, and generally suppress
all but one speaker. The differences in mixing and the number of simultaneous active streams that might be received potentially
affect the design of the codec.

Conference bridges with central mixing almost always mix multiple speakers.  As you add more streams into the mix, you reduce the
chance of missing onset speech and interruptions, but raise the noise floor. So even if complexity is not a consideration, there is
value in gating the mixer (instead of always doing a full mix-minus).

More on point, compressed domain mixing and easy detection of VAD have both been advocated on these lists, and both simplify the
large-scale mixing problem.

Colin Perkins

codec mailing list