Re: [codec] #16: Multicast?

Hi Stephen,

not too bad. You answered faster than the mailing list distributes

Comments inline:

From: stephen botzko [mailto:stephen.botzko@gmail.com] 
Sent: Wednesday, April 21, 2010 7:10 PM
To: Christian Hoene
Cc: codec@ietf.org
Subject: Re: [codec] #16: Multicast?

I agree there are lots of use cases.

Though I don't see why high quality has to be given up in order to be scalable.  
CH: These are just experiences from our lab. A spatial audio conference server including the acoustic 3D sound rendering needs a LOT
of processing power. In the end, we have to remain realistic. Processing power is always limited thus if we need a lot then we
cannot serve many clients.
Also, I am not sure why you think central mixing is more scalable than multicast (or why you think it is lower quality either).
CH: With multicast, you need N times 1:N multicast distribution trees (somewhat small tan O(n)=n²).  With central mixing you need N
times 2 transmission paths (O(n)=n). Also, this distributed mixing you need N times the mixing at each client. With centralized, you
can live with one mixing for all (and some tricks for serving the talkers).
Cheers,
 Christian

Stephen Botzko 
On Wed, Apr 21, 2010 at 12:58 PM, Christian Hoene < <mailto:hoene@uni-tuebingen.de> hoene@uni-tuebingen.de> wrote:
Hi, 

the teleconferencing issue gets complex. I am trying to compile the different requirements that have been mentioned on this list.
-          low complexity (with just one active speaker) vs. multiple speaker mixing vs. spatial audio/stereo mixing
-          centralized vs. distributed
-          few participants vs. hundreds of listeners and talkers
-          individual distribution of audio streams vs. IP multicast or RTP group communication
-          efficient encoding of multiple streams having the same content (but different quality).
-           I bet I missed some.

To make things easier, why not to split the teleconferencing scenario in two: High quality and Scalable?

The high quality scenario, intended for a low number of users, could have features like
-          Distributed processing and mixing
-          High computational resources to support spatial audio mixing (at the receiver) and multiple encodings of the same audio
stream at different qualities (at the sender)
-          Enough bandwidth to allow direct N to N transmissions of audio streams (no multicast or group communication). This would
be good for the latency, too.

The scalable scenario is the opposite:
-          Central processing and mixing for many participants .
-          N to 1 and 1 to N communication using efficient distribution mechanisms (RTP group communication and IP multicast).
-          Low complexity mixing of many using tricks like VAD, encoding at lowest rate to support many receivers having different
paths, you name it...

Then, we need not to compare apples with oranges all the time.

Christian

---------------------------------------------------------------
Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen 
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 
 <http://www.net.uni-tuebingen.de/> http://www.net.uni-tuebingen.de/

From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf Of stephen botzko
Sent: Wednesday, April 21, 2010 4:34 PM
To: Colin Perkins
Cc: trac@tools.ietf.org; codec@ietf.org
Subject: Re: [codec] #16: Multicast?

in-line

Stephen Botzko
On Wed, Apr 21, 2010 at 8:17 AM, Colin Perkins <csp@csperkins.org> wrote:
On 21 Apr 2010, at 12:20, Marshall Eubanks wrote:
On Apr 21, 2010, at 6:48 AM, Colin Perkins wrote:
On 21 Apr 2010, at 10:42, codec issue tracker wrote:
#16: Multicast?
------------------------------------+----------------------------------
Reporter:  hoene@                 |       Owner:
 Type:  enhancement             |      Status:  new
Priority:  trivial                 |   Milestone:
Component:  requirements            |     Version:
Severity:  Active WG Document      |    Keywords:
------------------------------------+----------------------------------
The question arrose whether the interactive CODEC MUST support multicast in addition to teleconferencing.

On 04/13/2010 11:35 AM, Christian Hoene wrote:
P.S. On the same note, does anybody here cares about using this CODEC with multicast? Is there a single commercial multicast voice
deployment? From what I've seen all multicast does is making IETF voice standards harder to understand or implement.

I think that would be a mistake to ignore multicast - not because of multicast itself, but because of Xcast (RFC 5058) which is a
promising technology to replace centralized conference bridges.

Regarding multicast:

I think we shall start at user requirements and scenarios. Teleconference (including mono or spatial audio) might be good starting
point. Virtual environments like second live would require multicast communication, too. If the requirements of these scenarios are
well understand, we can start to talk about potential solutions like IP multicast, Xcast or conference bridges.

RTP is inherently a group communication protocol, and any codec designed for use with RTP should consider operation in various
different types of group communication scenario (not just multicast). RFC 5117 is a good place to start when considering the
different types of topology in which RTP is used, and the possible placement of mixing and switching functions which the codec will
need to work with.

It is not clear to me what supporting multicast would entail here. If this is a codec over RTP, then what is to stop it from being
multicast ?

Nothing. However group conferences implemented using multicast require end system mixing of potentially large numbers of active
audio streams, whereas those implemented using conference bridges do the mixing in a single central location, and generally suppress
all but one speaker. The differences in mixing and the number of simultaneous active streams that might be received potentially
affect the design of the codec.

Conference bridges with central mixing almost always mix multiple speakers.  As you add more streams into the mix, you reduce the
chance of missing onset speech and interruptions, but raise the noise floor. So even if complexity is not a consideration, there is
value in gating the mixer (instead of always doing a full mix-minus).

More on point, compressed domain mixing and easy detection of VAD have both been advocated on these lists, and both simplify the
large-scale mixing problem.

-- 
Colin Perkins
http://csperkins.org/

_______________________________________________
codec mailing list
codec@ietf.org
https://www.ietf.org/mailman/listinfo/codec