Re: [codec] #15: Efficiently combine pre-encoded audio

Excellent idea.  Been there, never really did it.  It's complex.
Effectively, you need a distributed adaptive threshold mechanism.

However, if you had it, user experience in multispeaker environments gets a
win.

Brian

On 5/12/10 11:05 AM, "Cullen Jennings" <fluffy@cisco.com> wrote:

> 
> For conference bridges, it's probably more important to be able to decide who
> the active speakers are with low CPU complexity than the actually act of
> mixing the the selected speakers. Consider a typical call with 7 people who
> might be speakers and  the 3 most active are selected and mixed. In many
> systems today, most the MIPS goes to decoding all 7 streams to do speaker
> detection before the resulting 4 streams are formed and encoded. If there was
> a cheap way to figure out who the active speakers were without doing a full
> decode of all 7 streams, that would be sort of nice the for conferences
> bridges. 
> 
> 
> 
> 
> On May 1, 2010, at 8:24 AM, codec issue tracker wrote:
> 
>> #15: Efficiently combine pre-encoded audio
>> ------------------------------------+---------------------------------------
>> Reporter:  hoene@                 |       Owner:
>>     Type:  enhancement             |      Status:  new
>> Priority:  minor                   |   Milestone:
>> Component:  requirements            |     Version:
>> Severity:  Active WG Document      |    Keywords:
>> ------------------------------------+---------------------------------------
>> 
>> Comment(by hoene@):
>> 
>> [Colin:]
>>> [...] conferences implemented using multicast require
>>> end system mixing of potentially large numbers of active audio
>>> streams, whereas those implemented using conference bridges do the
>>> mixing in a single central location, and generally suppress all but
>>> one speaker. The differences in mixing and the number of simultaneous
>>> active streams that might be received potentially affect the design of
>>> the codec.
>> 
>> [Raymond]: I would like to take this opportunity to express my view that
>> although codec complexity isn¹t much of an issue for PC-to-PC calls where
>> there are GHz of processing power available, the codec complexity is an
>> important issue in certain application scenarios.  The following are just
>> some examples.
>> 1) If a conference bridge has to decode a large number of voice channels,
>> mix, and re-encode, and if compressed-domain mixing cannot be done (which
>> is usually the case), then it is important to keep the decoder complexity
>> low.
>> 
>> [JM]: The decoder complexity is very important. Not only because of mixing
>> issue, but also because the decoder is generally not allowed to take
>> shortcuts to save on complexity (unlike the encoder). As for compressed-
>> domain mixing, as you say it is not always available, but *if* we can do
>> it (even if only partially), then that can result in a "free" reduction in
>> decoder complexity for mixing.
>> 
>> [Christian]: Scalable [conferencing]
>> -       Receiver side activity detection for music and voice having low
>> complexity (for the conference bridge)
>> -       Efficient mixing of two to four(?) active flows (is this
>> achievable without the complete process of decoding and encoding again?)
>> 
>> -- 
>> Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/15#comment:1>
>> codec <http://tools.ietf.org/codec/>
>> 
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
> 
> 
> Cullen Jennings
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
> 
> 
> 
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec