Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix)
 with ESMTP id 08AB53A6932 for <codec@core3.amsl.com>;
 Wed, 12 May 2010 08:58:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.141
X-Spam-Level: 
X-Spam-Status: No, score=-0.141 tagged_above=-999 required=5 tests=[AWL=-0.743,
 BAYES_50=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_72=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com
 [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gLnjBoOGYW0c for
 <codec@core3.amsl.com>; Wed, 12 May 2010 08:58:51 -0700 (PDT)
Received: from mail-ww0-f44.google.com (mail-ww0-f44.google.com
 [74.125.82.44]) by core3.amsl.com (Postfix) with ESMTP id EE2143A690C for
 <codec@ietf.org>; Wed, 12 May 2010 08:46:03 -0700 (PDT)
Received: by wwb28 with SMTP id 28so108315wwb.31 for <codec@ietf.org>;
 Wed, 12 May 2010 08:45:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
 h=domainkey-signature:mime-version:received:received:in-reply-to
 :references:date:message-id:subject:from:to:cc:content-type;
 bh=Y9wnSe7Ll29KcGNuN6nBqTmg9regpb3bTwcTO0eoa7s=;
 b=n7ZZemVllNMLknCEzZPSQzEe46kXBMJT8DZyFtPAdz214L+9mLYH3aRXC0FcIYdj0H
 R0TxiMZ0KxN4JV5qO9cRqI0vaCiD+s4RayYC7rAtsaHANZt9Fp5z6VqTep0S9SctRldV
 5UsGjdD/C2eyWd7P8NvrTWqC20NZ/4Np9ChY0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 b=inXWKReFtDnO6l5BDuz++L381OYHfn5mR6WzPeRky1oNIh9DNjYQVxwA2dhO5+rhHu
 13iI9rBj8PigD3pz7njHYAuhHK9+nOwjg3jt5Na+fnib6evhWVG+ECpnZZMVzjRs/dg6
 8d2PRnMvwN2s6sq36BxunuTv9GgDJ9KaGWIS8=
MIME-Version: 1.0
Received: by 10.216.87.132 with SMTP id y4mr4217495wee.174.1273679150529;
 Wed,  12 May 2010 08:45:50 -0700 (PDT)
Received: by 10.216.23.5 with HTTP; Wed, 12 May 2010 08:45:50 -0700 (PDT)
In-Reply-To: <C810409D.33E4E%br@brianrosen.net>
References: <1F68067D-33B9-4F0C-B31B-B3A56A72DBA4@cisco.com>
 <C810409D.33E4E%br@brianrosen.net>
Date: Wed, 12 May 2010 11:45:50 -0400
Message-ID: <AANLkTinKR72AgB4cTmrs1X9HCz1LpisHt-toz75wDSCT@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: Brian Rosen <br@brianrosen.net>
Content-Type: multipart/alternative; boundary=0016e6d99ba7533ca904866789ca
Cc: codec@ietf.org
Subject: Re: [codec] #15: Efficiently combine pre-encoded audio
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>,
 <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>,
 <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 May 2010 15:58:53 -0000

--0016e6d99ba7533ca904866789ca
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

The adaptive threshold doesn't have to be distributed, as the conference
bridge is selecting the highest scores.

You do need a consistent way to compute the scores in the endpoints, ideall=
y
using a method which is not simply energy.

I realize the bridge can alternatively generate scores from bitstream
information; I am thinking that is equivalent to including a metric in the
RTP payload.

Stephen Botzko

On Wed, May 12, 2010 at 11:25 AM, Brian Rosen <br@brianrosen.net> wrote:

> Excellent idea.  Been there, never really did it.  It's complex.
> Effectively, you need a distributed adaptive threshold mechanism.
>
> However, if you had it, user experience in multispeaker environments gets=
 a
> win.
>
> Brian
>
>
> On 5/12/10 11:05 AM, "Cullen Jennings" <fluffy@cisco.com> wrote:
>
> >
> > For conference bridges, it's probably more important to be able to deci=
de
> who
> > the active speakers are with low CPU complexity than the actually act o=
f
> > mixing the the selected speakers. Consider a typical call with 7 people
> who
> > might be speakers and  the 3 most active are selected and mixed. In man=
y
> > systems today, most the MIPS goes to decoding all 7 streams to do speak=
er
> > detection before the resulting 4 streams are formed and encoded. If the=
re
> was
> > a cheap way to figure out who the active speakers were without doing a
> full
> > decode of all 7 streams, that would be sort of nice the for conferences
> > bridges.
> >
> >
> >
> >
> > On May 1, 2010, at 8:24 AM, codec issue tracker wrote:
> >
> >> #15: Efficiently combine pre-encoded audio
> >>
> ------------------------------------+------------------------------------=
---
> >> Reporter:  hoene@=C5=A0                 |       Owner:
> >>     Type:  enhancement             |      Status:  new
> >> Priority:  minor                   |   Milestone:
> >> Component:  requirements            |     Version:
> >> Severity:  Active WG Document      |    Keywords:
> >>
> ------------------------------------+------------------------------------=
---
> >>
> >> Comment(by hoene@=C5=A0):
> >>
> >> [Colin:]
> >>> [...] conferences implemented using multicast require
> >>> end system mixing of potentially large numbers of active audio
> >>> streams, whereas those implemented using conference bridges do the
> >>> mixing in a single central location, and generally suppress all but
> >>> one speaker. The differences in mixing and the number of simultaneous
> >>> active streams that might be received potentially affect the design o=
f
> >>> the codec.
> >>
> >> [Raymond]: I would like to take this opportunity to express my view th=
at
> >> although codec complexity isn=C2=B9t much of an issue for PC-to-PC cal=
ls
> where
> >> there are GHz of processing power available, the codec complexity is a=
n
> >> important issue in certain application scenarios.  The following are
> just
> >> some examples.
> >> 1) If a conference bridge has to decode a large number of voice
> channels,
> >> mix, and re-encode, and if compressed-domain mixing cannot be done
> (which
> >> is usually the case), then it is important to keep the decoder
> complexity
> >> low.
> >>
> >> [JM]: The decoder complexity is very important. Not only because of
> mixing
> >> issue, but also because the decoder is generally not allowed to take
> >> shortcuts to save on complexity (unlike the encoder). As for compresse=
d-
> >> domain mixing, as you say it is not always available, but *if* we can =
do
> >> it (even if only partially), then that can result in a "free" reductio=
n
> in
> >> decoder complexity for mixing.
> >>
> >> [Christian]: Scalable [conferencing]
> >> -       Receiver side activity detection for music and voice having lo=
w
> >> complexity (for the conference bridge)
> >> -       Efficient mixing of two to four(?) active flows (is this
> >> achievable without the complete process of decoding and encoding again=
?)
> >>
> >> --
> >> Ticket URL: <
> http://trac.tools.ietf.org/wg/codec/trac/ticket/15#comment:1>
> >> codec <http://tools.ietf.org/codec/>
> >>
> >> _______________________________________________
> >> codec mailing list
> >> codec@ietf.org
> >> https://www.ietf.org/mailman/listinfo/codec
> >
> >
> > Cullen Jennings
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/index.html
> >
> >
> >
> > _______________________________________________
> > codec mailing list
> > codec@ietf.org
> > https://www.ietf.org/mailman/listinfo/codec
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>

--0016e6d99ba7533ca904866789ca
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

The adaptive threshold doesn&#39;t have to be distributed, as the conferenc=
e bridge is selecting the highest scores.=C2=A0 <br><br>You do need a consi=
stent way to compute the scores in the endpoints, ideally using a method wh=
ich is not simply energy.<br>
<br>I realize the bridge can alternatively generate scores from bitstream i=
nformation; I am thinking that is equivalent to including a metric in the R=
TP payload.<br><br>Stephen Botzko<br><br><div class=3D"gmail_quote">On Wed,=
 May 12, 2010 at 11:25 AM, Brian Rosen <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:br@brianrosen.net">br@brianrosen.net</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Excellent idea. =
=C2=A0Been there, never really did it. =C2=A0It&#39;s complex.<br>
Effectively, you need a distributed adaptive threshold mechanism.<br>
<br>
However, if you had it, user experience in multispeaker environments gets a=
<br>
win.<br>
<br>
Brian<br>
<div class=3D"im"><br>
<br>
On 5/12/10 11:05 AM, &quot;Cullen Jennings&quot; &lt;<a href=3D"mailto:fluf=
fy@cisco.com">fluffy@cisco.com</a>&gt; wrote:<br>
<br>
&gt;<br>
&gt; For conference bridges, it&#39;s probably more important to be able to=
 decide who<br>
&gt; the active speakers are with low CPU complexity than the actually act =
of<br>
&gt; mixing the the selected speakers. Consider a typical call with 7 peopl=
e who<br>
&gt; might be speakers and =C2=A0the 3 most active are selected and mixed. =
In many<br>
&gt; systems today, most the MIPS goes to decoding all 7 streams to do spea=
ker<br>
&gt; detection before the resulting 4 streams are formed and encoded. If th=
ere was<br>
&gt; a cheap way to figure out who the active speakers were without doing a=
 full<br>
&gt; decode of all 7 streams, that would be sort of nice the for conference=
s<br>
&gt; bridges.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On May 1, 2010, at 8:24 AM, codec issue tracker wrote:<br>
&gt;<br>
&gt;&gt; #15: Efficiently combine pre-encoded audio<br>
&gt;&gt; ------------------------------------+-----------------------------=
----------<br>
</div>&gt;&gt; Reporter: =C2=A0hoene@=C5=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 Owner:<br>
<div class=3D"im">&gt;&gt; =C2=A0 =C2=A0 Type: =C2=A0enhancement =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0Status: =C2=A0new<br>
&gt;&gt; Priority: =C2=A0minor =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 | =C2=A0 Milestone:<br>
&gt;&gt; Component: =C2=A0requirements =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0| =C2=A0 =C2=A0 Version:<br>
&gt;&gt; Severity: =C2=A0Active WG Document =C2=A0 =C2=A0 =C2=A0| =C2=A0 =
=C2=A0Keywords:<br>
&gt;&gt; ------------------------------------+-----------------------------=
----------<br>
&gt;&gt;<br>
</div>&gt;&gt; Comment(by hoene@=C5=A0):<br>
<div><div></div><div class=3D"h5">&gt;&gt;<br>
&gt;&gt; [Colin:]<br>
&gt;&gt;&gt; [...] conferences implemented using multicast require<br>
&gt;&gt;&gt; end system mixing of potentially large numbers of active audio=
<br>
&gt;&gt;&gt; streams, whereas those implemented using conference bridges do=
 the<br>
&gt;&gt;&gt; mixing in a single central location, and generally suppress al=
l but<br>
&gt;&gt;&gt; one speaker. The differences in mixing and the number of simul=
taneous<br>
&gt;&gt;&gt; active streams that might be received potentially affect the d=
esign of<br>
&gt;&gt;&gt; the codec.<br>
&gt;&gt;<br>
&gt;&gt; [Raymond]: I would like to take this opportunity to express my vie=
w that<br>
&gt;&gt; although codec complexity isn=C2=B9t much of an issue for PC-to-PC=
 calls where<br>
&gt;&gt; there are GHz of processing power available, the codec complexity =
is an<br>
&gt;&gt; important issue in certain application scenarios. =C2=A0The follow=
ing are just<br>
&gt;&gt; some examples.<br>
&gt;&gt; 1) If a conference bridge has to decode a large number of voice ch=
annels,<br>
&gt;&gt; mix, and re-encode, and if compressed-domain mixing cannot be done=
 (which<br>
&gt;&gt; is usually the case), then it is important to keep the decoder com=
plexity<br>
&gt;&gt; low.<br>
&gt;&gt;<br>
&gt;&gt; [JM]: The decoder complexity is very important. Not only because o=
f mixing<br>
&gt;&gt; issue, but also because the decoder is generally not allowed to ta=
ke<br>
&gt;&gt; shortcuts to save on complexity (unlike the encoder). As for compr=
essed-<br>
&gt;&gt; domain mixing, as you say it is not always available, but *if* we =
can do<br>
&gt;&gt; it (even if only partially), then that can result in a &quot;free&=
quot; reduction in<br>
&gt;&gt; decoder complexity for mixing.<br>
&gt;&gt;<br>
&gt;&gt; [Christian]: Scalable [conferencing]<br>
&gt;&gt; - =C2=A0 =C2=A0 =C2=A0 Receiver side activity detection for music =
and voice having low<br>
&gt;&gt; complexity (for the conference bridge)<br>
&gt;&gt; - =C2=A0 =C2=A0 =C2=A0 Efficient mixing of two to four(?) active f=
lows (is this<br>
&gt;&gt; achievable without the complete process of decoding and encoding a=
gain?)<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Ticket URL: &lt;<a href=3D"http://trac.tools.ietf.org/wg/codec/tra=
c/ticket/15#comment:1" target=3D"_blank">http://trac.tools.ietf.org/wg/code=
c/trac/ticket/15#comment:1</a>&gt;<br>
&gt;&gt; codec &lt;<a href=3D"http://tools.ietf.org/codec/" target=3D"_blan=
k">http://tools.ietf.org/codec/</a>&gt;<br>
&gt;&gt;<br>
&gt;&gt; _______________________________________________<br>
&gt;&gt; codec mailing list<br>
&gt;&gt; <a href=3D"mailto:codec@ietf.org">codec@ietf.org</a><br>
&gt;&gt; <a href=3D"https://www.ietf.org/mailman/listinfo/codec" target=3D"=
_blank">https://www.ietf.org/mailman/listinfo/codec</a><br>
&gt;<br>
&gt;<br>
&gt; Cullen Jennings<br>
&gt; For corporate legal information go to:<br>
&gt; <a href=3D"http://www.cisco.com/web/about/doing_business/legal/cri/ind=
ex.html" target=3D"_blank">http://www.cisco.com/web/about/doing_business/le=
gal/cri/index.html</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; codec mailing list<br>
&gt; <a href=3D"mailto:codec@ietf.org">codec@ietf.org</a><br>
&gt; <a href=3D"https://www.ietf.org/mailman/listinfo/codec" target=3D"_bla=
nk">https://www.ietf.org/mailman/listinfo/codec</a><br>
<br>
<br>
_______________________________________________<br>
codec mailing list<br>
<a href=3D"mailto:codec@ietf.org">codec@ietf.org</a><br>
<a href=3D"https://www.ietf.org/mailman/listinfo/codec" target=3D"_blank">h=
ttps://www.ietf.org/mailman/listinfo/codec</a><br>
</div></div></blockquote></div><br>

--0016e6d99ba7533ca904866789ca--
