Re: [codec] #15: Efficiently combine pre-encoded audio

stephen botzko <stephen.botzko@gmail.com> Wed, 12 May 2010 15:58 UTC

Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 08AB53A6932 for <codec@core3.amsl.com>; Wed, 12 May 2010 08:58:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.141
X-Spam-Level:
X-Spam-Status: No, score=-0.141 tagged_above=-999 required=5 tests=[AWL=-0.743, BAYES_50=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_72=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gLnjBoOGYW0c for <codec@core3.amsl.com>; Wed, 12 May 2010 08:58:51 -0700 (PDT)
Received: from mail-ww0-f44.google.com (mail-ww0-f44.google.com [74.125.82.44]) by core3.amsl.com (Postfix) with ESMTP id EE2143A690C for <codec@ietf.org>; Wed, 12 May 2010 08:46:03 -0700 (PDT)
Received: by wwb28 with SMTP id 28so108315wwb.31 for <codec@ietf.org>; Wed, 12 May 2010 08:45:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=Y9wnSe7Ll29KcGNuN6nBqTmg9regpb3bTwcTO0eoa7s=; b=n7ZZemVllNMLknCEzZPSQzEe46kXBMJT8DZyFtPAdz214L+9mLYH3aRXC0FcIYdj0H R0TxiMZ0KxN4JV5qO9cRqI0vaCiD+s4RayYC7rAtsaHANZt9Fp5z6VqTep0S9SctRldV 5UsGjdD/C2eyWd7P8NvrTWqC20NZ/4Np9ChY0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=inXWKReFtDnO6l5BDuz++L381OYHfn5mR6WzPeRky1oNIh9DNjYQVxwA2dhO5+rhHu 13iI9rBj8PigD3pz7njHYAuhHK9+nOwjg3jt5Na+fnib6evhWVG+ECpnZZMVzjRs/dg6 8d2PRnMvwN2s6sq36BxunuTv9GgDJ9KaGWIS8=
MIME-Version: 1.0
Received: by 10.216.87.132 with SMTP id y4mr4217495wee.174.1273679150529; Wed, 12 May 2010 08:45:50 -0700 (PDT)
Received: by 10.216.23.5 with HTTP; Wed, 12 May 2010 08:45:50 -0700 (PDT)
In-Reply-To: <C810409D.33E4E%br@brianrosen.net>
References: <1F68067D-33B9-4F0C-B31B-B3A56A72DBA4@cisco.com> <C810409D.33E4E%br@brianrosen.net>
Date: Wed, 12 May 2010 11:45:50 -0400
Message-ID: <AANLkTinKR72AgB4cTmrs1X9HCz1LpisHt-toz75wDSCT@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: Brian Rosen <br@brianrosen.net>
Content-Type: multipart/alternative; boundary="0016e6d99ba7533ca904866789ca"
Cc: codec@ietf.org
Subject: Re: [codec] #15: Efficiently combine pre-encoded audio
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 May 2010 15:58:53 -0000

The adaptive threshold doesn't have to be distributed, as the conference
bridge is selecting the highest scores.

You do need a consistent way to compute the scores in the endpoints, ideally
using a method which is not simply energy.

I realize the bridge can alternatively generate scores from bitstream
information; I am thinking that is equivalent to including a metric in the
RTP payload.

Stephen Botzko

On Wed, May 12, 2010 at 11:25 AM, Brian Rosen <br@brianrosen.net> wrote:

> Excellent idea.  Been there, never really did it.  It's complex.
> Effectively, you need a distributed adaptive threshold mechanism.
>
> However, if you had it, user experience in multispeaker environments gets a
> win.
>
> Brian
>
>
> On 5/12/10 11:05 AM, "Cullen Jennings" <fluffy@cisco.com> wrote:
>
> >
> > For conference bridges, it's probably more important to be able to decide
> who
> > the active speakers are with low CPU complexity than the actually act of
> > mixing the the selected speakers. Consider a typical call with 7 people
> who
> > might be speakers and  the 3 most active are selected and mixed. In many
> > systems today, most the MIPS goes to decoding all 7 streams to do speaker
> > detection before the resulting 4 streams are formed and encoded. If there
> was
> > a cheap way to figure out who the active speakers were without doing a
> full
> > decode of all 7 streams, that would be sort of nice the for conferences
> > bridges.
> >
> >
> >
> >
> > On May 1, 2010, at 8:24 AM, codec issue tracker wrote:
> >
> >> #15: Efficiently combine pre-encoded audio
> >>
> ------------------------------------+---------------------------------------
> >> Reporter:  hoene@Š                 |       Owner:
> >>     Type:  enhancement             |      Status:  new
> >> Priority:  minor                   |   Milestone:
> >> Component:  requirements            |     Version:
> >> Severity:  Active WG Document      |    Keywords:
> >>
> ------------------------------------+---------------------------------------
> >>
> >> Comment(by hoene@Š):
> >>
> >> [Colin:]
> >>> [...] conferences implemented using multicast require
> >>> end system mixing of potentially large numbers of active audio
> >>> streams, whereas those implemented using conference bridges do the
> >>> mixing in a single central location, and generally suppress all but
> >>> one speaker. The differences in mixing and the number of simultaneous
> >>> active streams that might be received potentially affect the design of
> >>> the codec.
> >>
> >> [Raymond]: I would like to take this opportunity to express my view that
> >> although codec complexity isn¹t much of an issue for PC-to-PC calls
> where
> >> there are GHz of processing power available, the codec complexity is an
> >> important issue in certain application scenarios.  The following are
> just
> >> some examples.
> >> 1) If a conference bridge has to decode a large number of voice
> channels,
> >> mix, and re-encode, and if compressed-domain mixing cannot be done
> (which
> >> is usually the case), then it is important to keep the decoder
> complexity
> >> low.
> >>
> >> [JM]: The decoder complexity is very important. Not only because of
> mixing
> >> issue, but also because the decoder is generally not allowed to take
> >> shortcuts to save on complexity (unlike the encoder). As for compressed-
> >> domain mixing, as you say it is not always available, but *if* we can do
> >> it (even if only partially), then that can result in a "free" reduction
> in
> >> decoder complexity for mixing.
> >>
> >> [Christian]: Scalable [conferencing]
> >> -       Receiver side activity detection for music and voice having low
> >> complexity (for the conference bridge)
> >> -       Efficient mixing of two to four(?) active flows (is this
> >> achievable without the complete process of decoding and encoding again?)
> >>
> >> --
> >> Ticket URL: <
> http://trac.tools.ietf.org/wg/codec/trac/ticket/15#comment:1>
> >> codec <http://tools.ietf.org/codec/>
> >>
> >> _______________________________________________
> >> codec mailing list
> >> codec@ietf.org
> >> https://www.ietf.org/mailman/listinfo/codec
> >
> >
> > Cullen Jennings
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/index.html
> >
> >
> >
> > _______________________________________________
> > codec mailing list
> > codec@ietf.org
> > https://www.ietf.org/mailman/listinfo/codec
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>