Re: [codec] #15: Efficiently combine pre-encoded audio

Cullen Jennings <fluffy@cisco.com> Wed, 12 May 2010 15:19 UTC

Return-Path: <fluffy@cisco.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7217D3A684D for <codec@core3.amsl.com>; Wed, 12 May 2010 08:19:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -108.376
X-Spam-Level:
X-Spam-Status: No, score=-108.376 tagged_above=-999 required=5 tests=[AWL=-0.977, BAYES_50=0.001, J_CHICKENPOX_72=0.6, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H0IUfYa+kWGL for <codec@core3.amsl.com>; Wed, 12 May 2010 08:19:10 -0700 (PDT)
Received: from sj-iport-2.cisco.com (sj-iport-2.cisco.com [171.71.176.71]) by core3.amsl.com (Postfix) with ESMTP id 2E05828C128 for <codec@ietf.org>; Wed, 12 May 2010 08:05:58 -0700 (PDT)
Authentication-Results: sj-iport-2.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEADNh6kurR7Hu/2dsb2JhbACeLXGhP5lsglCCQgSDQA
X-IronPort-AV: E=Sophos;i="4.53,215,1272844800"; d="scan'208";a="255428347"
Received: from sj-core-5.cisco.com ([171.71.177.238]) by sj-iport-2.cisco.com with ESMTP; 12 May 2010 15:05:48 +0000
Received: from [192.168.4.177] (rcdn-fluffy-8711.cisco.com [10.99.9.18]) by sj-core-5.cisco.com (8.13.8/8.14.3) with ESMTP id o4CF5lXR007289; Wed, 12 May 2010 15:05:47 GMT
Mime-Version: 1.0 (Apple Message framework v1078)
Content-Type: text/plain; charset="windows-1252"
Impp: xmpp:cullenfluffyjennings@jabber.org
From: Cullen Jennings <fluffy@cisco.com>
In-Reply-To: <071.30b67e93d22f0bfedf46b5035d133441@tools.ietf.org>
Date: Wed, 12 May 2010 09:05:46 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <1F68067D-33B9-4F0C-B31B-B3A56A72DBA4@cisco.com>
References: <062.bc75a3b3c4a980df34535f87c9484935@tools.ietf.org> <071.30b67e93d22f0bfedf46b5035d133441@tools.ietf.org>
To: codec@ietf.org
X-Mailer: Apple Mail (2.1078)
Subject: Re: [codec] #15: Efficiently combine pre-encoded audio
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 May 2010 15:19:15 -0000

For conference bridges, it's probably more important to be able to decide who the active speakers are with low CPU complexity than the actually act of mixing the the selected speakers. Consider a typical call with 7 people who might be speakers and  the 3 most active are selected and mixed. In many systems today, most the MIPS goes to decoding all 7 streams to do speaker detection before the resulting 4 streams are formed and encoded. If there was a cheap way to figure out who the active speakers were without doing a full decode of all 7 streams, that would be sort of nice the for conferences bridges. 




On May 1, 2010, at 8:24 AM, codec issue tracker wrote:

> #15: Efficiently combine pre-encoded audio
> ------------------------------------+---------------------------------------
> Reporter:  hoene@…                 |       Owner:     
>     Type:  enhancement             |      Status:  new
> Priority:  minor                   |   Milestone:     
> Component:  requirements            |     Version:     
> Severity:  Active WG Document      |    Keywords:     
> ------------------------------------+---------------------------------------
> 
> Comment(by hoene@…):
> 
> [Colin:]
>> [...] conferences implemented using multicast require
>> end system mixing of potentially large numbers of active audio
>> streams, whereas those implemented using conference bridges do the
>> mixing in a single central location, and generally suppress all but
>> one speaker. The differences in mixing and the number of simultaneous
>> active streams that might be received potentially affect the design of
>> the codec.
> 
> [Raymond]: I would like to take this opportunity to express my view that
> although codec complexity isn’t much of an issue for PC-to-PC calls where
> there are GHz of processing power available, the codec complexity is an
> important issue in certain application scenarios.  The following are just
> some examples.
> 1) If a conference bridge has to decode a large number of voice channels,
> mix, and re-encode, and if compressed-domain mixing cannot be done (which
> is usually the case), then it is important to keep the decoder complexity
> low.
> 
> [JM]: The decoder complexity is very important. Not only because of mixing
> issue, but also because the decoder is generally not allowed to take
> shortcuts to save on complexity (unlike the encoder). As for compressed-
> domain mixing, as you say it is not always available, but *if* we can do
> it (even if only partially), then that can result in a "free" reduction in
> decoder complexity for mixing.
> 
> [Christian]: Scalable [conferencing]
> -       Receiver side activity detection for music and voice having low
> complexity (for the conference bridge)
> -       Efficient mixing of two to four(?) active flows (is this
> achievable without the complete process of decoding and encoding again?)
> 
> -- 
> Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/15#comment:1>
> codec <http://tools.ietf.org/codec/>
> 
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec


Cullen Jennings
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html