Re: [codec] #15: Efficiently combine pre-encoded audio

"codec issue tracker" <> Mon, 24 May 2010 14:22 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A471C3A6D87 for <>; Mon, 24 May 2010 07:22:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -101.047
X-Spam-Status: No, score=-101.047 tagged_above=-999 required=5 tests=[AWL=-1.047, BAYES_50=0.001, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id RyfEc3io7Rjp for <>; Mon, 24 May 2010 07:22:20 -0700 (PDT)
Received: from (unknown [IPv6:2001:1890:1112:1::2a]) by (Postfix) with ESMTP id 106B43A6C12 for <>; Mon, 24 May 2010 07:22:20 -0700 (PDT)
Received: from localhost ([::1] by with esmtp (Exim 4.71) (envelope-from <>) id 1OGYXg-0005Z8-Bk; Mon, 24 May 2010 07:22:08 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <>
X-Trac-Version: 0.11.7
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.7, by Edgewall Software
X-Trac-Project: codec
Date: Mon, 24 May 2010 14:22:08 -0000
Message-ID: <>
References: <>
X-Trac-Ticket-ID: 15
In-Reply-To: <>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Subject: Re: [codec] #15: Efficiently combine pre-encoded audio
X-Mailman-Version: 2.1.9
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 24 May 2010 14:22:31 -0000

#15: Efficiently combine pre-encoded audio
 Reporter:  hoene@…                 |       Owner:     
     Type:  enhancement             |      Status:  new
 Priority:  minor                   |   Milestone:     
Component:  requirements            |     Version:     
 Severity:  Active WG Document      |    Keywords:     

Comment(by hoene@…):

 For conference bridges, it's probably more important to be able to decide
 who the active speakers are with low CPU complexity than the actually act
 of mixing the the selected speakers. Consider a typical call with 7 people
 who might be speakers and  the 3 most active are selected and mixed. In
 many systems today, most the MIPS goes to decoding all 7 streams to do
 speaker detection before the resulting 4 streams are formed and encoded.
 If there was a cheap way to figure out who the active speakers were
 without doing a full decode of all 7 streams, that would be sort of nice
 the for conferences bridges.

 [Brian]: Excellent idea.  Been there, never really did it.  It's complex.
 Effectively, you need a distributed adaptive threshold mechanism.
 However, if you had it, user experience in multispeaker environments gets
 a win.

 [Benjamin]:  The cheapest solution, of course, is transmit-side activity
 Maybe we need to specify a way for a receiver to request that the
 transmitter employ (or not employ) VAD.

 I think you can do better than an encoder VAD. All you need to do is make
 sure that the relevant information you need for a VAD can easily be
 decoded from the bit-stream without having to do a full decoding. For
 example, if you're able to easily extract the gain and spectral envelope,
 you can do a VAD based on that without even having to look at the other
 parameters in the bit-stream.

 The adaptive threshold doesn't have to be distributed, as the conference
 bridge is selecting the highest scores.
 You do need a consistent way to compute the scores in the endpoints,
 ideally using a method which is not simply energy.
 I realize the bridge can alternatively generate scores from bitstream
 information; I am thinking that is equivalent to including a metric in the
 RTP payload.

Ticket URL: <>
codec <>