Re: [codec] #15: Efficiently combine pre-encoded audio
Roman Shpount <roman@telurix.com> Mon, 24 May 2010 17:28 UTC
Return-Path: <roman@telurix.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CBB593A6C64 for <codec@core3.amsl.com>; Mon, 24 May 2010 10:28:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.069
X-Spam-Level:
X-Spam-Status: No, score=0.069 tagged_above=-999 required=5 tests=[AWL=-1.154, BAYES_50=0.001, FM_FORGED_GMAIL=0.622, J_CHICKENPOX_72=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qnR0QnyOtFSu for <codec@core3.amsl.com>; Mon, 24 May 2010 10:28:25 -0700 (PDT)
Received: from mail-vw0-f44.google.com (mail-vw0-f44.google.com [209.85.212.44]) by core3.amsl.com (Postfix) with ESMTP id C759B3A6B47 for <codec@ietf.org>; Mon, 24 May 2010 10:28:24 -0700 (PDT)
Received: by vws14 with SMTP id 14so503029vws.31 for <codec@ietf.org>; Mon, 24 May 2010 10:28:11 -0700 (PDT)
Received: by 10.220.62.75 with SMTP id w11mr3941989vch.273.1274722090386; Mon, 24 May 2010 10:28:10 -0700 (PDT)
Received: from mail-qy0-f181.google.com (mail-qy0-f181.google.com [209.85.221.181]) by mx.google.com with ESMTPS id w29sm19917282vcr.2.2010.05.24.10.28.07 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 24 May 2010 10:28:08 -0700 (PDT)
Received: by qyk11 with SMTP id 11so6100876qyk.13 for <codec@ietf.org>; Mon, 24 May 2010 10:28:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.224.17.196 with SMTP id t4mr3207111qaa.254.1274722085608; Mon, 24 May 2010 10:28:05 -0700 (PDT)
Received: by 10.150.186.7 with HTTP; Mon, 24 May 2010 10:28:05 -0700 (PDT)
In-Reply-To: <071.ec17032c829ac2dce0dbc4374c8280b6@tools.ietf.org>
References: <062.bc75a3b3c4a980df34535f87c9484935@tools.ietf.org> <071.ec17032c829ac2dce0dbc4374c8280b6@tools.ietf.org>
Date: Mon, 24 May 2010 13:28:05 -0400
Message-ID: <AANLkTilB_aGCyZPMfJdBONk3LKaip_5sAIRFHAeILhee@mail.gmail.com>
From: Roman Shpount <roman@telurix.com>
To: codec@ietf.org
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [codec] #15: Efficiently combine pre-encoded audio
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 May 2010 17:28:25 -0000
I would like to remind that this issue is not about efficient VAD, but about efficiently combining pre-encoded streams. There are two use cases that I can think of: a. Conference servers, where the active speaker was determined by either receiver or decoder side VAD. As it was mentioned before, it is possible to implement an efficient decoder side VAD without implementing a complete decoder. If we can combine pre-encoded audio we can combine multiple streams on the conference server without going through decoder/encoder cycle greatly decreasing both CPU requirements and mixer delay. b. Announcement and IVR servers, where a small set of pre-encoded announcements are played to the user. Standard network or IVR announcements can be encoded once and efficiently inserted or combined into audio stream. If pre-encoded audio is supported and the client supports AVT tones, it is trivial to develop a very efficient IVR server which does not require any CODEC encoding or decoding. _____________ Roman Shpount On Mon, May 24, 2010 at 10:22 AM, codec issue tracker <trac@tools.ietf.org> wrote: > #15: Efficiently combine pre-encoded audio > ------------------------------------+--------------------------------------- > Reporter: hoene@… | Owner: > Type: enhancement | Status: new > Priority: minor | Milestone: > Component: requirements | Version: > Severity: Active WG Document | Keywords: > ------------------------------------+--------------------------------------- > > Comment(by hoene@…): > > [Cullen]: > For conference bridges, it's probably more important to be able to decide > who the active speakers are with low CPU complexity than the actually act > of mixing the the selected speakers. Consider a typical call with 7 people > who might be speakers and the 3 most active are selected and mixed. In > many systems today, most the MIPS goes to decoding all 7 streams to do > speaker detection before the resulting 4 streams are formed and encoded. > If there was a cheap way to figure out who the active speakers were > without doing a full decode of all 7 streams, that would be sort of nice > the for conferences bridges. > > [Brian]: Excellent idea. Been there, never really did it. It's complex. > Effectively, you need a distributed adaptive threshold mechanism. > However, if you had it, user experience in multispeaker environments gets > a win. > > [Benjamin]: The cheapest solution, of course, is transmit-side activity > detection. > Maybe we need to specify a way for a receiver to request that the > transmitter employ (or not employ) VAD. > > [JM]: > I think you can do better than an encoder VAD. All you need to do is make > sure that the relevant information you need for a VAD can easily be > decoded from the bit-stream without having to do a full decoding. For > example, if you're able to easily extract the gain and spectral envelope, > you can do a VAD based on that without even having to look at the other > parameters in the bit-stream. > > [Brian]: > The adaptive threshold doesn't have to be distributed, as the conference > bridge is selecting the highest scores. > You do need a consistent way to compute the scores in the endpoints, > ideally using a method which is not simply energy. > I realize the bridge can alternatively generate scores from bitstream > information; I am thinking that is equivalent to including a metric in the > RTP payload. > > -- > Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/15#comment:2> > codec <http://tools.ietf.org/codec/> > > _______________________________________________ > codec mailing list > codec@ietf.org > https://www.ietf.org/mailman/listinfo/codec >
- [codec] #15: Efficiently combine pre-encoded audio codec issue tracker
- Re: [codec] #15: Efficiently combine pre-encoded … stephen botzko
- Re: [codec] #15: Efficiently combine pre-encoded … Stephan Wenger
- Re: [codec] #15: Efficiently combine pre-encoded … codec issue tracker
- Re: [codec] #15: Efficiently combine pre-encoded … stephen botzko
- Re: [codec] #15: Efficiently combine pre-encoded … Jean-Marc Valin
- Re: [codec] #15: Efficiently combine pre-encoded … Cullen Jennings
- Re: [codec] #15: Efficiently combine pre-encoded … Brian Rosen
- Re: [codec] #15: Efficiently combine pre-encoded … Benjamin M. Schwartz
- Re: [codec] #15: Efficiently combine pre-encoded … Jean-Marc Valin
- Re: [codec] #15: Efficiently combine pre-encoded … stephen botzko
- Re: [codec] #15: Efficiently combine pre-encoded … Benjamin M. Schwartz
- Re: [codec] #15: Efficiently combine pre-encoded … Benjamin M. Schwartz
- Re: [codec] #15: Efficiently combine pre-encoded … Jean-Marc Valin
- Re: [codec] #15: Efficiently combine pre-encoded … Benjamin M. Schwartz
- Re: [codec] #15: Efficiently combine pre-encoded … Roman Shpount
- Re: [codec] #15: Efficiently combine pre-encoded … codec issue tracker
- Re: [codec] #15: Efficiently combine pre-encoded … Roman Shpount
- Re: [codec] #15: Efficiently combine pre-encoded … codec issue tracker
- Re: [codec] #15: Efficiently combine pre-encoded … Christian Hoene
- Re: [codec] #15: Efficiently combine pre-encoded … codec issue tracker