Re: [codec] #3: 2.2. Conferencing: Support of binaural audio?

"Slava Borilin" <Borilin@spiritdsp.com> Tue, 23 March 2010 21:03 UTC

Return-Path: <Borilin@spiritdsp.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8EE293A6B9A for <codec@core3.amsl.com>; Tue, 23 Mar 2010 14:03:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.356
X-Spam-Level: **
X-Spam-Status: No, score=2.356 tagged_above=-999 required=5 tests=[AWL=-1.225, BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, MIME_CHARSET_FARAWAY=2.45]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QjMCyVvLdbbH for <codec@core3.amsl.com>; Tue, 23 Mar 2010 14:03:12 -0700 (PDT)
Received: from mail3.spiritcorp.com (mail3.spiritcorp.com [85.13.194.167]) by core3.amsl.com (Postfix) with ESMTP id 6483B3A6B5C for <codec@ietf.org>; Tue, 23 Mar 2010 14:03:08 -0700 (PDT)
Received: from mail-srv.spiritcorp.com (mail-srv.spiritcorp.com [192.168.125.3]) by mail3.spiritcorp.com (8.13.8/8.14.3) with ESMTP id o2NL3OHw037404; Wed, 24 Mar 2010 00:03:25 +0300 (MSK) (envelope-from Borilin@spiritdsp.com)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
X-MimeOLE: Produced By Microsoft Exchange V6.5
Date: Wed, 24 Mar 2010 00:02:22 +0300
Message-ID: <5A3D7E7076F5DF42990A8C164308F810547742@mail-srv.spiritcorp.com>
In-Reply-To: <BCB3F026FAC4C145A4A3330806FEFDA93A5C458BE7@EMBX01-HQ.jnpr.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [codec] #3: 2.2. Conferencing: Support of binaural audio?
Thread-Index: AcrKrvuIACd3YdARQgWZAA5s7Y3ufgAAmG9wAAA/4OAAAbpg/gAEkwnA
References: <062.a837f2ff7647f7cb184f0c86b7e65747@tools.ietf.org><5A3D7E7076F5DF42990A8C164308F810547717@mail-srv.spiritcorp.com>, <003001cacab3$4e9ded90$ebd9c8b0$@de> <BCB3F026FAC4C145A4A3330806FEFDA93A5C458BE7@EMBX01-HQ.jnpr.net>
From: Slava Borilin <Borilin@spiritdsp.com>
To: Gregory Maxwell <gmaxwell@juniper.net>, Christian Hoene <hoene@uni-tuebingen.de>, codec@ietf.org
X-Scanned-By: MIMEDefang 2.67 on 192.168.125.15
Subject: Re: [codec] #3: 2.2. Conferencing: Support of binaural audio?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Mar 2010 21:03:13 -0000

Sure.
	
I see that web-conferencing applications are different - as typical business case (take for Webex as example) is that provider charge only ONE user (host) for the conference, but then you have to mix 10 people (participants) in that conference, consuming "10" resources and getting only 1 money.

System for 1000 paid accounts like webex requires having 10 000 ports in the conferencing server.

Therefore, mixing (or muxing) efficiency of conferencing servers are nowadays MORE important then before.

So webconfrencing requires much better scaling of the conferencing that old "all-per-minute" telco models.
This is important both for audio and video.

For video finally we have 264SVC, and for audio we (again) need layered codec.

с уважением,
Вячеслав Борилин


-----Original Message-----
From: Gregory Maxwell [mailto:gmaxwell@juniper.net] 
Sent: Tuesday, March 23, 2010 9:59 PM
To: Christian Hoene; Slava Borilin; codec@ietf.org
Subject: RE: [codec] #3: 2.2. Conferencing: Support of binaural audio?

Christian Hoene [hoene@uni-tuebingen.de] wrote:
> (a) does not have any impact on the requirements but in case of (b) codec requirements are the support of stereo speech transmission and support for efficient mixing.

I agree that (b) requires support for stereo. 

Since you're coming from mono you're going to do the panning/auralization to  virtually position participants at different locations.   Physiologically/acoustically correct positioning is likely going to defeat any mixing short-cuts that a codec provides.     E.g. with CELT (and presumably other codecs)  you can use the independent frame mechanisms  to eliminate or minimize  conferencing server computation when hard switching between sources,  but you can't do this if you're converting a mono stream to positioned stereo.  So I don't think that it requires any more than support for stereo. 

I suppose we could ask that codecs support a mode where a mono stream is sent with a positioning wrapper, instead of using stereo over the link... But for that case, I think that could be provided equally or better by another protocol or wrapper also running on RTP in a codec independent manner.

Of course, minimizing the codec computational burden is helpful for scaling conferencing systems.

Can anyone share some typical conference scaling numbers which they consider interesting?    It's been my view that all comers are fast enough that conferencing on commercial scale isn't much of an issue:  That, yes, it might take some serious processing grunt to handle 10,000 users on a conferencing system, but if you're running at that kind of scale you could afford the required hardware.   I've not see any numbers to suggest that this isn't the case.