Re: [codec] #3: 2.2. Conferencing: Support of binaural audio?

Gregory Maxwell <> Tue, 23 March 2010 19:01 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 877553A6B94 for <>; Tue, 23 Mar 2010 12:01:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.869
X-Spam-Status: No, score=-2.869 tagged_above=-999 required=5 tests=[BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id UIcPjZbDdZjy for <>; Tue, 23 Mar 2010 12:01:40 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id A9FD13A6CA9 for <>; Tue, 23 Mar 2010 12:01:15 -0700 (PDT)
Received: from source ([]) (using TLSv1) by ([]) with SMTP ID DSNKS6kQBrD/; Tue, 23 Mar 2010 12:01:35 PDT
Received: from ([fe80::c821:7c81:f21f:8bc7]) by ([fe80::88f9:77fd:dfc:4d51%11]) with mapi; Tue, 23 Mar 2010 11:59:24 -0700
From: Gregory Maxwell <>
To: Christian Hoene <>, 'Slava Borilin' <>, "" <>
Date: Tue, 23 Mar 2010 11:59:24 -0700
Thread-Topic: [codec] #3: 2.2. Conferencing: Support of binaural audio?
Thread-Index: AcrKrvuIACd3YdARQgWZAA5s7Y3ufgAAmG9wAAA/4OAAAbpg/g==
Message-ID: <>
References: <> <>, <003001cacab3$4e9ded90$ebd9c8b0$@de>
In-Reply-To: <003001cacab3$4e9ded90$ebd9c8b0$@de>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [codec] #3: 2.2. Conferencing: Support of binaural audio?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 23 Mar 2010 19:01:41 -0000

Christian Hoene [] wrote:
> (a) does not have any impact on the requirements but in case of (b) codec requirements are the support of stereo speech transmission and support for efficient mixing.

I agree that (b) requires support for stereo. 

Since you're coming from mono you're going to do the panning/auralization to  virtually position participants at different locations.   Physiologically/acoustically correct positioning is likely going to defeat any mixing short-cuts that a codec provides.     E.g. with CELT (and presumably other codecs)  you can use the independent frame mechanisms  to eliminate or minimize  conferencing server computation when hard switching between sources,  but you can't do this if you're converting a mono stream to positioned stereo.  So I don't think that it requires any more than support for stereo. 

I suppose we could ask that codecs support a mode where a mono stream is sent with a positioning wrapper, instead of using stereo over the link... But for that case, I think that could be provided equally or better by another protocol or wrapper also running on RTP in a codec independent manner.

Of course, minimizing the codec computational burden is helpful for scaling conferencing systems.

Can anyone share some typical conference scaling numbers which they consider interesting?    It's been my view that all comers are fast enough that conferencing on commercial scale isn't much of an issue:  That, yes, it might take some serious processing grunt to handle 10,000 users on a conferencing system, but if you're running at that kind of scale you could afford the required hardware.   I've not see any numbers to suggest that this isn't the case.