Re: [codec] requirements #8 (new): Sample rates?

Jean-Marc Valin <jean-marc.valin@octasic.com> Thu, 27 January 2011 01:12 UTC

Return-Path: <jean-marc.valin@octasic.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9CA633A68F6 for <codec@core3.amsl.com>; Wed, 26 Jan 2011 17:12:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.501
X-Spam-Level:
X-Spam-Status: No, score=-2.501 tagged_above=-999 required=5 tests=[AWL=0.098, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vxNBl5HXdutb for <codec@core3.amsl.com>; Wed, 26 Jan 2011 17:12:04 -0800 (PST)
Received: from relais.videotron.ca (relais.videotron.ca [24.201.245.36]) by core3.amsl.com (Postfix) with ESMTP id BC22A3A68F8 for <codec@ietf.org>; Wed, 26 Jan 2011 17:12:04 -0800 (PST)
MIME-version: 1.0
Content-transfer-encoding: 8bit
Content-type: text/plain; charset="windows-1252"; format="flowed"
Received: from [192.168.1.14] ([70.81.109.112]) by VL-MR-MRZ22.ip.videotron.ca (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTP id <0LFN002DIQ554YB0@VL-MR-MRZ22.ip.videotron.ca> for codec@ietf.org; Wed, 26 Jan 2011 20:15:05 -0500 (EST)
Message-id: <4D40C70F.4010200@octasic.com>
Date: Wed, 26 Jan 2011 20:14:55 -0500
From: Jean-Marc Valin <jean-marc.valin@octasic.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7
To: Koen Vos <koen.vos@skype.net>
References: <1995914071.1419511.1296086833495.JavaMail.root@lu2-zimbra>
In-reply-to: <1995914071.1419511.1296086833495.JavaMail.root@lu2-zimbra>
Cc: Pochol@WebfootGames.com, codec@ietf.org
Subject: Re: [codec] requirements #8 (new): Sample rates?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Jan 2011 01:12:06 -0000

What I meant in my previous email was that these modes are potentially 
switchable. How exactly (or whether) we want to handle that is still an 
open question. It even something we could handle in the RTP payload 
format. Overall, I don't have a strong opinion on switching of custom modes.

	Jean-Marc

On 11-01-26 07:07 PM, Koen Vos wrote:
> In your previous email you mentioned two custom modes with switchable frame sizes.  For those modes at least the frame size will have to be signaled in each packet, no?
>
> I could see a benefit in making all recommended custom modes "instantly" decodable (meaning without prior knowledge).  Although it does add another layer of "standard custom" vs "custom custom" which may confuse people.
>
> I'm fine either way.
> koen.
>
>
> ----- Original Message -----
> From: "Jean-Marc Valin"<jean-marc.valin@octasic.com>
> To: "Koen Vos"<koen.vos@skype.net>
> Cc: "Gregory Maxwell"<gmaxwell@juniper.net>, codec@ietf.org, Pochol@WebfootGames.com
> Sent: Wednesday, January 26, 2011 1:27:01 PM
> Subject: Re: [codec] requirements #8 (new): Sample rates?
>
> On 11-01-26 04:18 PM, Koen Vos wrote:
>> And of course: How to signal the custom modes within the packet?
>
> The custom modes will have to be negotiated at the SDP level. Unlike "Opus"
> packets, you will just not be able to decode an "Opus-custom" packet
> without knowing what was negotiated.
>
>       Jean-Marc
>
>>
>> ----- Original Message -----
>> From: "Koen Vos"<koen.vos@skype.net>
>> To: "Gregory Maxwell"<gmaxwell@juniper.net>
>> Cc: Pochol@WebfootGames.com, codec@ietf.org
>> Sent: Wednesday, January 26, 2011 1:08:51 PM
>> Subject: Re: [codec] requirements #8 (new): Sample rates?
>>
>> So what needs to be decided is:
>>
>> 1. How to call these custom modes (for instance in the SDP descriptor).  As long as we agree that it is not "Opus", then we prevent compatibility issues and we keep our on-the-fly switching flexibility within the standard Opus format.  I'm fine with "Opus-custom".
>>
>> 2. How to enable the custom modes in the code.  I think there must be a small barrier so that users don't unintentionally start using them.  Maybe a well-commented #define somewhere, plus an API control flag to put the codec in custom mode?
>>
>> 3. Should there be a set of "official" custom modes, to aid interop even with Opus-custom?  Users could still pick and choose any modes they want, but having the list could make users gravitate towards a few common choices.
>>
>> koen.
>>
>>
>>
>> ----- Original Message -----
>> From: "Gregory Maxwell"<gmaxwell@juniper.net>
>> To: Pochol@WebfootGames.com, "Jean-Marc Valin"<jean-marc.valin@octasic.com>
>> Cc: codec@ietf.org
>> Sent: Wednesday, January 26, 2011 11:32:01 AM
>> Subject: Re: [codec] requirements #8 (new): Sample rates?
>>
>>
>>
>> I'm very concerned that some people may believe that 44.1k is just another checkbox that can be added
>> without cost or much consideration. This isn't the case. I think it's essential that we delineate realtime VoIP
>> style usage from other applications.
>>
>> I support Jean Marc's option (1), which I think allows us to have our cake and eat it too. It's the closest thing to a
>> "cost free" option that I think we're going to get. This option basically separates the codec into two profiles,
>> one which imposes sampling rate restrictions in exchange for many important advantages, and one which
>> does not, but misses the advantages.
>>
>> JM's option (2) would also work. But I don't like the idea of the market confusion created by a totally
>> separate codec which has significant overlap with Opus, but isn't Opus.  Basically, I think it's silly
>> for part of the working group's output to compete with itself.  I'd prefer to just have "Opus" and
>> "Opus-custom" or whatever.
>>
>>
>> There are several reasons that supporting 44.1kHz in the primary Opus profile would be bad:
>>
>> One reason for this is that quite a bit of hardware (even on desktops) can only do 48kHz (or closely related
>> rates) and even when the hardware can do multiple rates it can only ever do one rate at a time so if
>> its even possible that multiple applications may play sound at once then the only way to avoid resampling
>> is if they are all running at a common rate.
>>
>> For the 48kHz related rates we can do very computationally cheap handling of different rates purely
>> inside the codec. If their hardware supports any mode out of the 48kHz family, then they'll need no costly
>> resampling at all. And if they do  need run at 44.1k they can resample in and out of the codec without imposing on (or
>> negotiating with) the far end.
>>
>> Another one is that Opus (as described in the draft, without 44.1kHz) can switch between any of
>> its supported modes, all on the fly, without creating any surprising impositions on the clients.
>> This is possible only because of the closely related nature of the supported rates, and 44.1kHz
>> can't be accommodated in this scheme.
>>
>> The on the fly switching and lack of requirement to negotiate, also means that two opus devices can
>>    communicate without transcoding, even if they were spliced long after negotiation. (e.g. as part
>> of a conference gateway).
>>
>>
>> On the other end of the spectrum— the current CELT library and bitstream is extremely flexible.
>> It supports a great many frame sizes and sample rates and I've personally argued against
>> every limitation we've imposed on it, because I like the idea that CELT can fit into every
>> niche requirement (like the DAB frame sizes).
>>
>> But this flexibility has a serious price for interoperable implementations: They must carry substantially
>> more code (the limited rates/sample sizes means that a simple table can replace several hundred
>> lines of tricky bit-exact initialization code), cope with increased peak CPU usage (e.g. if some device
>> only speaks 64 sample frames, 96KHz you might need 10x the CPU power to speak to it compared
>> to your preferred mode), and undertake more complicated negotiation and testing (CELT can support
>> far more unique mixtures of sample rate, frame-size, and channel count then there are RTP payload
>> types).
>>
>> So basically, I support fully supporting oddball configurations as a well specified standardized mode,
>> but I oppose subjecting the general VoIP/RTP users to the increase complexity and limitations of
>> the more rate/framesize agile configurations.
>>
>> I expect that most users which care about the 'custom' modes are doing other specialized things
>> and won't be expected to ever interop with a random Opus phone except (maybe) via a gateway,
>> and that most of them won't even speak RTP— so the separation shouldn't even make
>> much difference to them at all.
>>
>>
>> Thoughts?
>>
>>
>>
>>
>> ________________________________________
>> From: codec-bounces@ietf.org [codec-bounces@ietf.org] On Behalf Of Pascal Pochol [Pochol@WebfootGames.com]
>> Sent: Wednesday, January 26, 2011 6:06 PM
>> To: Jean-Marc Valin
>> Cc: codec@ietf.org
>> Subject: Re: [codec] requirements #8 (new): Sample rates?
>>
>> Hello,
>>
>> I just wanted to give in my 2 cents about 44.1Khz native support.
>>
>> 99% of all the audio we use celt and eventually opus for are encoded at
>> 44.1Khz. They are provided to us that way. Which means that without 44.1khz
>> support we'll have to up-sample 99% of our audio most likely in a
>> preprocess build making us maintain 2 sets of audibly identical files. We
>> had to do it before with speex where we converted from 44.1 down to 32khz
>> to use its native ultrawideband but it really wasn't the easiest. We had
>> thousands of files duplicated and every now and then a few of these getting
>> updated from the source, forcing us to redo massive conversions each time
>> to make sure we didn't miss one somewhere.
>>
>> Also about upsampling not costing much, I beg to differ. We had to work
>> with hardware that could decode speex ultrawideband 32Khz just fine but the
>> decoding alone was eating up all our CPU leaving not much else to do the
>> real work that we needed to do. We had to use 16khz instead to make it all
>> work. 48Khz, 44.1khz might not look like a big difference when working on a
>> desktop but when you're counting bytes to see how you can reduce you memory
>> consumption it could mean the world.
>>
>> So strickly from a user of the codec's view, native 44.1Khz would certainly
>> make working with celt/opus a lot easier. I'm guessing that I'm not the
>> only one in that case based on this thread. Easier would also lead to
>> faster adoption.
>>
>> Sorry I didn't intend to write that much. In short 44.1Khz: great if you
>> can do it, if not we'll just have to work around it.
>>
>> -Pascal
>>
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec