Re: [codec] requirements #8 (new): Sample rates?

Koen Vos <koen.vos@skype.net> Wed, 26 January 2011 21:15 UTC

Return-Path: <koen.vos@skype.net>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5CE3B3A6893 for <codec@core3.amsl.com>; Wed, 26 Jan 2011 13:15:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.561
X-Spam-Level:
X-Spam-Status: No, score=-2.561 tagged_above=-999 required=5 tests=[AWL=0.038, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5Mzo0W5wwDyf for <codec@core3.amsl.com>; Wed, 26 Jan 2011 13:15:56 -0800 (PST)
Received: from mx.skype.net (mx.skype.net [78.141.177.88]) by core3.amsl.com (Postfix) with ESMTP id 6CDCF3A69DA for <codec@ietf.org>; Wed, 26 Jan 2011 13:15:56 -0800 (PST)
Received: from mx.skype.net (localhost [127.0.0.1]) by mx.skype.net (Postfix) with ESMTP id 927D1170D; Wed, 26 Jan 2011 22:18:57 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=skype.net; h=date:from:to :cc:message-id:in-reply-to:subject:mime-version:content-type: content-transfer-encoding; s=mx; bh=+wrnYyJ+AFWuq9ULPsxE5N/QwNY= ; b=lp7EjXys9/jHRXQ3Udw22ar4asacRQErZmldXGasZoClHh0esLwzGMQ+623d TcMaeuZpppLxexc3CWpTJhB7i7Z3FOcnKg3QnfM37Yra20hkROT6CmVEswBRBOHq aFhHnhUOGoFDx8x0AC7WTe+sH+DGOpNqsxXXcUGPETQU8s4=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=skype.net; h=date:from:to:cc :message-id:in-reply-to:subject:mime-version:content-type: content-transfer-encoding; q=dns; s=mx; b=BrngEGrlsaSk04mUfKKtVl 65yL4a+d+6cYmxuM1odTWUE2A6vFehKWqjvYnlkADObhsnUsCk9Dvz48sNhVBB26 PndsY2FoFh2FoXdITyhVB0SFqgb3sdv9ZAgyEvk5PECfnK/G5w1PLpnO/eftiNQh dAno0GNuVTvb94IqmUbTk=
Received: from zimbra.skype.net (zimbra.skype.net [78.141.177.82]) by mx.skype.net (Postfix) with ESMTP id 911E77F6; Wed, 26 Jan 2011 22:18:57 +0100 (CET)
Received: from localhost (localhost [127.0.0.1]) by zimbra.skype.net (Postfix) with ESMTP id 6D9123507B79; Wed, 26 Jan 2011 22:18:57 +0100 (CET)
X-Virus-Scanned: amavisd-new at lu2-zimbra.skype.net
Received: from zimbra.skype.net ([127.0.0.1]) by localhost (zimbra.skype.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DTLEMR5KoQsx; Wed, 26 Jan 2011 22:18:56 +0100 (CET)
Received: from zimbra.skype.net (lu2-zimbra.skype.net [78.141.177.82]) by zimbra.skype.net (Postfix) with ESMTP id 3C6803507B73; Wed, 26 Jan 2011 22:18:56 +0100 (CET)
Date: Wed, 26 Jan 2011 22:18:56 +0100
From: Koen Vos <koen.vos@skype.net>
To: Gregory Maxwell <gmaxwell@juniper.net>
Message-ID: <1485847861.1415843.1296076736111.JavaMail.root@lu2-zimbra>
In-Reply-To: <731662711.1415662.1296076131142.JavaMail.root@lu2-zimbra>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [69.181.192.115]
X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Win)/6.0.9_GA_2686)
Cc: codec@ietf.org, Pochol@WebfootGames.com
Subject: Re: [codec] requirements #8 (new): Sample rates?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Jan 2011 21:15:58 -0000

And of course: How to signal the custom modes within the packet?


----- Original Message -----
From: "Koen Vos" <koen.vos@skype.net>
To: "Gregory Maxwell" <gmaxwell@juniper.net>
Cc: Pochol@WebfootGames.com, codec@ietf.org
Sent: Wednesday, January 26, 2011 1:08:51 PM
Subject: Re: [codec] requirements #8 (new): Sample rates?

So what needs to be decided is:

1. How to call these custom modes (for instance in the SDP descriptor).  As long as we agree that it is not "Opus", then we prevent compatibility issues and we keep our on-the-fly switching flexibility within the standard Opus format.  I'm fine with "Opus-custom".

2. How to enable the custom modes in the code.  I think there must be a small barrier so that users don't unintentionally start using them.  Maybe a well-commented #define somewhere, plus an API control flag to put the codec in custom mode?

3. Should there be a set of "official" custom modes, to aid interop even with Opus-custom?  Users could still pick and choose any modes they want, but having the list could make users gravitate towards a few common choices.

koen.



----- Original Message -----
From: "Gregory Maxwell" <gmaxwell@juniper.net>
To: Pochol@WebfootGames.com, "Jean-Marc Valin" <jean-marc.valin@octasic.com>
Cc: codec@ietf.org
Sent: Wednesday, January 26, 2011 11:32:01 AM
Subject: Re: [codec] requirements #8 (new): Sample rates?



I'm very concerned that some people may believe that 44.1k is just another checkbox that can be added
without cost or much consideration. This isn't the case. I think it's essential that we delineate realtime VoIP
style usage from other applications.

I support Jean Marc's option (1), which I think allows us to have our cake and eat it too. It's the closest thing to a 
"cost free" option that I think we're going to get. This option basically separates the codec into two profiles,
one which imposes sampling rate restrictions in exchange for many important advantages, and one which
does not, but misses the advantages.  

JM's option (2) would also work. But I don't like the idea of the market confusion created by a totally
separate codec which has significant overlap with Opus, but isn't Opus.  Basically, I think it's silly
for part of the working group's output to compete with itself.  I'd prefer to just have "Opus" and
"Opus-custom" or whatever.


There are several reasons that supporting 44.1kHz in the primary Opus profile would be bad:

One reason for this is that quite a bit of hardware (even on desktops) can only do 48kHz (or closely related
rates) and even when the hardware can do multiple rates it can only ever do one rate at a time so if 
its even possible that multiple applications may play sound at once then the only way to avoid resampling
is if they are all running at a common rate.

For the 48kHz related rates we can do very computationally cheap handling of different rates purely
inside the codec. If their hardware supports any mode out of the 48kHz family, then they'll need no costly
resampling at all. And if they do  need run at 44.1k they can resample in and out of the codec without imposing on (or
negotiating with) the far end.

Another one is that Opus (as described in the draft, without 44.1kHz) can switch between any of
its supported modes, all on the fly, without creating any surprising impositions on the clients.
This is possible only because of the closely related nature of the supported rates, and 44.1kHz
can't be accommodated in this scheme.

The on the fly switching and lack of requirement to negotiate, also means that two opus devices can
 communicate without transcoding, even if they were spliced long after negotiation. (e.g. as part
of a conference gateway).


On the other end of the spectrum— the current CELT library and bitstream is extremely flexible.
It supports a great many frame sizes and sample rates and I've personally argued against
every limitation we've imposed on it, because I like the idea that CELT can fit into every
niche requirement (like the DAB frame sizes).

But this flexibility has a serious price for interoperable implementations: They must carry substantially
more code (the limited rates/sample sizes means that a simple table can replace several hundred
lines of tricky bit-exact initialization code), cope with increased peak CPU usage (e.g. if some device
only speaks 64 sample frames, 96KHz you might need 10x the CPU power to speak to it compared 
to your preferred mode), and undertake more complicated negotiation and testing (CELT can support
far more unique mixtures of sample rate, frame-size, and channel count then there are RTP payload
types). 

So basically, I support fully supporting oddball configurations as a well specified standardized mode,
but I oppose subjecting the general VoIP/RTP users to the increase complexity and limitations of
the more rate/framesize agile configurations.

I expect that most users which care about the 'custom' modes are doing other specialized things
and won't be expected to ever interop with a random Opus phone except (maybe) via a gateway,
and that most of them won't even speak RTP— so the separation shouldn't even make 
much difference to them at all.


Thoughts?




________________________________________
From: codec-bounces@ietf.org [codec-bounces@ietf.org] On Behalf Of Pascal Pochol [Pochol@WebfootGames.com]
Sent: Wednesday, January 26, 2011 6:06 PM
To: Jean-Marc Valin
Cc: codec@ietf.org
Subject: Re: [codec] requirements #8 (new): Sample rates?

Hello,

I just wanted to give in my 2 cents about 44.1Khz native support.

99% of all the audio we use celt and eventually opus for are encoded at
44.1Khz. They are provided to us that way. Which means that without 44.1khz
support we'll have to up-sample 99% of our audio most likely in a
preprocess build making us maintain 2 sets of audibly identical files. We
had to do it before with speex where we converted from 44.1 down to 32khz
to use its native ultrawideband but it really wasn't the easiest. We had
thousands of files duplicated and every now and then a few of these getting
updated from the source, forcing us to redo massive conversions each time
to make sure we didn't miss one somewhere.

Also about upsampling not costing much, I beg to differ. We had to work
with hardware that could decode speex ultrawideband 32Khz just fine but the
decoding alone was eating up all our CPU leaving not much else to do the
real work that we needed to do. We had to use 16khz instead to make it all
work. 48Khz, 44.1khz might not look like a big difference when working on a
desktop but when you're counting bytes to see how you can reduce you memory
consumption it could mean the world.

So strickly from a user of the codec's view, native 44.1Khz would certainly
make working with celt/opus a lot easier. I'm guessing that I'm not the
only one in that case based on this thread. Easier would also lead to
faster adoption.

Sorry I didn't intend to write that much. In short 44.1Khz: great if you
can do it, if not we'll just have to work around it.

-Pascal

_______________________________________________
codec mailing list
codec@ietf.org
https://www.ietf.org/mailman/listinfo/codec
_______________________________________________
codec mailing list
codec@ietf.org
https://www.ietf.org/mailman/listinfo/codec
_______________________________________________
codec mailing list
codec@ietf.org
https://www.ietf.org/mailman/listinfo/codec