Re: [codec] Suggestions for OggOpus draft

Gregory Maxwell <gmaxwell@juniper.net> Mon, 24 September 2012 18:36 UTC

Return-Path: <gmaxwell@juniper.net>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1788D21F886C for <codec@ietfa.amsl.com>; Mon, 24 Sep 2012 11:36:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ESNw7h0R--3Y for <codec@ietfa.amsl.com>; Mon, 24 Sep 2012 11:36:41 -0700 (PDT)
Received: from exprod7og112.obsmtp.com (exprod7og112.obsmtp.com [64.18.2.177]) by ietfa.amsl.com (Postfix) with ESMTP id 5C84421F8869 for <codec@ietf.org>; Mon, 24 Sep 2012 11:36:41 -0700 (PDT)
Received: from P-EMHUB01-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob112.postini.com ([64.18.6.12]) with SMTP ID DSNKUGCoN+I/Jb7YM8n+xyhag24rGrc51wzu@postini.com; Mon, 24 Sep 2012 11:36:41 PDT
Received: from EMBX01-HQ.jnpr.net ([fe80::c821:7c81:f21f:8bc7]) by P-EMHUB01-HQ.jnpr.net ([fe80::fc92:eb1:759:2c72%11]) with mapi; Mon, 24 Sep 2012 11:33:42 -0700
From: Gregory Maxwell <gmaxwell@juniper.net>
To: Jean-Marc Valin <jmvalin@jmvalin.ca>, "codec@ietf.org" <codec@ietf.org>
Date: Mon, 24 Sep 2012 11:33:42 -0700
Thread-Topic: [codec] Suggestions for OggOpus draft
Thread-Index: Ac2aerNx4FYH+XhsTZmdxAV2SuPSAAABLbRq
Message-ID: <BCB3F026FAC4C145A4A3330806FEFDA9414A0FD130@EMBX01-HQ.jnpr.net>
References: <50609935.3080900@jmvalin.ca>
In-Reply-To: <50609935.3080900@jmvalin.ca>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [codec] Suggestions for OggOpus draft
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 18:36:42 -0000

Jean-Marc Valin [jmvalin@jmvalin.ca]:
> I suggest adding these two sections to the OggOpus draft, one that
> describes how to handle chaining and one that has recommendations for
> encoders. I'm sure there's still more things an encoder can do that I
> haven't listed.

The recommendations here are quite complex in terms of container handling
— complex enough that alone their implementation would be comparable to
a whole implementation of a basic encoder / decoder (not including the
codec itself).  This complexity is hidden by the fact that it uses DSP
terminology like 'LPC' in place of what would be a dozen step procedure.

It also doesn't address the multitude of corner cases that can arise: like
what should an encoder do if the minimum 2.5ms padding requirement
means two frames must be trimmed, so they need to be repacketized,
but they have different toc's and so can't be repacketized. I expect many
implementations would end up outputting two packets that need to be trimmed,
which may break some decoders.

Opus-tools's encoder does LPC extrapolation for end-padding, it doesn't bother
with the beginning because doing it at the beginning added latency and was
somewhat messy and I wasn't able to find any originally gapless streams that weren't
gapless without it (which surprised me). 

Since even with only a half-padding encoder and no decoder splice handling 
I was unable to find any gaplesness issues for originally gappless audio, I 
suspect the decoder part of this advice is just not needed.  It's also unlikely
to be implemented: it's complicated, requires DSP coding (admittedly modest)
which will take some implementers out of their comfort zone, and can be
quite difficult to implement in some architectures (e.g. if separate links in a
chain are decoded independently— as is common in existing mediaframe
works).

Recommending encoder extrapolation to prevent pre/post leakage of the
discontinuity to silence seems prudent to me. But I think the decoder
advice is unlikely to be followed especially if written as a recommendation.
If it turns out to actually be important it should probably be a requirement
not a recommendation. If it's not important then I think it should be omitted
because reduced implementation behavior diversity is more important than
theoretical improvements.

Chain itself is not widely supported by implementations of Vorbis. Only basic
audio only tools commonly support it.  I fear adding a bunch of complexity to
handle them the recommended way will further encourage implementations
which opt not to work with chaining at all.

(My opinion would be revised with examples which aren't gapless without it,
or by the existence patches for gstreamer / vlc / and or Firefox, all of 
which will, I think, have problems implementing these recommendations.)