Re: [codec] Fwd: New Version Notification for draft-terriberry-oggopus-00.txt

"Timothy B. Terriberry" <tterribe@xiph.org> Thu, 05 July 2012 23:55 UTC

Return-Path: <tterribe@xiph.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A62811E80F6 for <codec@ietfa.amsl.com>; Thu, 5 Jul 2012 16:55:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.677
X-Spam-Level:
X-Spam-Status: No, score=-1.677 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jkPzZj8cbDQG for <codec@ietfa.amsl.com>; Thu, 5 Jul 2012 16:55:19 -0700 (PDT)
Received: from smtp.mozilla.org (mx2.corp.phx1.mozilla.com [63.245.216.70]) by ietfa.amsl.com (Postfix) with ESMTP id 42DD411E807F for <codec@ietf.org>; Thu, 5 Jul 2012 16:55:19 -0700 (PDT)
Received: from [172.17.0.5] (c-69-181-137-38.hsd1.ca.comcast.net [69.181.137.38]) (Authenticated sender: tterriberry@mozilla.com) by mx2.mail.corp.phx1.mozilla.com (Postfix) with ESMTPSA id 3F180F230E for <codec@ietf.org>; Thu, 5 Jul 2012 16:55:30 -0700 (PDT)
Message-ID: <4FF62970.5070704@xiph.org>
Date: Thu, 05 Jul 2012 16:55:28 -0700
From: "Timothy B. Terriberry" <tterribe@xiph.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120620 SeaMonkey/2.10.1
MIME-Version: 1.0
To: codec@ietf.org
References: <20120705150704.14085.7364.idtracker@ietfa.amsl.com> <4FF5AEED.8080300@xiph.org> <4FF61755.5060205@thaumas.net>
In-Reply-To: <4FF61755.5060205@thaumas.net>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [codec] Fwd: New Version Notification for draft-terriberry-oggopus-00.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jul 2012 23:55:23 -0000

Ralph Giles wrote:
> Thanks for writing this up! As Tim says, this draft is already

Well, I had a bunch of help. This has been percolating for a while at 
http://wiki.xiph.org/OggOpus

> I think it would be better to swap sections 4 and 5. In particular, the

Of course, this means the reader has none of the concepts needed to 
understand the pre-skip field when it is introduced (in particular, the 
48 kHz decoding rate assumption, the difference between granule position 
and PCM sample position, etc.). We could forward-reference it, but it 
will remain completely mysterious until they get to the granule position 
section.

Anyone else have opinions on this?

> calculate identical timestamps moving either forward or backward. I
> think the draft should offer some guidance as to what a decoder MAY or
> SHOULD do if there *is* a hole in the data, for example if part of the
> file is corrupt.

Trying to specify what to do in the case of a corrupt file always looked 
like a pretty deep rathole to me (years ago I started writing a tool to 
make a best-effort attempt at repairing all kinds of corruption and 
invalid constructs in Ogg files, and with all the restrictions on 
beginning and end of stream, headers, ordering, etc., it becomes 
exceedingly complicated extremely quickly. And then you add chaining.

> Of course such a file is technically invalid, and the paragraph about
> granuleposition offsets being smaller than decoded data still stand, but
> in the case of corrupt Ogg pages, a decoder MAY wish to scan for the
> next valid page and trust its timestamp or the average bitrate to
> reconstruct what durations of packets were invalid.

Most of these restrictions are in place to make things easier to 
implement, but "what's easy" depends a good deal on your application. 
Just to illustrate: scanning forward for the next valid page is fine and 
good, but what do you do with the data once you get there? If you're not 
synchronizing with other streams and playing out to a soundcard, maybe 
you just keep playing (after skipping a suitable amount of samples for 
preroll and maybe cross-fading to avoid clicks... though sounding 
obviously broken in this case might be an advantage). If you are 
synchronizing with other streams or writing to a file, maybe you run a 
PLC algorithm or inject silence (in the file writing case, because you 
don't know what use it will be put to afterwards, so maintaining sync 
might be important there, too). Try not to think too hard about what 
happens if you're using a frame-based PLC like the one in libopus, and 
the gap is not a multiple of a valid frame size in the most recent mode. 
The more important question is, what happens when the next timestamp is 
2**63 samples in the future?

And of course, most importantly, I don't want to give people the 
impression that, however weakly we suggest it in the spec, what's 
written there is what all implementations will do, and that therefore 
encoders will _rely_ on that behavior. Then it becomes impossible to use 
the decoder flexibility gained by declaring such constructs invalid.

> pre-skip of at least 3,840 samples (80 ms) is RECOMMENDED." Do I
> understand correctly that, since pre-skip is subtracted from granulepos
> to obtain timestamps, cropping this way requires rewriting the
> granulepos field for every page in the logical stream, if
> synchronization with other logical streams is to be maintained? I
> suppose this is true of vorbis cropping too, but the need to preroll the
> encoder makes the issue much more noticeable.

Yes, I believe this is correct. This is still an improvement, in that 
you can get sample-accurate cutting without rewriting the granulepos 
field in every page so long as you _aren't_ trying to synchronize with 
another stream. That's impossible in Vorbis.

> I don't understand the motivation for the R128_TRACK_GAIN header.
> Applying this gain is optional, while there is already a manditory
> output_gain in the ID header. It seems like there are two ways to
> specify the same thing here.

I don't think these are the same thing. As the draft says, the field in 
the header should normally correspond to what REPLAYGAIN_ALBUM_GAIN used 
to represent. R128_TRACK_GAIN corresponds to REPLAYGAIN_TRACK_GAIN. The 
former is a single normalization calculated across an entire collection 
of files (for some definition of collection), while the latter is 
calculated from the apparent loudness of just that individual file.

> (a) It is important to describe the interaction between normalization
> metadata, user-controlled volume settings, and the output_gain header
> field to ensure that the later is correctly implemented as a manditory
> element.
>
> [Snip]
>
> All of which seems noble enough, but only (a) is specific to Opus, and
> addressing that issue doesn't require defining a new tag. Would it be
> better to document this in another draft, or a revision of the
> referenced vorbis-comment document?

I think it might be reasonable to document R128 versions of the 
ReplayGain tags elsewhere, but I think point (a) is the reason that this 
tag was discussed explicitly in this draft, and not any of the other issues.

There is also the whole proscriptive (output gain) vs. descriptive 
(R128_TRACK_GAIN) argument, but I'll let Greg Maxwell argue that one.

> I was surprised that the R128_TRACK_GAIN field specifies its value is
> the ascii representation of an integer, which is then interpreted as
> a fixed point number. Surely including the radix would be more natural
> for a human-readable field. The draft could still specify the same valid
> range.

Care to offer some text to make this more clear?

> I also think it would be better to clamp the allowed range rather than
> specifying that it MUST be valid. As a decoder author I wouldn't reject
> a file because an optional metadata value is incorrect.

As a decoder author, you of course could do whatever you wanted if an 
encoder fails to follow a MUST restriction. For example, you might want 
to just not apply the tag at all, because it's likely the result of a 
bogus calculation in the encoder (and the extremes of the representable 
range are _very_ extreme).

> It might be useful to define a metadata scheme for naming the output
> channels from Channel mapping 255. One use case for this mode is

I don't have a strong opinion on this one, so I'll defer to others here.

Thanks for the feedback!