Re: [codec] Fwd: New Version Notification for draft-terriberry-oggopus-00.txt
"Timothy B. Terriberry" <tterribe@xiph.org> Thu, 05 July 2012 23:55 UTC
Return-Path: <tterribe@xiph.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A62811E80F6 for <codec@ietfa.amsl.com>; Thu, 5 Jul 2012 16:55:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.677
X-Spam-Level:
X-Spam-Status: No, score=-1.677 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jkPzZj8cbDQG for <codec@ietfa.amsl.com>; Thu, 5 Jul 2012 16:55:19 -0700 (PDT)
Received: from smtp.mozilla.org (mx2.corp.phx1.mozilla.com [63.245.216.70]) by ietfa.amsl.com (Postfix) with ESMTP id 42DD411E807F for <codec@ietf.org>; Thu, 5 Jul 2012 16:55:19 -0700 (PDT)
Received: from [172.17.0.5] (c-69-181-137-38.hsd1.ca.comcast.net [69.181.137.38]) (Authenticated sender: tterriberry@mozilla.com) by mx2.mail.corp.phx1.mozilla.com (Postfix) with ESMTPSA id 3F180F230E for <codec@ietf.org>; Thu, 5 Jul 2012 16:55:30 -0700 (PDT)
Message-ID: <4FF62970.5070704@xiph.org>
Date: Thu, 05 Jul 2012 16:55:28 -0700
From: "Timothy B. Terriberry" <tterribe@xiph.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120620 SeaMonkey/2.10.1
MIME-Version: 1.0
To: codec@ietf.org
References: <20120705150704.14085.7364.idtracker@ietfa.amsl.com> <4FF5AEED.8080300@xiph.org> <4FF61755.5060205@thaumas.net>
In-Reply-To: <4FF61755.5060205@thaumas.net>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [codec] Fwd: New Version Notification for draft-terriberry-oggopus-00.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jul 2012 23:55:23 -0000
Ralph Giles wrote: > Thanks for writing this up! As Tim says, this draft is already Well, I had a bunch of help. This has been percolating for a while at http://wiki.xiph.org/OggOpus > I think it would be better to swap sections 4 and 5. In particular, the Of course, this means the reader has none of the concepts needed to understand the pre-skip field when it is introduced (in particular, the 48 kHz decoding rate assumption, the difference between granule position and PCM sample position, etc.). We could forward-reference it, but it will remain completely mysterious until they get to the granule position section. Anyone else have opinions on this? > calculate identical timestamps moving either forward or backward. I > think the draft should offer some guidance as to what a decoder MAY or > SHOULD do if there *is* a hole in the data, for example if part of the > file is corrupt. Trying to specify what to do in the case of a corrupt file always looked like a pretty deep rathole to me (years ago I started writing a tool to make a best-effort attempt at repairing all kinds of corruption and invalid constructs in Ogg files, and with all the restrictions on beginning and end of stream, headers, ordering, etc., it becomes exceedingly complicated extremely quickly. And then you add chaining. > Of course such a file is technically invalid, and the paragraph about > granuleposition offsets being smaller than decoded data still stand, but > in the case of corrupt Ogg pages, a decoder MAY wish to scan for the > next valid page and trust its timestamp or the average bitrate to > reconstruct what durations of packets were invalid. Most of these restrictions are in place to make things easier to implement, but "what's easy" depends a good deal on your application. Just to illustrate: scanning forward for the next valid page is fine and good, but what do you do with the data once you get there? If you're not synchronizing with other streams and playing out to a soundcard, maybe you just keep playing (after skipping a suitable amount of samples for preroll and maybe cross-fading to avoid clicks... though sounding obviously broken in this case might be an advantage). If you are synchronizing with other streams or writing to a file, maybe you run a PLC algorithm or inject silence (in the file writing case, because you don't know what use it will be put to afterwards, so maintaining sync might be important there, too). Try not to think too hard about what happens if you're using a frame-based PLC like the one in libopus, and the gap is not a multiple of a valid frame size in the most recent mode. The more important question is, what happens when the next timestamp is 2**63 samples in the future? And of course, most importantly, I don't want to give people the impression that, however weakly we suggest it in the spec, what's written there is what all implementations will do, and that therefore encoders will _rely_ on that behavior. Then it becomes impossible to use the decoder flexibility gained by declaring such constructs invalid. > pre-skip of at least 3,840 samples (80 ms) is RECOMMENDED." Do I > understand correctly that, since pre-skip is subtracted from granulepos > to obtain timestamps, cropping this way requires rewriting the > granulepos field for every page in the logical stream, if > synchronization with other logical streams is to be maintained? I > suppose this is true of vorbis cropping too, but the need to preroll the > encoder makes the issue much more noticeable. Yes, I believe this is correct. This is still an improvement, in that you can get sample-accurate cutting without rewriting the granulepos field in every page so long as you _aren't_ trying to synchronize with another stream. That's impossible in Vorbis. > I don't understand the motivation for the R128_TRACK_GAIN header. > Applying this gain is optional, while there is already a manditory > output_gain in the ID header. It seems like there are two ways to > specify the same thing here. I don't think these are the same thing. As the draft says, the field in the header should normally correspond to what REPLAYGAIN_ALBUM_GAIN used to represent. R128_TRACK_GAIN corresponds to REPLAYGAIN_TRACK_GAIN. The former is a single normalization calculated across an entire collection of files (for some definition of collection), while the latter is calculated from the apparent loudness of just that individual file. > (a) It is important to describe the interaction between normalization > metadata, user-controlled volume settings, and the output_gain header > field to ensure that the later is correctly implemented as a manditory > element. > > [Snip] > > All of which seems noble enough, but only (a) is specific to Opus, and > addressing that issue doesn't require defining a new tag. Would it be > better to document this in another draft, or a revision of the > referenced vorbis-comment document? I think it might be reasonable to document R128 versions of the ReplayGain tags elsewhere, but I think point (a) is the reason that this tag was discussed explicitly in this draft, and not any of the other issues. There is also the whole proscriptive (output gain) vs. descriptive (R128_TRACK_GAIN) argument, but I'll let Greg Maxwell argue that one. > I was surprised that the R128_TRACK_GAIN field specifies its value is > the ascii representation of an integer, which is then interpreted as > a fixed point number. Surely including the radix would be more natural > for a human-readable field. The draft could still specify the same valid > range. Care to offer some text to make this more clear? > I also think it would be better to clamp the allowed range rather than > specifying that it MUST be valid. As a decoder author I wouldn't reject > a file because an optional metadata value is incorrect. As a decoder author, you of course could do whatever you wanted if an encoder fails to follow a MUST restriction. For example, you might want to just not apply the tag at all, because it's likely the result of a bogus calculation in the encoder (and the extremes of the representable range are _very_ extreme). > It might be useful to define a metadata scheme for naming the output > channels from Channel mapping 255. One use case for this mode is I don't have a strong opinion on this one, so I'll defer to others here. Thanks for the feedback!
- [codec] Fwd: New Version Notification for draft-t… Timothy B. Terriberry
- Re: [codec] Fwd: New Version Notification for dra… Kevin P. Fleming
- Re: [codec] Fwd: New Version Notification for dra… Timothy B. Terriberry
- Re: [codec] Fwd: New Version Notification for dra… Ralph Giles
- Re: [codec] Fwd: New Version Notification for dra… Timothy B. Terriberry
- Re: [codec] Fwd: New Version Notification for dra… Jean-Marc Valin
- Re: [codec] Fwd: New Version Notification for dra… Timothy B. Terriberry
- Re: [codec] Fwd: New Version Notification for dra… Jean-Marc Valin
- Re: [codec] Fwd: New Version Notification for dra… Ron
- Re: [codec] Fwd: New Version Notification for dra… Ron
- Re: [codec] Fwd: New Version Notification for dra… Gregory Maxwell
- Re: [codec] Fwd: New Version Notification for dra… Ralph Giles
- Re: [codec] Fwd: New Version Notification for dra… Timothy B. Terriberry
- Re: [codec] Fwd: New Version Notification for dra… Timothy B. Terriberry
- Re: [codec] Fwd: New Version Notification for dra… Ron
- Re: [codec] Fwd: New Version Notification for dra… Timothy B. Terriberry
- Re: [codec] Fwd: New Version Notification for dra… Gregory Maxwell
- Re: [codec] Fwd: New Version Notification for dra… Ron
- Re: [codec] Fwd: New Version Notification for dra… Ron
- Re: [codec] Fwd: New Version Notification for dra… Ralph Giles