Re: [codec] Fwd: New Version Notification for draft-terriberry-oggopus-00.txt

Ralph Giles <giles@thaumas.net> Thu, 05 July 2012 22:38 UTC

Return-Path: <giles@thaumas.net>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 58A4E11E80BD for <codec@ietfa.amsl.com>; Thu, 5 Jul 2012 15:38:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pHd9MEpo0p6Q for <codec@ietfa.amsl.com>; Thu, 5 Jul 2012 15:38:03 -0700 (PDT)
Received: from mail-pb0-f44.google.com (mail-pb0-f44.google.com [209.85.160.44]) by ietfa.amsl.com (Postfix) with ESMTP id CAAF121F850F for <codec@ietf.org>; Thu, 5 Jul 2012 15:38:03 -0700 (PDT)
Received: by pbcwy7 with SMTP id wy7so13796579pbc.31 for <codec@ietf.org>; Thu, 05 Jul 2012 15:38:18 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=YckOrvqpVlQp9/OEmJIN3yAmaYS8CoK7wnMQco9wwc8=; b=UnX/6Yp47BpdPhoCy/tek0H5bswR2VjX1tgg7ViaU/vj1Rt1DYi+mcvXFqgheKxMwO MmxPG/I0nPzgW/2naPLkboBbh9FsIGNWdJdTO+S3V6djZRp0QeQ2PiiRREjw+qvjkGG5 dIHaR4aVJZ0v44zccghU5658AFfdJq0ixrlWPDYVA6b1LkyH5R8jTVXy8lgK3hcTmcd0 KjC6TZMucwsTYrybVHSVcUIXjuEKAZFAuV7aS8UnaoRO923mMjMqaxmkW2oTtZasjuJt EA1y/11RopPrgMlFJxBrX56eYfs6SMEm5L+u1aNus4eHZITRPP7hTFZ8U9DSQo4Os3KD j3gA==
Received: by 10.68.138.169 with SMTP id qr9mr31460294pbb.27.1341527898468; Thu, 05 Jul 2012 15:38:18 -0700 (PDT)
Received: from Glaucomys.local ([64.213.70.194]) by mx.google.com with ESMTPS id oo6sm20537456pbc.22.2012.07.05.15.38.13 (version=SSLv3 cipher=OTHER); Thu, 05 Jul 2012 15:38:14 -0700 (PDT)
Message-ID: <4FF61755.5060205@thaumas.net>
Date: Thu, 05 Jul 2012 15:38:13 -0700
From: Ralph Giles <giles@thaumas.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:13.0) Gecko/20120614 Thunderbird/13.0.1
MIME-Version: 1.0
To: "Timothy B. Terriberry" <tterribe@xiph.org>
References: <20120705150704.14085.7364.idtracker@ietfa.amsl.com> <4FF5AEED.8080300@xiph.org>
In-Reply-To: <4FF5AEED.8080300@xiph.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Gm-Message-State: ALoCoQk3GzhhSSc1l0iwUmlXnFTM6PaEkOUzWUZe4jl/Q6dLUZsQScw8fwnujn6eZx3Hgp/MEz0r
Cc: codec@ietf.org
Subject: Re: [codec] Fwd: New Version Notification for draft-terriberry-oggopus-00.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jul 2012 22:38:08 -0000

On 12-07-05 8:12 AM, Timothy B. Terriberry wrote:

> Now that the main codec draft has been approved by the IESG, I thought
> we should move on to standardizing a long-term storage file format.

Thanks for writing this up! As Tim says, this draft is already
implemented in Firefox, with support in the HTML <audio> element, hidden
behind a preference switch. You can try it with Aurora (alpha) and
Nightly builds. See http://people.xiph.org/~giles/2012/opus/ for details.

You can create test files with an experimental suite of command-line
utilities in the opus-tools package. See
http://opus-codec.org/downloads/ for how to obtain a copy.

Some comments on the draft:

I think it would be better to swap sections 4 and 5. In particular, the
header layout figures are of general interest to writing file
identification and metadata extraction tools, as well as full
implementations, so having them near the beginnning of the draft makes
for easier reference. The granulepos section is quite long, and fits
better with the seeking and other considerations in section 6.

Section 4 says the granulepos values MUST increment by the number of
decodeable samples in the intervening packets, so that decoders can
calculate identical timestamps moving either forward or backward. I
think the draft should offer some guidance as to what a decoder MAY or
SHOULD do if there *is* a hole in the data, for example if part of the
file is corrupt.

Of course such a file is technically invalid, and the paragraph about
granuleposition offsets being smaller than decoded data still stand, but
in the case of corrupt Ogg pages, a decoder MAY wish to scan for the
next valid page and trust its timestamp or the average bitrate to
reconstruct what durations of packets were invalid.


In the description of the 'pre-skip' field in the granulepos discussion,
the draft says this may be used to exactly trim the start-time of stream
to perform lossless cropping. Then in the description of the field in
the ID header section, "When constructing cropped Ogg Opus streams, a
pre-skip of at least 3,840 samples (80 ms) is RECOMMENDED." Do I
understand correctly that, since pre-skip is subtracted from granulepos
to obtain timestamps, cropping this way requires rewriting the
granulepos field for every page in the logical stream, if
synchronization with other logical streams is to be maintained? I
suppose this is true of vorbis cropping too, but the need to preroll the
encoder makes the issue much more noticeable.


I don't understand the motivation for the R128_TRACK_GAIN header.
Applying this gain is optional, while there is already a manditory
output_gain in the ID header. It seems like there are two ways to
specify the same thing here.

>From informal discussions, I'm aware of the following arguments for
specifying this field. Perhaps someone more familiar with audio
normalization can respond if I've left someting out.

(a) It is important to describe the interaction between normalization
metadata, user-controlled volume settings, and the output_gain header
field to ensure that the later is correctly implemented as a manditory
element.

(b) While the manditory output_gain field can be used to fix recording
level problems in situations where re-encoding is not feasible, for
example files produced by a separate workflow, or a recording of a live
event, some producers and distributors want an *optional* way to mark
files to support the subset of applications which apply playback
normalization. This is effectively metadata for machine consumption, not
a change in the intended mix level, so a separate field is justified.

(c) It is useful to define a new tag based on the EBU R128 reference
level, rather than reusing the older 'replaygain' specification. The EBU
standard is technically superiour and gaining broader support in audio
production tools and safety regulations.

All of which seems noble enough, but only (a) is specific to Opus, and
addressing that issue doesn't require defining a new tag. Would it be
better to document this in another draft, or a revision of the
referenced vorbis-comment document?

Contrariwise, why does the draft tackle this one metadata issue without
addressing other infelicities with vorbis-comment tag schemes, such as
non-dublincore keys, machine-parsible date formats, proper attribution
of Creative Commons licensed work, and so on?


I was surprised that the R128_TRACK_GAIN field specifies its value is
the ascii representation of an integer, which is then interpreted as
a fixed point number. Surely including the radix would be more natural
for a human-readable field. The draft could still specify the same valid
range.

I also think it would be better to clamp the allowed range rather than
specifying that it MUST be valid. As a decoder author I wouldn't reject
a file because an optional metadata value is incorrect.


It might be useful to define a metadata scheme for naming the output
channels from Channel mapping 255. One use case for this mode is
multitrack recording for later offline mixing. Being able to propagate
channel labels between the input and output applications would be a nice
feature. For example:

CHANNEL_000=lead mic
CHANNEL_001=backing mic
CHANNEL_002=guitar
CHANNEL_003=bass
CHANNEL_010=piano left
CHANNEL_011=piano right

Cheers,
 -r