Re: [codec] draft-ietf-codec-oggopus: attached pictures

Ron <> Sun, 31 August 2014 16:06 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id A375F1A0370 for <>; Sun, 31 Aug 2014 09:06:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.8
X-Spam-Status: No, score=0.8 tagged_above=-999 required=5 tests=[BAYES_50=0.8, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id RSligMJW-5Wr for <>; Sun, 31 Aug 2014 09:06:29 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 5DC071A038C for <>; Sun, 31 Aug 2014 09:06:28 -0700 (PDT)
Received: from (HELO mailservice.shelbyville.oz) ([]) by with ESMTP; 01 Sep 2014 01:36:27 +0930
Received: from localhost (localhost []) by mailservice.shelbyville.oz (Postfix) with ESMTP id 645FDFFF35 for <>; Mon, 1 Sep 2014 01:36:25 +0930 (CST)
X-Virus-Scanned: Debian amavisd-new at mailservice.shelbyville.oz
Received: from mailservice.shelbyville.oz ([]) by localhost (mailservice.shelbyville.oz []) (amavisd-new, port 10024) with LMTP id fib74UeJus3F for <>; Mon, 1 Sep 2014 01:36:24 +0930 (CST)
Received: from hex.shelbyville.oz (hex.shelbyville.oz []) by mailservice.shelbyville.oz (Postfix) with ESMTPS id 77A5DFF9AC for <>; Mon, 1 Sep 2014 01:36:24 +0930 (CST)
Received: by hex.shelbyville.oz (Postfix, from userid 1000) id 6545080470; Mon, 1 Sep 2014 01:36:24 +0930 (CST)
Date: Mon, 01 Sep 2014 01:36:24 +0930
From: Ron <>
Message-ID: <20140831160624.GH326@hex.shelbyville.oz>
References: <> <> <> <20140830072745.GF326@hex.shelbyville.oz> <> <20140830162119.GG326@hex.shelbyville.oz> <>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.5.23 (2014-03-12)
Subject: Re: [codec] draft-ietf-codec-oggopus: attached pictures
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 31 Aug 2014 16:06:31 -0000

On Sat, Aug 30, 2014 at 04:03:36PM -0700, Mark Harris wrote:
> Ron wrote:
> > Which is why I'm suggesting that one option if you want still
> > pictures, is to define a generic Ogg mapping for them - which could
> > then be used with any other format that also has an Ogg mapping,
> > including this one.
> I have no objection to adding the pictures in a separate multiplexed
> stream; in fact that is my proposal (3).  That is the only one that I
> did not specify with enough detail to create interoperable
> implementations, because it is a much larger change to the way that
> opus-tools currently attaches pictures, and I therefore did not expect
> that approach to be favored.  However if there is consensus that that
> is the way to go, I can propose something more concrete, and

I think it's a bit early, and there hasn't yet been enough thorough
investigation of the available backward compatible options, to declare
this is "the way to go", but I do think there's general recognition
that this is one of the options on table for that which could be
investigated further, and could be made to work if it turns out to
be the best available option.

And which wouldn't require changes to this draft once there was a
real consensus among the various stakeholders about an extension
that met all of their various needs for something like this (not
just still picture data).

> in that
> case the only change that I think is needed in the Ogg Opus draft is
> to allow these streams in an "Ogg Opus file".  In particular, when
> attached pictures are not specifically mentioned, no requirements or
> recommendations (such as recommended file extension) should depend on
> whether attached pictures are present.

Most of this draft only talks about and defines the "Ogg Opus stream",
ie. the Ogg mapping needed for a single stream of Opus audio.  Which is
the minimal amount of extra data needed to actually play such a stream.

The comment header allows for the inclusion of arbitrary other metadata
but that data is strictly optional.  Tools can freely ignore it, or
strip it away, and the stream still remains valid and playable.  That
data is in a format that existing tools already knew how to handle,
(and to some extent, in some tools, expect for Ogg audio streams).

The only place we talk about an "Ogg Opus file" (with one exception,
which I think is a 'typo' that should instead say stream, and will
address in a separate mail), is where we say you can mux one or more
Ogg Opus streams together to store them in a file.

Which I think is the correct interpretation, since it indicates they
contain *only* Ogg Opus streams.  A minimal tool which recognises only
the Opus stream mapping is all that is required to handle such files.

Obviously you can also create files which mux an Ogg Opus stream with
other streams, for video, or subtitles, or still pictures, or anything
else - and when you do that, they become a more complex media type,
as defined by RFC 5334, and they become some other kind of file, which
requires something more than a minimal Ogg player to render.

Also obviously, we can more specifically define certain combinations
of those things in the future too "Opus with Daala", "Opus with still
pictures (possibly in a slide show)", "Opus with dancing holo-ponies",
whatever people care to create.  But you're going to need a more
specifically capable player to correctly render those, and it does
seem like they should be noted as distinct for the convenience of end
users as well in any case.  That doesn't make them "second class" or
otherwise frowned upon or illegal, they're just different.

The file extension is exactly such a convenience.  No player should
*ever* rely purely on it to know what the content of a file is.
It's just a hint to users that "A minimal Opus-only player should
be able to play this for you" and nothing more.

I think trying to define (and include or exclude) such things here in
detail is putting the cart before the holo-pony.  Once we have an Ogg
mapping for Daala, and for other arbitrary binary data etc. then
a separate document can define the normative and informational rules
for how such things should be combined with Ogg Opus streams and what
they should be called when in the newly defined form.

I think we leave open the most scope for being able to do that well
if this draft just confines itself to the Opus mapping, and leaves
more complex combinations (including with things that don't exist
yet) to future documents to define.  We just need to make sure we
haven't accidentally made something impossible here - which I don't
think we have (or can't yet see if we do).

> > There may be better ways to do that, but any optimal way should not
> > be Opus specific, it should be reusable for any Ogg stream.
> Oh I agree completely that this should ideally not be codec-specific.
> Not only attached pictures but also tags such as TITLE and ARTIST,
> channel mapping, output gain, and loudness/gain required for
> normalization would ideally all be maintained by the container and not
> be specific to any codec.  The reason that I proposed adding attached
> pictures in the codec-specific stream is because Ogg does not provide
> metadata at the container level, and all of the metadata I mentioned
> is currently being stored on a codec-specific page of the
> codec-specific stream.  I would love to see all of this moved out of
> the codec-specific stream and into an extensible codec-independent
> format elsewhere in the file, and the codec-specific stream reduced to
> only what is truly codec-specific, but that is not the direction that
> was taken for other metadata.

Ogg is purely a generic framing format.  Like TLV, it provides simple
rules that lets a generic tool parse a stream, even if that stream
includes things it doesn't know anything about, to cherry pick the
things that it does out of it.  Unlike TLV, it strikes a balance
between being efficient for small data packets and being able to
contain arbitrarily huge ones in the same format.

And unlike most other media containers, it was designed with the
ability to be streamed right from the outset too.

Which really means that the best format for metadata in any given
use scenario *is* somewhat specific to that scenario.  (ie. it's
totally useless to have all your metadata in container-defined
headers at the beginning of a file if people can join a stream
at any point during its playback, and it's pointless to have
something like "track gain" for a video stream).

For the first few codecs that were mapped to it, the value of
trying to guess and implement a metadata format that would be
properly future-proof was limited.  For the next few, the line
of least resistance was to "do what the other codecs did",
reusing what was good, fixing what turned out to be a mistake
as appropriate.  And much of that was done in an era when the
extra meta-data was always deliberately minimal, since we only
had, like, 100MB disks, if we were the lucky ones!

So yes, the time might well be right to look back on all of the
things that we currently have using it, and devise some format
that can both be used for new mappings that are created, and
retro-fitted to older ones, by and for tools that support it.

Ogg Skeleton was an early attempt along those lines for adding
multi-stream metadata, but it never really seemed to get a lot
of traction, for reasons I don't profess to really know.

But all of that is why I think this draft should confine itself
to just specifying the mapping for Opus streams, and why adding
"enhanced" meta-data should be done as an extension of RFC 3533
in close consultation with the people who'd actually have to
want it and implement it for it to become more generally useful.

I think you've made some good points about that, but I don't
think this group are the ones you need to convince and this
document is the place we need to specify it, in order for that
to really happen now.

Ideally I'd like to see this conversation move from "does this
belong in the Ogg Opus draft", to engagement of the various
interested parties in the merits and needs of the various
options for it (and associated things that haven't been raised
here at all yet), and if there is real momentum for that, to
the question of whether it should become a new milestone for
this group to create a separate document on that (or if not,
where that work should instead best be done).