Re: [codec] draft-ietf-codec-oggopus: attached pictures

Mark Harris <> Sat, 30 August 2014 05:38 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 4B9401A875D for <>; Fri, 29 Aug 2014 22:38:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.7
X-Spam-Status: No, score=0.7 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id LCgherKqhPKm for <>; Fri, 29 Aug 2014 22:38:15 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4001:c05::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 768E91A8764 for <>; Fri, 29 Aug 2014 22:38:15 -0700 (PDT)
Received: by with SMTP id h3so182716igd.3 for <>; Fri, 29 Aug 2014 22:38:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=x0qwIBNl6QzQJvBEBsraGRnJQBm61mTLTzNX1T/X3Sg=; b=EoJs14pIbx30vLSwntI/iqyuSCrA1XJfkz4LtXsrZ18jOy6XF6PvIbooo56ur/2h7o Y3YN/Y+eVt/v23tMK0ZUumO7hWZ1/w5oS6hBLcsq/3FakTY14cft/LvGoSnuOBzTBTlO kyodHLo4ADO7C7eFtcrpogTHtaDmRh6Kg63b52IO934/PEnHGvA/H1C37kKXBglO2N7m 4RjiDWEYmkNQsuRgpyj3HFfCv2PzEGk+2DmiAZsscwvjsFFaNWqFzIjwqPsrXkxwiLBT xmID/2QZaOiebkyDiHEerszRJPo6pI8y7Z9qgdfSeW4XbTgx252cfnGJXhEZ8Qp3/HHw +GnQ==
MIME-Version: 1.0
X-Received: by with SMTP id em6mr15854723icc.21.1409377094854; Fri, 29 Aug 2014 22:38:14 -0700 (PDT)
Received: by with HTTP; Fri, 29 Aug 2014 22:38:14 -0700 (PDT)
In-Reply-To: <>
References: <> <>
Date: Fri, 29 Aug 2014 22:38:14 -0700
X-Google-Sender-Auth: 3r9IvS1WFCtYSorGm_wc7IBX72A
Message-ID: <>
From: Mark Harris <>
To: "Timothy B. Terriberry" <>, Basil Mohamed Gohar <>
Content-Type: text/plain; charset="UTF-8"
Cc: "" <>
Subject: Re: [codec] draft-ietf-codec-oggopus: attached pictures
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 30 Aug 2014 05:38:18 -0000

Timothy B. Terriberry wrote:
> Mark Harris wrote:
>> Attached pictures are not mentioned at all in the current Ogg Opus
>> draft (draft-ietf-codec-oggopus-04), which simply defers to the
>> vorbis-comment specification for the format of metadata, with a few
>> specific differences such as new gain tags.
> The theory was that it wasn't necessary to re-invent the wheel here.
> Applications which already support Vorbis-style album art can add support
> for Opus album art with as little as one line of code (because they can, and
> do, share comment-parsing routines among Vorbis/Theora/Speex/etc.). This is
> not just a matter of engineering simplicity, but also speed of adoption.

Use of vorbis-comment is fine for its intended purpose, and to be
clear I do not suggest that something else be used for short,
text comments such as TITLE, ALBUM, or ARTIST.  The referenced
vorbis-comment document explicitly states that it is meant for
short, text comments, not arbitrary metadata.  You don't use a
hammer in place of a screwdriver just because you have a hammer
handy and have found that it can get the screw into the hole.  I
can understand the reasoning behind trying to shoehorn attached
pictures into comments for Ogg Vorbis, as the Ogg Vorbis encapsulation
format was designed before attached pictures were common and it had
already been finalized for a number of years; this was one of the
few options available.  However it is clear that vorbis-comment was
not intended for this kind of data and that the efficiency of this
encoding for attached pictures is significantly worse than what is
available with other common audio file formats.

As for sharing code, native FLAC also uses Vorbis-style comments
for short text tags and a binary format for attached pictures; in
fact it is the same binary format that is being suggested.  I don't
see why an application supporting Ogg Opus and FLAC, or just Ogg
Opus, should need to include code for base64 encoding and decoding
attached pictures just because that was needed by a different older
format.  If an application supports both Ogg Vorbis and Ogg Opus
then it will need to support both the old and new styles of gain
tags as well as the old and new styles of picture tags.  The format
of the new picture data is the same as the old except that the
base64 encoding is no longer needed, so it is easy to support both
just as it is easy to support both gain tags.  It is also easy to
distinguish the old and new styles without having to know which to
expect.  So code sharing should not be an issue.

>>   (1) Specify that comments may contain either UTF-8 or binary data,
>> according to some rule.  For example, if the name of the tag begins
>> with "@" then its value is binary data and not intended to be
>> displayed as text, and is otherwise UTF-8.  There is no technical
> This has backwards-compatibility problems. Is this unique to Opus or would
> we expect other formats to adopt the same strategy? (see above about
> code-reuse).

Like the new gain tags, applications could decide to support the
new tags with older formats, although I wouldn't expect the new
tags to be recommended for older formats that already have another
well established standard.  On the other hand, if there are other
reasons to create a new incompatible version of the older standard
(e.g. TransOgg :)) then it would make sense to recommend use of the
new tags in the new version.

>              What would we do about existing tags that might already start
> with an '@'? How would we represent a field name that we want to start with
> an '@' (for whatever reason)?

The vorbis-comment specification explicitly allows new unregisterd
"nonstandard" tag names, and does not offer any system of ensuring
unique tag names, so this is an issue with the design of the
vorbis-comment specification in that it is always possible that
someone else has used the same tag name for another purpose.  If
this is considered to be a serious compatibility issue then the
design of vorbis-comment is seriously flawed, because no one would
then be able to introduce new tags since they may already be in use
with a different purpose or syntax.

If there is known existing usage of tag names beginning with @,
with text values, then a different indicator should be chosen for
binary tags.  However even if such tags exist I would not expect
it to be an issue, as this should have little effect other than the
way in which these tags, if unrecognized, are displayed and edited
by default.  I would expect that most applications do not do anything
with unrecognized tags and should be completely unaffected.  Any
ability to copy and delete tags should also also not be impacted.

> We could also simply use a more compact form of embedding binary data in
> UTF-8, but those have higher implementation complexity (and two encodings to
> choose from is higher still).

Even writing the picture data in binary form, attaching pictures
in Ogg still adds more overhead than any other non-Ogg format that
I tried (results below), primarily due to the Ogg lacing for the
picture data.  However, despite being dead last the binary format
is close enough to be competitive.

  Overhead of attaching 300 KB PNG image:
    MP3/ID3:                 + 0.04% of image size
    Matroska:                + 0.06% of image size
    FLAC:                    + 0.09% of image size
    M4A/iTunes:              + 0.32% of image size
    Ogg Opus base64 comment: +33.95% of image size
    Ogg Opus binary comment: + 0.46% of image size

If the format is not even competitive with other popular encapsulation
formats in common feature areas at the time of standardization,
what that tells me is that it is not ready for standardization yet,
as that will just lock in the bad decisions.  If we were talking
about a difficult problem that required serious tradeoffs in order
to address then it may be worth just acknowledging that other
encapsulation formats are better suited to such uses, however that
would be unfortunate for something that is easily addressed with
no tradeoffs required.

What is really nice about the Opus codec is that it does not require
any compromise; it is freely available and offers the best quality
available for a wide variety of applications from low-latency
low-bitrate speech to transparent quality multi-channel music, which
previously required several non-free codecs and careful codec
selection in order to choose something that will work well for all
use cases.  A standardized encapsulation format that similarly
required no compromise for all of the common use cases would make
an appropriate match.  An encapsulation format with two to three
orders of magnitude more overhead than anything else for a commonly
used feature that can account for a significant portion of actual
files would be an especially poor choice.

>>   (2) Immediately following the comments, in the same packet, allow a
>> picture count followed by a length and binary data for each picture,
> This is more practical, though we probably want to leave room to define
> additional types of binary data later (e.g., use the FLAC
> METADATA_BLOCK_HEADER and/or BLOCK_TYPE values). Again, I'm interested in
> the issues of code re-use for other formats. Notably, this is hard to make
> compatible with Vorbis because it encodes one non-zero "framing bit" after
> the main comment data (and mandates it be checked).

A block type sounds great.  I was just trying to minimize the changes
by keeping the binary format identical to what Ogg Vorbis passed to the
base64 encoder.

Although something similar could be done for Ogg Vorbis, this is
probably not the place to discuss that.  Anyone who has implemented
the current Ogg Opus draft knew that they were implementing a draft
version of the standard that is subject to change, but that is not
the case for Ogg Vorbis.

>>   (3) Specify a way to store attached pictures in the file outside of
>> the Opus stream.  This is the way that containers such as Quicktime
>> and Matroska work, but to do that with Ogg would require another
>> stream that contains the pictures, since the Ogg container itself does
>> not provide metadata.
> This can already be done by specifying a relative URL instead of actual
> picture data (by setting the mime type appropriately). This has the
> advantage that a single file can be used for multiple tracks, which provides
> considerably more space savings than avoiding BASE64 encoding as soon as you
> have more than 1 track from the same album. Support for and usage of this
> approach is almost non-existent. Whether that's because the convenience to
> users of having everything embedded in one file is worth the extra space or
> because of the inconvenience to application authors of having to (securely)
> deal with external files, or the combination of both of those effects, I
> don't know.

While that may be useful in some cases, I think there is still a
strong need for the ability to keep everything related to the track
in a single file.  That makes it easier to copy to a phone or other
playback device, easier to make available for download or sharing,
easier to ensure security and privacy, and less likely to break when
files are moved or copied.  To be clear I was referring to some way of
attaching the pictures to the same file.

>>   (4) Do not address this.  Attached pictures must be base64-encoded
>> and written in a comment.  If users complain, recommend use of
>> Matroska or another container that does not have this issue.
> We could also choose not to address this _here_, and instead continue to
> defer to the existing vorbis-comment documentation. That wouldn't block
> publication of this draft, and give us time to get implementation feedback
> from the authors of media players and tools who would have to deal with this
> change. I think this is the approach that I personally prefer.

If this is not addressed prior to the standard being finalized, it
becomes very difficult to add later.  Ogg Vorbis is still much worse
at this than other formats even after so long because it was not
part of the design, and a kludge was needed in order to add it
afterwards without affecting compatibility.  Now there is an
opportunity to address this in a cleaner manner for Ogg Opus
encapsulation.  This opportunity should not be squandered and the
same mistakes repeated once again.

Basil Mohamed Gohar wrote:
> 2.  Isn't there already a specification for Kate that allows usage of
> images in Ogg?  Hackish, I know, but that might lend some guidance on a
> way (good or bad?) to encode binary image data.

It looks like Kate can support timed text and timed images, but
not attached pictures that are not associated with particular times.

> 3.  Does Ogg Skeleton have any mechanism or "spot" for this kind of
> binary image data?

I do not see anything related in Ogg Skeleton.

> I'd be interested to know if there are
> cases where the image data becomes large in comparison to the audio data

I just checked some audio files that I have locally on this computer.
Considering only commercially sold music, the ratio of attached
picture size (in binary form) to compressed audio size was as high
as 26.28%.  (This was on a 48 second MP3 interlude track that is
part of an album, with a single attached picture.)  The largest
attached picture I found was a 1.9 MB PNG, although that was on a
podcast episode, not commercially sold music.

If it is expected that Ogg Opus will be taken seriously as a
compressed audio format then it would do well to avoid adding 34%
to these sizes for no added benefit.

 - Mark