[codec] draft-ietf-codec-oggopus: attached pictures

Mark Harris <mark.hsj@gmail.com> Mon, 25 August 2014 07:01 UTC

Return-Path: <markh.sj@gmail.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com []) by ietfa.amsl.com (Postfix) with ESMTP id A34361A8AAB for <codec@ietfa.amsl.com>; Mon, 25 Aug 2014 00:01:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.7
X-Spam-Status: No, score=0.7 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id SIN5FqYR4qtN for <codec@ietfa.amsl.com>; Mon, 25 Aug 2014 00:01:50 -0700 (PDT)
Received: from mail-ig0-x241.google.com (mail-ig0-x241.google.com [IPv6:2607:f8b0:4001:c05::241]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F21E1A8AAC for <codec@ietf.org>; Mon, 25 Aug 2014 00:01:50 -0700 (PDT)
Received: by mail-ig0-f193.google.com with SMTP id h18so772129igc.4 for <codec@ietf.org>; Mon, 25 Aug 2014 00:01:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=Mm5JCkMS3mUi8susLfAs9FRnAdSVOzIA4XmZNwjBHqI=; b=gLkSGJFXGthjAmUFmc9K65FfrasPfiROVvvnPUxd7UMdeKPQ7iCp3ohqC7HDPpcZqQ nCblADCfLEcTwCHOVpFIte/Scdx3TP3QBlwqFsjZ63nbl0h55GjUrxWxowzfq938MgW+ uH365g8/E8aEIJLxjJs3sDChSHBlGTS95pnR8LjyGXgaWjaCJh7wVCNIKnb2foRHDQd4 N+HON7v+srJys+uJfUtKRYBwMbaoJRlBTzuwpGZifpvVvxqm9vZNxZAuOk6t5kX55iaT i5zLM4hpSiAG3uvkn1wuaceFVrO0SjYfDT4yqMfqG1z5988y5w6QfJxpd/5YGQ+JiCmw Yg5w==
MIME-Version: 1.0
X-Received: by with SMTP id q16mr23124959icb.0.1408950109638; Mon, 25 Aug 2014 00:01:49 -0700 (PDT)
Sender: markh.sj@gmail.com
Received: by with HTTP; Mon, 25 Aug 2014 00:01:49 -0700 (PDT)
Date: Mon, 25 Aug 2014 00:01:49 -0700
X-Google-Sender-Auth: 8sg_PrTydO5bHNhv-178gYC2LtA
Message-ID: <CAMdZqKEOfNEXEAGjyx2+5xW7QkrA2ekNVZym+ZLRsNoA0+cSYQ@mail.gmail.com>
From: Mark Harris <mark.hsj@gmail.com>
To: "codec@ietf.org" <codec@ietf.org>
Content-Type: text/plain; charset=UTF-8
Archived-At: http://mailarchive.ietf.org/arch/msg/codec/kZPMv0HQQDfnPyfoRgtPQfAZ8Yk
Subject: [codec] draft-ietf-codec-oggopus: attached pictures
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec/>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Aug 2014 07:01:54 -0000

Opus does a fantastic job at its primary goal of compressing audio,
including speech, offering a lower bitrate and latency than other
similar codecs.  This makes it an obvious choice for use in RTP and
many other uses.  However when it comes to storing music, podcasts,
and audiobooks in a file, there are additional requirements beyond
just a great codec, such as good support for attached pictures and
other metadata.

Attached pictures (also known as album art or cover art) are images
(usually in JPEG or PNG format) that are related to the audio, and are
often shown by players while the audio is playing or to help the user
choose the file to be played.  In most audio formats pictures can be
categorized into a number of picture types, with multiple pictures
allowed for most types.

Container formats like Matroska have their own format for storing
attached pictures and other metadata in the container, however Ogg
does not provide this and requires any such data to be encoded within
the packets of a media stream.

Attached pictures are not mentioned at all in the current Ogg Opus
draft (draft-ietf-codec-oggopus-04), which simply defers to the
vorbis-comment specification for the format of metadata, with a few
specific differences such as new gain tags.

When Ogg Vorbis was originally specified, it provided a way of storing
a list of text comment tags for things like the title, artist,
license, and copyright of the work, but no way to attach pictures or
binary data.  Because other formats allow attached pictures and users
wanted to add them to Ogg Vorbis files as well, a method of encoding
pictures in a text comment tag using base64 encoding was devised,
which is documented at https://wiki.xiph.org/VorbisComment#Cover_art .
This allowed pictures to be attached to Ogg Vorbis files without
impacting compatibility, but the downside is that attaching pictures
to an Ogg Vorbis file adds about 134% of the picture size to the size
of the file, compared to just over 100% of the picture size for
competing formats including MP3/ID3, iTunes/Quicktime MPEG-4, FLAC,
and Matroska-based formats.  Also, because comments are always at the
beginning of the file and can appear in any order, it is not easy to
skip the pictures and get to the audio if there are multiple large
pictures, at least when there are some other comments that may be of

Unfortunately the current Ogg Opus draft has the same issue and has no
way of attaching pictures as binary data, thus requiring the use of
the same base64-encoded comment workaround that was used for Ogg
Vorbis.  This is unbefitting of a compressed file format, which has a
primary purpose of reducing file size, and it would be unfortunate if
Ogg Opus was standardized with the same, easily-addressed limitation.
It is difficult to take a purportedly compressed file format seriously
when the first 400 KB or so of the file is a long base64 string of
text that takes significantly more space than the corresponding data
that was provided to the encoder.

I see a few different ways that this could be addressed:

 (1) Specify that comments may contain either UTF-8 or binary data,
according to some rule.  For example, if the name of the tag begins
with "@" then its value is binary data and not intended to be
displayed as text, and is otherwise UTF-8.  There is no technical
issue with storing binary data since each comment is already preceded
by a length field, but a tag name beginning with "@" would be a signal
to applications that the value should not be assumed to be UTF-8 or
displayed as text.  Pictures could then be attached using a newly
defined tag such as "@PICTURE", with a value that is the same as the
binary format used in FLAC (and the same as Ogg Vorbis before base64

     Alternatively, ordinary tag names could be used and the value
itself could be used to indicate that it is non-UTF-8 binary data, for
example by including a null character prefix before the binary data,
or a 0xff byte prefix (which is not valid UTF-8), or a null byte
delimiter in place of "=".  However using distinct tag names seems

 (2) Immediately following the comments, in the same packet, allow a
picture count followed by a length and binary data for each picture,
similar to the way that comments are stored but separated from them.
Separating pictures from comments allows a more appropriate format to
be used to encode them, makes it less likely that tools will treat the
data as text, and also makes it easier to skip all attached pictures
entirely when they are not needed.  Tools based on older versions of
the draft would see anything after the original comment section as
comment padding and ignore any attached pictures.  The content of each
picture would be the same as it is for FLAC, and the same as Ogg
Vorbis without the added base64 encoding or "METADATA_BLOCK_PICTURE="

 (3) Specify a way to store attached pictures in the file outside of
the Opus stream.  This is the way that containers such as Quicktime
and Matroska work, but to do that with Ogg would require another
stream that contains the pictures, since the Ogg container itself does
not provide metadata.

 (4) Do not address this.  Attached pictures must be base64-encoded
and written in a comment.  If users complain, recommend use of
Matroska or another container that does not have this issue.

Any thoughts on these or other ideas?

I have created a proof-of-concept implementation of (1) and (2) based
on opus-tools, at:
https://github.com/mark4o/opus-tools/tree/binary_comment (1),
https://github.com/mark4o/opus-tools/tree/pic_section (2).

 - Mark