Re: [codec] Ambisonics in an Ogg Opus Container

Michael Graczyk <mgraczyk@google.com> Sat, 28 May 2016 00:16 UTC

Return-Path: <mgraczyk@google.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 283EE12D1D3 for <codec@ietfa.amsl.com>; Fri, 27 May 2016 17:16:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.127
X-Spam-Level:
X-Spam-Status: No, score=-4.127 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yf2z6IyE6QA6 for <codec@ietfa.amsl.com>; Fri, 27 May 2016 17:16:42 -0700 (PDT)
Received: from mail-vk0-x22b.google.com (mail-vk0-x22b.google.com [IPv6:2607:f8b0:400c:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C17F412D126 for <codec@ietf.org>; Fri, 27 May 2016 17:16:41 -0700 (PDT)
Received: by mail-vk0-x22b.google.com with SMTP id d127so2779223vkh.2 for <codec@ietf.org>; Fri, 27 May 2016 17:16:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=WsNdvvVMR7H01KNvcI4X2HbGk4PADCOs7Bk2vk46Yi8=; b=alEu0q3IYlGsp8g4fmUC1hnfE/42VstMHOxxIIcJsk45iYAgiEunbqHklOHxL4+txn rvtZ+u7NFc48OL+obYWYxKC0SJP5uuSwK759SI4JB7VavahPaDc08n4Dmx/WmF/jdp7n oEjS76BPOjMzAhlWvwWH6FE39R1hLGR51Oz4hK4icwJPJDq3gOUtd+kWt2lXHbR6lR58 n+utMPlYx5rpbXV3XiSzsbEdvWAj7YhNDYCmzPC7gTvRlPTdUhi9TPjzm0JGw8FzgXL4 d8E/tcm58/5URSQC7vzYqTNgIV/515bPIvcRTJxuYiQCeXvR7rbeJWY5cDWr15irez8k KjTw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=WsNdvvVMR7H01KNvcI4X2HbGk4PADCOs7Bk2vk46Yi8=; b=kcgZmPl+SRWQETQwLdr59I3qFjEiCLtbGOUDkNGehPM3gXKiyLoEMQKaNkGfRbv7Gw AltPp0gXVIOKhk+AxeAwjuLXvTCkwqpUyrG7ZdhYBIOo7MhiHMbEyqIsvl0tu/S38yQ8 yyVZ3eomwEzhKiYLWStvXTO5NQ+gTiU1CdsD5r6AC2I+PXR2skTjvWMwtC19DzKQGaOP sgSwlOUmUghCpWbH0Reu6Qne+HNH1khiZz0/oxlF+XN3IvYeaWgLDJSa3nmy68xkDaSf D/SmsjhUc7dLrqn/6j3FtABT6psB2sRBBBaodDHbd5JNBkLe/+0n4YhuVpWHjD+akVME 4I1w==
X-Gm-Message-State: ALyK8tKa9dPm8BhSaynyn/W2+GsOb7kpj/iS8PJpVrNwngGUcSMpx8q+pZS/bxFoIHrTTzVSKuiPciSQn7SO19CI
X-Received: by 10.31.67.14 with SMTP id q14mr10025115vka.38.1464394600548; Fri, 27 May 2016 17:16:40 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.71.210 with HTTP; Fri, 27 May 2016 17:16:39 -0700 (PDT)
In-Reply-To: <574887A3.30003@xiph.org>
References: <CABcu6-jN2gC0FRm5Vu6CmT5=yMdJr1WSaJJ_-FOXffw8bWjc_g@mail.gmail.com> <574887A3.30003@xiph.org>
From: Michael Graczyk <mgraczyk@google.com>
Date: Fri, 27 May 2016 17:16:39 -0700
Message-ID: <CABcu6-h5XXi3QnsPvGw=MREamf-O+x3RHGDeeN1NRo0EUJ=CoA@mail.gmail.com>
To: "Timothy B. Terriberry" <tterribe@xiph.org>, jb@videolan.org, jmvalin@mozilla.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/codec/AmTE1a1OkVokjHmJqw5yH5Ej09k>
Cc: codec@ietf.org
Subject: Re: [codec] Ambisonics in an Ogg Opus Container
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/codec/>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 28 May 2016 00:16:46 -0000

Thanks Jean-Baptise, Tim, and Jean-Marc for your questions and
comments. Also thanks for all the support! I've replied inline.

On Fri, May 27, 2016 at 2:02 AM, Jean-Baptiste Kempf <jb@videolan.org> wrote:
> How is this mapping related to the mp4 one from google spatial-media:
> https://github.com/google/spatial-media/blob/master/docs/spatial-audio-rfc.md
The RFC that you linked to described a metadata box for spatial audio
(SA3D). This metadata is specific to mp4. We do not currently plan to
encapsulate Opus in mp4, but may be interested in the future if the
encapsulation described here becomes more widely supported:
wiki.xiph.org/Mp4Opus.

Since we have no plans to encapsulate Opus in mp4, I have been
operating under the assumption that all ambisonic metadata should be
contained in the Ogg stream itself (in this case implicitly by
defining only one channel ordering and normalization).



On Fri, May 27, 2016 at 10:45 AM, Timothy B. Terriberry
<tterribe@xiph.org> wrote:
> Ogg is a general purpose container, supporting audio, video, subtitles, etc. (though its most common use in current applications is for audio).

I reworded this to say
"Ogg is a general purpose container, supporting audio, video, and other media.
It can be used to encapsulate audio streams coded using the Opus codec"

> I think it may be useful to say that this channel mapping number will also likely impact other formats which use the same channel mapping families, e.g., Maktroska, MP4, MPEG TS. Just adding a sentence to the introduction along the lines of, "This mapping can also be used in other contexts which make use of the channel mappings defined by the Opus Channel Mapping Families registry."

Great, added.
BTW, when either Maktroska or WebM is used to encapsulate opus, the
entire Ogg container is wrapped, including the OpusHead header.

> For n = 0, (1 + n)^2 == 1, but that's not on your explicit list. Was the intention here ...

I would like to include mono to minimize the number of special cases
that decoders have to deal with. For example, if a virtual reality
playback environment expects to always receive ambisonics with mapping
family 2, the encoder could simply send single channel, n=0 ambisonics
when spatial audio is unavailable.

I corrected the explicit list: "Explicitly 1, 4...".

> I've read the [ambix] reference before, and it still took me a while to figure out what this normalization coefficient is actually referring to. ... The coefficient in general seems highly dependent on how you express the spherical harmonic basis functions...

Yes, unfortunately the interpretation of "normalization" depends on
how you define the spherical harmonic functions. I expect that Opus
encoders and decoders would not need to apply this normalization, but
the normalization is part of the semantic meaning of the audio stream.
For example, if the decoder is passing it's decoded stream along to an
ambisonic player which expects PCM with SN3D normalization, then the
decoder does not need to deal with the exact definition of "SN3D". On
the other hand, if the ambisonic player does not support SN3D
playback, an intermediate scaling step would be necessary. I do not
want to require the Opus decoder to perform this scaling, but I want
the Ogg stream to carry enough metadata (implicitly in this case) to
make it clear when and what scaling must be done.

> It may be better to simply refer to [ambix] Section 2.1, equation (1) for the definition of the spherical harmonics basis functions (the explicit section/equation makes it easy to find), and leave out this equation entirely.

I agree, that would be more clear.

> Conversely, it would be reasonable to spell out the full equation for those basis functions in this document (to make it more self-contained). But this kind of sits in-between, where it looks like it's trying to say something that you can understand on its own, but in fact it's impossible to interpret outside of the context of that reference (which already says it).

I agree. I do not think it would be worthwhile to include a
self-contained description of ambisonics, because the interpretation
of the encoded data is fairly complicated and depends on the playback
environment.  I removed this equation and referenced the section in
[ambix] instead.

> I doubt this will really cause confusion, but you don't define W and Y. You might want to say they correspond to the first two ambisonics channels, explicitly.

I added a definition in the paragraph above.

> Does this actually update RFC7845?...

Great, I removed the "updates" attribute from the rfc tag.



On Fri, May 27, 2016 at 11:00 AM, Jean-Marc Valin <jmvalin@mozilla.com> wrote:
> I personally don't think 1-channel ambisonics should be allowed
> (unless there's a good reason I missed). We might want to say that
> encoders MUST NOT create them, but decoders SHOULD handle them as if
> they were regular mono files.

As I described above, I believe there is a good reason to support
1-channel ambisonics. The playback of ambisonics from an Ogg stream
can be simplified if the playback system always receives ambisonics
with channel mapping family 2. With 1-channel ambisonics, this
simplification exists even when the streamed data originally had no
spatial content.

> I agree, I'm pretty sure we don't want the "Updates: 7845" line.
Great, removed.


Question for anyone: Do I submit an -01 draft with the changes
mentioned above, or do I send out an intermediate draft on this list?

-- 

Thanks,
Michael Graczyk