Re: [codec] Comments on draft-ietf-codec-ambisonics-01

Jean-Marc Valin <jmvalin@mozilla.com> Mon, 13 March 2017 22:12 UTC

Return-Path: <jmvalin@mozilla.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 28611129B83 for <codec@ietfa.amsl.com>; Mon, 13 Mar 2017 15:12:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=mozilla.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SP-KEywxMHox for <codec@ietfa.amsl.com>; Mon, 13 Mar 2017 15:12:52 -0700 (PDT)
Received: from mail-it0-x22b.google.com (mail-it0-x22b.google.com [IPv6:2607:f8b0:4001:c0b::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8E486129B88 for <codec@ietf.org>; Mon, 13 Mar 2017 15:12:49 -0700 (PDT)
Received: by mail-it0-x22b.google.com with SMTP id w124so20942680itb.1 for <codec@ietf.org>; Mon, 13 Mar 2017 15:12:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mozilla.com; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=FphrnN4SvsqVAa7kvjAYLf7BxCH8stJt89K4V3nhaSI=; b=S8YQVqQcXny5vrC0C8IxWw/enZLUed9CrvqIy4MmOxM/Kb0lXBjayB3/G5tMVs8hRW ATJtlqsTJlS8JIMjLMSjd/oEaAG6bdyjA1muWnOB7UAKgHc9F0/MiaFwoZsLJB6+lWa4 rOkf14V4x8X4ACgQ6wLkh7kXkECBs6xJ1Mx+Q=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=FphrnN4SvsqVAa7kvjAYLf7BxCH8stJt89K4V3nhaSI=; b=f+mTNwQ07pBT4VNDC3+pukmLwm+SeCrc4MAt+WfQQ9bgqx8nDOST+PGeMn6GtUkaNs TIpRfZBd0xx5VbD3YuhbaL0U2mM3D6fn9FDZemcFTLwiejw60tFpO1ZRXtd8VguQl3DD jfgsYkNPWxhEWue7GAHcpblgvm/WMt5w4VPkW2bGfFu6AbPuVXcKkav4vbtsojWd9Nm5 QffTA+sN1X5W7khGocn6ZqJ7bahnXSDzvtIqi2UHkg/CR0vjmxtj5kaEsMmFTCXA83b2 bU3taGqYzuoPAc5tmMtpXAAyJxBlrfiEfw+zRJAaH3a9nCQ/ibkD7e3LU2Q/UFCZzJMN ObDA==
X-Gm-Message-State: AFeK/H0GHLXf4szXUDpUjYStXCpS7ilfr8Pfjd4fpG53H9Zeb8YOGg2c7mfq94/LMCmNOWLY
X-Received: by 10.36.22.209 with SMTP id a200mr12278467ita.117.1489443168794; Mon, 13 Mar 2017 15:12:48 -0700 (PDT)
Received: from panoramix.jmvalin.ca (modemcable067.31-56-74.mc.videotron.ca. [74.56.31.67]) by smtp.gmail.com with ESMTPSA id j2sm4228600itj.30.2017.03.13.15.12.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Mar 2017 15:12:48 -0700 (PDT)
To: Drew Allen <bitllama@google.com>, Mark Harris <mark.hsj@gmail.com>, Jan Skoglund <jks@google.com>, "codec@ietf.org" <codec@ietf.org>
References: <2f534e1b-b1af-266a-50ef-36f1739d878b@jmvalin.ca> <CAMdZqKGzdndiwpdXsYcHS7+r8Ega5LcQmAvcjiuHTHJgtTUwDg@mail.gmail.com> <CA+KMCSXhS2m4Dkous=4RkOibYWuoi+V_zBrhi1+anm-c+syQ1Q@mail.gmail.com> <CAMdZqKFDtD684HMkoO9bXi-c+g+8R+ay9kPdWSQOtHFDbC3ZLA@mail.gmail.com> <CABQ9DcuD+Et6+rBG-rCnWX-Dk-9STZMeYs-6fQWTk1kyjigRhw@mail.gmail.com> <52f5a570-e9f4-ea49-515e-498f0ed4f1bb@mozilla.com> <CABQ9Dcu0JVuAFvThSOgiBzxa+QOD4-1zpLzX6i-RKG7SRJnkNg@mail.gmail.com> <CABQ9Dct0d4id7wnzyu4sQHU=HZFVjCOXHCTO_F5RHcfE7HdH1Q@mail.gmail.com>
From: Jean-Marc Valin <jmvalin@mozilla.com>
Message-ID: <17622007-e5ce-0a08-67df-98c30a51e5a8@mozilla.com>
Date: Mon, 13 Mar 2017 18:12:47 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <CABQ9Dct0d4id7wnzyu4sQHU=HZFVjCOXHCTO_F5RHcfE7HdH1Q@mail.gmail.com>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="jTres7V6Gb6mrmnJ1cnQnq6cKEUC2D53F"
Archived-At: <https://mailarchive.ietf.org/arch/msg/codec/6uIoDhMbwbccEtpnemJstNjvQ3Q>
Subject: Re: [codec] Comments on draft-ietf-codec-ambisonics-01
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/codec/>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2017 22:12:55 -0000

On 13/03/17 06:04 PM, Drew Allen wrote:
> so just to be clear, if a user, say, wants to encode some mixed order
> ambisonics using ch253, how does the decoder know what ambisonic
> channels it has received and know how to render them correctly?

Well, each line of the matrix would correspond to a channel in the
ambisonics channel order. If that channel isn't encoded, then the line
would have only zeros.

The only way to avoid that situations would be to encode a separate D
value (D <= C) for the number of non-zero channels among the C
ambisonics channels possible. Then you'd store C values in the channel
mapping array (equivalent to a CxD permutation matrix), followed by a
Dx(M+N) weight matrix that would no longer have entire lines of zeros.
The result would be more compact in the case of sparse representation,
but IMO it'd be pretty ugly and prone to implementation errors. And if
you force D==C and don't code the D (which is what I'm proposing), then
the channel mapping permutation automatically becomes redundant.

Cheers,

	Jean-Marc

> On Mon, Mar 13, 2017 at 3:00 PM Drew Allen <bitllama@google.com
> <mailto:bitllama@google.com>> wrote:
> 
>     Got it. In that case, it certainly seems reasonable if I understand
>     correctly. Thanks for clearing that up!
> 
>     On Mon, Mar 13, 2017 at 2:55 PM Jean-Marc Valin <jmvalin@mozilla.com
>     <mailto:jmvalin@mozilla.com>> wrote:
> 
>         On 13/03/17 05:44 PM, Drew Allen wrote:
>         > I think the issue is that the number of total channels rises
>         > quadratically in respect to the ambisonic order (N + 1)^2. If
>         a user
>         > wants to use just the horizontal channels, it is only 2 * N +
>         1. If they
>         > wish to code very high-order (>10th order) horizontal
>         channels, they
>         > would be artifically limited by all the zero channels being
>         produced,
>         > no? Or can this handled without actually creating all those
>         empty channels?
> 
>         As far as I understand, the current draft already has all the
>         limitations you're describing. The channel mapping array is
>         basically
>         equivalent to a CxC permutation matrix that multiplies the Cx(N+M)
>         weight matrix. The result is still a Cx(N+M) matrix, so using the
>         resulting matrix as weights can still do everything without the
>         need for
>         the channel mapping to do the permutations.
> 
>         Cheers,
> 
>                 Jean-Marc
> 
>         > On Mon, Mar 13, 2017 at 2:41 PM Mark Harris
>         <mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>         > <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>> wrote:
>         >
>         >     On Mon, Mar 13, 2017 at 10:38 AM, Jan Skoglund
>         <jks@google.com <mailto:jks@google.com>
>         >     <mailto:jks@google.com <mailto:jks@google.com>>> wrote:
>         >     > Hey,
>         >     >
>         >     > Thanks for your comments
>         >     >
>         >     > On Mon, Mar 13, 2017 at 10:08 AM Mark Harris
>         <mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>         >     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>>
>         wrote:
>         >     >>
>         >     >> On Fri, Feb 17, 2017 at 1:57 PM, Jean-Marc Valin
>         >     <jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>
>         <mailto:jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>>>
>         >     >> wrote:
>         >     >> > 3.2.  Channel Mapping Family 3
>         >     >> >
>         >     >> > I would suggest removing the "Output Channel
>         Numbering" field
>         >     because it
>         >     >> > is fully equivalent to simply permuting lines of the
>         matrix.
>         >     Also, I
>         >     >> > believe that the size of the matrix was meant to be
>         "32*(N+M)*C
>         >     bits"
>         >     >> > rather than "32*N*C bits".
>         >     >>
>         >     >> To expand on this a bit, a mapping family maps M+N
>         decoded channels
>         >     >> (corresponding to the actual order of the coupled and
>         uncoupled
>         >     >> channels in the bitstream) to C output channels
>         (channels with a
>         >     >> specific semantic meaning).  The additional "Output Channel
>         >     Numbering"
>         >     >> table confuses things by adding an additional mapping
>         from the output
>         >     >> channel numbers to a different set of numbers with
>         actual semantic
>         >     >> meaning, leaving the output channel numbers with no
>         apparent meaning.
>         >     >>
>         >     >> This does have a potential benefit as a matrix
>         compression technique,
>         >     >> to reduce the size of the matrix when it would contain
>         rows that are
>         >     >> all zero.  However considering that the matrix occurs
>         only once, and
>         >     >> mapping family 2 already offers a way to compress the
>         matrix, this
>         >     >> alone does not seem worth the complexity of another
>         level of
>         >     >> indirection.  If matrix compression is desired it would
>         probably be
>         >     >> less confusing to describe it in those terms and keep
>         the semantic
>         >     >> meaning tied to the output channels.
>         >     >>
>         >     >>
>         >     >> The description of the Output Channel Numbering also
>         does not specify
>         >     >> the intended behavior if the same value appears in the
>         table multiple
>         >     >> times.
>         >     >>
>         >     >> Additionally, section 4.2 describes how to perform a stereo
>         >     downmix of
>         >     >> mapping family 3, but makes assumptions about the
>         output channel
>         >     >> numbering.  This seems harmful and likely to promote
>         implementations
>         >     >> that make similar assumptions.  If it is necessary to
>         apply the
>         >     output
>         >     >> channel numbering described in section 3.2 in order to
>         implement a
>         >     >> correct stereo downmix, then it would be better to
>         simply use the
>         >     >> output channels from section 3 as input to the downmix,
>         consolidating
>         >     >> sections 4.1 and 4.2, rather than specify new formulas
>         that make
>         >     >> assumptions about the mapping.  That would also greatly
>         simplify
>         >     >> section 4.
>         >     >>
>         >     >> Eliminating the Output Channel Numbering table as
>         Jean-Marc suggests
>         >     >> should resolve these concerns.
>         >     >
>         >     >
>         >     > The problem is that once we allow mixed orders there is
>         no unique
>         >     way for a
>         >     > receiver/decoder
>         >     > to resolve the mapping to ACNs from just a number of
>         total output
>         >     channels.
>         >
>         >
>         >     In mapping family 2, the channel count (C) is the number
>         of channels
>         >     in the fully periphonic configuration, but it is not
>         necessary to
>         >     encode them all.  The channel mapping table can map each
>         ACN to a
>         >     specific decoded channel or to silence.  For mixed order,
>         some of the
>         >     ACNs will be mapped to silence and will not be encoded.
>         >
>         >     In mapping family 3, the matrix can do everything that the
>         channel
>         >     mapping table can do and more.  Why not treat C in the
>         same manner, as
>         >     the number of channels in the fully periphonic
>         configuration, even if
>         >     some are silent?
>         >
>         >      - Mark
>         >
>         >     _______________________________________________
>         >     codec mailing list
>         >     codec@ietf.org <mailto:codec@ietf.org>
>         <mailto:codec@ietf.org <mailto:codec@ietf.org>>
>         >     https://www.ietf.org/mailman/listinfo/codec
>         >
>         >
>         >
>         > _______________________________________________
>         > codec mailing list
>         > codec@ietf.org <mailto:codec@ietf.org>
>         > https://www.ietf.org/mailman/listinfo/codec
>         >
>