Re: [codec] Comments on draft-ietf-codec-ambisonics-01

Jean-Marc Valin <jmvalin@mozilla.com> Mon, 13 March 2017 22:35 UTC

Return-Path: <jmvalin@mozilla.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D1245129B4C for <codec@ietfa.amsl.com>; Mon, 13 Mar 2017 15:35:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=mozilla.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IYgful63RfPq for <codec@ietfa.amsl.com>; Mon, 13 Mar 2017 15:35:21 -0700 (PDT)
Received: from mail-io0-x236.google.com (mail-io0-x236.google.com [IPv6:2607:f8b0:4001:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 13F98129AF4 for <codec@ietf.org>; Mon, 13 Mar 2017 15:35:21 -0700 (PDT)
Received: by mail-io0-x236.google.com with SMTP id f84so92565413ioj.0 for <codec@ietf.org>; Mon, 13 Mar 2017 15:35:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mozilla.com; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=WSZsYq1XMw5m8h62T5E+P4tyZbHJGEHgrS8JqkVdhZY=; b=F0T0y8XqC2oQ3LJFwUgCGl2vGKyvb3ZE+GcANX303BqmQ0GlK4/NGvllrymsfkEkB3 9kehxjCYomjLZlpWKQI56KHNSDFemg6aG5A4bT3Nf2+cw7JVBzELXUeSB+n/5eQuirMu rRHkeD4Aaet6/FNGmDS9vmQUpUWzOjQ56WpqE=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=WSZsYq1XMw5m8h62T5E+P4tyZbHJGEHgrS8JqkVdhZY=; b=bkWN24pMerhp4xbpPvi1yULguS55v6bHjR461DfiOa/VKOqXiuIeKkkEVEf97wqKaE J2zDbQtza7AEudqrJN/U68HjfMOJycfz7s5VAWRuz0PAFj5vUYHkdNUlXpfQ3Xt/O2xm saT+9Moz7WTciA/CUDSjHHtFFob1DHlQoQdXVrGnqD0m8j4hGv+L4gxd5bFN0xSsiHWP 2IMwCkeCMNyuSqXwqVUXJnqzamlJNOVUfRL5M+1i2ECOvUpQHbo7jRD2WZ/28d5ivaH1 gAcQp0cIsMSxEbUYKsiwp9hS+gwS7q/YNxImzl3u3ib+BslzIbJ6Ulk3DBmnlrC7bBE7 Dy/g==
X-Gm-Message-State: AMke39k9xev1Vz7L8CVBz2w+d9398IfFbJnfss3M2oonUCQxEZWn07oWE+dyz+BUQNRNb0w8
X-Received: by 10.107.46.85 with SMTP id i82mr28349357ioo.85.1489444520236; Mon, 13 Mar 2017 15:35:20 -0700 (PDT)
Received: from panoramix.jmvalin.ca (modemcable067.31-56-74.mc.videotron.ca. [74.56.31.67]) by smtp.gmail.com with ESMTPSA id z5sm4268794ita.6.2017.03.13.15.35.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Mar 2017 15:35:19 -0700 (PDT)
To: Jan Skoglund <jks@google.com>, Jean-Marc Valin <jmvalin@jmvalin.ca>, Drew Allen <bitllama@google.com>, Mark Harris <mark.hsj@gmail.com>, "codec@ietf.org" <codec@ietf.org>
References: <2f534e1b-b1af-266a-50ef-36f1739d878b@jmvalin.ca> <CAMdZqKGzdndiwpdXsYcHS7+r8Ega5LcQmAvcjiuHTHJgtTUwDg@mail.gmail.com> <CA+KMCSXhS2m4Dkous=4RkOibYWuoi+V_zBrhi1+anm-c+syQ1Q@mail.gmail.com> <CAMdZqKFDtD684HMkoO9bXi-c+g+8R+ay9kPdWSQOtHFDbC3ZLA@mail.gmail.com> <CABQ9DcuD+Et6+rBG-rCnWX-Dk-9STZMeYs-6fQWTk1kyjigRhw@mail.gmail.com> <52f5a570-e9f4-ea49-515e-498f0ed4f1bb@mozilla.com> <CABQ9Dcu0JVuAFvThSOgiBzxa+QOD4-1zpLzX6i-RKG7SRJnkNg@mail.gmail.com> <CABQ9Dct0d4id7wnzyu4sQHU=HZFVjCOXHCTO_F5RHcfE7HdH1Q@mail.gmail.com> <17622007-e5ce-0a08-67df-98c30a51e5a8@mozilla.com> <CA+KMCSVPHoav7QzdnvV5_TB2wFidkML0Z2+kp4VpJCU16N1+5Q@mail.gmail.com> <ebae7987-a8c2-befc-0d95-9f2b131916a6@jmvalin.ca> <CA+KMCSVdAG=q_LDhFg7OcKqBbN+6Dbaz-1Bjv6tRji9+d_PktA@mail.gmail.com>
From: Jean-Marc Valin <jmvalin@mozilla.com>
Message-ID: <2480dd55-eaef-9376-c88a-d764d777dfad@mozilla.com>
Date: Mon, 13 Mar 2017 18:35:18 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <CA+KMCSVdAG=q_LDhFg7OcKqBbN+6Dbaz-1Bjv6tRji9+d_PktA@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="jcpt5ssN7hfslbFoA4o1DHSCCkpECvFb4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/codec/PHjXwF3yYdjjOwMOJEbk9Qa8zS0>
Subject: Re: [codec] Comments on draft-ietf-codec-ambisonics-01
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/codec/>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2017 22:35:24 -0000

Well, for family 2 is makes sense to have just a mapping table (1-D
array of size C) and no matrix. Note that in previous comments, I was
using the term "permutation matrix" only in the sense that the 1-D array
is mathematically equivalent to a (sparse) permutation matrix.

For family 3, if you already have a Cx(N+M) weight matrix, then the
mapping table becomes completely redundant.

	Jean-Marc

On 13/03/17 06:25 PM, Jan Skoglund wrote:
> Ha, sorry, wrong name! I meant avoiding permutation matrices.
> 
> Jan
> 
> On Mon, Mar 13, 2017 at 3:19 PM Jean-Marc Valin <jmvalin@jmvalin.ca
> <mailto:jmvalin@jmvalin.ca>> wrote:
> 
>     On 13/03/17 06:17 PM, Jan Skoglund wrote:
>     > Our idea was to avoid a mapping table, potentially sparse, completely
>     > for family 2, and replacing it with a channel numbering list for
>     family 3.
> 
>     Can you explain what you mean here by "avoid a mapping table" for family
>     2 and "channel numbering list" for family 3?
> 
>             Jean-Marc
> 
>     > Cheers,
>     > Jan
>     >
>     > On Mon, Mar 13, 2017 at 3:12 PM Jean-Marc Valin
>     <jmvalin@mozilla.com <mailto:jmvalin@mozilla.com>
>     > <mailto:jmvalin@mozilla.com <mailto:jmvalin@mozilla.com>>> wrote:
>     >
>     >     On 13/03/17 06:04 PM, Drew Allen wrote:
>     >     > so just to be clear, if a user, say, wants to encode some
>     mixed order
>     >     > ambisonics using ch253, how does the decoder know what ambisonic
>     >     > channels it has received and know how to render them correctly?
>     >
>     >     Well, each line of the matrix would correspond to a channel in the
>     >     ambisonics channel order. If that channel isn't encoded, then
>     the line
>     >     would have only zeros.
>     >
>     >     The only way to avoid that situations would be to encode a
>     separate D
>     >     value (D <= C) for the number of non-zero channels among the C
>     >     ambisonics channels possible. Then you'd store C values in the
>     channel
>     >     mapping array (equivalent to a CxD permutation matrix),
>     followed by a
>     >     Dx(M+N) weight matrix that would no longer have entire lines
>     of zeros.
>     >     The result would be more compact in the case of sparse
>     representation,
>     >     but IMO it'd be pretty ugly and prone to implementation
>     errors. And if
>     >     you force D==C and don't code the D (which is what I'm
>     proposing), then
>     >     the channel mapping permutation automatically becomes redundant.
>     >
>     >     Cheers,
>     >
>     >             Jean-Marc
>     >
>     >     > On Mon, Mar 13, 2017 at 3:00 PM Drew Allen
>     <bitllama@google.com <mailto:bitllama@google.com>
>     >     <mailto:bitllama@google.com <mailto:bitllama@google.com>>
>     >     > <mailto:bitllama@google.com <mailto:bitllama@google.com>
>     <mailto:bitllama@google.com <mailto:bitllama@google.com>>>> wrote:
>     >     >
>     >     >     Got it. In that case, it certainly seems reasonable if I
>     >     understand
>     >     >     correctly. Thanks for clearing that up!
>     >     >
>     >     >     On Mon, Mar 13, 2017 at 2:55 PM Jean-Marc Valin
>     >     <jmvalin@mozilla.com <mailto:jmvalin@mozilla.com>
>     <mailto:jmvalin@mozilla.com <mailto:jmvalin@mozilla.com>>
>     >     >     <mailto:jmvalin@mozilla.com <mailto:jmvalin@mozilla.com>
>     <mailto:jmvalin@mozilla.com <mailto:jmvalin@mozilla.com>>>> wrote:
>     >     >
>     >     >         On 13/03/17 05:44 PM, Drew Allen wrote:
>     >     >         > I think the issue is that the number of total
>     channels rises
>     >     >         > quadratically in respect to the ambisonic order (N +
>     >     1)^2. If
>     >     >         a user
>     >     >         > wants to use just the horizontal channels, it is
>     only 2
>     >     * N +
>     >     >         1. If they
>     >     >         > wish to code very high-order (>10th order) horizontal
>     >     >         channels, they
>     >     >         > would be artifically limited by all the zero
>     channels being
>     >     >         produced,
>     >     >         > no? Or can this handled without actually creating
>     all those
>     >     >         empty channels?
>     >     >
>     >     >         As far as I understand, the current draft already
>     has all the
>     >     >         limitations you're describing. The channel mapping
>     array is
>     >     >         basically
>     >     >         equivalent to a CxC permutation matrix that
>     multiplies the
>     >     Cx(N+M)
>     >     >         weight matrix. The result is still a Cx(N+M) matrix, so
>     >     using the
>     >     >         resulting matrix as weights can still do everything
>     >     without the
>     >     >         need for
>     >     >         the channel mapping to do the permutations.
>     >     >
>     >     >         Cheers,
>     >     >
>     >     >                 Jean-Marc
>     >     >
>     >     >         > On Mon, Mar 13, 2017 at 2:41 PM Mark Harris
>     >     >         <mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>
>     >     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>>
>     >     >         > <mailto:mark.hsj@gmail.com
>     <mailto:mark.hsj@gmail.com> <mailto:mark.hsj@gmail.com
>     <mailto:mark.hsj@gmail.com>>
>     >     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>>>> wrote:
>     >     >         >
>     >     >         >     On Mon, Mar 13, 2017 at 10:38 AM, Jan Skoglund
>     >     >         <jks@google.com <mailto:jks@google.com>
>     <mailto:jks@google.com <mailto:jks@google.com>>
>     >     <mailto:jks@google.com <mailto:jks@google.com>
>     <mailto:jks@google.com <mailto:jks@google.com>>>
>     >     >         >     <mailto:jks@google.com <mailto:jks@google.com>
>     <mailto:jks@google.com <mailto:jks@google.com>>
>     >     <mailto:jks@google.com <mailto:jks@google.com>
>     <mailto:jks@google.com <mailto:jks@google.com>>>>> wrote:
>     >     >         >     > Hey,
>     >     >         >     >
>     >     >         >     > Thanks for your comments
>     >     >         >     >
>     >     >         >     > On Mon, Mar 13, 2017 at 10:08 AM Mark Harris
>     >     >         <mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>
>     >     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>>
>     >     >         >     <mailto:mark.hsj@gmail.com
>     <mailto:mark.hsj@gmail.com>
>     >     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>
>     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
>     >     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>>>>
>     >     >         wrote:
>     >     >         >     >>
>     >     >         >     >> On Fri, Feb 17, 2017 at 1:57 PM, Jean-Marc
>     Valin
>     >     >         >     <jmvalin@jmvalin.ca
>     <mailto:jmvalin@jmvalin.ca> <mailto:jmvalin@jmvalin.ca
>     <mailto:jmvalin@jmvalin.ca>>
>     >     <mailto:jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>
>     <mailto:jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>>>
>     >     >         <mailto:jmvalin@jmvalin.ca
>     <mailto:jmvalin@jmvalin.ca> <mailto:jmvalin@jmvalin.ca
>     <mailto:jmvalin@jmvalin.ca>>
>     >     <mailto:jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>
>     <mailto:jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>>>>>
>     >     >         >     >> wrote:
>     >     >         >     >> > 3.2.  Channel Mapping Family 3
>     >     >         >     >> >
>     >     >         >     >> > I would suggest removing the "Output Channel
>     >     >         Numbering" field
>     >     >         >     because it
>     >     >         >     >> > is fully equivalent to simply permuting lines
>     >     of the
>     >     >         matrix.
>     >     >         >     Also, I
>     >     >         >     >> > believe that the size of the matrix was
>     meant to be
>     >     >         "32*(N+M)*C
>     >     >         >     bits"
>     >     >         >     >> > rather than "32*N*C bits".
>     >     >         >     >>
>     >     >         >     >> To expand on this a bit, a mapping family
>     maps M+N
>     >     >         decoded channels
>     >     >         >     >> (corresponding to the actual order of the
>     coupled and
>     >     >         uncoupled
>     >     >         >     >> channels in the bitstream) to C output channels
>     >     >         (channels with a
>     >     >         >     >> specific semantic meaning).  The additional
>     >     "Output Channel
>     >     >         >     Numbering"
>     >     >         >     >> table confuses things by adding an
>     additional mapping
>     >     >         from the output
>     >     >         >     >> channel numbers to a different set of
>     numbers with
>     >     >         actual semantic
>     >     >         >     >> meaning, leaving the output channel numbers
>     with no
>     >     >         apparent meaning.
>     >     >         >     >>
>     >     >         >     >> This does have a potential benefit as a matrix
>     >     >         compression technique,
>     >     >         >     >> to reduce the size of the matrix when it would
>     >     contain
>     >     >         rows that are
>     >     >         >     >> all zero.  However considering that the
>     matrix occurs
>     >     >         only once, and
>     >     >         >     >> mapping family 2 already offers a way to
>     compress the
>     >     >         matrix, this
>     >     >         >     >> alone does not seem worth the complexity of
>     another
>     >     >         level of
>     >     >         >     >> indirection.  If matrix compression is
>     desired it
>     >     would
>     >     >         probably be
>     >     >         >     >> less confusing to describe it in those
>     terms and keep
>     >     >         the semantic
>     >     >         >     >> meaning tied to the output channels.
>     >     >         >     >>
>     >     >         >     >>
>     >     >         >     >> The description of the Output Channel
>     Numbering also
>     >     >         does not specify
>     >     >         >     >> the intended behavior if the same value appears
>     >     in the
>     >     >         table multiple
>     >     >         >     >> times.
>     >     >         >     >>
>     >     >         >     >> Additionally, section 4.2 describes how to
>     >     perform a stereo
>     >     >         >     downmix of
>     >     >         >     >> mapping family 3, but makes assumptions
>     about the
>     >     >         output channel
>     >     >         >     >> numbering.  This seems harmful and likely
>     to promote
>     >     >         implementations
>     >     >         >     >> that make similar assumptions.  If it is
>     necessary to
>     >     >         apply the
>     >     >         >     output
>     >     >         >     >> channel numbering described in section 3.2 in
>     >     order to
>     >     >         implement a
>     >     >         >     >> correct stereo downmix, then it would be
>     better to
>     >     >         simply use the
>     >     >         >     >> output channels from section 3 as input to the
>     >     downmix,
>     >     >         consolidating
>     >     >         >     >> sections 4.1 and 4.2, rather than specify new
>     >     formulas
>     >     >         that make
>     >     >         >     >> assumptions about the mapping.  That would also
>     >     greatly
>     >     >         simplify
>     >     >         >     >> section 4.
>     >     >         >     >>
>     >     >         >     >> Eliminating the Output Channel Numbering
>     table as
>     >     >         Jean-Marc suggests
>     >     >         >     >> should resolve these concerns.
>     >     >         >     >
>     >     >         >     >
>     >     >         >     > The problem is that once we allow mixed orders
>     >     there is
>     >     >         no unique
>     >     >         >     way for a
>     >     >         >     > receiver/decoder
>     >     >         >     > to resolve the mapping to ACNs from just a
>     number of
>     >     >         total output
>     >     >         >     channels.
>     >     >         >
>     >     >         >
>     >     >         >     In mapping family 2, the channel count (C) is
>     the number
>     >     >         of channels
>     >     >         >     in the fully periphonic configuration, but it
>     is not
>     >     >         necessary to
>     >     >         >     encode them all.  The channel mapping table
>     can map each
>     >     >         ACN to a
>     >     >         >     specific decoded channel or to silence.  For mixed
>     >     order,
>     >     >         some of the
>     >     >         >     ACNs will be mapped to silence and will not be
>     encoded.
>     >     >         >
>     >     >         >     In mapping family 3, the matrix can do everything
>     >     that the
>     >     >         channel
>     >     >         >     mapping table can do and more.  Why not treat
>     C in the
>     >     >         same manner, as
>     >     >         >     the number of channels in the fully periphonic
>     >     >         configuration, even if
>     >     >         >     some are silent?
>     >     >         >
>     >     >         >      - Mark
>     >     >         >
>     >     >         >     _______________________________________________
>     >     >         >     codec mailing list
>     >     >         >     codec@ietf.org <mailto:codec@ietf.org>
>     <mailto:codec@ietf.org <mailto:codec@ietf.org>>
>     >     <mailto:codec@ietf.org <mailto:codec@ietf.org>
>     <mailto:codec@ietf.org <mailto:codec@ietf.org>>>
>     >     >         <mailto:codec@ietf.org <mailto:codec@ietf.org>
>     <mailto:codec@ietf.org <mailto:codec@ietf.org>>
>     >     <mailto:codec@ietf.org <mailto:codec@ietf.org>
>     <mailto:codec@ietf.org <mailto:codec@ietf.org>>>>
>     >     >         >     https://www.ietf.org/mailman/listinfo/codec
>     >     >         >
>     >     >         >
>     >     >         >
>     >     >         > _______________________________________________
>     >     >         > codec mailing list
>     >     >         > codec@ietf.org <mailto:codec@ietf.org>
>     <mailto:codec@ietf.org <mailto:codec@ietf.org>>
>     >     <mailto:codec@ietf.org <mailto:codec@ietf.org>
>     <mailto:codec@ietf.org <mailto:codec@ietf.org>>>
>     >     >         > https://www.ietf.org/mailman/listinfo/codec
>     >     >         >
>     >     >
>     >
>     >
>     >
>     > _______________________________________________
>     > codec mailing list
>     > codec@ietf.org <mailto:codec@ietf.org>
>     > https://www.ietf.org/mailman/listinfo/codec
>     >
>