Re: [codec] Comments on draft-ietf-codec-ambisonics-01

Jan Skoglund <jks@google.com> Mon, 13 March 2017 22:17 UTC

Return-Path: <jks@google.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A6A80129415 for <codec@ietfa.amsl.com>; Mon, 13 Mar 2017 15:17:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GtoexPt64cMq for <codec@ietfa.amsl.com>; Mon, 13 Mar 2017 15:17:45 -0700 (PDT)
Received: from mail-qk0-x229.google.com (mail-qk0-x229.google.com [IPv6:2607:f8b0:400d:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8194B129BBF for <codec@ietf.org>; Mon, 13 Mar 2017 15:17:30 -0700 (PDT)
Received: by mail-qk0-x229.google.com with SMTP id p64so234842953qke.1 for <codec@ietf.org>; Mon, 13 Mar 2017 15:17:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=ETxtTz9xGxd+albsOg25/xbZ9sjq8jspy2MnSxLt3Cw=; b=H7AMQ1hYJolacgTeswL2scgi5jnIpF2MdjVSpYsSLxLZ3LP+VlOHQkA2cI2Z9WGilQ y3iTy0vU5L5LUm0bs6oGv51UVQa5BiNZHq07ehJhQkLSC2ihGflmWioQvDIX1wXDoFlF ldw4Bbg8z6oU2cNi89PxL4yc3zVaf3wkotLfeytstAOkOvSN9eLsVkjTyNLruldqWe8d S81mo+pt9SD77kGIm0yjUSukmyLHHSNqg7k/q9oZpc2QKt29GRYwEJ5byYm2A1o8mTEU cYsVzyQ6lF0KWnlJIYdOMjy6k89B6nKXEepEXH3b4cDy8ekH3aCOtH5sSRfvlyKFzvXA kdyQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=ETxtTz9xGxd+albsOg25/xbZ9sjq8jspy2MnSxLt3Cw=; b=s4e14u+8n9aepnvzDXy2OcRFy5IQ+MbbEy1ktQkkfYPm9XrskXt/V24Qy1rILJqA/C JSoVQE1g88pLer5rmJLr9+Qt/HUTHRpe75ziwjIf0q0OVRECG8lfICRfrBOIor9kNC4r OA8SNkSRzLc9OIFY/VJgI42UwK8jouCk+VppjZTpqqV3yUUBiIQEOuRKqohQKTEPcGPg HCi1SbpDpcA660829JQuTQMFe+qqnJpWnf1xegsBICO9G6CuFS0FT3bkiC4PUY6iJXO/ WhGMifjFMsZk2sLaYDAi7TyuiXw+mf3QanUgF4fzYiHJozuiB6PV+I4hPFHqEqSGxmi6 w1hA==
X-Gm-Message-State: AMke39n18wpBlJv1vBRMjaPKTqprtAtf+jVqJcBIhqmm3K+8yT4qSIUpX9H/THW3VvZD/XMfEQ0RT3l61E11r0Vl
X-Received: by 10.55.80.135 with SMTP id e129mr32210849qkb.192.1489443449424; Mon, 13 Mar 2017 15:17:29 -0700 (PDT)
MIME-Version: 1.0
References: <2f534e1b-b1af-266a-50ef-36f1739d878b@jmvalin.ca> <CAMdZqKGzdndiwpdXsYcHS7+r8Ega5LcQmAvcjiuHTHJgtTUwDg@mail.gmail.com> <CA+KMCSXhS2m4Dkous=4RkOibYWuoi+V_zBrhi1+anm-c+syQ1Q@mail.gmail.com> <CAMdZqKFDtD684HMkoO9bXi-c+g+8R+ay9kPdWSQOtHFDbC3ZLA@mail.gmail.com> <CABQ9DcuD+Et6+rBG-rCnWX-Dk-9STZMeYs-6fQWTk1kyjigRhw@mail.gmail.com> <52f5a570-e9f4-ea49-515e-498f0ed4f1bb@mozilla.com> <CABQ9Dcu0JVuAFvThSOgiBzxa+QOD4-1zpLzX6i-RKG7SRJnkNg@mail.gmail.com> <CABQ9Dct0d4id7wnzyu4sQHU=HZFVjCOXHCTO_F5RHcfE7HdH1Q@mail.gmail.com> <17622007-e5ce-0a08-67df-98c30a51e5a8@mozilla.com>
In-Reply-To: <17622007-e5ce-0a08-67df-98c30a51e5a8@mozilla.com>
From: Jan Skoglund <jks@google.com>
Date: Mon, 13 Mar 2017 22:17:18 +0000
Message-ID: <CA+KMCSVPHoav7QzdnvV5_TB2wFidkML0Z2+kp4VpJCU16N1+5Q@mail.gmail.com>
To: Jean-Marc Valin <jmvalin@mozilla.com>, Drew Allen <bitllama@google.com>, Mark Harris <mark.hsj@gmail.com>, "codec@ietf.org" <codec@ietf.org>
Content-Type: multipart/alternative; boundary="001a114a6b4eb7b9b2054aa4135b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/codec/_Adbpb6YGOxdzXqvs_1S0co1cSc>
Subject: Re: [codec] Comments on draft-ietf-codec-ambisonics-01
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/codec/>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2017 22:17:47 -0000

Hey,

Our idea was to avoid a mapping table, potentially sparse, completely for
family 2, and replacing it with a channel numbering list for family 3.

Cheers,
Jan

On Mon, Mar 13, 2017 at 3:12 PM Jean-Marc Valin <jmvalin@mozilla.com> wrote:

> On 13/03/17 06:04 PM, Drew Allen wrote:
> > so just to be clear, if a user, say, wants to encode some mixed order
> > ambisonics using ch253, how does the decoder know what ambisonic
> > channels it has received and know how to render them correctly?
>
> Well, each line of the matrix would correspond to a channel in the
> ambisonics channel order. If that channel isn't encoded, then the line
> would have only zeros.
>
> The only way to avoid that situations would be to encode a separate D
> value (D <= C) for the number of non-zero channels among the C
> ambisonics channels possible. Then you'd store C values in the channel
> mapping array (equivalent to a CxD permutation matrix), followed by a
> Dx(M+N) weight matrix that would no longer have entire lines of zeros.
> The result would be more compact in the case of sparse representation,
> but IMO it'd be pretty ugly and prone to implementation errors. And if
> you force D==C and don't code the D (which is what I'm proposing), then
> the channel mapping permutation automatically becomes redundant.
>
> Cheers,
>
>         Jean-Marc
>
> > On Mon, Mar 13, 2017 at 3:00 PM Drew Allen <bitllama@google.com
> > <mailto:bitllama@google.com>> wrote:
> >
> >     Got it. In that case, it certainly seems reasonable if I understand
> >     correctly. Thanks for clearing that up!
> >
> >     On Mon, Mar 13, 2017 at 2:55 PM Jean-Marc Valin <jmvalin@mozilla.com
> >     <mailto:jmvalin@mozilla.com>> wrote:
> >
> >         On 13/03/17 05:44 PM, Drew Allen wrote:
> >         > I think the issue is that the number of total channels rises
> >         > quadratically in respect to the ambisonic order (N + 1)^2. If
> >         a user
> >         > wants to use just the horizontal channels, it is only 2 * N +
> >         1. If they
> >         > wish to code very high-order (>10th order) horizontal
> >         channels, they
> >         > would be artifically limited by all the zero channels being
> >         produced,
> >         > no? Or can this handled without actually creating all those
> >         empty channels?
> >
> >         As far as I understand, the current draft already has all the
> >         limitations you're describing. The channel mapping array is
> >         basically
> >         equivalent to a CxC permutation matrix that multiplies the
> Cx(N+M)
> >         weight matrix. The result is still a Cx(N+M) matrix, so using the
> >         resulting matrix as weights can still do everything without the
> >         need for
> >         the channel mapping to do the permutations.
> >
> >         Cheers,
> >
> >                 Jean-Marc
> >
> >         > On Mon, Mar 13, 2017 at 2:41 PM Mark Harris
> >         <mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
> >         > <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>>
> wrote:
> >         >
> >         >     On Mon, Mar 13, 2017 at 10:38 AM, Jan Skoglund
> >         <jks@google.com <mailto:jks@google.com>
> >         >     <mailto:jks@google.com <mailto:jks@google.com>>> wrote:
> >         >     > Hey,
> >         >     >
> >         >     > Thanks for your comments
> >         >     >
> >         >     > On Mon, Mar 13, 2017 at 10:08 AM Mark Harris
> >         <mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>
> >         >     <mailto:mark.hsj@gmail.com <mailto:mark.hsj@gmail.com>>>
> >         wrote:
> >         >     >>
> >         >     >> On Fri, Feb 17, 2017 at 1:57 PM, Jean-Marc Valin
> >         >     <jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>
> >         <mailto:jmvalin@jmvalin.ca <mailto:jmvalin@jmvalin.ca>>>
> >         >     >> wrote:
> >         >     >> > 3.2.  Channel Mapping Family 3
> >         >     >> >
> >         >     >> > I would suggest removing the "Output Channel
> >         Numbering" field
> >         >     because it
> >         >     >> > is fully equivalent to simply permuting lines of the
> >         matrix.
> >         >     Also, I
> >         >     >> > believe that the size of the matrix was meant to be
> >         "32*(N+M)*C
> >         >     bits"
> >         >     >> > rather than "32*N*C bits".
> >         >     >>
> >         >     >> To expand on this a bit, a mapping family maps M+N
> >         decoded channels
> >         >     >> (corresponding to the actual order of the coupled and
> >         uncoupled
> >         >     >> channels in the bitstream) to C output channels
> >         (channels with a
> >         >     >> specific semantic meaning).  The additional "Output
> Channel
> >         >     Numbering"
> >         >     >> table confuses things by adding an additional mapping
> >         from the output
> >         >     >> channel numbers to a different set of numbers with
> >         actual semantic
> >         >     >> meaning, leaving the output channel numbers with no
> >         apparent meaning.
> >         >     >>
> >         >     >> This does have a potential benefit as a matrix
> >         compression technique,
> >         >     >> to reduce the size of the matrix when it would contain
> >         rows that are
> >         >     >> all zero.  However considering that the matrix occurs
> >         only once, and
> >         >     >> mapping family 2 already offers a way to compress the
> >         matrix, this
> >         >     >> alone does not seem worth the complexity of another
> >         level of
> >         >     >> indirection.  If matrix compression is desired it would
> >         probably be
> >         >     >> less confusing to describe it in those terms and keep
> >         the semantic
> >         >     >> meaning tied to the output channels.
> >         >     >>
> >         >     >>
> >         >     >> The description of the Output Channel Numbering also
> >         does not specify
> >         >     >> the intended behavior if the same value appears in the
> >         table multiple
> >         >     >> times.
> >         >     >>
> >         >     >> Additionally, section 4.2 describes how to perform a
> stereo
> >         >     downmix of
> >         >     >> mapping family 3, but makes assumptions about the
> >         output channel
> >         >     >> numbering.  This seems harmful and likely to promote
> >         implementations
> >         >     >> that make similar assumptions.  If it is necessary to
> >         apply the
> >         >     output
> >         >     >> channel numbering described in section 3.2 in order to
> >         implement a
> >         >     >> correct stereo downmix, then it would be better to
> >         simply use the
> >         >     >> output channels from section 3 as input to the downmix,
> >         consolidating
> >         >     >> sections 4.1 and 4.2, rather than specify new formulas
> >         that make
> >         >     >> assumptions about the mapping.  That would also greatly
> >         simplify
> >         >     >> section 4.
> >         >     >>
> >         >     >> Eliminating the Output Channel Numbering table as
> >         Jean-Marc suggests
> >         >     >> should resolve these concerns.
> >         >     >
> >         >     >
> >         >     > The problem is that once we allow mixed orders there is
> >         no unique
> >         >     way for a
> >         >     > receiver/decoder
> >         >     > to resolve the mapping to ACNs from just a number of
> >         total output
> >         >     channels.
> >         >
> >         >
> >         >     In mapping family 2, the channel count (C) is the number
> >         of channels
> >         >     in the fully periphonic configuration, but it is not
> >         necessary to
> >         >     encode them all.  The channel mapping table can map each
> >         ACN to a
> >         >     specific decoded channel or to silence.  For mixed order,
> >         some of the
> >         >     ACNs will be mapped to silence and will not be encoded.
> >         >
> >         >     In mapping family 3, the matrix can do everything that the
> >         channel
> >         >     mapping table can do and more.  Why not treat C in the
> >         same manner, as
> >         >     the number of channels in the fully periphonic
> >         configuration, even if
> >         >     some are silent?
> >         >
> >         >      - Mark
> >         >
> >         >     _______________________________________________
> >         >     codec mailing list
> >         >     codec@ietf.org <mailto:codec@ietf.org>
> >         <mailto:codec@ietf.org <mailto:codec@ietf.org>>
> >         >     https://www.ietf.org/mailman/listinfo/codec
> >         >
> >         >
> >         >
> >         > _______________________________________________
> >         > codec mailing list
> >         > codec@ietf.org <mailto:codec@ietf.org>
> >         > https://www.ietf.org/mailman/listinfo/codec
> >         >
> >
>
>