Re: [Cbor] Record proposal

Kris Zyp <kriszyp@gmail.com> Mon, 29 November 2021 13:35 UTC

Return-Path: <kriszyp@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 407303A09B2 for <cbor@ietfa.amsl.com>; Mon, 29 Nov 2021 05:35:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xA4EtFfdJkvZ for <cbor@ietfa.amsl.com>; Mon, 29 Nov 2021 05:35:19 -0800 (PST)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D19813A09AB for <cbor@ietf.org>; Mon, 29 Nov 2021 05:35:18 -0800 (PST)
Received: by mail-ed1-x52a.google.com with SMTP id t5so72236140edd.0 for <cbor@ietf.org>; Mon, 29 Nov 2021 05:35:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=H4Www20omlMnZCG5cykz7fATHQB/bmeKBtufzAzXiw4=; b=gA6yHaOXJJrIMSRuNihHNu7/h6koISeaV2PGaoSPEugSEI+C9CtA7hQk/djASJq3Gb 0Z6N3XVNFDfqthosN3yzxHFjQvDmKUfbfmgJ/oBOFTyOwSosDHkpZ7lrnYyuwSWU7nYA ipcfFSFxbVh6m8W6s/rzimQ81qHbFP6Wj/8CTILPhkncKvOYg6yjpxacA7iC/4eNbkca o/fF6ViD91oCGqP5UtySUHZ+YhLaAe3pL5gceQ1MKimQYtjMJrgKOu3ll0UuvSKJOYAv 09Z8JYqFfIw6YQ0C3Wp+kqEE4WlXrw7gsCY/mfFZmWnmuBUAn356Xbo9NQ0SWFoyHuHf +cEw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=H4Www20omlMnZCG5cykz7fATHQB/bmeKBtufzAzXiw4=; b=BDC8yad4fWXMtZhK/I4TOCgO8jBLTi0jRnzJ3wflfcargkJm70klCNjqCKPxFFLXpB EbbFhymW78l2YklgggPP99EKhDZW5COccatGuF0jdG9MjdMplH/jBWNded9eI3lySw60 RT/a6Okk5RcoAdJ2w9l0AlWfe4LEg5bkZrOqRBSESHs3L2x+XJqKgSRG5kkkmd5jwNDi T7zXrfxfKgU9jPRPxeWiFiWIEIVIB0GndmLFY5MGSaKvYdhO+bM31uXnbxhXyreBkWuJ hOV5EFyzxAm2/Jm7q3az8gisLPXj9gTb4Dl7LWfybqVB5kM46YerTZME+7EgZRPoWR/J 9R7w==
X-Gm-Message-State: AOAM531lA0x8qkF1R6BRv1v+NZQFvUdXL9REFXF31klm+ESD1GRRlje1 b1aqDsPj8BK1csnOMIMVr3L8geV+fGuhOjEtHbHyV+d5NK0=
X-Google-Smtp-Source: ABdhPJxr1DYa36/zzmGFxnjyhsrDF8VlapWkknE4HozObNVfq0dqOVO/go5iDVAbEYC0pF88bDfwD2TUnCszurk/8YE=
X-Received: by 2002:a17:907:3f07:: with SMTP id hq7mr59500350ejc.420.1638192911591; Mon, 29 Nov 2021 05:35:11 -0800 (PST)
MIME-Version: 1.0
References: <CAEs2a6vNhrHhaiPUNtbJ68WYfbrprETPr+kmWNJgNXMSawyBig@mail.gmail.com> <E3F121DA-95EE-43C6-BC72-E3763C034944@tzi.org> <CAEs2a6uZrT9FFP6qa+hPV2sYO0y+xJJmLaF-pPoynE2vqspfBg@mail.gmail.com> <CAEs2a6uHyvvghAMCN=UmhpJMoiES7zoPmGi-bATZWXgjA068Mg@mail.gmail.com> <YY0B4YxuMuw20umu@hephaistos.amsuess.com> <CAEs2a6utA=GQSx2Ln=5wnoNdS6z+0ExdCcfNXG6cAg=1MxnT=w@mail.gmail.com> <901541DC-A520-44CD-AA8D-F2CE77F03FA0@tzi.org> <CAEs2a6sZd4s-DJ3R_M4BLwO12s8i2AGfv0yXCaWdy+baOuAEqw@mail.gmail.com> <8CA1A63D-70B5-4109-ABE7-9CF9197F0375@tzi.org> <CAEs2a6uTKJ1DOTjREjKaRSY6kNAHSof97OoRAZbjDWOazLQC+A@mail.gmail.com> <CAEs2a6tY02haauD4OL18fp15Zet2bqkq+xVzEvAEiK5cvTpy2w@mail.gmail.com> <5C7719D8-8DCB-41BE-9111-882A02D43506@tzi.org> <CAEs2a6vVL9_wvrbwske80m5P5Y1xKw6_ecitDL9uybf2TsvWHw@mail.gmail.com>
In-Reply-To: <CAEs2a6vVL9_wvrbwske80m5P5Y1xKw6_ecitDL9uybf2TsvWHw@mail.gmail.com>
From: Kris Zyp <kriszyp@gmail.com>
Date: Mon, 29 Nov 2021 06:35:00 -0700
Message-ID: <CAEs2a6tW7K71wKfK-EerdntmyTppqDrz=Fjb7BfADXAkH5N3gA@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: =?UTF-8?Q?Christian_Ams=C3=BCss?= <christian@amsuess.com>, cbor@ietf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/UexW5GMZUG7kwWYLzDY_EYx4Zqs>
Subject: Re: [Cbor] Record proposal
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Nov 2021 13:35:23 -0000

To follow up, I did update the proposal/spec to use the tag id range
of 57342-57599. Let me know if you would prefer different ids.
Thanks,
Kris

On Wed, Nov 17, 2021 at 10:52 PM Kris Zyp <kriszyp@gmail.com> wrote:
>
> That sounds reasonable to me.
>
> To clarify a little bit of the rationale for this:
> The purpose of a global registry, as I understand it, is to define the
> global tag ids that need to be known by independent encoders and
> decoders prior to their encoding and decoding interaction. However,
> ids that can be communicated within a message for the scope of that
> message, don't require global allocation, if they are only used within
> a specified scope. This is just like in a programming language: a
> variable from an inner block may shadow another variable from an outer
> block or a global scope temporarily within its block without affecting
> the outer or global scope (this doesn't require any coordination with
> a registry of globals). And a decoder that really implements this
> proposal would, by definition, be implementing this behavior of
> temporary tag reassignment/shadowing, if it really is a true record
> implementation (i.e. a real record implementation wouldn't see
> gobbledygook if it is conformant).
>
> Also allowing dynamic/temporary allocation permits potential use of
> shorter/more efficient tag ids based on known use of other tags, and
> in the case of record references, these may occur much more frequently
> in a data item/document compared record definitions, and be much more
> sensitive to size differences. Furthermore, this also sidesteps issues
> with having to figure out the appropriate size of a range of ids to
> globally allocate. How many ids should be allocated? In terms of
> percentage of available space, registering/using 256 ids from the tag
> 1+2 range seems like it is effectively equivalent as registering 1 id
> from the tag 1+1 range (and tag 1+4 would definitely be undesirable
> due to the high frequency and sensitivity to size). And I have had
> users of my implementations say they need a few hundred structures to
> be defined and referenceable, so even 256 may be somewhat limiting to
> some users.
>
> But that being said, if you are thinking that it may be too onerous
> for (record) implementators to support this type of potentially
> dynamic/temporary tag reassignment with possibly conflicting ids (as
> opposed to using a more stable table of tags or tag ranges), I can
> understand that concern. What size of chunk of tag ids (above 32768)
> do you think would be appropriate to allocate? Would a chunk of 256 be
> reasonable?
>
> Anyway, thank you for your help and feedback!
> Thanks,
> Kris
>
>
>
> On Wed, Nov 17, 2021 at 1:52 PM Carsten Bormann <cabo@tzi.org> wrote:
> >
> > Hi Kris,
> >
> > thank you for the update.
> > I still need to take a closer look, but I have one immediate reaction:
> > You probably shouldn’t induce implementations to “hijack” a tag number.
> > The handling of a tag number that has actually been allocated for something else may be deeply embedded into the CBOR decoder; a record implementation then would only see gobbledygook.
> > So I think we should allocate some space for tag-allocated tags — I don’t see a big problem with allocating a chunk above 32768 just for the record proposal.
> >
> > Grüße, Carsten
> >
> >
> > > On 2021-11-17, at 16:16, Kris Zyp <kriszyp@gmail.com> wrote:
> > >
> > > I have updated my tag registration proposal/submission at
> > > https://github.com/kriszyp/cbor-records to use 1+2 tag entries, which
> > > hopefully makes this a much less intrusive registry entry. I have also
> > > updated the proposed tag definitions to also support up-front
> > > declaration of a set of record structure definitions for a data item
> > > ("around" that data item, like the packed approach, as you suggested),
> > > in addition to the current inline record definitions (which can be
> > > scoped with record definitions and should follow a well-specified
> > > order for when they can be referenced by subsequent references). I
> > > hope this offers flexibility for encoders that have all structures
> > > known a-priori and streaming encoders (that do not), while still
> > > maintaining nearly the same mechanics for decoders. Let me know if you
> > > think this looks reasonable.
> > > Thank you!
> > > Kris
> > >
> > > On Thu, Nov 11, 2021 at 9:15 PM Kris Zyp <kriszyp@gmail.com> wrote:
> > >>
> > >>> Actually, stats would be very interesting.
> > >>> I was assuming that the 1+1 setup comes with a number of 1+2 referencing records, the hit from going to 1+2 there as well would be relatively insignificant.
> > >>> Number are better than assumptions!
> > >>
> > >> You are definitely right, using 1+2 tag for defining records is pretty
> > >> insignificant (around a quarter of percent in my tests). Anyway, I put
> > >> together some tests comparing CBOR packed, record structures, and
> > >> combinations with a couple test data structures, with my library, for
> > >> the sake of further comparisons:
> > >> https://gist.github.com/kriszyp/b623b85d2dc25ac9e3b07d8f39df9307
> > >>
> > >> Anyway, seeing this, I am happy to go ahead and update my proposal to
> > >> use a 1+2 tag for defining records. And thinking about this, I don't
> > >> think my proposal necessarily even needs to mandate the tag ids used
> > >> for referencing records since those are dynamically assigned and
> > >> explicitly specified by the encoder itself (encoder obviously must not
> > >> conflict and use tag ids that will be used for other purposes), albeit
> > >> can encourage a certain range (presumably from the first-come
> > >> first-serve range).
> > >>
> > >> Do you have any preference for a tag id to use? It looks like 279 is
> > >> the next in the contiguous block, but sounds like choosing aesthetic
> > >> characters is the new preference (29299/"rs" perhaps).
> > >>
> > >> Anyway, thanks again for the helpful feedback, really appreciate it!
> > >>
> > >> Thanks,
> > >> Kris
> > >
> > > _______________________________________________
> > > CBOR mailing list
> > > CBOR@ietf.org
> > > https://www.ietf.org/mailman/listinfo/cbor
> >