Re: [Cbor] Record proposal

Kris Zyp <kriszyp@gmail.com> Thu, 02 December 2021 05:34 UTC

Return-Path: <kriszyp@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7C75C3A0A87 for <cbor@ietfa.amsl.com>; Wed, 1 Dec 2021 21:34:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RA5nODoWGoC4 for <cbor@ietfa.amsl.com>; Wed, 1 Dec 2021 21:34:51 -0800 (PST)
Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6BDCE3A0A86 for <cbor@ietf.org>; Wed, 1 Dec 2021 21:34:51 -0800 (PST)
Received: by mail-ed1-x52e.google.com with SMTP id l25so111087185eda.11 for <cbor@ietf.org>; Wed, 01 Dec 2021 21:34:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Qz3Ms4wpfBeLomvZl053Oj0LxPndxSBqHXgW3yDcvH8=; b=Bo8tuDWXojxT2GVi2kE7A3cD8gkTt0YYZmkx6pGypQztVZbnr899jrMqwOzdeZ/zMw ILNkbWlAaCtQqEJCDR3ucJbYZbNRo6cf/W2XDZu8sp/ySQY0p5hyXuC0eIsDE+vsxpA7 EpVd0BWdUukFHySYYYAa4Ax5W6R79g8wAEhQqn7o80n/WKlQ9Yc5/CGZyKKYqZ6TOoCw vj4W24mKad9FHiNk25WpIfFsMPKOWzajUkFqNhj+bYJDTqpc6Llgc3ciRyvePXc+/Tjy GgZb/fhLhPL7Cxu0K2VnerbAO3MaVFjkiG5atD7jpYeZqLdX4MiqonSxuvK8BALFxEDo ozww==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Qz3Ms4wpfBeLomvZl053Oj0LxPndxSBqHXgW3yDcvH8=; b=6mm3McdGKhi9izRbGJjq+TzicK+f2icUH+Levoflt6li3uz3Otmcg+Sivfm3KeykYZ 0mYhm0wpetDxctHFuCxBwGB3hhSeHneNzizkBwe07ZJmb91tFRGih50KUBqy5Wik4Smg QtmphbNBBLpjqN4xkSevaSvOO/sjAn8uceUKow6nBXtBAx9pnb5omzc2RLN+SylONSZU syfLMuAHF0zk7D2lL+AA1uexwUaPmj8VzbKeZWm3Gcr9H/oSvo/YBYbeUyWemFdjeg7G 6nscRp1a4v/GgKA0ndNo7bFLIXGPDdGgxcbVu17M4fmMWu2ZUdlmlWHJ66Ec1EHqO3+x Hjug==
X-Gm-Message-State: AOAM532W29x0qs30SzlXELiq+4UiXu1cR21LzNKq4haKmSpcXLgUWUVU XC2oO6qFNSl+1fxs7J6D7vlL/6Ccp6LQjmJ/8g4Uuj0X
X-Google-Smtp-Source: ABdhPJyO2FsAOBZvmu1sxE1VRY72U3DouqXBHWIqE4XZknGghRPWXLJ4BlnZl84KLXN9iTl/LT+DCoD/CkqJhHnHIy0=
X-Received: by 2002:a17:906:c14b:: with SMTP id dp11mr12660097ejc.294.1638423288993; Wed, 01 Dec 2021 21:34:48 -0800 (PST)
MIME-Version: 1.0
References: <CAEs2a6vNhrHhaiPUNtbJ68WYfbrprETPr+kmWNJgNXMSawyBig@mail.gmail.com> <E3F121DA-95EE-43C6-BC72-E3763C034944@tzi.org> <CAEs2a6uZrT9FFP6qa+hPV2sYO0y+xJJmLaF-pPoynE2vqspfBg@mail.gmail.com> <CAEs2a6uHyvvghAMCN=UmhpJMoiES7zoPmGi-bATZWXgjA068Mg@mail.gmail.com> <YY0B4YxuMuw20umu@hephaistos.amsuess.com> <CAEs2a6utA=GQSx2Ln=5wnoNdS6z+0ExdCcfNXG6cAg=1MxnT=w@mail.gmail.com> <901541DC-A520-44CD-AA8D-F2CE77F03FA0@tzi.org> <CAEs2a6sZd4s-DJ3R_M4BLwO12s8i2AGfv0yXCaWdy+baOuAEqw@mail.gmail.com> <8CA1A63D-70B5-4109-ABE7-9CF9197F0375@tzi.org> <CAEs2a6uTKJ1DOTjREjKaRSY6kNAHSof97OoRAZbjDWOazLQC+A@mail.gmail.com> <CAEs2a6tY02haauD4OL18fp15Zet2bqkq+xVzEvAEiK5cvTpy2w@mail.gmail.com> <5C7719D8-8DCB-41BE-9111-882A02D43506@tzi.org> <CAEs2a6vVL9_wvrbwske80m5P5Y1xKw6_ecitDL9uybf2TsvWHw@mail.gmail.com> <CAEs2a6tW7K71wKfK-EerdntmyTppqDrz=Fjb7BfADXAkH5N3gA@mail.gmail.com> <CAM70yxB_YaRWccUk_UfLgxwd1gSUNxkDmaWfh-15wEiXsVe9Ng@mail.gmail.com>
In-Reply-To: <CAM70yxB_YaRWccUk_UfLgxwd1gSUNxkDmaWfh-15wEiXsVe9Ng@mail.gmail.com>
From: Kris Zyp <kriszyp@gmail.com>
Date: Wed, 1 Dec 2021 22:34:37 -0700
Message-ID: <CAEs2a6s3=jSb2N7+JHntApW9PWgBCUxV5TP5ej7vLfuR4T6fug@mail.gmail.com>
To: Emile Cormier <emile.cormier.jr@gmail.com>
Cc: Carsten Bormann <cabo@tzi.org>, =?UTF-8?Q?Christian_Ams=C3=BCss?= <christian@amsuess.com>, "cbor@ietf.org" <cbor@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000021dcc705d22326f1"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/VKBA-7GXg6XjtbxvRQH_OuklT-U>
Subject: Re: [Cbor] Record proposal
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Dec 2021 05:34:57 -0000

I certainly empathize with the idea of keeping tags as conceptually
distinct in terms of just adding some extra non-transformational semantic
description to the underlying CBOR data structure. However, there is a wide
range of "meanings" in tags, and realistically, with many, or even most
tags, you can't really "make sense" of a payload without the "semantic
meaning"; the meaning is what provides the direction for how to make sense
of the data. For many tags, something needs to understand the data beyond
just a raw CBOR structure.


Without the semantic meaning, there isn't data loss; raw CBOR structures
could always be transcoded to JSON or anything else (with conventions for
tags and such), but those basic data structures could still be (and needs
to be) interpreted by a decoder/app that understands how the tags encode
things like the distinction between a record and map to properly translate
to various language constructs (failure to distinguish these and conflating
them all into a single JSON object or map would be the real data loss). And
likewise there are many other tags that also have significant meaning for
interpretation, such as the packed-CBOR tags, tags 1-5, 28/29, 100, 256/25,
and I'm sure many more, that do not make useful sense without understanding
the tags. Of course a decoder can always simply defer interpretation of
tags and decode CBOR to a set of tags, array, maps, etc., and leave the
interpretation to a higher level component.


Thanks,

Kris



*From: *Emile Cormier <emile.cormier.jr@gmail.com>
*Sent: *December 1, 2021 4:57 PM
*To: *Kris Zyp <kriszyp@gmail.com>
*Cc: *Carsten Bormann <cabo@tzi.org>rg>; Christian Amsüss
<christian@amsuess.com>om>; cbor@ietf.org
*Subject: *Re: [Cbor] Record proposal



I'd like to point out that this proposal as it's currently written will
result in loss of information with CBOR decoders that don't understand the
tags, or when the data elements are transcoded to another protocol such as
JSON. This is due to the tags carrying information (record IDs) that the
application needs in order to make sense of the received payload. The
proposed tags do more than provide "semantic meaning"; they also carry
information.



This practice of bundling numeric information into the tags is making it so
that these CBOR items are not interoperable when transcoded to JSON (think
web browser), or when a CBOR decoder doesn't understand the tags and
doesn't propagate them up to the application.





On Mon, Nov 29, 2021 at 9:35 AM Kris Zyp <kriszyp@gmail.com> wrote:

To follow up, I did update the proposal/spec to use the tag id range
of 57342-57599. Let me know if you would prefer different ids.
Thanks,
Kris

On Wed, Nov 17, 2021 at 10:52 PM Kris Zyp <kriszyp@gmail.com> wrote:
>
> That sounds reasonable to me.
>
> To clarify a little bit of the rationale for this:
> The purpose of a global registry, as I understand it, is to define the
> global tag ids that need to be known by independent encoders and
> decoders prior to their encoding and decoding interaction. However,
> ids that can be communicated within a message for the scope of that
> message, don't require global allocation, if they are only used within
> a specified scope. This is just like in a programming language: a
> variable from an inner block may shadow another variable from an outer
> block or a global scope temporarily within its block without affecting
> the outer or global scope (this doesn't require any coordination with
> a registry of globals). And a decoder that really implements this
> proposal would, by definition, be implementing this behavior of
> temporary tag reassignment/shadowing, if it really is a true record
> implementation (i.e. a real record implementation wouldn't see
> gobbledygook if it is conformant).
>
> Also allowing dynamic/temporary allocation permits potential use of
> shorter/more efficient tag ids based on known use of other tags, and
> in the case of record references, these may occur much more frequently
> in a data item/document compared record definitions, and be much more
> sensitive to size differences. Furthermore, this also sidesteps issues
> with having to figure out the appropriate size of a range of ids to
> globally allocate. How many ids should be allocated? In terms of
> percentage of available space, registering/using 256 ids from the tag
> 1+2 range seems like it is effectively equivalent as registering 1 id
> from the tag 1+1 range (and tag 1+4 would definitely be undesirable
> due to the high frequency and sensitivity to size). And I have had
> users of my implementations say they need a few hundred structures to
> be defined and referenceable, so even 256 may be somewhat limiting to
> some users.
>
> But that being said, if you are thinking that it may be too onerous
> for (record) implementators to support this type of potentially
> dynamic/temporary tag reassignment with possibly conflicting ids (as
> opposed to using a more stable table of tags or tag ranges), I can
> understand that concern. What size of chunk of tag ids (above 32768)
> do you think would be appropriate to allocate? Would a chunk of 256 be
> reasonable?
>
> Anyway, thank you for your help and feedback!
> Thanks,
> Kris
>
>
>
> On Wed, Nov 17, 2021 at 1:52 PM Carsten Bormann <cabo@tzi.org> wrote:
> >
> > Hi Kris,
> >
> > thank you for the update.
> > I still need to take a closer look, but I have one immediate reaction:
> > You probably shouldn’t induce implementations to “hijack” a tag number.
> > The handling of a tag number that has actually been allocated for
something else may be deeply embedded into the CBOR decoder; a record
implementation then would only see gobbledygook.
> > So I think we should allocate some space for tag-allocated tags — I
don’t see a big problem with allocating a chunk above 32768 just for the
record proposal.
> >
> > Grüße, Carsten
> >
> >
> > > On 2021-11-17, at 16:16, Kris Zyp <kriszyp@gmail.com> wrote:
> > >
> > > I have updated my tag registration proposal/submission at
> > > https://github.com/kriszyp/cbor-records to use 1+2 tag entries, which
> > > hopefully makes this a much less intrusive registry entry. I have also
> > > updated the proposed tag definitions to also support up-front
> > > declaration of a set of record structure definitions for a data item
> > > ("around" that data item, like the packed approach, as you suggested),
> > > in addition to the current inline record definitions (which can be
> > > scoped with record definitions and should follow a well-specified
> > > order for when they can be referenced by subsequent references). I
> > > hope this offers flexibility for encoders that have all structures
> > > known a-priori and streaming encoders (that do not), while still
> > > maintaining nearly the same mechanics for decoders. Let me know if you
> > > think this looks reasonable.
> > > Thank you!
> > > Kris
> > >
> > > On Thu, Nov 11, 2021 at 9:15 PM Kris Zyp <kriszyp@gmail.com> wrote:
> > >>
> > >>> Actually, stats would be very interesting.
> > >>> I was assuming that the 1+1 setup comes with a number of 1+2
referencing records, the hit from going to 1+2 there as well would be
relatively insignificant.
> > >>> Number are better than assumptions!
> > >>
> > >> You are definitely right, using 1+2 tag for defining records is
pretty
> > >> insignificant (around a quarter of percent in my tests). Anyway, I
put
> > >> together some tests comparing CBOR packed, record structures, and
> > >> combinations with a couple test data structures, with my library, for
> > >> the sake of further comparisons:
> > >> https://gist.github.com/kriszyp/b623b85d2dc25ac9e3b07d8f39df9307
> > >>
> > >> Anyway, seeing this, I am happy to go ahead and update my proposal to
> > >> use a 1+2 tag for defining records. And thinking about this, I don't
> > >> think my proposal necessarily even needs to mandate the tag ids used
> > >> for referencing records since those are dynamically assigned and
> > >> explicitly specified by the encoder itself (encoder obviously must
not
> > >> conflict and use tag ids that will be used for other purposes),
albeit
> > >> can encourage a certain range (presumably from the first-come
> > >> first-serve range).
> > >>
> > >> Do you have any preference for a tag id to use? It looks like 279 is
> > >> the next in the contiguous block, but sounds like choosing aesthetic
> > >> characters is the new preference (29299/"rs" perhaps).
> > >>
> > >> Anyway, thanks again for the helpful feedback, really appreciate it!
> > >>
> > >> Thanks,
> > >> Kris
> > >
> > > _______________________________________________
> > > CBOR mailing list
> > > CBOR@ietf.org
> > > https://www.ietf.org/mailman/listinfo/cbor
> >

_______________________________________________
CBOR mailing list
CBOR@ietf.org
https://www.ietf.org/mailman/listinfo/cbor