Re: [Cbor] Record proposal

Kris Zyp <kriszyp@gmail.com> Thu, 18 November 2021 05:53 UTC

Return-Path: <kriszyp@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB4963A03F8 for <cbor@ietfa.amsl.com>; Wed, 17 Nov 2021 21:53:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wE9tMHefvBf3 for <cbor@ietfa.amsl.com>; Wed, 17 Nov 2021 21:53:05 -0800 (PST)
Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CF4663A03F4 for <cbor@ietf.org>; Wed, 17 Nov 2021 21:53:04 -0800 (PST)
Received: by mail-ed1-x52f.google.com with SMTP id t5so22111799edd.0 for <cbor@ietf.org>; Wed, 17 Nov 2021 21:53:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=0xkx2yU/qO+0Jk5eePdRNz63wFgQJg93plEQfcYcHMM=; b=RiVm5CAs/voQ0NszAH3U//RFpllnD6pXZyWheFMtpzCu86sikPsFLFB1DAlgi1LeKh HWSZBzA/iTANrqr6zSr7xiCb8qgoCXRmEef3IUkQRAZnbIhOOQ/VJO5Ttav4/QtqbPTQ RHTuK4Fwtc0rLJNyThmT0lz1ppNOF3fLxDHWvb2/dMcb8kdTKRc6/DvJpF9I//48VTxC cF+sEQK/fUZ+CfHA00KKEnQQAgPFDH5XcGDoHILFy3qKlKXKVuS5UOic6B0G2TNLfDT0 7d/RltpVVgu5TG/YLhvwDHya0mvf+pOeB2W9rPPIEfwiVT9zSCgD4RXRjoqiqZpowV0D VJPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=0xkx2yU/qO+0Jk5eePdRNz63wFgQJg93plEQfcYcHMM=; b=tcdYWOQCxfh87KObTXLjHQ37GUv9tU0daGUtONy14Plk5tc4kRIubDjf5Kg/YZDNZz LJSZFe8jVp0CzqMIMGN15MKDd7vKOF6jQezfOGQs1zqFfVxHMXjNFxn9IxpVES7oEUPg dD1+ycqbEu6xVW1uZnofX7FlPLYu3ThIHpjEqmTep/cgk/OhRN630vrBbNGtAudlLNN8 ByVsuwQfgg6Oxxkp9d4rTTaMJKZvpunHV6my1KEx2JPweGU4/7VXx2nkJek79InHQJdR TjW9Hv83mkFA+757x5h4ZW9KH51eW+wu4MsQ+8rEGMcLr++0FiHXSYjK5taWXfJMf+eY m8eg==
X-Gm-Message-State: AOAM533ACow/4euyjR2Sayoc2pvUVTQBNKvGA0O88jwdkTyN/RvMp9Kb RHsLVOroRgFdd4s4Tferqt1xNhY+EmqENo5k6c+nDFvE
X-Google-Smtp-Source: ABdhPJy6bmzO+m1XvBpr4p/r7m1J9wqozzqkkHD6r/SXjMY9ayWLBTOU86Qteg//Ji6qi0int7yuWok62IeZoGCMQWc=
X-Received: by 2002:a50:da48:: with SMTP id a8mr7284741edk.146.1637214778135; Wed, 17 Nov 2021 21:52:58 -0800 (PST)
MIME-Version: 1.0
References: <CAEs2a6vNhrHhaiPUNtbJ68WYfbrprETPr+kmWNJgNXMSawyBig@mail.gmail.com> <E3F121DA-95EE-43C6-BC72-E3763C034944@tzi.org> <CAEs2a6uZrT9FFP6qa+hPV2sYO0y+xJJmLaF-pPoynE2vqspfBg@mail.gmail.com> <CAEs2a6uHyvvghAMCN=UmhpJMoiES7zoPmGi-bATZWXgjA068Mg@mail.gmail.com> <YY0B4YxuMuw20umu@hephaistos.amsuess.com> <CAEs2a6utA=GQSx2Ln=5wnoNdS6z+0ExdCcfNXG6cAg=1MxnT=w@mail.gmail.com> <901541DC-A520-44CD-AA8D-F2CE77F03FA0@tzi.org> <CAEs2a6sZd4s-DJ3R_M4BLwO12s8i2AGfv0yXCaWdy+baOuAEqw@mail.gmail.com> <8CA1A63D-70B5-4109-ABE7-9CF9197F0375@tzi.org> <CAEs2a6uTKJ1DOTjREjKaRSY6kNAHSof97OoRAZbjDWOazLQC+A@mail.gmail.com> <CAEs2a6tY02haauD4OL18fp15Zet2bqkq+xVzEvAEiK5cvTpy2w@mail.gmail.com> <5C7719D8-8DCB-41BE-9111-882A02D43506@tzi.org>
In-Reply-To: <5C7719D8-8DCB-41BE-9111-882A02D43506@tzi.org>
From: Kris Zyp <kriszyp@gmail.com>
Date: Wed, 17 Nov 2021 22:52:45 -0700
Message-ID: <CAEs2a6vVL9_wvrbwske80m5P5Y1xKw6_ecitDL9uybf2TsvWHw@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: =?UTF-8?Q?Christian_Ams=C3=BCss?= <christian@amsuess.com>, cbor@ietf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/cKzPHg6fJa4sbHjXEdgu5cfgvoM>
Subject: Re: [Cbor] Record proposal
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2021 05:53:10 -0000

That sounds reasonable to me.

To clarify a little bit of the rationale for this:
The purpose of a global registry, as I understand it, is to define the
global tag ids that need to be known by independent encoders and
decoders prior to their encoding and decoding interaction. However,
ids that can be communicated within a message for the scope of that
message, don't require global allocation, if they are only used within
a specified scope. This is just like in a programming language: a
variable from an inner block may shadow another variable from an outer
block or a global scope temporarily within its block without affecting
the outer or global scope (this doesn't require any coordination with
a registry of globals). And a decoder that really implements this
proposal would, by definition, be implementing this behavior of
temporary tag reassignment/shadowing, if it really is a true record
implementation (i.e. a real record implementation wouldn't see
gobbledygook if it is conformant).

Also allowing dynamic/temporary allocation permits potential use of
shorter/more efficient tag ids based on known use of other tags, and
in the case of record references, these may occur much more frequently
in a data item/document compared record definitions, and be much more
sensitive to size differences. Furthermore, this also sidesteps issues
with having to figure out the appropriate size of a range of ids to
globally allocate. How many ids should be allocated? In terms of
percentage of available space, registering/using 256 ids from the tag
1+2 range seems like it is effectively equivalent as registering 1 id
from the tag 1+1 range (and tag 1+4 would definitely be undesirable
due to the high frequency and sensitivity to size). And I have had
users of my implementations say they need a few hundred structures to
be defined and referenceable, so even 256 may be somewhat limiting to
some users.

But that being said, if you are thinking that it may be too onerous
for (record) implementators to support this type of potentially
dynamic/temporary tag reassignment with possibly conflicting ids (as
opposed to using a more stable table of tags or tag ranges), I can
understand that concern. What size of chunk of tag ids (above 32768)
do you think would be appropriate to allocate? Would a chunk of 256 be
reasonable?

Anyway, thank you for your help and feedback!
Thanks,
Kris



On Wed, Nov 17, 2021 at 1:52 PM Carsten Bormann <cabo@tzi.org> wrote:
>
> Hi Kris,
>
> thank you for the update.
> I still need to take a closer look, but I have one immediate reaction:
> You probably shouldn’t induce implementations to “hijack” a tag number.
> The handling of a tag number that has actually been allocated for something else may be deeply embedded into the CBOR decoder; a record implementation then would only see gobbledygook.
> So I think we should allocate some space for tag-allocated tags — I don’t see a big problem with allocating a chunk above 32768 just for the record proposal.
>
> Grüße, Carsten
>
>
> > On 2021-11-17, at 16:16, Kris Zyp <kriszyp@gmail.com> wrote:
> >
> > I have updated my tag registration proposal/submission at
> > https://github.com/kriszyp/cbor-records to use 1+2 tag entries, which
> > hopefully makes this a much less intrusive registry entry. I have also
> > updated the proposed tag definitions to also support up-front
> > declaration of a set of record structure definitions for a data item
> > ("around" that data item, like the packed approach, as you suggested),
> > in addition to the current inline record definitions (which can be
> > scoped with record definitions and should follow a well-specified
> > order for when they can be referenced by subsequent references). I
> > hope this offers flexibility for encoders that have all structures
> > known a-priori and streaming encoders (that do not), while still
> > maintaining nearly the same mechanics for decoders. Let me know if you
> > think this looks reasonable.
> > Thank you!
> > Kris
> >
> > On Thu, Nov 11, 2021 at 9:15 PM Kris Zyp <kriszyp@gmail.com> wrote:
> >>
> >>> Actually, stats would be very interesting.
> >>> I was assuming that the 1+1 setup comes with a number of 1+2 referencing records, the hit from going to 1+2 there as well would be relatively insignificant.
> >>> Number are better than assumptions!
> >>
> >> You are definitely right, using 1+2 tag for defining records is pretty
> >> insignificant (around a quarter of percent in my tests). Anyway, I put
> >> together some tests comparing CBOR packed, record structures, and
> >> combinations with a couple test data structures, with my library, for
> >> the sake of further comparisons:
> >> https://gist.github.com/kriszyp/b623b85d2dc25ac9e3b07d8f39df9307
> >>
> >> Anyway, seeing this, I am happy to go ahead and update my proposal to
> >> use a 1+2 tag for defining records. And thinking about this, I don't
> >> think my proposal necessarily even needs to mandate the tag ids used
> >> for referencing records since those are dynamically assigned and
> >> explicitly specified by the encoder itself (encoder obviously must not
> >> conflict and use tag ids that will be used for other purposes), albeit
> >> can encourage a certain range (presumably from the first-come
> >> first-serve range).
> >>
> >> Do you have any preference for a tag id to use? It looks like 279 is
> >> the next in the contiguous block, but sounds like choosing aesthetic
> >> characters is the new preference (29299/"rs" perhaps).
> >>
> >> Anyway, thanks again for the helpful feedback, really appreciate it!
> >>
> >> Thanks,
> >> Kris
> >
> > _______________________________________________
> > CBOR mailing list
> > CBOR@ietf.org
> > https://www.ietf.org/mailman/listinfo/cbor
>