Re: [Cbor] Record proposal

Kris Zyp <kriszyp@gmail.com> Thu, 29 July 2021 00:40 UTC

Return-Path: <kriszyp@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A48EE3A0650 for <cbor@ietfa.amsl.com>; Wed, 28 Jul 2021 17:40:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ADo_Hw9Ov8hk for <cbor@ietfa.amsl.com>; Wed, 28 Jul 2021 17:40:00 -0700 (PDT)
Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 38C623A05F8 for <cbor@ietf.org>; Wed, 28 Jul 2021 17:40:00 -0700 (PDT)
Received: by mail-ed1-x531.google.com with SMTP id j2so5672509edp.11 for <cbor@ietf.org>; Wed, 28 Jul 2021 17:39:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=kP51QOQENlnKlOx4URtQGW+rFFzQE5Kv+OzhfvDhTYg=; b=dlt7v1iG8Q/hZjLwcy/6gvFGKt60qmaskpf6dsuLHUSe8k3g6mRemZyd3m1HS0ZC3E Q7VfJ8VLjnBTxbqOIVDfNIrgv/GNsytcSx9W3euoaG9bnV7RtYMLMHArMFxQfsN6T/lJ nWMa2/hYpRCnrYmcqifmVOg1wOLBd/l0JAz4bWpRoM1kReG6JyZwxE5A4Ur50FuBi86y hFXIPo2QHihUMuiQUGPdPlJqZJINfqq7McIYe3bBaro6FpHep6wyZArEI8eNESaYN/uk IyF7fBetJCf7qYakZNbmOP7PhBZ1hIf6lnLldo+8nQUXDoJaCh22/mFg4g5IoaKxbOJW iifQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=kP51QOQENlnKlOx4URtQGW+rFFzQE5Kv+OzhfvDhTYg=; b=cYBt4c/9+fz9L8gG2Szk8t69VYWugqzxMBV5EyzOamv8b/Ciifwiy0L51TPYMmmJyg gOBnfQvKYT+Lm4IZ1DU7LfsuLBbYn+wjTMLoTgLrTE4Vbb+iJQivNNW6hyYIoHAdJcS2 2sUpXedCHYGl0+Roa73NAg913Itwqaicc54a/SvpF23c4EUA8L8D4/rnhyR5WZ8Paqtm Of9ee9zfmm54EPs7kbrDAw8FcTp6aSbAxkOSx7ufQXPdB7EWIE2iVoUghLB7CkKxTgVP dKEFXA+u06KFP4sh5czIkN0jSVWwaAXl87rUOQcn51wAQTvnIAFT7FmTIYtHmaCHCxeI ndWg==
X-Gm-Message-State: AOAM530GdBzPFL9lYzYb0bQ+p5+jutFkcbQYDvPlKblYBPH3Qe7r+JTd ViE0K/VOMaSbBm7zJVyvVIJSuYukl+2w+16DEFI=
X-Google-Smtp-Source: ABdhPJz2DHAxfXmfxOV8pj0CGaK52jAjNAN+PhykankVCmhSfyCRQLkQHiQi+nE73LLGdfvwR/NKqKbcRiFDL7kre2Q=
X-Received: by 2002:aa7:cd71:: with SMTP id ca17mr2989935edb.58.1627519197777; Wed, 28 Jul 2021 17:39:57 -0700 (PDT)
MIME-Version: 1.0
References: <8421F43D-E9ED-444F-A915-415F3AE59FA0@tzi.org> <YJJ+oJZ5YF/c14sv@hephaistos.amsuess.com> <41C02CBE-E7EC-4E61-889B-779EE561C632@tzi.org> <31190CB3-2EE9-4B92-BBC6-C29F71A11162@hxcore.ol>
In-Reply-To: <31190CB3-2EE9-4B92-BBC6-C29F71A11162@hxcore.ol>
From: Kris Zyp <kriszyp@gmail.com>
Date: Wed, 28 Jul 2021 18:39:46 -0600
Message-ID: <CAEs2a6stR_0=eGP0Vx6z_9x7mf1pWJpRE8toTaSSmdWOQ+zsFA@mail.gmail.com>
To: Christian Amsüss <christian@amsuess.com>
Cc: "cbor@ietf.org" <cbor@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Op-FbrWqjrZeBZAPj_qQ6c8w-II>
Subject: Re: [Cbor] Record proposal
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2021 00:52:35 -0000

I just wanted to check on the status of this, and if this registry
entry is still being considered?
Thank you,
Kris

On Wed, May 5, 2021 at 9:11 AM <kriszyp@gmail.com> wrote:
>
> To try to clarify a little bit about my proposal, my intent with the record proposal was not primarily about minimizing the size of the encodings. I think that is a nice auxiliary benefit, but is not the primary purpose of the proposed tag. The primary purpose is to assign semantics indicating that a sequence of values should be understood as a structured record. I think this is complementary to tag 259, which indicates the opposite, a set of name-values to be interpreted a map structure. And I think this type of semantic application to a primitive data structure is the intent of tags, if I understand correctly.
>
> And the benefit of being able to use this type of tag/semantics is that encoders can more explicitly align encodings with language structures and class. And this can have much more of a significant impact on performance than space. Serializers can “stream” serialization of array or other sequences, that may be of indefinite length, without a prior knowledge of homogeneity, but still be able to use reuse structural information for any objects/elements that have the same structure, in a way that is easily optimized and can scale to arrays and other structures that may not be scannable ahead of time. For example, an encoder can keep a simple cache of class types that it has serialized and recognize that it has already serialized the record structure in a previous entity, and reference that rather reserializing the structure again. And this type of class/type cache can be significantly faster than doing cache lookups for every primitive value. Likewise, on deserialization, this provides an opportunity for a decoder to use a record reference to find and allocate/initialize exactly the correct class or structure, and then decode values directly into that structure.
>
> And my intent with suggesting this as a tag proposal is that hopefully this can simply be a rather unobtrusive addition the tag registry, keeping aligned with the intent and purpose of tag semantics, without trying to alter the direction CBOR itself at all.
>
>
>
> Thanks,
>
> Kris
>
>
>
> Sent from Mail for Windows 10
>
>
>
> From: Carsten Bormann
> Sent: May 5, 2021 6:50 AM
> To: Christian Amsüss
> Cc: cbor@ietf.org; Kris Zyp
> Subject: Re: [Cbor] Record proposal
>
>
>
> On 2021-05-05, at 13:16, Christian Amsüss <christian@amsuess.com> wrote:
>
> >
>
> > (And as much as I dislike being "the person to whom everything looks
>
> > like a nail", I'll probably ask about whether this fits in the general
>
> > model of packed CBOR, with the first entity setting up a single table
>
> > entry, and then the entries expanding a [] to a {}).
>
>
>
> There are two potential aspects to the proposed tag:
>
>
>
> * a more compact representation (which is all that cbor-packed is about)
>
>
>
> * semantic indication that a specific kind of record is being used
>
>
>
> Proposed Tag 105 currently does not have a place for further semantic indications, but one could be added.
>
>
>
> By the way, cbor-packed turns the example I gave in the referenced email into
>
>
>
> 51([["value", "name"], [], [],
>
>    [{simple(1): "one", simple(0): 1},
>
>     {simple(1): "two", simple(0): 2}, {simple(1): "three", simple(0): 3}]])
>
>
>
> Encoding-wise, the last array looks like this:
>
>
>
>       83                  # array(3)
>
>          a2               # map(2)
>
>             e1            # primitive(1)
>
>             63            # text(3)
>
>                6f6e65     # "one"
>
>             e0            # primitive(0)
>
>             01            # unsigned(1)
>
>          a2               # map(2)
>
>             e1            # primitive(1)
>
>             63            # text(3)
>
>                74776f     # "two"
>
>             e0            # primitive(0)
>
>             02            # unsigned(2)
>
>          a2               # map(2)
>
>             e1            # primitive(1)
>
>             65            # text(5)
>
>                7468726565 # "three"
>
>             e0            # primitive(0)
>
>             03            # unsigned(3)
>
>
>
> So the overhead here is one map head and two simple values per row.
>
> (Of course, that assumes that one-byte simple values are still available in the greater context this is in.)
>
>
>
> Even with a form of circumfix compression (e.g., mapping tables with parameters [1]), this is hard to beat encoding wise.
>
> The record proposal as is takes four bytes per row (1+2 tag, 1 array).
>
> This can be optimized significantly further only by amortizing the tag over more than one row, as my “CSV style” does, but that requires homogeneity.
>
>
>
> Grüße, Carsten
>
>
>
> [1]: https://datatracker.ietf.org/doc/draft-bormann-lpwan-cbor-template/
>
>
>
>