[Cbor] Re: Early allocation for packed CBOR (Re: Reminder: CBOR WG Virtual Meeting on 2024-12-11)

Carsten Bormann <cabo@tzi.org> Wed, 11 December 2024 11:07 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F3410C151701 for <cbor@ietfa.amsl.com>; Wed, 11 Dec 2024 03:07:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.908
X-Spam-Level:
X-Spam-Status: No, score=-1.908 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EdUe_EdKCmSj for <cbor@ietfa.amsl.com>; Wed, 11 Dec 2024 03:07:25 -0800 (PST)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1AE0AC15155C for <cbor@ietf.org>; Wed, 11 Dec 2024 03:07:24 -0800 (PST)
Received: from smtpclient.apple (p548dc3ec.dip0.t-ipconnect.de [84.141.195.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Y7Xqt6YCkzDCcJ; Wed, 11 Dec 2024 12:07:22 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51.11.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <5FBB4831-3E96-44E3-A2AA-B2D83B6C1B05@cursive.net>
Date: Wed, 11 Dec 2024 12:07:12 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <03254343-C2C1-4725-8E69-1CF532472C25@tzi.org>
References: <CALaySJKDFscUBGw4CPspXJvUTkXywVHc_FrmhO3ybBWTrwjGXw@mail.gmail.com> <CALaySJJ8-M9x8irtmF2pfDE3GRXU1am9n2a3XeDcmPT+kww+KA@mail.gmail.com> <CALaySJKTQT_9CC-wVVd+fY1NYJ73M8CP22hn=rWrFeTJSJDEsA@mail.gmail.com> <CALaySJKG3oagg6ffLTx8LgvLvnjHHA2DMGgY74E0q=rReAc4PA@mail.gmail.com> <CALaySJLtUR1=G_WH4H+zoJ5LCrHjBgEf1oW104zDtFQighY+gg@mail.gmail.com> <CALaySJLnKxU9m3BNPq4XayrSrorRBG2vuBz1AF-CsEBoSZe7Xg@mail.gmail.com> <CALaySJKaz7C=GN5E=saiDY4KxL+9xCfM0ocZuMStEQ96FnQ4KA@mail.gmail.com> <CALaySJJEXkey9vLAp8VqDXmPsWpxiWN9jjtVnGio1nMQ4K+mDQ@mail.gmail.com> <CALaySJJfc+tET4Vm5UQjHPK5mf61O0iR-1i6=X32CYtWxZLWTQ@mail.gmail.com> <CALaySJKdrk7aPzhT=kbE1B8pq1EBw74nmx_peSJMAoHsG5jyVQ@mail.gmail.com> <CALaySJ+fWX4zEnE5v-Q9R6eCv=kSJjnc-fsXL5PGPgac1GJAcA@mail.gmail.com> <B807C9D3-39A4-4024-BC1D-85DD84EA1735@tzi.org> <DFE56705-CCDD-4172-B577-C873E3DB4898@tzi.org> <5FEA5C07-4A39-4B58-B2AE-F261D111FCE6@cursive.net> <D0618F67-4868-4745-A526-F73DF1A98E1B@tzi.org> <98C6BEDA-C4B2-4657-ABE2-19FE637CE782@cursive.net> <2A875D49-DD88-42D9-969D-0841A6B41F95@tzi.org> <PH7PR02MB92920676E8817271E547F82FB73E2@PH7PR02MB9292.namprd02.prod.outlook.com> <5FBB4831-3E96-44E3-A2AA-B2D83B6C1B05@cursive.net>
To: Joe Hildebrand <hildjj@cursive.net>
X-Mailer: Apple Mail (2.3776.700.51.11.1)
Message-ID-Hash: QQ4SC7A66JRVSJDXR7RIT44CMAXUU6NS
X-Message-ID-Hash: QQ4SC7A66JRVSJDXR7RIT44CMAXUU6NS
X-MailFrom: cabo@tzi.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Michael Jones <michael_b_jones@hotmail.com>, CBOR <cbor@ietf.org>
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Cbor] Re: Early allocation for packed CBOR (Re: Reminder: CBOR WG Virtual Meeting on 2024-12-11)
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/PfYGm4BQqkTOumSS3PXmGXiWP9s>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

On 11. Dec 2024, at 07:05, Joe Hildebrand <hildjj@cursive.net> wrote:
> 
> Short arrays are really cheap in CBOR.

Yes, but using the tag numbers is even cheaper.

For argument bearing tags, the current WG draft creates an overhead of +2 to +5 bytes:

+2 d8 xi arg (0..31 in 1+1 tag)
+3 d9 xi ii arg (32..4095 in 1+2 tag)
+5 da xi ii ii ii arg (4096..a lot in 1+4 tag)

(The x is supposed to indicate that we are of course only using part of the space.
ii is indicating both the index and whether this is a straight or inverted reference.)

Let’s do a quick design with fewer tags to compare:
We could lose the meaning of non-integer content to Tag 6 as a short entry 0 argument reference
and instead use the following construct for all argument references:

6([index, argument])
(Use negative integer for inverted references).
This leads to:

+3 c6 82 hn arg (0..23)
+4 c6 82 hn nn arg (24-~255)
+5 c6 82 hn nn nn arg (256..~65535)

➔ one byte more (or sometimes two).  Why do users of cbor-packed want to spend this byte?

(Zipf’s law makes likely that the first indexes have a higher impact.
A quick back of the envelope test with a Zipfian distribution for tables up to size 10000 tells me the second design costs about 1.22 bytes more per straight reference than the first; this is probably a bit tail-heavy, so the real impact will be closer to 1 byte.)

We could always do the 6([index, arg]) construct in the second design for the higher index numbers only, keeping the 1+1 allocations.
This will have a more limited impact, and would get rid of the cognitive impact of using millions of tags.
But there is no “technical” reason to do that. 

Grüße, Carsten