[Cbor] Re: Early allocation for packed CBOR (Re: Reminder: CBOR WG Virtual Meeting on 2024-12-11)

Carsten Bormann <cabo@tzi.org> Sun, 15 December 2024 06:40 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 492B5C15152C for <cbor@ietfa.amsl.com>; Sat, 14 Dec 2024 22:40:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.205
X-Spam-Level:
X-Spam-Status: No, score=-4.205 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x1mbdy462SBH for <cbor@ietfa.amsl.com>; Sat, 14 Dec 2024 22:40:06 -0800 (PST)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49357C14F6EA for <cbor@ietf.org>; Sat, 14 Dec 2024 22:40:05 -0800 (PST)
Received: from smtpclient.apple (p548dc3ec.dip0.t-ipconnect.de [84.141.195.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Y9tjb0Mb0zDCd9; Sun, 15 Dec 2024 07:40:03 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.300.87.4.3\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20241215062400.795b7401@nuclight.lan>
Date: Sun, 15 Dec 2024 07:39:52 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <CFAB016F-EB06-4FEE-A7F6-427E70E0332C@tzi.org>
References: <CALaySJKDFscUBGw4CPspXJvUTkXywVHc_FrmhO3ybBWTrwjGXw@mail.gmail.com> <CALaySJKaz7C=GN5E=saiDY4KxL+9xCfM0ocZuMStEQ96FnQ4KA@mail.gmail.com> <CALaySJJEXkey9vLAp8VqDXmPsWpxiWN9jjtVnGio1nMQ4K+mDQ@mail.gmail.com> <CALaySJJfc+tET4Vm5UQjHPK5mf61O0iR-1i6=X32CYtWxZLWTQ@mail.gmail.com> <CALaySJKdrk7aPzhT=kbE1B8pq1EBw74nmx_peSJMAoHsG5jyVQ@mail.gmail.com> <CALaySJ+fWX4zEnE5v-Q9R6eCv=kSJjnc-fsXL5PGPgac1GJAcA@mail.gmail.com> <B807C9D3-39A4-4024-BC1D-85DD84EA1735@tzi.org> <DFE56705-CCDD-4172-B577-C873E3DB4898@tzi.org> <5FEA5C07-4A39-4B58-B2AE-F261D111FCE6@cursive.net> <D0618F67-4868-4745-A526-F73DF1A98E1B@tzi.org> <98C6BEDA-C4B2-4657-ABE2-19FE637CE782@cursive.net> <2A875D49-DD88-42D9-969D-0841A6B41F95@tzi.org> <PH7PR02MB92920676E8817271E547F82FB73E2@PH7PR02MB9292.namprd02.prod.outlook.com> <5FBB4831-3E96-44E3-A2AA-B2D83B6C1B05@cursive.net> <03254343-C2C1-4725-8E69-1CF532472C25@tzi.org> <b6788b18-78f3-48e7-b86a-1e369123e7b1@ri.se> <20241215062400.795b7401@nuclight.lan>
To: Vadim Goncharov <vadimnuclight@gmail.com>
X-Mailer: Apple Mail (2.3826.300.87.4.3)
Message-ID-Hash: HEMXHBMVBAC75ENE55PNQ4LD7KF5JGIE
X-Message-ID-Hash: HEMXHBMVBAC75ENE55PNQ4LD7KF5JGIE
X-MailFrom: cabo@tzi.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Marco Tiloca <marco.tiloca=40ri.se@dmarc.ietf.org>, Joe Hildebrand <hildjj@cursive.net>, Michael Jones <michael_b_jones@hotmail.com>, CBOR <cbor@ietf.org>
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Cbor] Re: Early allocation for packed CBOR (Re: Reminder: CBOR WG Virtual Meeting on 2024-12-11)
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/F6kUmZHPegFdKi9arEzf6CuWBF8>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

Hi Vadim,

> On 15. Dec 2024, at 04:24, Vadim Goncharov <vadimnuclight@gmail.com> wrote:
> 
> The problem is that cbor-packed does not achieve good compression at all.
> For 2017's CBOR source of 1210 bytes, tag 25/256 method achieves 904
> bytes (note it's forced for 3 bytes per even shortest reference, in
> contrast to cbor-packed cheating with single byte) and cbor-packed
> claims 793 for same method and 564 with prefixes - while even
> limited LZF achieves 425..435 bytes (depends on key order) and less
> limited LZ4 variants get 404..416 bytes.

There is no doubt that classical data compression will give you smaller encoded sizes than packed CBOR.
But you already know where to find deflate, brotli, or zstd; the latter with some nice additions such as dictionaries.
Trying to hyper-optimize Packed CBOR (e.g., by changing everything :-) is out of scope, as you always can go to well-established classic data compression for that.

CBOR packed is just solving a different problem [1]:

>> This specification describes Packed CBOR, a simple transformation of
>> a CBOR data item into another CBOR data item that is almost as easy
>> to consume as the original CBOR data item. A separate decompression
>> step is therefore often not required at the recipient.

[1]: https://www.ietf.org/archive/id/draft-ietf-cbor-packed-13.html#section-abstract-3

Different from classical data compression, there is no need to unpack/decompress Packed CBOR before using the data.  (Many applications will want to do that anyway, but the point is that APIs that make sense for constrained implementations can easily get by without.)
Packed CBOR is valid CBOR, so it can be embedded into other CBOR data and processed by generic decoders/encoders (*).

Obviously, not everyone needs Packed CBOR, so I can understand some haggling about tag space, in particular for the 40 code points the draft acquires for the referencing tags from the 1+1 space (plus three function tags and one basic table setup tag).
I believe this number is appropriate, given that the reference tags are pretty universal/future-proof as they can be used with table setup mechanisms we haven’t even invented yet.

$ tag-report.rb
range  used     %                 free                total
0 1+0    13 54.17                   11                   24
1 1+1    73 31.47                  159                  232
2 1+2  1087  1.67                64193                65280
3 1+4 65539  0.00           4294836221           4294901760
4 1+8     2  0.00 18446744069414584318 18446744069414584320

Grüße, Carsten

(*) And, yes, it would be interesting to come up with a design for instead using the 24 unused bit combinations (ai=28..30), but that, again, cannot be used as part of valid CBOR; it’s more like a CBOR 2.0 project.