[Cbor] Abandon cbor-packed (Re: Early allocation for packed CBOR)

Vadim Goncharov <vadimnuclight@gmail.com> Mon, 16 December 2024 03:51 UTC

Return-Path: <vadimnuclight@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C0E39C14CEE4 for <cbor@ietfa.amsl.com>; Sun, 15 Dec 2024 19:51:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.106
X-Spam-Level:
X-Spam-Status: No, score=-2.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MSXeDx9MJbfJ for <cbor@ietfa.amsl.com>; Sun, 15 Dec 2024 19:51:09 -0800 (PST)
Received: from mail-lj1-x235.google.com (mail-lj1-x235.google.com [IPv6:2a00:1450:4864:20::235]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4098DC14F6E3 for <cbor@ietf.org>; Sun, 15 Dec 2024 19:51:09 -0800 (PST)
Received: by mail-lj1-x235.google.com with SMTP id 38308e7fff4ca-3011c7b39c7so40044391fa.1 for <cbor@ietf.org>; Sun, 15 Dec 2024 19:51:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734321067; x=1734925867; darn=ietf.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=Zv1KsvMwG+FbV0+P2Nqd5VU7LJaXtiaMbqUfM+EI2II=; b=epsoGx/Am+TMSHqytSkLUSsPC0VBE5Kvp2thySCg9aIlNk8DzGPoYugo2WB/zShKek HCtquEXPgPfmImN0BbunvWTXr7eQhl1BhKsVTpgL9ydqwD2Bs0dezojgcI9XYwvcy/AH yqndti1Vsw4N8jsQ8oYY6+xsuX06WugGehK7nnrK69hLXgy6BvLffUBwvT0qfEgIuObZ +g2b2t+nGP0R1A8hHwB88zKanlnDFzgrDcyD6cu8G3Db1iXd9l4OApCzH9QKM5J+iaIe mu8erPwHp0YDW5q7Y6MSDpa3RfMJX4lFX3oHRMbTiC/LSkDrc4qFRcOdPE3c/1Jskln2 y/IQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734321067; x=1734925867; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Zv1KsvMwG+FbV0+P2Nqd5VU7LJaXtiaMbqUfM+EI2II=; b=Yt4l9KLiVybXTs3nD62lr1fd9vu4iBCG7MR5eDsf3yYj/3F8MOZcH7XFWo2TzATVey 3qkLUP2EhZhhfF+p1SvinU7tIcak5mvkNmxVqvCLTiC0WvaS0Ej4yiNrMv2XG/uxAMUN C5wT2Kwv/6KdjTExzPPxh1FhkEfEbVWP2QziRRd0J1cqNn/TRfPoJb3fFA311CbG473W X8NE9jGYMX/ndWLeRy9cZqAgiEbBYRfH8qf6kHpxGemfhAndm/+ahHfkH9kLepuvCFHI iDptDCdgQSS1ZkBwsgvDZNW0HnmLTuXORh44rcnui5Av5/PBUNGbq2zNBFsRRhE0gMl9 II6w==
X-Forwarded-Encrypted: i=1; AJvYcCW31UBHhkcL7TVhfreXQpUBzemaG3qaNrN34wYMYnjKL0LQe5yORsnap7BtXnIcD9WntK+O@ietf.org
X-Gm-Message-State: AOJu0Yz2n7gGE2wrAaSBREGSbV5OvEEg+UIC34DM6fGpwCymWfhUfsPu qpU485gAXz3SoQrHrI0jRd7Rr3WoX/vWagp058LVfmXZnbs/yh0W
X-Gm-Gg: ASbGncs+NG5Lka02CUoABgRY1pfzh9TrGkFdA2MC1ovaE0RKhUa8jge08+kecoz+2FW wyvOD64ffvsA6pETqdXDNf98ja1qzgmXsPOw5kGC+oXmJwOw318UuKyTrdY9Gq9hzw55+1d/XIK 6AdNcV+MxEAOGym7NmfRi5M5xNbIMH3e5k4yIG1FG5kZxGjC2lTXYES8ym5xnjOYHdTLRn9VLlx Q6nnfieFdTSO6vHBYahLyql+SK69PPZseh7ZyJDrJIUT09kfXzT26386HlVtdcIo7/vXlkivd5A 9Qc+W2FGrO5zxNUhZyGSXbzV
X-Google-Smtp-Source: AGHT+IGIvCSJG/66jPZXFitIQydv8XTs3dZZyp8ZKpAKd9jQyktCxO/wJef+lRDxoy1Vo3Vg5yQfpw==
X-Received: by 2002:a2e:a781:0:b0:302:4a4e:67da with SMTP id 38308e7fff4ca-302545b94ddmr37449861fa.36.1734321067126; Sun, 15 Dec 2024 19:51:07 -0800 (PST)
Received: from nuclight.lan (broadband-37-110-95-35.ip.moscow.rt.ru. [37.110.95.35]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-30344045244sm8034591fa.45.2024.12.15.19.51.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 15 Dec 2024 19:51:06 -0800 (PST)
Date: Mon, 16 Dec 2024 06:51:03 +0300
From: Vadim Goncharov <vadimnuclight@gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Message-ID: <20241216065103.77270e2f@nuclight.lan>
In-Reply-To: <CFAB016F-EB06-4FEE-A7F6-427E70E0332C@tzi.org>
References: <CALaySJKDFscUBGw4CPspXJvUTkXywVHc_FrmhO3ybBWTrwjGXw@mail.gmail.com> <CALaySJJEXkey9vLAp8VqDXmPsWpxiWN9jjtVnGio1nMQ4K+mDQ@mail.gmail.com> <CALaySJJfc+tET4Vm5UQjHPK5mf61O0iR-1i6=X32CYtWxZLWTQ@mail.gmail.com> <CALaySJKdrk7aPzhT=kbE1B8pq1EBw74nmx_peSJMAoHsG5jyVQ@mail.gmail.com> <CALaySJ+fWX4zEnE5v-Q9R6eCv=kSJjnc-fsXL5PGPgac1GJAcA@mail.gmail.com> <B807C9D3-39A4-4024-BC1D-85DD84EA1735@tzi.org> <DFE56705-CCDD-4172-B577-C873E3DB4898@tzi.org> <5FEA5C07-4A39-4B58-B2AE-F261D111FCE6@cursive.net> <D0618F67-4868-4745-A526-F73DF1A98E1B@tzi.org> <98C6BEDA-C4B2-4657-ABE2-19FE637CE782@cursive.net> <2A875D49-DD88-42D9-969D-0841A6B41F95@tzi.org> <PH7PR02MB92920676E8817271E547F82FB73E2@PH7PR02MB9292.namprd02.prod.outlook.com> <5FBB4831-3E96-44E3-A2AA-B2D83B6C1B05@cursive.net> <03254343-C2C1-4725-8E69-1CF532472C25@tzi.org> <b6788b18-78f3-48e7-b86a-1e369123e7b1@ri.se> <20241215062400.795b7401@nuclight.lan> <CFAB016F-EB06-4FEE-A7F6-427E70E0332C@tzi.org>
X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; amd64-portbld-freebsd12.4)
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Message-ID-Hash: RKRQRN6YRYZWU6GXQ24DFXRRZAG34T3I
X-Message-ID-Hash: RKRQRN6YRYZWU6GXQ24DFXRRZAG34T3I
X-MailFrom: vadimnuclight@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Marco Tiloca <marco.tiloca=40ri.se@dmarc.ietf.org>, Joe Hildebrand <hildjj@cursive.net>, Michael Jones <michael_b_jones@hotmail.com>, CBOR <cbor@ietf.org>
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Cbor] Abandon cbor-packed (Re: Early allocation for packed CBOR)
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/vJzV_imq5dyo6vLIYpl9qyJL4YU>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

On Sun, 15 Dec 2024 07:39:52 +0100
Carsten Bormann <cabo@tzi.org> wrote:

> Hi Vadim,
> 
> > On 15. Dec 2024, at 04:24, Vadim Goncharov
> > <vadimnuclight@gmail.com> wrote:
> > 
> > The problem is that cbor-packed does not achieve good compression
> > at all. For 2017's CBOR source of 1210 bytes, tag 25/256 method
> > achieves 904 bytes (note it's forced for 3 bytes per even shortest
> > reference, in contrast to cbor-packed cheating with single byte)
> > and cbor-packed claims 793 for same method and 564 with prefixes -
> > while even limited LZF achieves 425..435 bytes (depends on key
> > order) and less limited LZ4 variants get 404..416 bytes.  
> 
> There is no doubt that classical data compression will give you
> smaller encoded sizes than packed CBOR. But you already know where to
> find deflate, brotli, or zstd; the latter with some nice additions
> such as dictionaries. Trying to hyper-optimize Packed CBOR (e.g., by
> changing everything :-) is out of scope, as you always can go to

If changes to it are "out of scope", then entire draft must be
abandoned as absolutely inadequate and dangerous precedent.

There was long message from Christopher Allen on Wednesday, where is
your response to valid arguments in it?

> well-established classic data compression for that.
> 
> CBOR packed is just solving a different problem [1]:
> 
> >> This specification describes Packed CBOR, a simple transformation
> >> of a CBOR data item into another CBOR data item that is almost as
> >> easy to consume as the original CBOR data item. A separate
> >> decompression step is therefore often not required at the
> >> recipient.  
> 
> [1]:
> https://www.ietf.org/archive/id/draft-ietf-cbor-packed-13.html#section-abstract-3

This is simply not true. Neither cbor-packed is "simple" transformation
nor it is "easy to consume" - in fact, LED example in draft
demonstrates splitting one string to five (!) chunks, using recursion.
It leads to complex algorithms to work with such data, *especially* for
constrained implementations.

Where is your proof? Do you have a real implementation able to work
with such very crunhed data?

> Different from classical data compression, there is no need to
> unpack/decompress Packed CBOR before using the data.  (Many
> applications will want to do that anyway, but the point is that APIs
> that make sense for constrained implementations can easily get by
> without.) 

No, they can't. There is no "wide applicability area" for cbor-packed,
the opposite is true - it's suitable only in some niche.

> Packed CBOR is valid CBOR, so it can be embedded into other
> CBOR data and processed by generic decoders/encoders (*).

No, it can't. It is absolutely useless after generic decoder which
don't understand how to use it. And this is true for *any* tags not
merely signifying data type but trying to extend CBOR - e.g.
compression using tags 25/256 by Marc A. Lehmann is built-in into
codec, being transparent for application. And this method is really
simple and suitable for constrained implementations, in contrast to
cbor-packed.

BTW, I had talk with him, as he has Compress::LZF [1] module which is
more lightweight than LZ4 (offsets are 8192 bytes max, so less memory),
and he suggested https://github.com/atomicobject/heatshrink (LZSS
variant) for really constrained environments - just hundred of bytes of
memory.

[1] https://metacpan.org/pod/Compress::LZF 
~200 lines of C encoder
https://metacpan.org/release/MLEHMANN/Compress-LZF-3.8/source/lzf_c_best.c
& decoder
https://metacpan.org/release/MLEHMANN/Compress-LZF-3.8/source/lzf_d.c

> Obviously, not everyone needs Packed CBOR, so I can understand some
> haggling about tag space, in particular for the 40 code points the
> draft acquires for the referencing tags from the 1+1 space (plus
> three function tags and one basic table setup tag). I believe this
> number is appropriate, given that the reference tags are pretty

The main problem of which we all angry is stealing so much tag/simple
scarce space for *NO* real benefit. Packed CBOR is VERY ineffective,
yet wastes so much tags. The same task can be achieved without using so
many short tags, thus cbor-packed must be either fixed or abandoned.
I've not even mentioned yet other problems with draft, like using of
Undefined instead of (at least) tag 31, or - more serious- not
supporting CBOR sequences.

> universal/future-proof as they can be used with table setup
> mechanisms we haven’t even invented yet.

What? Draft lacks this extensibility - setup tags don't have any
alternatives, e.g. specifying dictionary ID/hash, like in zlib etc.

> $ tag-report.rb
> range  used     %                 free                total
> 0 1+0    13 54.17                   11                   24
> 1 1+1    73 31.47                  159                  232
[..] 
> (*) And, yes, it would be interesting to come up with a design for
> instead using the 24 unused bit combinations (ai=28..30), but that,
> again, cannot be used as part of valid CBOR; it’s more like a CBOR
> 2.0 project.

Absolutely not. It is very simple to retain compatibility - "modified"
CBOR is just wrapped into bstring/array under tag. Like this CDDL-like:

   CBAR-CBOR = #6.10([atoms, bytedict, CBAR, checksum])  ; Tag 0x0a for
   "Atom" / #6.10(CBAR)    ; binary string - everything other setup
   earlier

   atomarr = [ + atom ]    ; "atom" is like in X11 sense - number of string
   atom = bstr             ; raw value
        / #6.10(bstr)      ; itself CBAR - definition uses previous
   atoms / any             ; valid CBOR fragment to substitute

   atoms = atomarr / uint / bstr, ; atoms array (mb empty) or their hash
   bytedict = bstr / uint,        ; (zlib) dictionary (mb empty) / it's hash
   CBAR = bstr .size (3..),       ; may have tag for additional pass e.g. zlib
   ? checksum = uint

Here, extensibility point is left via additional types, not yet
specified but intended to. Actual compressed data in CBAR bytestring is
that modified CBOR, so CBOR parsing code may be reused, without
constraints of being compatible with non-understanding decoders. Being
mostly CBOR satisfies goal of implementation API where application gets
pointer and length to data in original CBOR without copying - the only
looking reasonable goal of cbor-packed... modulo it's not achievable in
practice in any scheme involving splitting streams. And see, valid
CBOR with no giant ranges of tags and 27 codepoints available!

Trying to be "just CBOR" is the most design flaw of cbor-packed - it
leads to inefficient compression, it leads to wasteful efforts, while
being meaningless.

-- 
WBR, @nuclight