[Cbor] cbor-packed: Circumfix expansion

Christian Amsüss <christian@amsuess.com> Mon, 01 February 2021 11:57 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ED7C73A107C; Mon, 1 Feb 2021 03:57:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SWaLuoX0dKZq; Mon, 1 Feb 2021 03:57:35 -0800 (PST)
Received: from prometheus.amsuess.com (prometheus.amsuess.com [5.9.147.112]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 19B043A1078; Mon, 1 Feb 2021 03:57:33 -0800 (PST)
Received: from poseidon-mailhub.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bd]) by prometheus.amsuess.com (Postfix) with ESMTPS id 3D3BE407CF; Mon, 1 Feb 2021 12:57:28 +0100 (CET)
Received: from poseidon-mailbox.amsuess.com (poseidon-mailbox.amsuess.com [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bf]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 1F1F2FD; Mon, 1 Feb 2021 12:57:26 +0100 (CET)
Received: from hephaistos.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:a490:e0ff:10ba:5dab]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id 8A7CD44; Mon, 1 Feb 2021 12:57:25 +0100 (CET)
Received: (nullmailer pid 54555 invoked by uid 1000); Mon, 01 Feb 2021 11:57:25 -0000
Date: Mon, 01 Feb 2021 12:57:25 +0100
From: Christian Amsüss <christian@amsuess.com>
To: draft-ietf-cbor-packed@ietf.org
Cc: cbor@ietf.org
Message-ID: <YBfspfVS6GzWkWO6@hephaistos.amsuess.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="xfnh2KNAOwUEFUZp"
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/5ehs66qL8iBFlMXllGdy-c8MgPI>
Subject: [Cbor] cbor-packed: Circumfix expansion
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Feb 2021 11:57:38 -0000

Hello Carsten, hello CBOR group,

it occurred to me that a variant of compression is unexplored. Suppose
we have data like (please mentally replace strings with appropriate
numbers) in SenML assuming draft-groves-core-senml-bto:

[{"bn": "urn:dev:ow:X", "bu": "V", "v": 1.992, "t": 1.6e9, "bto": 2},
/* bto indicates all the following items are offset by 2 seconds each */
 {"v": 1.986},
 {"v": 1.874},
 {"v": 1.423},
 ...
]

It'd be tempting to write that as

TBD51([{"v": TBD(0)}], [], [],
  [{"bn": "urn:dev:ow:X", "bu": "V", "v": 2.047, "t": 1.6e9, "bto": 2},
   TBD6'(
     5([-7, 64(h'feefb6...')]) /* [0xfe, 0xef, 0xb6, ...] x 2^-7 */
   )])

which is combining bigfloats and arrays (if they, on their own, can they
be combined that way; otherwise one might be tempted to use second-table
units) to expand to

TBD51([{"v": TBD(0)}], [], [],
  [{"bn": "urn:dev:ow:X", "bu": "V", "v": 2.047, "t": 1.6e9, "bto": 2},
   TBD6'(
     [1.986, 1.874, 1.423]
   )])

and then would use circumfix expansion (a variant on the shared item)
to expand the array into the original one-element dictionary sets. The
rough rule could be that when the nth circumfix expansion is triggered,
the rump is iterated over expanded into the nth shared item (or picked
from another table -- details) with its occurrence of a given tag TBD
replaced with the rump's item at that position.

One straightforward application is like in the example -- generating
repetitive entries from data that can be sent right from some sampling
buffer by using typed arrays.

The straightforward reason not to do this is that it's more complex even
on the decompressor (let alone on the compressor, but that might usually
use manual compression anyway).

I'm not saying we *should* do this (that's what I hope to get from
starting discussion here) -- but if nothing else, it could help shape a
"what kind of compressions are in scope and what's out of scope"
paragraph.

BR
Christian

-- 
To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom