[Cbor] Concatenation tag (was: Re: Reminder and call for agenda: CBOR WG Virtual Meeting on 2022-06-01)

Christian Amsüss <christian@amsuess.com> Wed, 29 June 2022 08:30 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0CC3DC15A73C for <cbor@ietfa.amsl.com>; Wed, 29 Jun 2022 01:30:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.909
X-Spam-Level:
X-Spam-Status: No, score=-6.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ieEg4lQEMBPH for <cbor@ietfa.amsl.com>; Wed, 29 Jun 2022 01:30:31 -0700 (PDT)
Received: from smtp.akis.at (smtp.akis.at [IPv6:2a02:b18:500:a515::f455]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B314FC15A737 for <cbor@ietf.org>; Wed, 29 Jun 2022 01:30:28 -0700 (PDT)
Received: from poseidon-mailhub.amsuess.com ([IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bd]) by smtp.akis.at (8.17.1/8.17.1) with ESMTPS id 25T8UO9h040516 (version=TLSv1.2 cipher=ECDHE-ECDSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 29 Jun 2022 10:30:24 +0200 (CEST) (envelope-from christian@amsuess.com)
X-Authentication-Warning: smtp.akis.at: Host [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bd] claimed to be poseidon-mailhub.amsuess.com
Received: from poseidon-mailbox.amsuess.com (hermes.amsuess.com [10.13.13.254]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id A5C5B9524; Wed, 29 Jun 2022 10:30:21 +0200 (CEST)
Received: from hephaistos.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:297c:eccd:d672:21c1]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id 3AAF8D11A; Wed, 29 Jun 2022 10:30:20 +0200 (CEST)
Received: (nullmailer pid 70100 invoked by uid 1000); Wed, 29 Jun 2022 08:30:14 -0000
Date: Wed, 29 Jun 2022 10:30:14 +0200
From: Christian Amsüss <christian@amsuess.com>
To: cbor@ietf.org, Carsten Bormann <cabo@tzi.org>
Message-ID: <YrwNlhn5NS3F4MDf@hephaistos.amsuess.com>
References: <CALaySJLPtUjdfVss17noK=18RyczpcCGNu=im8CBpiQz=WiLWA@mail.gmail.com> <CALaySJKUNh-AkJa87sCDpzf9OHV8H367VQyzyozXCCXxphUARw@mail.gmail.com> <CALaySJ+P2sP7BU7bNSxRJBByyp04rzVZuukq_e+9wbb5WPRSFQ@mail.gmail.com> <CALaySJKxht1gd1+3mNiAH-kLUAxjdPPk3doK50C_xS74LG+YTQ@mail.gmail.com> <CALaySJJjSHT2q_wpZQ9QFhLSxGuhffWwb=9P1XDUFTsheOvPZA@mail.gmail.com> <5A9B396E-1D9F-455C-949F-9B4C89AA510C@tzi.org> <CALaySJ+Sp=hmc4-kp1UrYPf0BxMtQy4aS+LiCfkREYqmip1Q6w@mail.gmail.com> <B9E21E1E-164D-4306-88D7-A88DC76080A9@tzi.org> <YrmJV7OwrbOI/zKe@hephaistos.amsuess.com> <861D3104-8C33-4819-9488-1885F94C973F@tzi.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="yg55DcHTaKL28EL7"
Content-Disposition: inline
In-Reply-To: <861D3104-8C33-4819-9488-1885F94C973F@tzi.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/UgWdAYRcOrW9C6yo4CeBZPGyhtg>
Subject: [Cbor] Concatenation tag (was: Re: Reminder and call for agenda: CBOR WG Virtual Meeting on 2022-06-01)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2022 08:30:32 -0000

Hello Carsten, hello group,

> > who is wondering whether indefinite-length strings / bytestrings really
> > needed to be in the original CBOR spec, or would not have been better
> > served with a "concatenate" tag around an indefinite-length array.
> 
> Knowing after almost a decade how we have learned to use tags, this
> could indeed be done differently…

well we won't change this any time soon, but I'd like to propose
something a bit like tag 65535 (after it occurred to me while writing
the other mail that it works not only for indefinite-length strings but
also for indefinite-length arrays):

Concatenation tag
=================

This registers tag TBDcat to indicate the concatenation of two CBOR
items.

The tag takes a finite length array of largely homogenous types as an
argument, and concatenates them based on the type of the first argument:

* Text and byte strings are plainly concatenated. Subsequent arguments
  each may be of either type (converted through the UTF-8 encoding).
  Byte strings concatenated into text strings need to contain valid
  UTF-8 data. (These rules are consistent with those of Section 2.2 of
  RFC 9165 on CDDL concatenation).

* Arrays are concatenated.

* In dictionaries, the union of the dictionaries is formed. The second
  occurrence of a key takes precedence.

All items in the dictionary may be tags; their behavior under
concatenation is specified in the tag, but needs to be compatible with
any expanded form that tag may have. (For example, concatenation tags
can be nested, and mixed with array tags).

Use case: Implementation aids
-----------------------------

Arrays, strings and maps of indefinite length are sometimes supported
badly by CBOR libraries, especially those for constrained devices --
their consumers might expect strings in contiguous memory, which the
library can not provide.

Rather than implementing additional interfaces for slice-wise access, a
CBOR consuming library can represent indefinite length data structures
by nested tags TBDcat. For example, if it received

```
(_ "hello, ", "world")
```

it may present that to the application as

```
TBDcat(["hello", TBDcat(["world", ""])])
```

and also accept data represented that way for encoding.

Whether a library performs that transformation is up to its
configuration and/or its documentation. It is not recommended to define
CBOR based protocols in which there is a semantic difference between an
indefinite length array / string / map and one constructed by this tag,
but they may specify that only one of them can be used for encoding. [
Or would it be better to recommend never to serialize anything with
TBDcat unless other tags are involved? ].

Use case: A bit inhomogenous arrays
-----------------------------------

Arrays that can largely be expressed as homogenous arrays can use this
as an escape hatch for out-of-range items:

```
TBDcat(64(h'000101020305080d1522375990e9'), [377, 610])
```

Use case: packed CBOR
---------------------

The concatenation tag can be used with packed CBOR, in partular when a
dictionary item does not on its own support midfix expansions.

[ This might be even more useful when used as part of dictionary items,
but how that works is under active discussion. ]


Thoughts, comments?

BR
c

-- 
To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom