Re: [Cbor] Record proposal

Carsten Bormann <> Thu, 02 December 2021 07:35 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9E2CC3A0C38 for <>; Wed, 1 Dec 2021 23:35:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id hoTvmrYOiP5x for <>; Wed, 1 Dec 2021 23:35:29 -0800 (PST)
Received: from ( [IPv6:2001:638:708:32::15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id E41173A0C39 for <>; Wed, 1 Dec 2021 23:35:26 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 4J4SRG517fz2yHr; Thu, 2 Dec 2021 08:35:22 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.\))
From: Carsten Bormann <>
In-Reply-To: <>
Date: Thu, 02 Dec 2021 08:35:22 +0100
Cc: Kris Zyp <>, Christian Amsüss <>, "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
To: Emile Cormier <>
X-Mailer: Apple Mail (2.3654.
Archived-At: <>
Subject: Re: [Cbor] Record proposal
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 02 Dec 2021 07:35:34 -0000

On 2. Dec 2021, at 07:18, Emile Cormier <> wrote:
> On Thu, Dec 2, 2021 at 1:34 AM Kris Zyp <> wrote:
> I certainly empathize with the idea of keeping tags as conceptually distinct in terms of just adding some extra non-transformational semantic description to the underlying CBOR data structure. However, there is a wide range of "meanings" in tags, and realistically, with many, or even most tags, you can't really "make sense" of a payload without the "semantic meaning"; the meaning is what provides the direction for how to make sense of the data. For many tags, something needs to understand the data beyond just a raw CBOR structure.
> But it should be possible for applications to agree in advance on the meaning of the data and still be able to interpret it in the absence of tags.

I do not agree that this is a worthwhile goal.
Ensuring this causes gymnastics that simply does not provide that much benefit.

> This practice of embedding an extra number via a tag range is ruining this ability for applications to agree in advance on the meaning of the data in the absence of tags.

There is no rule that CBOR data need to make sense after removal of tags.

For instance, in YANG-CBOR [1], tag 47 means that a given map key is an absolute SID, not a relative SID delta.
An implementation that discards these tags is not going to work very well.


> Your proposed tag range of 57344 - 57599 could simply be replaced by a single tag followed by an array of size 2.

Yes.  Encoding this needs some additional bytes.

> In section 3.4 of the spec, it says: "If a tag requires further structure to its content, this structure is provided by the enclosed data item." (emphasis mine). This clause is being violated by tags that rely on tag range to convey additional data.

No, a tag that expresses information in its tag number simply may not "require further structure to its content”, so the cited statement of fact simply doesn’t apply.

> Without the semantic meaning, there isn't data loss; raw CBOR structures could always be transcoded to JSON or anything else (with conventions for tags and such)

That doesn’t work in general: An array with two elements could be a valid data item as well, so you get ambiguity.
Tags provide with a clean way to extend the data model.

I think you are better off accepting that the CBOR generic data model is a superset of that of JSON.

> Well, that's the problem: there is no convention for tags in JSON. A CBOR-to-JSON converter now needs to understand your record set tags, and generate a modified structure for JSON that contains the record IDs that would otherwise be lost.

Generic CBOR-to-JSON converters very well can be designed so they are round-trippable without information loss (*).
It’s just a bit ugly.

> This practice of using a tag range to avoid the few measly bytes of a two-element tuple is making it so that once an application adopts these tags, it can never switch to another encoding other than CBOR.

I don’t know why it would want to, but it can, see above.

> I'm afraid CBOR can no longer be considered as the binary equivalent of JSON plus byte arrays.

It never was!
A simplistic view of its generic data model is JSON plus bytes plus extensibility via tags.

> Instead of tags enhancing data with semantic meaning hints, it's effectively extending the number of data types.


> This is what I've realized with signed CBOR bignums which also rely on tag ranges (a range of two): they are now a new data type that my decoder needs to interpret if I don't want the application layer to be concerned with tags.

Handling tags inside generic codecs is definitely one implementation strategy; it requires the tagged information to be representable by platform types (which often is not a problem for bignums).  I’m not sure how a platform type for Kris’ records would look like; I’m sure one can be defined in many platforms.

Grüße, Carsten

(*): E.g., see
This isn’t perfect (it would also need code to handle certain strings to obtain full data transparency), but it was good enough for one application.