Re: [Cbor] tag 24, CBOR sequence and non-CBOR sequence

Carsten Bormann <cabo@tzi.org> Fri, 05 June 2020 18:50 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5927E3A0A69 for <cbor@ietfa.amsl.com>; Fri, 5 Jun 2020 11:50:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cItLcY_KzNbN for <cbor@ietfa.amsl.com>; Fri, 5 Jun 2020 11:50:25 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ECDD23A09DB for <cbor@ietf.org>; Fri, 5 Jun 2020 11:50:24 -0700 (PDT)
Received: from [172.16.42.112] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 49dsCc3jT2zySF; Fri, 5 Jun 2020 20:50:20 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <1811F0B6-3105-4904-987A-72D0BC14D719@island-resort.com>
Date: Fri, 05 Jun 2020 20:50:20 +0200
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 613075820.014277-52a07baf3a8db177082ec788e8522c80
Content-Transfer-Encoding: quoted-printable
Message-Id: <1B46F405-BF3C-44C7-B154-53AD503AA716@tzi.org>
References: <1811F0B6-3105-4904-987A-72D0BC14D719@island-resort.com>
To: Laurence Lundblade <lgl@island-resort.com>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ImTp5ltPI9f3qWnP8Jo0OLvPcUE>
Subject: Re: [Cbor] tag 24, CBOR sequence and non-CBOR sequence
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jun 2020 18:50:30 -0000

Hi Laurence,

> On 2020-06-05, at 17:13, Laurence Lundblade <lgl@island-resort.com> wrote:
> 
> Here’s some example of CBOR sequence (e.g., application/cbor-seq) and CBOR that is not a sequence (e.g., application/cbor)
> 
> 0x00 
> This is a legal sequence or non-sequence

“CBOR data item” (or “encoded CBOR data item”) is my preferred wording for the latter.

> 
> 0x00 0x00
> This is a legal sequence, but not non-sequence. A non-sequence can contain only one data item, this contains two.

Right.

> 0x82 0x00 0x00
> This is a legal sequence or non-sequence. It is how you encode two zeros as a non-sequence.

Right.

> Looks to me that tag 24 cannot contain a CBOR sequence.

Indeed, no.

> It says “a data item”. The use of “a” implies one data item and thus not a sequence. That means a decoder should always consider a sequence in the contents of a tag 24 invalid.

(A sequence that doesn’t happen to contain exactly one data item.)

> This seems overly restrictive and unfortunate to me.  Would it be wrong for a generic decoder to allow this?

Yes.
(Well, it might not be checking the validity.)

It seems we forgot to define an equivalent to tag 24 in RFC 8742.
But a registration for such a tag can easily be made now.

> The wording for tag 55799 avoids the issue of sequence or not.
> 
> 
> Technically, it seems that something labeled application/cbor or in a file named foo.cbor can’t contain the CBOR 0x00 0x00 because it is not a legal non-sequence.

Correct.

> That might implies a generic decoder should have a sequence mode and a non-sequence mode.

Yes.  You can always use the sequence mode, though (just check that you got exactly one data item back).

My preferred API asks the generic decoder to decode one CBOR data item from a position given and returns the position of the first input byte beyond that CBOR data item.
A sequence decoder is built trivially from this.
But you can also use it for cases where a CBOR data item is the “header” for some blob that is not decoded as CBOR.

> In practice this doesn’t seem like a very useful restriction. What is useful is whether the encoded CBOR is legal for the protocol being implemented.

If the protocol wants one CBOR data item and gets two (or zero), that doesn’t help.

> From what I’ve experienced, it seems easier if everything is just a CBOR sequence and no distinction is made. Individual CBOR protocols can decide on their own structure either deciding to bind things together in an array or not. Am I missing something?

I think the important observation is that it took us a couple of years to rationalize the weird text about “streaming decoders” in RFC 7049 into a clear concept of CBOR sequences.
So there is a lot of code out that that doesn’t cope very well with CBOR sequences, just with single CBOR data items — and that is often also exactly what’s needed.

Grüße, Carsten