Re: [Cbor] tag 24, CBOR sequence and non-CBOR sequence

Laurence Lundblade <lgl@island-resort.com> Sat, 06 June 2020 19:42 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA51E3A0B0A for <cbor@ietfa.amsl.com>; Sat, 6 Jun 2020 12:42:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1TS_SaV085eU for <cbor@ietfa.amsl.com>; Sat, 6 Jun 2020 12:42:07 -0700 (PDT)
Received: from p3plsmtpa09-04.prod.phx3.secureserver.net (p3plsmtpa09-04.prod.phx3.secureserver.net [173.201.193.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3B8543A0B07 for <cbor@ietf.org>; Sat, 6 Jun 2020 12:42:07 -0700 (PDT)
Received: from [192.168.1.78] ([76.167.193.86]) by :SMTPAUTH: with ESMTPA id hehijyTWIpfF2hehijA1MF; Sat, 06 Jun 2020 12:42:06 -0700
X-CMAE-Analysis: v=2.3 cv=NLWrBHyg c=1 sm=1 tr=0 a=t2DvPg6iSvRzsOFYbaV4uQ==:117 a=t2DvPg6iSvRzsOFYbaV4uQ==:17 a=IkcTkHD0fZMA:10 a=gKmFwSsBAAAA:8 a=K6EGIJCdAAAA:8 a=o3LS-OQVB9i6saUIs9QA:9 a=xpyytPA2zsbDlgyd:21 a=46bGI_CoPwGvhCWo:21 a=QEXdDO2ut3YA:10 a=nnPW6aIcBuj1ljLj_o6Q:22 a=L6pVIi0Kn1GYQfi8-iRI:22
X-SECURESERVER-ACCT: lgl@island-resort.com
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Laurence Lundblade <lgl@island-resort.com>
In-Reply-To: <1B46F405-BF3C-44C7-B154-53AD503AA716@tzi.org>
Date: Sat, 06 Jun 2020 12:42:06 -0700
Cc: cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <66E842EB-E590-42C0-9F56-C79B3A077E92@island-resort.com>
References: <1811F0B6-3105-4904-987A-72D0BC14D719@island-resort.com> <1B46F405-BF3C-44C7-B154-53AD503AA716@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3445.104.11)
X-CMAE-Envelope: MS4wfK5rOG/D8oM4lYk3jYw/oYT0NSM/0cSiBKx7jZdJNcwIKJqbyNLXVQNp5WKGP7UUtytO0lSlTe+I2PU6kozn2PjFYsyP/nPEfTzymq/lWGIfAepBOTYh 3R3JnzKorvM+c78sMyo82CWBeshEEqw1+IODOzzycknXY0+cWeq9thA5mpy8XhUGfdKR222wSrMSwg==
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/NzZNmM9kxcNPDx3Z_OpdCCSB9kU>
Subject: Re: [Cbor] tag 24, CBOR sequence and non-CBOR sequence
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 06 Jun 2020 19:42:09 -0000

Well glad I asked. 

> On Jun 5, 2020, at 11:50 AM, Carsten Bormann <cabo@tzi.org> wrote:
> 
> Hi Laurence,
> 
>> On 2020-06-05, at 17:13, Laurence Lundblade <lgl@island-resort.com> wrote:
>> 
>> Here’s some example of CBOR sequence (e.g., application/cbor-seq) and CBOR that is not a sequence (e.g., application/cbor)
>> 
>> 0x00 
>> This is a legal sequence or non-sequence
> 
> “CBOR data item” (or “encoded CBOR data item”) is my preferred wording for the latter.
> 
>> 
>> 0x00 0x00
>> This is a legal sequence, but not non-sequence. A non-sequence can contain only one data item, this contains two.
> 
> Right.
> 
>> 0x82 0x00 0x00
>> This is a legal sequence or non-sequence. It is how you encode two zeros as a non-sequence.
> 
> Right.
> 
>> Looks to me that tag 24 cannot contain a CBOR sequence.
> 
> Indeed, no.

How about stating it like this? While tag 24 can contain a sequence of length one, it cannot contain most sequences as defined in 8742. 

Personally, I’d prefer to just say tag 24 can’t contain sequences. That way you’d have tag 24 for non-sequences and new tag XX for sequences. You don’t want people designing a protocol that uses sequences and using tag 24 when the sequence happens to be length 1 and tag XX when it is larger. That’s just confusing and more code.

Basically a protocol designer should either design a protocol that uses sequences or one that doesn’t and declare that up front. If some messages in a sequence-based protocol happen to be just one item, they are still part of a sequence-based protocol.

> 
>> It says “a data item”. The use of “a” implies one data item and thus not a sequence. That means a decoder should always consider a sequence in the contents of a tag 24 invalid.
> 
> (A sequence that doesn’t happen to contain exactly one data item.)
> 
>> This seems overly restrictive and unfortunate to me.  Would it be wrong for a generic decoder to allow this?
> 
> Yes.
> (Well, it might not be checking the validity.)
> 
> It seems we forgot to define an equivalent to tag 24 in RFC 8742.
> But a registration for such a tag can easily be made now.
> 
>> The wording for tag 55799 avoids the issue of sequence or not.
>> 
>> 
>> Technically, it seems that something labeled application/cbor or in a file named foo.cbor can’t contain the CBOR 0x00 0x00 because it is not a legal non-sequence.
> 
> Correct.
> 
>> That might implies a generic decoder should have a sequence mode and a non-sequence mode.
> 
> Yes.  You can always use the sequence mode, though (just check that you got exactly one data item back).
> 
> My preferred API asks the generic decoder to decode one CBOR data item from a position given and returns the position of the first input byte beyond that CBOR data item.
> A sequence decoder is built trivially from this.
> But you can also use it for cases where a CBOR data item is the “header” for some blob that is not decoded as CBOR.

QCBOR is a more powerful generic decoder than the one implied by the API you describe. It aims to relieve the protocol implementor of a lot of these details and hard-to-understand aspects of CBOR. It aims to make the work of the protocol implementor a lot easier. As a shared library it reduces overall object code because it handles things common to many protocols (e.g. duplicate detection (coming soon)).

It’s of course fine to have all different sorts of decoders.

I find the distinction between sequences and non-sequence subtle and it took until this conversation about tag 24 (started by mcr) to really get it sharp for me. It was definitely one of the things that was confusing to me when I started working with CBOR a few years ago. Easy to gloss over in a way, but a niggling issue that finally came to light. Seems like it wasn’t just me.

One of the reasons I try to do more in QCBOR is to help others get these kinds of details right. They are hard to to understand from the RFCs. The average protocol implementor won’t spend the five plus years reading them we all have here.

> 
>> In practice this doesn’t seem like a very useful restriction. What is useful is whether the encoded CBOR is legal for the protocol being implemented.
> 
> If the protocol wants one CBOR data item and gets two (or zero), that doesn’t help.

Protocol implementors have to deal with lots of errors.

> 
>> From what I’ve experienced, it seems easier if everything is just a CBOR sequence and no distinction is made. Individual CBOR protocols can decide on their own structure either deciding to bind things together in an array or not. Am I missing something?
> 
> I think the important observation is that it took us a couple of years to rationalize the weird text about “streaming decoders” in RFC 7049 into a clear concept of CBOR sequences.
> So there is a lot of code out that that doesn’t cope very well with CBOR sequences, just with single CBOR data items — and that is often also exactly what’s needed.

QCBOR is fine with sequences or not as input. The CBOR playground doesn’t allow sequences.


Personally, I still think it would be better to relax the distinction and allow tag 24 to carry sequences rather than define a new tag. It is a broadening, not a narrowing so it doesn’t break anything. There’s no backwards compatibility issue just like there was no backwards compatibility issue for 7049 when 8742 was published. 

If not relaxing the distinction and a new tag is defined for 8742, maybe 3.4.5.1 should say that 8742 CBOR sequences are invalid as content for tag 24. It is OK for CBORbis to make normative reference to 8742.

LL