Re: [Cbor] correctness of implied top level array?

Laurence Lundblade <lgl@island-resort.com> Sun, 03 March 2019 00:11 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 55D271271FF for <cbor@ietfa.amsl.com>; Sat, 2 Mar 2019 16:11:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NhyvYJClcgxr for <cbor@ietfa.amsl.com>; Sat, 2 Mar 2019 16:11:34 -0800 (PST)
Received: from p3plsmtpa08-05.prod.phx3.secureserver.net (p3plsmtpa08-05.prod.phx3.secureserver.net [173.201.193.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 886C6124B0C for <cbor@ietf.org>; Sat, 2 Mar 2019 16:11:34 -0800 (PST)
Received: from [10.183.0.14] ([167.160.116.54]) by :SMTPAUTH: with ESMTPSA id 0Ej7hVyhPBvYa0Ej7hZuyp; Sat, 02 Mar 2019 17:11:33 -0700
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Laurence Lundblade <lgl@island-resort.com>
In-Reply-To: <EBB5C10D-5232-4BED-9061-FE28FD5B5534@tzi.org>
Date: Sat, 2 Mar 2019 16:11:32 -0800
Cc: Joe Hildebrand <jhildebrand@mozilla.com>, Michael Richardson <mcr+ietf@sandelman.ca>, cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <6FC60F75-E182-47DD-B229-3B8908DD3474@island-resort.com>
References: <81789050-5133-48B0-BEE7-4F1E0BBB4C06@island-resort.com> <40A3B694-80A4-4AD7-A2A6-C071C6E88D2D@tzi.org> <F0A06813-3F1F-4D53-80A1-4CBBBB91DC64@island-resort.com> <0A96C82A-85DB-411D-812D-5A3479A8EA87@mozilla.com> <052FFFD1-6145-4451-91A0-B07ED0AEC726@tzi.org> <9644.1551315204@localhost> <01396AC3-0EDE-4AEC-B60E-1274B9E66C52@mozilla.com> <EBB5C10D-5232-4BED-9061-FE28FD5B5534@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3445.9.1)
X-CMAE-Envelope: MS4wfHvBL+PvWhtn9LCgDaI54oBbgUYEenXp2vmjGKutGL3IvBf2FJpuqpOKYq+7GQ9FOe8TNFekiudcIu7U/4klS2r9fQyIB06e/cWS31RM8wxb/vGlmAox fq45eWb4AXRx763SJWlOvvjzLPkgrsEDgKyOeAMS0Mpz/Kvatf2cgpm9SDmtMa7AoUAfSzH2Q++Eyry5ChUdTiawtQwdt7i9AJHmxPG4sPTNPttjvTMn16g/ MM0zxz1zsGS8vZ8PNLp0Ww==
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/l0NBGq7sIX-0FF_2t1be6gC4zkM>
Subject: Re: [Cbor] correctness of implied top level array?
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 00:11:36 -0000


> On Feb 28, 2019, at 12:46 PM, Carsten Bormann <cabo@tzi.org>; wrote:
> 
>> On Feb 28, 2019, at 20:23, Joe Hildebrand <jhildebrand@mozilla.com>; wrote:
>> 
>>> On Feb 27, 2019, at 5:53 PM, Michael Richardson <mcr+ietf@sandelman.ca>; wrote:
>>> 
>>> An alternative way to do the desired action (and bring this inside of cbor)
>>> would be an (indefinite?) array type that was specified to concatenate.
>>> 
>>> I'm not really arguing for this, but I think it's worth knowing why this
>>> would be less good a thing.
>> 
>> It's an interesting idea.  One of the reasons this didn't feel right to me was that my initial take on indefinite-length arrays was to read the whole array, growing memory as needed.  I've moved on from that, but would expect others to find themselves in a similar spot.
>> 
>> More interestingly though, there are times when I might want to use an optional external length-framing approach, like embedding a single CBOR data item in a WebSocket message.
> 
> It is hard for a format like CBOR to take in external length information — data items need to be self-delimiting *within* the tree, so taking in that information at the top level would mean that a different encoding would be needed there.
> 
> CBOR sequences are actually exactly that, for the specific case of a top-level array.
> So this is your “array type that was specified to concatenate” — except that this cannot be used within a CBOR data item ([[1, 2], [3, 4]] ≠ [[1, 2, 3, 4]]), only right on top.
> 
> Whether a single data item or a CBOR sequence is expected is indicated by meta information, such as the Content-Type.  We could define more things like CBOR maps, CBOR strings (hey, we already have ct 0 for text and 42 for bytes).  This is becoming complicated quickly.
> 
> CBOR sequences stand out because they are often exactly what is needed for streaming, and for flexible storage of partial streams in containers that can then simply be concatenated.
> 
> Next stop: Getting a CDDL spec to define top-level sequences…  (Right now, we simply describe a top-level array and add English-language information that this is to be encoded as a sequence.  Quite similar to how .cborseq works inside the spec.)

When a decoder receives a CBOR data item, it always knows when it as decoded the whole thing. Through the decoding is knows how many bytes it needs.  So one way to frame this up is by asking what should a decoder do when it has got to the end of a data item. 

With application/cbor, it is an error for there to be more bytes.

With application/cbor-seq, further bytes are expected to be another data item. 

Another way to say this is that application/cbor is truly and fully self-delimiting and application/cbor-seq is not. With application/cbor-seq you rely on the unavailability of more bytes to know it is the end.

Maybe this answer’s Michaels question. If there was an implied indefinite length array around all CBOR, then all CBOR would have to either  a) rely on something external like the end of a transmission or b) have a CBOR break present to know it is the end. I don’t think we want that.

The net of it then seems that a protocol using CBOR should be clear about whether it is application/cbor or application/cbor-seq. (Doesn’t have to use MIME and MIME type and all that, just indicate some how).

LL