[Cbor] correctness of implied top level array?

Laurence Lundblade <lgl@island-resort.com> Sat, 23 February 2019 20:32 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0FCBC130E8C for <cbor@ietfa.amsl.com>; Sat, 23 Feb 2019 12:32:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pXHxVvG5TMAM for <cbor@ietfa.amsl.com>; Sat, 23 Feb 2019 12:32:29 -0800 (PST)
Received: from p3plsmtpa08-07.prod.phx3.secureserver.net (p3plsmtpa08-07.prod.phx3.secureserver.net [173.201.193.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5F0C7130E6E for <cbor@ietf.org>; Sat, 23 Feb 2019 12:32:29 -0800 (PST)
Received: from [10.198.0.82] ([167.160.116.44]) by :SMTPAUTH: with ESMTPSA id xdyGgfjEF1rzHxdyGgIKwS; Sat, 23 Feb 2019 13:32:28 -0700
From: Laurence Lundblade <lgl@island-resort.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Message-Id: <81789050-5133-48B0-BEE7-4F1E0BBB4C06@island-resort.com>
Date: Sat, 23 Feb 2019 12:32:27 -0800
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3445.9.1)
X-CMAE-Envelope: MS4wfJpmmae9Yuy+GSuUI74/w/4AMaRC221iKJ8CFebjYwKfX3PqbRWUuaQnxI+Je/UiUnYA3UuvWiWD3BiR1emhW9kszt7PeavEhsU3EmLuSt4Xz6zJxCWW q62aEAdFR+YBRC6Zjn3eW4zJJSxhGKC5KNA0zof7QUjXebVwFjscjZCE
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/HOstsc18WYOHBsSTBgVYy3koji4>
Subject: [Cbor] correctness of implied top level array?
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2019 20:32:31 -0000

The current CBOR playground works like this:

    Input 0x00, (one byte) and it decodes to a single integer of value 0
    Input 0x82 0x00 0x00 (three bytes) and it decodes to an array of two zeros
    Input 0x00 0x00 (two bytes) and it decodes to a single integer with one extra byte, an error
    Similarly substitute 0x40, a zero length byte string, for 0x00 and it works the same as the above three
    Similarly substitute 0xf4, “false", for 0x00 and it works the same above three

That is, the top level of a chunk of encoded CBOR can either be a map, an array or a single non-structured data item like an integer, string or simple type. 

I think in theory you could have an implied array at the top level and allow 0x00 0x00 to decode correctly, but it is probably better not to allow this implied array. It might save someone a byte or two or three or five or nine, but I think allowing it creates an unnecessary special case. 

So 0x00, 0x00 and such should be documented as not-well-formed in the RFC and the Appendix C pseudo code should be fixed by tracking the whether the top-level is being decoded or not.

LL