Re: [Cbor] [Technical Errata Reported] RFC8610 (6527)

Carsten Bormann <cabo@tzi.org> Wed, 14 April 2021 05:53 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B1873A0AA7 for <cbor@ietfa.amsl.com>; Tue, 13 Apr 2021 22:53:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dDDoLjyjexvA for <cbor@ietfa.amsl.com>; Tue, 13 Apr 2021 22:53:54 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C5F063A0A9F for <cbor@ietf.org>; Tue, 13 Apr 2021 22:53:53 -0700 (PDT)
Received: from [192.168.217.110] (p548dc178.dip0.t-ipconnect.de [84.141.193.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4FKs9C376nzyV2; Wed, 14 Apr 2021 07:53:51 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <5a2abb26-2a17-4b1b-b491-7bac8485e69a@www.fastmail.com>
Date: Wed, 14 Apr 2021 07:53:50 +0200
Cc: Francesca Palombini <francesca.palombini=40ericsson.com@dmarc.ietf.org>, Barry Leiba <barryleiba@computer.org>, Henk Birkholz <henk.birkholz@sit.fraunhofer.de>, "cbor@ietf.org" <cbor@ietf.org>, "Murray S. Kucherawy" <superuser@gmail.com>, =?utf-8?Q?Christian_Ams=C3=BCss?= <christian@amsuess.com>, "christoph.vigano@uni-bremen.de" <christoph.vigano@uni-bremen.de>
Content-Transfer-Encoding: quoted-printable
Message-Id: <DF1E72C2-D300-4561-A991-60D48F6EC027@tzi.org>
References: <20210411161045.9648FF40799@rfc-editor.org> <4986660B-EDCC-4D07-A74E-BBEBE698721D@tzi.org> <2E410DD1-D0E2-4137-B7E7-7FB18CF71971@tzi.org> <CALaySJJAzJgtQY9wuF1dgCQRfTSAz3Ofva-N-EwqcFGo_d6XEw@mail.gmail.com> <513F7F4F-E791-4B96-AF3E-42A7B1447EF7@ericsson.com> <73c7a676bea744e48390f9fdb2639843@DM6PR11MB3834.namprd11.prod.outlook.com> <5a2abb26-2a17-4b1b-b491-7bac8485e69a@www.fastmail.com>
To: Sean Bartell <smbarte2@illinois.edu>
X-Mailer: Apple Mail (2.3654.60.0.2.21)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/bDVcl5zxrcqAw-4H-XqY9ap628Q>
Subject: Re: [Cbor] [Technical Errata Reported] RFC8610 (6527)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Apr 2021 05:53:59 -0000

> This ABNF looks correct to me, but it obfuscates more than it clarifies.

I don’t agree — it is a very clear statement of what is allowed.

> I would suggest following the example of the JSON RFC (https://www.rfc-editor.org/rfc/rfc8259.html#section-7). The ABNF there allows arbitrary hex digits (%x75 4HEXDIG), but the text clarifies how non-BMP characters are handled.

RFC 8259 is a product of a process that was more political than technical.  There was a lot of discussion of unpaired surrogates etc., spilling over from UTF-16 land.
There was always a danger of ECMA bursting out of the fragile agreement to align ECMA 404 and RFC 8259, and that precarious situation limited the amount of fixing that could be done between 7159 and 8259.
While RFC 8259 makes it clear that the JSON text itself needs to be UTF-8, it wasn't really possible to summon the courage to write text that explicitly disallows constructing invalid text from escapes in text string data items.

To summarize, RFC 8259 is in many ways not a good model for how to specify things.

As my ABNF proposal shows, it is easy to specify how surrogates can be used in the ABNF, and doing that saves a validation process step in the implementation.

> A similar situation already exists for base64 bytestrings, for which CDDL's ABNF allows invalid base64.

Yes, but it is much harder to do an ABNF that combines all the syntax for whitespace, comments, *and* that for base64.

> It's also not clear to me whether \' should be allowed in text strings,

The text in section 3.1 doesn’t support allowing that:

   o  Text strings are enclosed by double quotation '"' characters.
      They follow the conventions for strings as defined in Section 7 of
      [RFC8259].  […]

> or whether \" should be allowed in byte strings.

It is:

      If unprefixed, the string is
      interpreted as with a text string, except that single quotes must
      be escaped and that the resulting UTF-8 bytes are marked as a byte
      string (major type 2).

Note the absence of a statement like “double quotes must not be escaped”.

> Allowing them would make both types of strings consistent with each other, but prohibiting them would let text strings use the exact same syntax as JSON.

JSON doesn’t have byte strings, but for text strings we indeed should have full alignment.

> And one more issue: the ABNF currently allows CRLF in unprefixed byte strings, and I'm not sure whether that's intentional.

It is.
(And that is important for draft-ietf-cbor-cddl-control.)

BTW:
> FWIW, I'm not actually using CDDL at the moment. I'm using the extended diagnostic notation, and I was using the CDDL ABNF to narrow down my choice of syntax.

Maybe providing ABNF for EDN would be a good idea, after all.
It *is* subtly different in a number of details from CDDL syntax.
One formal interpretation of EDN syntax (albeit not in ABNF form, as it predates the abnftt tool) can be found in https://github.com/cabo/cbor-diag/blob/master/lib/cbor-diag-parser.treetop (and, yes, this contains the equivalent of the surrogate pair handling that is in the ABNF I proposed).

Grüße, Carsten