[Cbor] Disposition of items from "ESON" discussion that might potentially be useful for EDN

Carsten Bormann <cabo@tzi.org> Sun, 28 January 2024 13:51 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09805C18DBB9 for <cbor@ietfa.amsl.com>; Sun, 28 Jan 2024 05:51:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.907
X-Spam-Level:
X-Spam-Status: No, score=-6.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id agEiUdXl8pwz for <cbor@ietfa.amsl.com>; Sun, 28 Jan 2024 05:51:20 -0800 (PST)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8EEEEC18DBB8 for <cbor@ietf.org>; Sun, 28 Jan 2024 05:51:19 -0800 (PST)
Received: from eduroam-0298.wlan.uni-bremen.de (eduroam-0298.wlan.uni-bremen.de [134.102.17.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4TNCWm1hRWzDCdF; Sun, 28 Jan 2024 14:51:16 +0100 (CET)
From: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mao-Original-Outgoing-Id: 728142675.43088-01764edb4ee587e41427f883f240973b
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
Date: Sun, 28 Jan 2024 14:51:15 +0100
Message-Id: <E9CACAEB-0F42-4987-837B-2AD558D1CE27@tzi.org>
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/wHk-wqWI-Nk9aq2hAgnSlaBsKTg>
Subject: [Cbor] Disposition of items from "ESON" discussion that might potentially be useful for EDN
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Jan 2024 13:51:25 -0000

We said [-1] we were going to have a look at the discussion currently going on in the JSON mailing list about a more humane version of JSON.

[-1]: https://mailarchive.ietf.org/arch/msg/cbor/iy8nJ3zrjOztUWbGBwSzWnl4MPE

Here is my take-away from this.
I propose we do a quick check if this has consensus and submit a -08 based on the PRs cited below.
The proposed changes to EDN are implemented in the edn-abnf PoC tool (0.1.5).
(The parallel change to CDDL is implemented in cddlc 0.2.5.)

* Backward compatibility = old data, new system.
* Forward compatibility = new data, old system.
We generally strive for backward compatibility, but rarely achieve forward compatibility.

Numbers in braces {4711} are items in Section 3 of [0].

[0]: https://hildjj.github.io/draft-hildebrand-eson-requirements/draft-hildebrand-eson-requirements.html#name-requirements


## (0) Checks we could explicitly mandate

I’m currently tending to consider these quality-of-implementation items.
We do not have text for these yet.

* 0.1 {17} Rejecting duplicate/equivalent keys

On the other hand, when using CBOR diagnostic notation as a diagnostic tool it is often useful to be able to express invalid data.

* 0.2 {2}{14} Rejecting floating point range errors (as in 1e1000)

On the other hand, 1e1000 is JSON’s Infinity, so it is not clear we aren’t losing some backwards compatibility here.

$ edn-abnf -tedn -e 1e1000
Infinity
$ edn-abnf -tjson -e Infinity
9e9999
$ edn-abnf -tjson -e 1e1000
9e9999

(What?  9e9999 seems to be a platform library thing.)


## (1) Items that we can and should pick up in EDN

* 1.1 {9} enable \u{1F913} style hex escapes in strings

This makes a lot of sense.  PR #28 [1], merged.

[1]: https://github.com/cbor-wg/edn-literal/pull/28

(This also gave rise to a similar change in update-8610-grammar, PR #7 over in that repo [2].  We'll need to re-submit that draft to the IESG with that change.)

[2]: https://github.com/cbor-wg/update-8610-grammar/pull/7


* 1.2 {12} Allow .3 and 3. for decimal and hex float

(... as opposed to requiring 0.3 and 3.0).
Better aligns the floating point syntax with Python, C, C++, IEEE 754, ...
PR #29, merged [3].

[3]: https://github.com/cbor-wg/edn-literal/pull/29

We still have an open comment that we should give guidance how to achieve that in an implementation on a platform that doesn’t allow that elision in its own library; I answered that in [4], but I don’t know whether we want to have similar text in the document (we generally don’t have this kind of implementation notes in this document).

[4]: https://mailarchive.ietf.org/arch/msg/cbor/UWw3NiSXlf7ai6EtNXXyjHzu3TE

(While doing this, we added some references for hex floats, PR #27 [5], merged.)

[5]: https://github.com/cbor-wg/edn-literal/pull/27

(Note that adding a similar relaxation of the syntax to CDDL is more difficult as the dot is also used for control operators there.
I also seems less necessary, as CDDL specs are rarely created from data dumps from other languages.
I propose not to make this change there.)


## (2) Items that are not easy to pick up in EDN

* 2.1 {5} parsing bare words as text string map keys

Problem: false, null, true, undefined, Infinity, NaN can occur in EDN as map keys, which will need to be able going forward for backwards compatibility.  If we create a list of exceptions for them, people would need to know these are not actually barewords for strings, creating a pitfall.
(See also N.2 below.)

There are also some Unicode-related issues: categories like XID_Start/XID_Continue are not available in ABNF, and there are a few normalization issues (that we successfully ignore in quote-enclosed text strings).

* 2.2 {6a} elidable commas between map entries and array elements

Problem: elidable commas conflict with the C-inspired [Section 6.4.5 of N2301, C++: Section 5.13.5 of N4860] string concatenation syntax in Appendix G.4 of RFC 8610 [6].
(We have elidable commas in CDDL, and while it is a nice feature in a specification language, it does sometimes create some pitfalls…)

[6]: https://www.rfc-editor.org/rfc/rfc8610.html#appendix-G.4

* 2.3 {4} exactly emulate some comment syntax, such as that of JS

Problem: We already have a quite functional comment syntax.  Changing this to look exactly like another syntax destroys backward compatibility:

/* JS-style long comment
   /
*/

(It is easy enough to change this into a compatible comment, as in

/* EDN-style long comment
   //
*/
)

* 2.4 {7} single-quoted strings

We do have single-quoted strings, but they are semantically text in binary strings (which don’t exist in JSON).
(The “other” quote does not need to be escaped in both text and binary strings of this form.)

## (3) Items that we already have

* 3.0 {1} include all of I-JSON; 
      {3} use UTF-8;
      {4} comment syntax (vs 2.3 above);

* 3.1 {6b} allow trailing commas in maps and arrays

* 3.2 {8} multi-line strings

Note that the ABNF says that U+000D is ignored on input in strings, so the multi-line strings have platform-independent semantics.
(Note also that we provide no way to dedent multi-line strings, which will make files that use them quite ugly.)

* 3.3 {10} no limits on integer range

(JSON-the-syntax also already has this feature; it is just hard to use in an I-JSON environment.)

* 3.4 {11} hex numbers

(and octal, binary, and hex floats)

* 3.5 {13} Infinity, -Infinity, NaN

(We don’t have NaN payloads, though.)

* 3.6 {15} + before numbers

* 3.8 {18} type extensibility

* 3.3 {19} date/time syntax

(Separate question: Do we need a CBOR tag for SEDATE-extended RFC 3339, i.e.,  IXDTF [7]?)

[7]: https://www.ietf.org/archive/id/draft-ietf-sedate-datetime-extended-11.html#name-internet-extended-date-time

* 3.7 {20} base64 encoded binary strings

Our way to cause "explicit interchange of binary data” is CBOR …


## (4) Items we likely do not want to do

* 4.1 {16} Any whitespace character with the Unicode class Zs SHALL be valid whitespace.

We are quite accepting here already, but U+00A0, U+2003 (and possibly U+2028, which is Zl though) etc. mostly add an unneeded Unicode dependency.


## (N) Even more random discussions (maybe too innovative)

* N.1 Allow leaving out the outer braces for a top-level map?

* N.2 Do an almost bareword syntax for map keys?

As in {
 :lat: 53,
 :lon: 8,
}

* N.3 Allow RS instead of comma for sequences (or something even closer to RFC 7464 Section 2.1)?

As a hack, that could easily be added (triggered by the leading RS).
While it would facilitate ingesting RFC 7464 style JSON sequences, it sounds less useful for the applications we see for EDN, which are resting on printable characters plus newline.


TL;DR:
Lots of discussion, whittled down to two PRs (and a related one for CDDL).

Grüße, Carsten