Re: [Cbor] Disposition of items from "ESON" discussion that might potentially be useful for EDN

Carsten Bormann <cabo@tzi.org> Thu, 01 February 2024 12:39 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BF17FC14F69B for <cbor@ietfa.amsl.com>; Thu, 1 Feb 2024 04:39:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e_ESde4z45bK for <cbor@ietfa.amsl.com>; Thu, 1 Feb 2024 04:39:42 -0800 (PST)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F462C14F5F4 for <cbor@ietf.org>; Thu, 1 Feb 2024 04:39:40 -0800 (PST)
Received: from [192.168.217.145] (p548dcbf2.dip0.t-ipconnect.de [84.141.203.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4TQdlF3cStzDCcS; Thu, 1 Feb 2024 13:39:37 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <E9CACAEB-0F42-4987-837B-2AD558D1CE27@tzi.org>
Date: Thu, 01 Feb 2024 13:39:37 +0100
X-Mao-Original-Outgoing-Id: 728483976.9078161-19bf41fc9ee84d85daf8091442d77dd3
Content-Transfer-Encoding: quoted-printable
Message-Id: <13C1207E-2800-40E0-82DB-276CA2F8A13A@tzi.org>
References: <E9CACAEB-0F42-4987-837B-2AD558D1CE27@tzi.org>
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/q7focX2JEZY6RTF8pJuQn_kIumc>
Subject: Re: [Cbor] Disposition of items from "ESON" discussion that might potentially be useful for EDN
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Feb 2024 12:39:46 -0000

As I wasn’t hearing any more discussion, I submitted -08 with the changes discussed below.

This differs from -07 in the following way [1]:

## Technical
* Enable hex scalars \u{…} in strings, as in \u{1F913} for 🤓
* Allow eliding all-zeros before and after the decimal point (3. and .3)
## Editorial
* Added references (754, C, C++ explaining origin of floating point format
* Added a note about the use of platform number parses that may not provide for the above
* (Note that the automatic submission today did not yet benefit from the fixes in xml2rfc 3.19.2, so the rendering for the subseries references is still incomplete.)

[1]: https://author-tools.ietf.org/iddiff?url1=draft-ietf-cbor-edn-literals-07&url2=draft-ietf-cbor-edn-literals-08&difftype=--html

edn-abnf 0.1.5 has the technical changes:    gem update edn-abnf
The source can also serve as an example how to use a platform parser that does not allow eliding all-zeros before or after the decimal point.

$ edn-abnf -e '[.3, 3., "Hello \u{1f3bc}"]'
[0.3, 3.0, "Hello 🎼"]

The chairs can now check how we should handle this post-WGLC technical change.

Grüße, Carsten


> On 2024-01-28, at 14:51, Carsten Bormann <cabo@tzi.org> wrote:
> 
> We said [-1] we were going to have a look at the discussion currently going on in the JSON mailing list about a more humane version of JSON.
> 
> [-1]: https://mailarchive.ietf.org/arch/msg/cbor/iy8nJ3zrjOztUWbGBwSzWnl4MPE
> 
> Here is my take-away from this.
> I propose we do a quick check if this has consensus and submit a -08 based on the PRs cited below.
> The proposed changes to EDN are implemented in the edn-abnf PoC tool (0.1.5).
> (The parallel change to CDDL is implemented in cddlc 0.2.5.)
> 
> * Backward compatibility = old data, new system.
> * Forward compatibility = new data, old system.
> We generally strive for backward compatibility, but rarely achieve forward compatibility.
> 
> Numbers in braces {4711} are items in Section 3 of [0].
> 
> [0]: https://hildjj.github.io/draft-hildebrand-eson-requirements/draft-hildebrand-eson-requirements.html#name-requirements
> 
> 
> ## (0) Checks we could explicitly mandate
> 
> I’m currently tending to consider these quality-of-implementation items.
> We do not have text for these yet.
> 
> * 0.1 {17} Rejecting duplicate/equivalent keys
> 
> On the other hand, when using CBOR diagnostic notation as a diagnostic tool it is often useful to be able to express invalid data.
> 
> * 0.2 {2}{14} Rejecting floating point range errors (as in 1e1000)
> 
> On the other hand, 1e1000 is JSON’s Infinity, so it is not clear we aren’t losing some backwards compatibility here.
> 
> $ edn-abnf -tedn -e 1e1000
> Infinity
> $ edn-abnf -tjson -e Infinity
> 9e9999
> $ edn-abnf -tjson -e 1e1000
> 9e9999
> 
> (What?  9e9999 seems to be a platform library thing.)
> 
> 
> ## (1) Items that we can and should pick up in EDN
> 
> * 1.1 {9} enable \u{1F913} style hex escapes in strings
> 
> This makes a lot of sense.  PR #28 [1], merged.
> 
> [1]: https://github.com/cbor-wg/edn-literal/pull/28
> 
> (This also gave rise to a similar change in update-8610-grammar, PR #7 over in that repo [2].  We'll need to re-submit that draft to the IESG with that change.)
> 
> [2]: https://github.com/cbor-wg/update-8610-grammar/pull/7
> 
> 
> * 1.2 {12} Allow .3 and 3. for decimal and hex float
> 
> (... as opposed to requiring 0.3 and 3.0).
> Better aligns the floating point syntax with Python, C, C++, IEEE 754, ...
> PR #29, merged [3].
> 
> [3]: https://github.com/cbor-wg/edn-literal/pull/29
> 
> We still have an open comment that we should give guidance how to achieve that in an implementation on a platform that doesn’t allow that elision in its own library; I answered that in [4], but I don’t know whether we want to have similar text in the document (we generally don’t have this kind of implementation notes in this document).
> 
> [4]: https://mailarchive.ietf.org/arch/msg/cbor/UWw3NiSXlf7ai6EtNXXyjHzu3TE
> 
> (While doing this, we added some references for hex floats, PR #27 [5], merged.)
> 
> [5]: https://github.com/cbor-wg/edn-literal/pull/27
> 
> (Note that adding a similar relaxation of the syntax to CDDL is more difficult as the dot is also used for control operators there.
> I also seems less necessary, as CDDL specs are rarely created from data dumps from other languages.
> I propose not to make this change there.)
> 
> 
> ## (2) Items that are not easy to pick up in EDN
> 
> * 2.1 {5} parsing bare words as text string map keys
> 
> Problem: false, null, true, undefined, Infinity, NaN can occur in EDN as map keys, which will need to be able going forward for backwards compatibility.  If we create a list of exceptions for them, people would need to know these are not actually barewords for strings, creating a pitfall.
> (See also N.2 below.)
> 
> There are also some Unicode-related issues: categories like XID_Start/XID_Continue are not available in ABNF, and there are a few normalization issues (that we successfully ignore in quote-enclosed text strings).
> 
> * 2.2 {6a} elidable commas between map entries and array elements
> 
> Problem: elidable commas conflict with the C-inspired [Section 6.4.5 of N2301, C++: Section 5.13.5 of N4860] string concatenation syntax in Appendix G.4 of RFC 8610 [6].
> (We have elidable commas in CDDL, and while it is a nice feature in a specification language, it does sometimes create some pitfalls…)
> 
> [6]: https://www.rfc-editor.org/rfc/rfc8610.html#appendix-G.4
> 
> * 2.3 {4} exactly emulate some comment syntax, such as that of JS
> 
> Problem: We already have a quite functional comment syntax.  Changing this to look exactly like another syntax destroys backward compatibility:
> 
> /* JS-style long comment
>   /
> */
> 
> (It is easy enough to change this into a compatible comment, as in
> 
> /* EDN-style long comment
>   //
> */
> )
> 
> * 2.4 {7} single-quoted strings
> 
> We do have single-quoted strings, but they are semantically text in binary strings (which don’t exist in JSON).
> (The “other” quote does not need to be escaped in both text and binary strings of this form.)
> 
> ## (3) Items that we already have
> 
> * 3.0 {1} include all of I-JSON; 
>      {3} use UTF-8;
>      {4} comment syntax (vs 2.3 above);
> 
> * 3.1 {6b} allow trailing commas in maps and arrays
> 
> * 3.2 {8} multi-line strings
> 
> Note that the ABNF says that U+000D is ignored on input in strings, so the multi-line strings have platform-independent semantics.
> (Note also that we provide no way to dedent multi-line strings, which will make files that use them quite ugly.)
> 
> * 3.3 {10} no limits on integer range
> 
> (JSON-the-syntax also already has this feature; it is just hard to use in an I-JSON environment.)
> 
> * 3.4 {11} hex numbers
> 
> (and octal, binary, and hex floats)
> 
> * 3.5 {13} Infinity, -Infinity, NaN
> 
> (We don’t have NaN payloads, though.)
> 
> * 3.6 {15} + before numbers
> 
> * 3.8 {18} type extensibility
> 
> * 3.3 {19} date/time syntax
> 
> (Separate question: Do we need a CBOR tag for SEDATE-extended RFC 3339, i.e.,  IXDTF [7]?)
> 
> [7]: https://www.ietf.org/archive/id/draft-ietf-sedate-datetime-extended-11.html#name-internet-extended-date-time
> 
> * 3.7 {20} base64 encoded binary strings
> 
> Our way to cause "explicit interchange of binary data” is CBOR …
> 
> 
> ## (4) Items we likely do not want to do
> 
> * 4.1 {16} Any whitespace character with the Unicode class Zs SHALL be valid whitespace.
> 
> We are quite accepting here already, but U+00A0, U+2003 (and possibly U+2028, which is Zl though) etc. mostly add an unneeded Unicode dependency.
> 
> 
> ## (N) Even more random discussions (maybe too innovative)
> 
> * N.1 Allow leaving out the outer braces for a top-level map?
> 
> * N.2 Do an almost bareword syntax for map keys?
> 
> As in {
> :lat: 53,
> :lon: 8,
> }
> 
> * N.3 Allow RS instead of comma for sequences (or something even closer to RFC 7464 Section 2.1)?
> 
> As a hack, that could easily be added (triggered by the leading RS).
> While it would facilitate ingesting RFC 7464 style JSON sequences, it sounds less useful for the applications we see for EDN, which are resting on printable characters plus newline.
> 
> 
> TL;DR:
> Lots of discussion, whittled down to two PRs (and a related one for CDDL).
> 
> Grüße, Carsten
>