Re: [Cbor] Disposition of items from "ESON" discussion that might potentially be useful for EDN
Carsten Bormann <cabo@tzi.org> Thu, 01 February 2024 12:39 UTC
Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BF17FC14F69B for <cbor@ietfa.amsl.com>; Thu, 1 Feb 2024 04:39:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e_ESde4z45bK for <cbor@ietfa.amsl.com>; Thu, 1 Feb 2024 04:39:42 -0800 (PST)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F462C14F5F4 for <cbor@ietf.org>; Thu, 1 Feb 2024 04:39:40 -0800 (PST)
Received: from [192.168.217.145] (p548dcbf2.dip0.t-ipconnect.de [84.141.203.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4TQdlF3cStzDCcS; Thu, 1 Feb 2024 13:39:37 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <E9CACAEB-0F42-4987-837B-2AD558D1CE27@tzi.org>
Date: Thu, 01 Feb 2024 13:39:37 +0100
X-Mao-Original-Outgoing-Id: 728483976.9078161-19bf41fc9ee84d85daf8091442d77dd3
Content-Transfer-Encoding: quoted-printable
Message-Id: <13C1207E-2800-40E0-82DB-276CA2F8A13A@tzi.org>
References: <E9CACAEB-0F42-4987-837B-2AD558D1CE27@tzi.org>
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/q7focX2JEZY6RTF8pJuQn_kIumc>
Subject: Re: [Cbor] Disposition of items from "ESON" discussion that might potentially be useful for EDN
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Feb 2024 12:39:46 -0000
As I wasn’t hearing any more discussion, I submitted -08 with the changes discussed below. This differs from -07 in the following way [1]: ## Technical * Enable hex scalars \u{…} in strings, as in \u{1F913} for 🤓 * Allow eliding all-zeros before and after the decimal point (3. and .3) ## Editorial * Added references (754, C, C++ explaining origin of floating point format * Added a note about the use of platform number parses that may not provide for the above * (Note that the automatic submission today did not yet benefit from the fixes in xml2rfc 3.19.2, so the rendering for the subseries references is still incomplete.) [1]: https://author-tools.ietf.org/iddiff?url1=draft-ietf-cbor-edn-literals-07&url2=draft-ietf-cbor-edn-literals-08&difftype=--html edn-abnf 0.1.5 has the technical changes: gem update edn-abnf The source can also serve as an example how to use a platform parser that does not allow eliding all-zeros before or after the decimal point. $ edn-abnf -e '[.3, 3., "Hello \u{1f3bc}"]' [0.3, 3.0, "Hello 🎼"] The chairs can now check how we should handle this post-WGLC technical change. Grüße, Carsten > On 2024-01-28, at 14:51, Carsten Bormann <cabo@tzi.org> wrote: > > We said [-1] we were going to have a look at the discussion currently going on in the JSON mailing list about a more humane version of JSON. > > [-1]: https://mailarchive.ietf.org/arch/msg/cbor/iy8nJ3zrjOztUWbGBwSzWnl4MPE > > Here is my take-away from this. > I propose we do a quick check if this has consensus and submit a -08 based on the PRs cited below. > The proposed changes to EDN are implemented in the edn-abnf PoC tool (0.1.5). > (The parallel change to CDDL is implemented in cddlc 0.2.5.) > > * Backward compatibility = old data, new system. > * Forward compatibility = new data, old system. > We generally strive for backward compatibility, but rarely achieve forward compatibility. > > Numbers in braces {4711} are items in Section 3 of [0]. > > [0]: https://hildjj.github.io/draft-hildebrand-eson-requirements/draft-hildebrand-eson-requirements.html#name-requirements > > > ## (0) Checks we could explicitly mandate > > I’m currently tending to consider these quality-of-implementation items. > We do not have text for these yet. > > * 0.1 {17} Rejecting duplicate/equivalent keys > > On the other hand, when using CBOR diagnostic notation as a diagnostic tool it is often useful to be able to express invalid data. > > * 0.2 {2}{14} Rejecting floating point range errors (as in 1e1000) > > On the other hand, 1e1000 is JSON’s Infinity, so it is not clear we aren’t losing some backwards compatibility here. > > $ edn-abnf -tedn -e 1e1000 > Infinity > $ edn-abnf -tjson -e Infinity > 9e9999 > $ edn-abnf -tjson -e 1e1000 > 9e9999 > > (What? 9e9999 seems to be a platform library thing.) > > > ## (1) Items that we can and should pick up in EDN > > * 1.1 {9} enable \u{1F913} style hex escapes in strings > > This makes a lot of sense. PR #28 [1], merged. > > [1]: https://github.com/cbor-wg/edn-literal/pull/28 > > (This also gave rise to a similar change in update-8610-grammar, PR #7 over in that repo [2]. We'll need to re-submit that draft to the IESG with that change.) > > [2]: https://github.com/cbor-wg/update-8610-grammar/pull/7 > > > * 1.2 {12} Allow .3 and 3. for decimal and hex float > > (... as opposed to requiring 0.3 and 3.0). > Better aligns the floating point syntax with Python, C, C++, IEEE 754, ... > PR #29, merged [3]. > > [3]: https://github.com/cbor-wg/edn-literal/pull/29 > > We still have an open comment that we should give guidance how to achieve that in an implementation on a platform that doesn’t allow that elision in its own library; I answered that in [4], but I don’t know whether we want to have similar text in the document (we generally don’t have this kind of implementation notes in this document). > > [4]: https://mailarchive.ietf.org/arch/msg/cbor/UWw3NiSXlf7ai6EtNXXyjHzu3TE > > (While doing this, we added some references for hex floats, PR #27 [5], merged.) > > [5]: https://github.com/cbor-wg/edn-literal/pull/27 > > (Note that adding a similar relaxation of the syntax to CDDL is more difficult as the dot is also used for control operators there. > I also seems less necessary, as CDDL specs are rarely created from data dumps from other languages. > I propose not to make this change there.) > > > ## (2) Items that are not easy to pick up in EDN > > * 2.1 {5} parsing bare words as text string map keys > > Problem: false, null, true, undefined, Infinity, NaN can occur in EDN as map keys, which will need to be able going forward for backwards compatibility. If we create a list of exceptions for them, people would need to know these are not actually barewords for strings, creating a pitfall. > (See also N.2 below.) > > There are also some Unicode-related issues: categories like XID_Start/XID_Continue are not available in ABNF, and there are a few normalization issues (that we successfully ignore in quote-enclosed text strings). > > * 2.2 {6a} elidable commas between map entries and array elements > > Problem: elidable commas conflict with the C-inspired [Section 6.4.5 of N2301, C++: Section 5.13.5 of N4860] string concatenation syntax in Appendix G.4 of RFC 8610 [6]. > (We have elidable commas in CDDL, and while it is a nice feature in a specification language, it does sometimes create some pitfalls…) > > [6]: https://www.rfc-editor.org/rfc/rfc8610.html#appendix-G.4 > > * 2.3 {4} exactly emulate some comment syntax, such as that of JS > > Problem: We already have a quite functional comment syntax. Changing this to look exactly like another syntax destroys backward compatibility: > > /* JS-style long comment > / > */ > > (It is easy enough to change this into a compatible comment, as in > > /* EDN-style long comment > // > */ > ) > > * 2.4 {7} single-quoted strings > > We do have single-quoted strings, but they are semantically text in binary strings (which don’t exist in JSON). > (The “other” quote does not need to be escaped in both text and binary strings of this form.) > > ## (3) Items that we already have > > * 3.0 {1} include all of I-JSON; > {3} use UTF-8; > {4} comment syntax (vs 2.3 above); > > * 3.1 {6b} allow trailing commas in maps and arrays > > * 3.2 {8} multi-line strings > > Note that the ABNF says that U+000D is ignored on input in strings, so the multi-line strings have platform-independent semantics. > (Note also that we provide no way to dedent multi-line strings, which will make files that use them quite ugly.) > > * 3.3 {10} no limits on integer range > > (JSON-the-syntax also already has this feature; it is just hard to use in an I-JSON environment.) > > * 3.4 {11} hex numbers > > (and octal, binary, and hex floats) > > * 3.5 {13} Infinity, -Infinity, NaN > > (We don’t have NaN payloads, though.) > > * 3.6 {15} + before numbers > > * 3.8 {18} type extensibility > > * 3.3 {19} date/time syntax > > (Separate question: Do we need a CBOR tag for SEDATE-extended RFC 3339, i.e., IXDTF [7]?) > > [7]: https://www.ietf.org/archive/id/draft-ietf-sedate-datetime-extended-11.html#name-internet-extended-date-time > > * 3.7 {20} base64 encoded binary strings > > Our way to cause "explicit interchange of binary data” is CBOR … > > > ## (4) Items we likely do not want to do > > * 4.1 {16} Any whitespace character with the Unicode class Zs SHALL be valid whitespace. > > We are quite accepting here already, but U+00A0, U+2003 (and possibly U+2028, which is Zl though) etc. mostly add an unneeded Unicode dependency. > > > ## (N) Even more random discussions (maybe too innovative) > > * N.1 Allow leaving out the outer braces for a top-level map? > > * N.2 Do an almost bareword syntax for map keys? > > As in { > :lat: 53, > :lon: 8, > } > > * N.3 Allow RS instead of comma for sequences (or something even closer to RFC 7464 Section 2.1)? > > As a hack, that could easily be added (triggered by the leading RS). > While it would facilitate ingesting RFC 7464 style JSON sequences, it sounds less useful for the applications we see for EDN, which are resting on printable characters plus newline. > > > TL;DR: > Lots of discussion, whittled down to two PRs (and a related one for CDDL). > > Grüße, Carsten >
- [Cbor] Disposition of items from "ESON" discussio… Carsten Bormann
- Re: [Cbor] Disposition of items from "ESON" discu… Carsten Bormann