[Cbor] EDN redux

Carsten Bormann <cabo@tzi.org> Wed, 07 August 2024 10:51 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C7D0DC14CF18 for <cbor@ietfa.amsl.com>; Wed, 7 Aug 2024 03:51:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UqytogUyug5w for <cbor@ietfa.amsl.com>; Wed, 7 Aug 2024 03:51:01 -0700 (PDT)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E087C14F6BF for <cbor@ietf.org>; Wed, 7 Aug 2024 03:51:00 -0700 (PDT)
Received: from clients-pool3-0220.vpn.uni-bremen.de (clients-pool3-0220.vpn.uni-bremen.de [134.102.69.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Wf6R55q54zDCf1; Wed, 7 Aug 2024 12:50:57 +0200 (CEST)
From: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mao-Original-Outgoing-Id: 744720657.367918-f0d956c1b3f871cb92074951132c97a5
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
Date: Wed, 07 Aug 2024 12:50:57 +0200
Message-Id: <7C93E178-2D2A-44E1-96A2-EB55048052DD@tzi.org>
To: CBOR <cbor@ietf.org>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Message-ID-Hash: VANW3IDQKQQM7YL4RD7JSX2VOOSVQ7DI
X-Message-ID-Hash: VANW3IDQKQQM7YL4RD7JSX2VOOSVQ7DI
X-MailFrom: cabo@tzi.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Cbor] EDN redux
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/gTeu3xvpeAZGrc8RGlA2L9_PEf8>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

We didn't have enough time in Vancouver to complete the discussion
about the EDN ABNF in draft-ietf-cbor-edn-literals.

## Explanation: Analog

In Vancouver, Christian Amsüss had a good explanation in the form of
an analog that I would like to try repeating here.

For JSON, RFC 8259 provides an ABNF.
This ABNF describes the grammar for valid JSON texts.
These JSON texts are typically parsed by a JSON parser to yield a
(parsed) JSON value.
The JSON value is not described by RFC 8259 in any formal way, but
intuition (= previous knowledge) about programming languages with
built-in data representation formats (and specifically JavaScript)
makes it relatively clear how the JSON value looks like and is
obtained from the parsed JSON text.

Some application may want to transport a URI in a JSON string.
RFC 3986 provides an ABNF grammar for URIs.
(This was done before JSON was written up, but the sequence here isn't
relevant; it just shows the independence/loose coupling achieved.)

To use URIs in a JSON text, the application parses/validates the JSON
text (e.g., using the ABNF in RFC 8259), constructs the JSON value
(which embeds a JSON string) based on the above intuition (ABNF
doesn't help with that), and then parses/validates the JSON string
(e.g., using the ABNF in RFC 3986).

This procedure is so obvious I'm not aware of any document than even
describes how the two ABNF grammars interact (they don't, directly,
only via the intuition-based transformation of the parsed JSON text
into a JSON value).

## Applying the analog to EDN

We would like to do exactly the same with application-oriented
literals in EDN: represented in sqstr (~ JSON text), parsed/validated
(e.g., using the ABNF in Section 4.1 of EDN), transformed to the
content of the sqstr (same intuition works here right away, as sqstr
is mostly adapted from JSON strings), then parse/validate (e.g. using
the ABNF in 4.2.x or a separate document) for the specific application
oriented literal.

What PR #49 proposes would be as follows in the JSON/URI example:

* Define ABNF that merges the ABNF grammar for strings in JSON texts
  (RFC 8259), the transformation of parsed input into a JSON value,
  and the ABNF grammar for URIs (RFC 3986), into a single piece of
  "single-pass" ABNF.

* Modify the ABNF grammar for JSON to include this new ABNF.

* Build a new JSON parser that includes this modified grammar; make
  sure that such a modified JSON parser is available everywhere where
  JSON is ingested with URIs to be included in JSON strings.

This clearly *can* be done.

It is not what we want to do; we are quite happy with the way JSON is
processed today.
(EDN is an extended form of JSON, by the way, so the analog is
actually rather close here.)

## Side discussions

The side discussion on whether ABNF is used correctly in the WG
specification is somewhat sophistic.
I can't follow this discussion, as it is not about anything that is
material for the use of ABNF and its ecosystem for EDN.

We cited RFC 8288 as an example that corrected the earlier approach of
the spec it obsoleted (RFC 5988), which was conflating the base
grammar of Web links with the grammars for each specific parameter in
a similar way that PR #49 does.

That approach created interoperability problems (as many of us noticed
when implementing RFC 6690, which is based on the single-pass grammar
attempt in RFC 5988).

### Side-side discussion

RFC 8288 describes individual parameter grammars without typing in a
name and an equals sign as in the ABNF production in ABNF's grammar:

   rule           =  rulename defined-as elements c-nl

...essentially only keeping elements and c-nl; one can discuss whether
that description technique is strictly following STD68.
We are not doing this (all ABNF grammars in the WG spec parse
correctly using the grammar for ABNF in RFC 5234), so this point is
not relevant for this discussion.

## Conclusion

I'm proposing that we now decide to keep the approach for describing
EDN that maintains separation between the parsing and transformation
of sqstr on one hand and the grammars for individual
application-oriented literals on the other hand.
This will allow us to use ABNF grammars such as those in RFC 3986 and
RFC 3339 as well as those of future application-oriented literals
already described in RFCs, unchanged.

With that, I'm asking to stop blocking on the proposal made in PR #49
and continue processing the document.

Grüße, Carsten