Re: [Cbor] Review of draft-ietf-cbor-edn-literals-06

Carsten Bormann <cabo@tzi.org> Thu, 21 December 2023 14:06 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 19808C151073; Thu, 21 Dec 2023 06:06:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g5XlDplUccvk; Thu, 21 Dec 2023 06:06:40 -0800 (PST)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E9975C14CE3F; Thu, 21 Dec 2023 06:06:30 -0800 (PST)
Received: from eduroam-0160.wlan.uni-bremen.de (eduroam-0160.wlan.uni-bremen.de [134.102.16.160]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Swsfp3LbGzDCfM; Thu, 21 Dec 2023 15:06:26 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <ZXrsy__RRV2GVAdI@hephaistos.amsuess.com>
Date: Thu, 21 Dec 2023 15:06:22 +0100
Cc: cbor@ietf.org, draft-ietf-cbor-edn-literals@ietf.org
X-Mao-Original-Outgoing-Id: 724860382.925505-bc9bdc9b58c47f8e37751723201ee2db
Content-Transfer-Encoding: quoted-printable
Message-Id: <2CBBDA4A-9636-46A1-B3F4-CA1FAC2C78CA@tzi.org>
References: <ZXrsy__RRV2GVAdI@hephaistos.amsuess.com>
To: Christian Amsüss <christian@amsuess.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/AUNv_K0iSuWRwX4OTnZT58cu1tY>
Subject: Re: [Cbor] Review of draft-ietf-cbor-edn-literals-06
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Dec 2023 14:06:45 -0000

On 2023-12-14, at 12:53, Christian Amsüss <christian@amsuess.com> wrote:
> 
> Hello Carsten, hello group,
> 
> as part of shepherding this document, I'm also giving the full document
> one more read.

Thank you for this one as well!

I have generated a PR with multiple commits:

https://github.com/cbor-wg/edn-literal/pull/26

https://cbor-wg.github.io/edn-literal/shepherd-review/draft-ietf-cbor-edn-literals.html

https://author-tools.ietf.org/api/iddiff?doc_1=draft-ietf-cbor-edn-literals&url_2=https://cbor-wg.github.io/edn-literal/shepherd-review/draft-ietf-cbor-edn-literals.txt

Details below.

Grüße, Carsten

> It's in good over-all shape, and most comments are merely editorial.
> 
> * The introducction still reads like it adds application literals and
>  then slaps an ABNF on it. There are other changes (such as permitting
>  trailing commas, end-of line comments and the extensions to encoding
>  indicators and how to even apply them to some things), which would go
>  completely unnoticed by readers. I have no preference on whether those
>  would be enumerated here completely, or just summarizes as ~"closing
>  some gaps and fixing some details".

I went with the very brief " A few further additions close some gaps in usability.”.

5c87a56 * Shepherd review: Mention further additions

> * 'there is little expectation of "roundtripping"': There is no
>  expectation of roundtripping EDN-CBOR-EDN. A CBOR-EDN-CBOR round trip
>  is possible provided any non-default encoding choices are given
>  explicitly (at least if they're non-default), and map sequences are
>  preserved (even though they have no semantic impact).

I made this a bit more redundant now while limiting the paragraph to EDN-CBOR-EDN:

506efd7 * Shepherd review: *No* roundtripping EDN-CBOR-EDN

> * In addition, this document finally registers a media type identifier":
>  I don't think that this sentence belongs in section 2.

Moved it to Section 1.

6e30fd2 * Shepherd review: Move mention of media type registration

> * dt: time-tag will be done before this document.

As will be the SEDATE format, IXDTF.

> Does the presence of
>  extensions in a dt literal imply that it should be used?
> 
>  I.e., are those to be accepted and equivalent?
> 
>  DT"1996-12-19T16:39:57-08:00[America/Los_Angeles][u-ca=hebrew]"
>  1001({ 1: 851042397,
>     -10: "America/Los_Angeles",
>     -11: { "u-ca": "hebrew" }
>  })
> 
>  (The lowercase form would be unusable with those literals).

The lower case form could simply spit out the map inside the 1001 tag.
(Not so useful.)

More interesting is when the EDN ingester decides to move from 1 to 1001, and what choices are made there.
E.g., DT"1996-12-19T16:39:57.471108151234" cannot properly represented in tag 1, but can be in tag 1001 with picoseconds (key -12), or possibly extended decimals (key 4).
Not sure we are prepared to address this complexity right now; I’d expect we do EDN literals for 1001..1003 once we have a better understanding of the use cases.

So I think we can stick with the enticing simplicity of what dt/DT is now (stick with RFC 3339/~tdate input, always generate a Tag number 1, float/int decided by the presence of a decimal point).

>  It would sound like nice closure, but I'd like to avoid wasting time
>  -- can this be done with just a swap of references and a copy-paste of
>  into A.2.3, or is this harder?

I think it is harder.
We can always define one or more extensions for IXDTF-based notation of time-tags, once we have a better understanding of the use cases.

> * IP extension: Anticipating what I'd put into an ARTART review: Please
>  consider using IPv6 for the examples in the prose part. The examples
>  there are of prefixes anyway, so the examples wouldn't even get much
>  longer.

I think we need to balance this need with the need for the examples to immediately appeal to users of this specification, which may not all have arived in v6-land yet.
So I simply put in v6 *and* v4 (v6 first, of course).

4786f47 * Shepherd review: Add IPv6 examples to ip'' prose

> * 3.1 has language about this being an error and processing stopping
>  except in certain cases. (Which, as an implementer, I'd read implement
>  that as diag2cbor failing on encounter, and cbor2diag rendering them
>  as 999(...), unless they get some flag like --use-stand-in-tags).
>  Should the same apply to 3.2? (I do think so, and I'd implement them
>  behind the same switch).

Not sure that it needs to be the same switch, but otherwise fully agree.

7017fa7 * Shepherd review: Clarify that ingesting ellipses usually is an error

> * 3.2 states that EDN supports the use of an ellipsis, but while I do
>  remember that this was added, I found it neither in 8610 appendix G
>  nor in 8949. Where was that originally described?

PR #19/PR #20.
I worked the fix for this misunderstanding-waiting-to-happen into the previous commit.

> * 3.2, "elements of which are alternating": For interoperability it may
>  make sense to place a "usually" here, lest someone does [1, 2, ...,
>  ..., 3] and another implementation trips over it. (One could go
>  overboard with requirements that there needs to be at least one
>  888(null) in it, and that no two non-888 items can be adjacent, but
>  that'd be overkill for a diagnostic tool).

[1, 2, …, 3] is fine.
888([1, 2, 888(null), 3]) is not fine — the fragments need to be strings, and they do need to alternate.

$ edn-abnf -e "h'abcd' ... ... 'efgh'" -tedn
888([h'ABCD', 888(null), h'65666768’])
$ edn-abnf -e "... h'abcd' ... ... 'efgh' 'ijk'" -tedn
888([888(null), h'ABCD', 888(null), h'65666768696A6B’])

So I think the text says the right thing here.

> * 4.3 media type: Intended usage says COMMON. Given this is explicitly
>  not intended for interchange, it may make sense to put LIMITED USE
>  here -- nobody should expect that their peer can process it.

OK, that does make some sense.
But then, people are starting to use EDN for config files etc…

I wrote up a “Restrictions on usage” entry in

27ed7cc * Shepherd review: media type: LIMITED USE/Restrictions on usage

It is interesting to note that there are dozens of media types that combine

>> Intended usage: LIMITED USE
>> Restrictions on usage: none

…while RFC 6838 is quite explicit about using "Restrictions on usage” to explain "LIMITED USE”.

I’m wondering what kinds of confusion this change will cause, but I do like the opportunity for explaining that EDN is not a wholesale replacement for CBOR.

> * Remark: The media type (and content format) registrations go for a
>  single format, where CBOR is used both as application/cbor and as
>  application/...+cbor. This makes sense considering that any tool that
>  converts cbor2diag and knows the underlying format may easily add
>  sensible annotations (say, names to keys), and thus do away with the
>  need to know the precise format. In ambiguous situations, a converter
>  may prepend a comment on the original format into the file.
> 
>  (I've briefly pondered whether that makes sense as an optional
>  parameter, but probably not, especially because it complicates the
>  content-format situation).

I take the “remark:” remark as saying that there is no need for action.
(Defining a +cbor-diagnostic is probably not what we want to do…)

> * References:
> 
>  * Does 9165 need to be normative? It is referenced when talking about
>    what CDDL is (but we don't even use CDDL here, let alone the
>    extensions), and as a source reference for the date-time production,
>    but that is copied over and originatres from 3339 which is
>    normative.

Agree.

923e1b6 * Shepherd review: Make RFC 9165 informative.

>    Same goes for 8610-grammar.

It is somewhat weird that 8610 is normative and 8610-grammar is not, but that is because of Appendix G of 8610.

497ecfc * Shepherd review: Make update-8610-grammar informative.

>  * STD90 probably does indeed need to be normative because "the
>    escaping rules of JSON are applied" in section 2. It may be possible
>    to remove that normative reference (which I find a bit misleading,
>    you don't need to know JSON to implement this) by rephrasing that
>    "are applied" to refer to the existing STD94 string processing rules
>    (that reference can still say that those are inherited from JSON).

Well, EDN is extended JSON, so when implementing EDN you will apply your knowledge of JSON; less so with this document, as the meat is in the ABNF now.
We somehow made JSON informative in RFC 8949, though; so I’ll agree that we should follow suit here.

80193bb shepherd-review Shepherd review: Make STD90 informative.

>  * IANA registries are inconsistently normative and informative
>    references. I have no strong opinion what they should be, but
>    consistent.

I made them all normative, as you need to understand Content-Formats to implement them…

3fda126 * Shepherd review: Make all IANA referencs normative.

The question of whether IANA references should be normative is often rehashed during IESG review, so we’ll likely have that discussion again...

> * seq vs. item spaces: The ABNF allows space (including comments) in a
>  file when the root production is seq, but not when representing a
>  single item (unless that item is conventionally transported in a
>  sequence of exactly one element).
> 
>  Is the expectation that EDN implementations, when converting
>  diagnostic notation to a single CBOR item, unconditionally process the
>  diagnostic as a CBOR stream, and then do an extra check that the
>  stream contains just a single item?

This is where I’d like to nudge implementations toward.

> If not, it may make sense to place
>  the S around the item variants rather than in the seq.

The empty sequence still needs an S.

As a matter of consistency, with the exception of “seq” the current ABNF generally keeps each single production clean (no S around right-hand-sides).
I’d like to keep “item” clean; otherwise half the ABNF changes (risky).
So I added a

one-item        = S item S

as a minimal change.

dd43475 * Shepherd review: add "one-item" as a top-level for a single-item seq


> * Note that I have not mentally processed every single part of the ABNF,
>  I trust that this has been tested with parser imlementations that
>  could then process EDN.

I tested this with my implementation edn-abnf.
I think we would benefit from further implementations closely tracking this, but we can do another solicitation for those once the draft is in IESG.

> * Remark on the joining of chunked strings containing app-strings: Had
>  we had this from the onset, and had we not had the JSON history, this
>  might have replaced the backslash escapes completely:
> 
>  "Hello \ud83c\udf0d" could hve just been "Hello " u+"1f30d" (maybe
>  also "Hello " 127757_u), or (taking from Python's "\N{name}" notation)
>  "Hello " un"Earth globe Europe-Africa"
> 
>  I don't think that this warrants a remark in the text, but I find it a
>  noteworthy observation for the design of future formats with strings.

We should go ahead and define u’’ now in a new document…
U’’ could then be the name lookup variant...

> * such as hj'', h'0815: Missing trailing ' after 0815

c3c6d95 * Shepherd review: missing quote

> * A.2.4: What is PEG?

https://www.rfc-editor.org/rfc/rfc8610#appendix-A

c6e7410 * Shepherd review: explain PEG

> * B "Separator character.": "doesn't allow a trailing comma" is not
>  correct any more.

Indeed.

62b93da * Shepherd review: rectify separator/terminator discussion