[Cbor] Re: draft-ietf-cbor-edn-literals-10 implementation notes

Joe Hildebrand <hildjj@cursive.net> Sat, 17 August 2024 19:10 UTC

Return-Path: <hildjj@cursive.net>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4E139C14F682 for <cbor@ietfa.amsl.com>; Sat, 17 Aug 2024 12:10:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cursive.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 22tLU4l7g_-k for <cbor@ietfa.amsl.com>; Sat, 17 Aug 2024 12:10:15 -0700 (PDT)
Received: from mail-io1-xd31.google.com (mail-io1-xd31.google.com [IPv6:2607:f8b0:4864:20::d31]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 580E8C14F5F6 for <cbor@ietf.org>; Sat, 17 Aug 2024 12:10:15 -0700 (PDT)
Received: by mail-io1-xd31.google.com with SMTP id ca18e2360f4ac-81fdaccd75eso116062839f.3 for <cbor@ietf.org>; Sat, 17 Aug 2024 12:10:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cursive.net; s=google; t=1723921814; x=1724526614; darn=ietf.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RJpea+u0YstWGK++Om9I5Kqqc0G9pZ4Kv8nmfcH87MA=; b=S3rfukBcAb5yp8GiqqKMRYKY45UClGAOx+22ifRIzl+VJEOsz7zfUECYmsG62HMcYr liPHoMEj+iLT+JM4CJwGUgcli5EJb30VQXk/nfXwbE5UyZ2BiRE3BPZADUo8pFyR0RZU pr0PzoQZm1JAZH/eLi7WDx8F/IJLfVO/aQBms=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723921814; x=1724526614; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RJpea+u0YstWGK++Om9I5Kqqc0G9pZ4Kv8nmfcH87MA=; b=WWnNbEP5Z1mOJVvbkWaWFJGamrlqw/q/zeDVa6GjYzF8ThdkUYQNIM8xFhQLj+cmQy bYIbxGJCS4lVOSgab5apG9o66P5gYT2w6t0FhuA4pgydAI0/Y++ca7eacj8dclvZZdwL sDwpFzraDjxK+EArnRVtSIGx3dVQNPSec6JZ4LX3CqQxAUc7UkI2VSr/bHjdoChxKKhS jhapV1+tsZagPQ8gEDZk88W0YAx0gIAZD2DtRFTT3GQkXTbtWm432AbzrzzAArBwTR88 izw8EZ+HnrE6396rpfZuG7si3KkS8aT8hvS/Sg6uvtlQjmrJJxvZzJhVMnur0ljz6ZKS AixQ==
X-Gm-Message-State: AOJu0YyesZwhQGe+0OP+MRIt67wnsV8xM3wbxuggTR0PT0QjoDMgWhYD iAVJ51lSU9c4IzHG7eMTcULpyhyS5Sdc3rwywVhaRtVdg3DM87ZDosh6dVzB9rM7u8vO+rpAQaQ =
X-Google-Smtp-Source: AGHT+IEM0JjTz3xHzlPYPZPhBRkSRRE3DWNcltN4PTw0udkj1TfEhK1RzYPt9VxSIC78zni2TtTJHw==
X-Received: by 2002:a05:6602:6d8a:b0:824:d5d2:2c8f with SMTP id ca18e2360f4ac-824f260fd58mr785325039f.1.1723921813735; Sat, 17 Aug 2024 12:10:13 -0700 (PDT)
Received: from smtpclient.apple ([2601:282:2181:450f:55a5:8a71:3193:301]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ccd6e7caedsm1985136173.3.2024.08.17.12.10.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 17 Aug 2024 12:10:13 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\))
From: Joe Hildebrand <hildjj@cursive.net>
In-Reply-To: <B6B33BBA-2844-4FBF-B93E-9BC63A76CD47@tzi.org>
Date: Sat, 17 Aug 2024 13:10:02 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <D8B4A36A-ADA7-46F0-98CF-4F1DC2AEFEEE@cursive.net>
References: <FFD7DB6F-2313-4A64-B434-110B1C524D0C@cursive.net> <4DDB6F09-D038-4A13-80F3-A4C13F3EF529@tzi.org> <2E46E3B5-6F88-49D0-AD75-10CF611BBFB5@cursive.net> <B6B33BBA-2844-4FBF-B93E-9BC63A76CD47@tzi.org>
To: Carsten Bormann <cabo@tzi.org>, Rohan Mahy <rohan.mahy@gmail.com>
X-Mailer: Apple Mail (2.3776.700.51)
Message-ID-Hash: FV4ZDQZPWZUEMOSXSV2BAQVRBVU63EHC
X-Message-ID-Hash: FV4ZDQZPWZUEMOSXSV2BAQVRBVU63EHC
X-MailFrom: hildjj@cursive.net
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: CBOR <cbor@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Cbor] Re: draft-ietf-cbor-edn-literals-10 implementation notes
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Rg_FOPlW3jMqiequfvm4Lcu4DBk>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

Note to Rohan: look for the part about PR#49.

(elided places where we've come to conclusions)


> On Aug 17, 2024, at 10:58 AM, Carsten Bormann <cabo@tzi.org> wrote:
> 
>>>> - s4.1 It should be an error to mix app-strings and strings, or app-strings of multiple types.  The text at the end of the section starting with "Some of the strings may be app-strings..." is not proscriptive enough.
>>> 
>>> This is a reasonable conclusion when you just look at the set of application-extensions that are in the document, but it breaks down when constructing strings from components that can be described by app-strings (something like a JWT that mixes base64-encoded strings with delimiters such as “.” and “~”).
>>> (The example would need to build the base64-encoded parts using application-oriented literals, for which we don’t have an application-oriented literal — b64 goes the other way around.  I find it quite likely that we’ll have application-oriented literals for JWT and CWT over time.)
>> 
>> Can we at least say that the output type of the app-strings need to be compatible enough to concatenate?
> 
> The last bullet in 4.1 tries to do that.

>> Take your 'f' approach from above; what would f'2' + f'3' mean?
> 
> " • Some of the strings may be app-strings. If the type of the app-string is an actual string, joining of chunked strings occurs as with directly notated strings; otherwise the occurrence of more than one app-string or an app-string together with a directly notated string cannot be processed.”
> 
> I read the answer as “cannot be processed”, which was a way to say “not allowed”.  Wording fixes appreciated.

Maybe after reading G.4 of 8610 several times I think I understand.  However, I feel strongly that all of G.4 needs to be pulled into this document and made more clear, with "+" in the correct places.  I'm willing to help with that effort.

This is an example of why the reliance on extending 8610 does not work for me, since the syntax here has *changed*, it's not being extended.

>> OK.  It would be nice to have just a few words in this spec about the concatenation, then.  Particularly because the commas are now optional unlike in 8610.
> 
> I would like to avoid the term concatenation, as we are not concatenating data-model level strings here, but we are assembling CBOR sequences (sequences of encoded data items).  This is briefly described in Section 4.2 of RFC 8742; this doesn’t outright extend EDN, but is the basis for allowing seq at the top level of the current ABNF.

8742 is not referenced by this doc.  This doc should have a complete description of at least the semantic notions that an implementor needs to process the ABNF.

>> There's not a *lot* of overhead, but there's enough that it will become noticeable, I think.  Therefore, I'm going to interpolate the app-string grammars into my larger grammar.  I hope I do it in a way that is similar enough to everyone else that it will interoperate and still allow extensibility correctly.  Maybe I'll get it right.
> 
> I think that is the point of the discussion about keeping some of PR # 49 around, either as an alternative (e.g., in an appendix) or as some wiki.

Then let me state for the record that I agree with Rohan.  I reserve the right to find nits in PR#49 until after I've implemented it, but it looks more-or-less on the right track to me.


>>>> - (nit) I'd prefer basenumber to be split into 3 or 4 rules, one for each base, since each needs special processing.
>>> 
>>> I personally like the more compact way, but I’ll put up a change proposal in a PR.
>> 
>> It's not worth the change if others had tools available to them to process 0x 0o and 0b strings (including hex floats) all in one place.  Maybe I could have gotten all of them except hexfloats if I'd been willing to use eval().  If we are going to make a change for this, I'd suggest hexfloats be separated from hexints as well, since they are more likely to need custom code.
> 
> Done in an update to #58.
> As I wrote, my code is now trivial, if a bit more repetitive.

Nod.  Like I said, it's a nit that I have worked around by doing this separation in my own version of the grammar.  The ABNF doesn't need to change unless someone else thinks it's useful.  

I strongly agree with the other change in PR#59 though.  "nonfin" is a much better name for the rule that includes NaN.

>>> ### Document structure
>> 
>> I *didn't* read 8610 as I sat down to implement this, and didn't understand that it was important to do so from the text.  I started work on a CDDL implementation months ago, but never got very far.  Nevertheless, I expected CDDL to be completely unrelated to implementing diagnostic notation,
> 
> It is.  It just happened to a document that we were completing at the time we decided to write up what became the E in EDN.
> I’ll look for opportunities to clarify this.

I'd much rather we make this doc not depend on 8610.  This doc does not say it updates 8610, and has an incompatible syntax.  Again, I'm willing to provide text and time.

>> and assumed I didn't need that doc.  In particular, section 1 says:
>> 
>> "This document sets forth a further step of evolution of EDN, and it is intended to serve as a single reference target in specifications that use EDN."
> 
> A single reference target doesn’t imply that this needs to be without references…

Fair enough.  That doesn't mean I agree that this doc achieves its goals in its current state, however.

> The current CSV represents a failure as 
> 
> -,mybadedn,
> 
> (i.e., “-“ for not OK, and leave the third column empty).
> 
>> Those probably should be in a separate file.  
> 
> I like them together with the related succeeding cases.

Ah, now that I understand the syntax, that works for me.  What would you think about having column headers to make it more clear?

I would have had the third column be just hex-encoded CBOR, but can live with the h'' notation I guess.

— 
Joe Hildebrand