[Cbor] Re: draft-ietf-cbor-edn-literals-10 implementation notes
Joe Hildebrand <hildjj@cursive.net> Sat, 17 August 2024 15:55 UTC
Return-Path: <hildjj@cursive.net>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 96F35C14F61A for <cbor@ietfa.amsl.com>; Sat, 17 Aug 2024 08:55:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.107
X-Spam-Level:
X-Spam-Status: No, score=-2.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cursive.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mOAOoC-0HqC2 for <cbor@ietfa.amsl.com>; Sat, 17 Aug 2024 08:55:42 -0700 (PDT)
Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A0A27C14F699 for <cbor@ietf.org>; Sat, 17 Aug 2024 08:55:22 -0700 (PDT)
Received: by mail-io1-xd34.google.com with SMTP id ca18e2360f4ac-8223685fbe9so138357339f.1 for <cbor@ietf.org>; Sat, 17 Aug 2024 08:55:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cursive.net; s=google; t=1723910122; x=1724514922; darn=ietf.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VEReTquTFMgfyEoUWL9DJJq4ca3Z8zyoA1jDBok8q2k=; b=OklwErgkZQPTutga9EQxHzaJ9FThBLSXs+L5YeFBtxnrzzek4hTkINyPn8f+D+4OCM Ny1xOsn1iYNGCItl8TBspZAzplRQwH+Afb1Z80T1jpXdLB3FjqikGrVIQlKd8c/WTA0L VnJwX2RUCfIWbLk0Op7CsKPxky1R5BRyeWCx0=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723910122; x=1724514922; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VEReTquTFMgfyEoUWL9DJJq4ca3Z8zyoA1jDBok8q2k=; b=f26urcoHFj0Dq5MY5q0Q4VQiURZLqv81J9gR9sXolVibc/lh19i2cGlFQ6hGkCyv5k QQ3Ye7aaKyVfdkrZn37M5YgY6qijBJFbUU7P1ibbCKDRcYQsdlVyA+31NP3fLKa9Ro73 Vg8Lsu/e8eoTwSNDraYoefUg7jtmGOpLpfs7tZlIQHgVU3+4+QZ03HK2MPPqSjPA9N1v JuB0deLc0S5Z/ahUnjqmkpI1ZYJaISG/ty8b3A8Dy9xf9rAazvxcihm5l+xwEvro0+gG CAx9kWrcF1EGk/9/cyFcw9o2T/xFO/J9QduCEPrvvhoAK6ZOpF8Fmn5Yu4ZgIxUJrx3H PWLA==
X-Gm-Message-State: AOJu0Yz/EQKGQV4l3TN86HmlsnUjT9qfMfMLZFDsjxXe4pt8wGxhMB3r onyjrIKxP7+MjrrdXHwESIyz8JpEiqHSCZuvOKSXzueJ2wm0ToKhF61bjTuBCCO7p2SYMiF9Uu0 =
X-Google-Smtp-Source: AGHT+IEX2az+F2Q95RdRlzu5AsJq+NiTPmfY7nSr9fowxI2wj2DKjIHwxRoVrM2Ly9vI1zIA28eksA==
X-Received: by 2002:a05:6602:6209:b0:822:43ef:99f3 with SMTP id ca18e2360f4ac-824f25e4aa6mr757516239f.3.1723910121318; Sat, 17 Aug 2024 08:55:21 -0700 (PDT)
Received: from smtpclient.apple ([2601:282:2181:450f:55a5:8a71:3193:301]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ccd6f3d91dsm1909913173.96.2024.08.17.08.55.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 17 Aug 2024 08:55:20 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\))
From: Joe Hildebrand <hildjj@cursive.net>
In-Reply-To: <4DDB6F09-D038-4A13-80F3-A4C13F3EF529@tzi.org>
Date: Sat, 17 Aug 2024 09:55:09 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <2E46E3B5-6F88-49D0-AD75-10CF611BBFB5@cursive.net>
References: <FFD7DB6F-2313-4A64-B434-110B1C524D0C@cursive.net> <4DDB6F09-D038-4A13-80F3-A4C13F3EF529@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3776.700.51)
Message-ID-Hash: JH4FV4IQHDNLOXVT6LVT4KYPGETX3H4Y
X-Message-ID-Hash: JH4FV4IQHDNLOXVT6LVT4KYPGETX3H4Y
X-MailFrom: hildjj@cursive.net
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: CBOR <cbor@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Cbor] Re: draft-ietf-cbor-edn-literals-10 implementation notes
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/B76nM9uIgVz5nbMrcya4H-rr8x8>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>
> On Aug 17, 2024, at 5:40 AM, Carsten Bormann <cabo@tzi.org> wrote: > > > ## Proposals for technical changes > >> - I wish that floating point encodings had a separate spec syntax from integers, rather than relying on getting a decimal point or "e" into the output somehow (e.g. JS doesn't have good float formatting built in). For example, 2_f1 could mean 0xf94000, while 2_1 would mean 0x190002. > > We have used diagnostic notation in a different way for a long time. > So I’d consider this a technical change. > > The specific change suggested here violates an invariant of encoding indicators: You can leave them off and will still get an equivalent data item from a data model point of view. > > (It did take us some time to learn enough from implementations to evolve the CBOR data model from a muddy boundary between integer and float in RFC 7049 to a much more well-defined boundary that we now have in RFC 8949.) > > I would probably solve the specific use case by two application-oriented extension literals: > > f'2' is equivalent to 2.0, not 2 > i'1e3' is equivalent to 1000, not 1000.0 Nod. After thinking about it some more, I don't think it's important to add. Let me try a couple of things I thought of overnight before we add this complexity. > >> - s4.1 It should be an error to mix app-strings and strings, or app-strings of multiple types. The text at the end of the section starting with "Some of the strings may be app-strings..." is not proscriptive enough. > > This is a reasonable conclusion when you just look at the set of application-extensions that are in the document, but it breaks down when constructing strings from components that can be described by app-strings (something like a JWT that mixes base64-encoded strings with delimiters such as “.” and “~”). > (The example would need to build the base64-encoded parts using application-oriented literals, for which we don’t have an application-oriented literal — b64 goes the other way around. I find it quite likely that we’ll have application-oriented literals for JWT and CWT over time.) Can we at least say that the output type of the app-strings need to be compatible enough to concatenate? Take your 'f' approach from above; what would f'2' + f'3' mean? >> - It's not clear to me what happens if there are multiple items in the sequence inside <<>>. I assume they are concatenated together, even though that's a little odd unless you are generating cbor streams. I would have expected the production to be: >> >> embedded = "<<" one-item ">>" > > (Current ABNF is: > > embedded = "<<" seq ">>" > > where seq is a CBOR sequence.) > > This syntax for Embedded CBOR sequences is quite explicitly part of Appendix G.3 of RFC 8610. > Embedded CBOR sequences are a natural way to employ the fact that the byte string encoding already provides an outer delineation, so an array head is superfluous. I’ve seen them used in various contexts but don’t have an example handy off the top of my head. OK. It would be nice to have just a few words in this spec about the concatenation, then. Particularly because the commas are now optional unlike in 8610. > >> You would still get concatenation with << 1 >> + << 2 >> in the unlikely event you need it. > > I would consider needing to write this (instead of << 1, 2 >>) noise. Nod. Again, I don't think there's a use case for this that matters, but it doesn't hurt anything, so there's no need to change it. > ## Bugs > >> - s3.2, "Herewith I buy" /.../ "gned: Alice & Bob" doesn't match the grammar. I think it needs a +. > > > Good point. > (Mental note: I need to run the examples through the newest edn-abnf…) > > I propose to simply remove the “contract” member of the example, as it is no longer really a compelling way of using comments. > > Now: > https://github.com/cbor-wg/edn-literal/pull/58 +1. > ## Exposition > > ### ABNF > >> - Like others, I don't like the two-level ABNF. I understand that you're going for extensibility, but you can still leave an extension point in place while having a single grammar. > > As an implementers’ note for an implementation that just implements the extensions and constructs that already are in the document, I can follow that view (even if it works out differently in not only my implementation). > Extensibility, however, is important, and I think I have made my point why a little more work for some initial implementations will be worth it for extensibility. > Maybe an implementers’ note always will take the perspective of an implementer, not one of caring about protocol evolution; but some implementers have found that the two-level approach is also easier to implement (and more robust, to boot). I really don't want to crank up an entire other parser instance for each app-string. There's not a *lot* of overhead, but there's enough that it will become noticeable, I think. Therefore, I'm going to interpolate the app-string grammars into my larger grammar. I hope I do it in a way that is similar enough to everyone else that it will interoperate and still allow extensibility correctly. Maybe I'll get it right. >> - (nit) I'd prefer basenumber to be split into 3 or 4 rules, one for each base, since each needs special processing. > > I personally like the more compact way, but I’ll put up a change proposal in a PR. It's not worth the change if others had tools available to them to process 0x 0o and 0b strings (including hex floats) all in one place. Maybe I could have gotten all of them except hexfloats if I'd been willing to use eval(). If we are going to make a change for this, I'd suggest hexfloats be separated from hexints as well, since they are more likely to need custom code. > ### Document structure > >> - The doc structure is quite odd in that it presents the extension points before the main format. I understand that's an outgrowth of how the doc grew over time, but it needs a small refactor before publishing. I'm willing to provide more suggestions or help, if the authors want. > > One comment on another recent document essentially was that we should do a complete roll-up of the documents that constitute our ecosystem — we already did a roll-up for CBOR (RFC 8949) but could may revise that with a target publication date of 2027 (another seven years). > I’d rather do the editorial work then. > Right now the document relies on the reader having read Section 8 of 8949 and Appendix G of 8610; so focusing on the additions first sounds acceptable to me. I *didn't* read 8610 as I sat down to implement this, and didn't understand that it was important to do so from the text. I started work on a CDDL implementation months ago, but never got very far. Nevertheless, I expected CDDL to be completely unrelated to implementing diagnostic notation, and assumed I didn't need that doc. In particular, section 1 says: "This document sets forth a further step of evolution of EDN, and it is intended to serve as a single reference target in specifications that use EDN." >> - s3 could be moved after the ABNF section and the (possibly new) app-string section and make more sense. > > I need to re-read that to have an opinion. > (Moving the ABNF from an appendix to a main section strikes back.) > > It seems to me a forward reference from Section 3 to the rule “app-string” in Section 4.1 would also do it. Let's agree on the previous point before tackling this. >> - s4.2 could be outdented to s5, containing both ABNF and the descriptions of the app-string formats from s2. Having to go back and forth made reading more difficult than it needed to be. > > Makes sense on a napkin. I’ll try to do a PR. > >> - s4.2.1, h'/head/ 63 /contents/ 66 6f 6f' should become << "cfoo" >>, not << "foo" >> if I'm understanding correctly. > > “foo” is a text string of length 3, so it begins with 0x63, followed by 66 6f 6f for the f o o characters. Got it! > (Embedded CBOR can be confusing because of the multiple layers, why it was a great addition to get syntax for that in Appendix G.3 of RFC 8610.) If the comment was a little more than /head/ maybe it would have been more clear? Perhaps /string, length 3/? >> - s6 Security Considerations seems like it could use some more text about how this format isn't intended for interchange. > > Some text (or just key words) would be appreciated. This probably needs to be a separate thread. What I'm most worried about is that we haven't thought of the ways that you could mangle input to a parser constructed from the ABNF to get surprising results. JSON is small enough that we think we've worked through those boundaries (although we did find the issue with U+2028 and U+2029 relatively late). Maybe we should at least do some fuzzing of the ABNF. >> I'm building up a large-ish set of test vectors. I'm willing to put those into a separate repo for sharing if anyone is interested in collaborating. > > Absolutely! > I have a very lazy set of tests (specifically testing out recent changes) as a CSV (first column = for equivalent and - for not equivalent, second column EDN, third column also EDN but different features): > > https://github.com/cabo/edn-abnf/blob/explicit-concatenation/tests/basic.csv > > (This CSV contains carriage returns in some of the CSV-quoted strings; this abuse of CSV might not survive a Windows environment, but the tests will magically still pass :-). > > (This could have been EDN plus hex encoded CBOR data items, but it is so much easier to write the latter in EDN, too…) Thanks, I'll take a look. I have a bunch of strings that are supposed to cause failures. Those probably should be in a separate file. I'll send PRs. — Joe Hildebrand
- [Cbor] draft-ietf-cbor-edn-literals-10 implementa… Joe Hildebrand
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 implem… Joe Hildebrand
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 implem… Joe Hildebrand
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 implem… Carsten Bormann
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 implem… Carsten Bormann
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 implem… Carsten Bormann
- [Cbor] draft-ietf-cbor-edn-literals-10 simple rule Joe Hildebrand
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 simple… Carsten Bormann
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 simple… Joe Hildebrand
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 implem… Carsten Bormann
- [Cbor] Re: draft-ietf-cbor-edn-literals-10 implem… Joe Hildebrand