[Tools-discuss] Re: Exploring ABNF extracts from RFCs

Carsten Bormann <cabo@tzi.org> Fri, 26 July 2024 12:01 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D6683C18DB8B; Fri, 26 Jul 2024 05:01:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0g--_oHt6W39; Fri, 26 Jul 2024 05:01:13 -0700 (PDT)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2BE0CC18DB8E; Fri, 26 Jul 2024 05:01:10 -0700 (PDT)
Received: from [192.168.217.145] (p5dc5d6c5.dip0.t-ipconnect.de [93.197.214.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4WVmYc0hVhzDCd7; Fri, 26 Jul 2024 14:01:08 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <89a4a566-8ffe-413e-9196-3f08bebe8d20@w3.org>
Date: Fri, 26 Jul 2024 14:01:07 +0200
X-Mao-Original-Outgoing-Id: 743688067.5908279-a46ad60d9e92cb0bfa8dcdb22b7c8f74
Content-Transfer-Encoding: quoted-printable
Message-Id: <99E8B992-AF60-4CD6-9786-2EC180E95E4D@tzi.org>
References: <89a4a566-8ffe-413e-9196-3f08bebe8d20@w3.org>
To: Dominique Hazael-Massieux <dom@w3.org>, art@ietf.org
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Message-ID-Hash: VQ5D4VJV3ANFEJOZ3YIKMPITRYIE2JE5
X-Message-ID-Hash: VQ5D4VJV3ANFEJOZ3YIKMPITRYIE2JE5
X-MailFrom: cabo@tzi.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tools-discuss.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: tools-discuss <tools-discuss@ietf.org>, CBOR <cbor@ietf.org>, abnf-discuss@ietf.org
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Tools-discuss] Re: Exploring ABNF extracts from RFCs
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/cvcLtLVCkYn_qLaAFX4rtKjUvNQ>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Owner: <mailto:tools-discuss-owner@ietf.org>
List-Post: <mailto:tools-discuss@ietf.org>
List-Subscribe: <mailto:tools-discuss-join@ietf.org>
List-Unsubscribe: <mailto:tools-discuss-leave@ietf.org>

Hi Dominique,

> (I hope this list is the appropriate place to bring this project up; let me know if there is a better place)

Not sure, as it both is about tools we use and the specs that make use of these tools, so I have added art@ietf.org and a few more lists and will do a full cite in here.

> Inspired by some of the work happening in the W3C community around extracting, curating and publishing reusable data and code from Web specifications [1], I have been exploring what this might look like for RFCs.

This is much-needed work!
I have started a much simpler effort that does similar things for CDDL [c1] [c4], so I think we will be able to compare our approaches and maybe find some common way to deal with “exportable” from specifications.

[c1]: https://www.ietf.org/archive/id/draft-bormann-cbor-rfc-cddl-models-03.html

> I had initially planned to present and continue that exploration during the IETF 120 hackathon [2], but didn't realize the hackathon start was moved to Saturday, so ended up working on the project on my own.
> 
> The current state of my exploration is visible in two github repos:
>  https://github.com/dontcallmedom/rfcref/tree/main/abnf
>  https://github.com/dontcallmedom/ABNFary
> 
> I have in particular focused on extracting ABNF from (some) RFCs (re-using aex on pre-XML RFCs, and a simpler XML-extraction script on XML RFCs), and crucially, identifying the dependencies that exist between ABNF fragments (where e.g. the URI production defined in RFC3986 gets "imported" into e.g. RFC9110):
>  https://github.com/dontcallmedom/rfcref/tree/main/abnf/dependencies

I put up extracted sourcecode from the 8650+ RFCs [c7] and the XML I-Ds [c8], which already has turned out to be useful for a lot of applications but needs some additional care and more frequent updates; I hope something like this becomes rsyncable content in the near future.

[c7]: https://user.informatik.uni-bremen.de/~cabo/rfc/
[c8]: https://user.informatik.uni-bremen.de/~cabo/i-d/

> With these dependencies semi-manually detected and documented, a (node.js) tool I wrote allows to build consolidated ABNF files that are guaranteed to have all their rules defined:
>  https://github.com/dontcallmedom/rfcref/tree/main/abnf/consolidated

This is very much what we need to make ABNF more prominent in building (and not just specifying) our applications.

At some point we might want to have a syntax where a new document’s ABNF can point to another document’s ABNF and explicitly import from that.  CDDL is gaining such a syntax [c2], and a similar syntax for ABNF should be easy to construct (and would actually become part of the CDDL ecosystem as CDDL includes ABNF).

[c2]: https://www.ietf.org/archive/id/draft-ietf-cbor-cddl-modules-02.html

A rough snapshot of a consolidated set of CDDL modules is part of the cddlc tool [c3], you may specifically want to look at [c4].

[c3]: https://github.com/cabo/cddlc
[c4]: https://github.com/cabo/cddlc/tree/master/data

> (I identified a couple of errata in the process of consolidating some of these RFCs [3])

Indeed, not all RFCs are providing good exportable ABNF (or even CDDL); e.g., some reuse the same rule name with different right hand sides…

> Finally, a built a simple Web interface on top of the said consolidated ABNF that allows to see interlinked ABNF rules, with railroad diagrams illustrating the "endemic" rules of a given RFC:
>  https://dontcallmedom.github.io/ABNFary/?num=9110
> 
> The broader motivation for that project is to enable easier and broader usage of ABNF definitions (and more generally, data/code provided in RFCs); the similar effort in W3C I mentioned [1] has found impactful re-use in implementation conformance and interop testing, developer documentation, IDEs, libraries, fuzz-testing, spec authoring tools, spec validation.
> 
> Although the errata I filed show potential, I'm seeking input and feedback on whether this project (and in particular its current focus on ABNF) is likely to provide opportunities for similar re-use.
> 
> And if so, I'm also interested to hear what would be the best way to make this a collaborative effort that would align best with the IETF community expectations and processes.
> 
> (I'm about to disappear for a few weeks, but thought I would start this discussion while the project is still fresh in my mind)

I think that this is a great discussion point for the next interim of the CBOR WG (August 21?  I think we’ll set that date in the CBOR meeting today).  The WG is in the process of completing [c2]; we certainly want to be able to make this as useful as possible with the existing body of specs and probably need similar tooling.

Apart from ABNF and CDDL, the third spec language we probably want to think about is YANG [c5]; this has yangcatalog.org [c6], but also a lot of datatracker and IANA support.  With YANG, there has been much more attention to making models referenceable, so we might learn a bit from them as well.

[c5]: https://www.rfc-editor.org/rfc/rfc7950.html
[c6]: https://www.yangcatalog.org/

Grüße, Carsten

> Dom
> 
> 1. https://github.com/w3c/webref/
> 2. https://wiki.ietf.org/en/meeting/120/hackathon#collecting-consolidated-abnf-from-rfcs
> 3. https://www.rfc-editor.org/errata/eid8040 https://www.rfc-editor.org/errata/eid8039
> 
> -----------------------------------------------
> Tools-discuss mailing list -- tools-discuss@ietf.org
> To unsubscribe send an email to tools-discuss-leave@ietf.org
> https://mailarchive.ietf.org/arch/browse/tools-discuss/