[Cbor] Re: [art] Re: [Tools-discuss] Exploring ABNF extracts from RFCs

Paul Kyzivat <pkyzivat@alum.mit.edu> Fri, 26 July 2024 16:10 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B8B1EC1D5319; Fri, 26 Jul 2024 09:10:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.009
X-Spam-Level:
X-Spam-Status: No, score=-7.009 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=alum.mit.edu
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bWQxP1_Uc2xY; Fri, 26 Jul 2024 09:10:03 -0700 (PDT)
Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2122.outbound.protection.outlook.com [40.107.93.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E3CBC15107A; Fri, 26 Jul 2024 09:09:47 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=UEkccBybG/PFc1V0OZxuIKCJ9jidYpYPQD/tdmRSQoUwJ4bD1Aqil9P7NHFaIzK4Kpt6pwDDkyVFR/0+EcxjK73XSuN2NINKKZ+dcwno6EydqYUTiuSYEi/UM+45gouRwkxZNzd7h+q+salUOPYf3jIKZ0F2Mh6Vo5K885cxfLXNbPqhmp6YFuqWWtLRrENXGIiwdvRKDtUaMmVbcx1n7IBEXQkrI+AdXgI0ymq4R3QKZK1HYgXy1D1hDMOew+CjeL/Xwnz6q+eSzt5vbtKZkwPwHxrcjSavmIzrUDCRmFQpqs2EFZUJrHX11WTx6ljs+ledc1SEXXZTGSoJnhVw8A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IVJnwEgh+8TVuOccBq8SX3g93Q+rv6YgaFVNkkF/hGA=; b=iREWIZAv/rDDDMCQQqNHhpBziLdb9pYbaALylWhedGNt3eG28ICD4oNRZKxhcor7y3InbxBBwgrjEJ54Djz5vqP04cUx2mg7l25pEDt3RfSXkBziURc8Q3XrO04s8qs51+xYDdqcfFgvfm42FpJBzficUGN9W/qXHBgkt2/HLM3+VXOGqFVKu8d2DDWxfig1/ZIqW1V+XCToKBYDLAe7H+gUSINH5eg1I3WV7CGsJQlKg9yXqnhq/NooBPGMdmCkQSKUGyU9mfy3e2acbRWlMAVop5ZnqgHMzWaBPvA75XzbjtU7K85uIv3RRo/BSgwuNhdn1OAD8d1Ph3EYD/n9GQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 18.7.68.33) smtp.rcpttodomain=ietf.org smtp.mailfrom=alum.mit.edu; dmarc=pass (p=none sp=none pct=100) action=none header.from=alum.mit.edu; dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alum.mit.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IVJnwEgh+8TVuOccBq8SX3g93Q+rv6YgaFVNkkF/hGA=; b=EniVfGfqSGZKmX86IpO78my/FLgqTbqBTPFqomYT1xvcSxBcAMt8Wdczgy07ynjd0mmAo1tzMEzSqPW4KNHekqLMVGY/VnMfT/bfOpMz6UKGr/BbOuFbNSI82roWbcM2MHtalhtSMaxum+CzYtFkBMEf8S8k0I+3LAGU7Jql5Nc=
Received: from SJ0PR03CA0347.namprd03.prod.outlook.com (2603:10b6:a03:39c::22) by LV2PR12MB5942.namprd12.prod.outlook.com (2603:10b6:408:171::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.29; Fri, 26 Jul 2024 16:09:45 +0000
Received: from SJ1PEPF000023D0.namprd02.prod.outlook.com (2603:10b6:a03:39c:cafe::29) by SJ0PR03CA0347.outlook.office365.com (2603:10b6:a03:39c::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.29 via Frontend Transport; Fri, 26 Jul 2024 16:09:44 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 18.7.68.33) smtp.mailfrom=alum.mit.edu; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=alum.mit.edu;
Received-SPF: Pass (protection.outlook.com: domain of alum.mit.edu designates 18.7.68.33 as permitted sender) receiver=protection.outlook.com; client-ip=18.7.68.33; helo=outgoing-alum.mit.edu; pr=C
Received: from outgoing-alum.mit.edu (18.7.68.33) by SJ1PEPF000023D0.mail.protection.outlook.com (10.167.244.4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.11 via Frontend Transport; Fri, 26 Jul 2024 16:09:44 +0000
Received: from [192.168.1.52] (c-76-19-71-248.hsd1.ma.comcast.net [76.19.71.248]) (authenticated bits=0) (User authenticated as pkyzivat@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.14.7/8.12.4) with ESMTP id 46QG9eKi022411 (version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 bits=128 verify=NOT); Fri, 26 Jul 2024 12:09:42 -0400
Message-ID: <2867abac-8c62-41b2-a20c-cb9fe8d3736d@alum.mit.edu>
Date: Fri, 26 Jul 2024 12:09:40 -0400
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-US
To: Carsten Bormann <cabo@tzi.org>, Dominique Hazael-Massieux <dom@w3.org>, art@ietf.org
References: <89a4a566-8ffe-413e-9196-3f08bebe8d20@w3.org> <99E8B992-AF60-4CD6-9786-2EC180E95E4D@tzi.org>
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
In-Reply-To: <99E8B992-AF60-4CD6-9786-2EC180E95E4D@tzi.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: SJ1PEPF000023D0:EE_|LV2PR12MB5942:EE_
X-MS-Office365-Filtering-Correlation-Id: 883797fe-3787-40e7-9ecd-08dcad8d5d26
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|4022899009|376014|82310400026|41320700013|36860700013;
X-Microsoft-Antispam-Message-Info: ZsvUzof+VavJuVpKQQihG2zxAeYnFqQXXXlixrXuemISxwS9XiceigDtcWAf26qq5WKctPkqypS5CrSlg1RTbf9YetRVrrnNQgJ+ujMA2btqhUWGXx6WzNgobPcahlCPXofx7DmO5q1iEdfhVYnZrv0NsoKH6K2kpVVCjd1Q9GbFfyPjNA3XzKVz2AseUZELqWu2CkkNvVYRVHPjeZjXMaDraVNZ/4ooOdlirlqKnX+nhrW8/tsPFI3tShgi6VU2PvgZrhAPw7SDdtDdh5cOAA2LlaCLYvoiBcAQlPpZjATaiK5IrxMupvG0+l4sfEL6Ig3AY3FKJ98Vtafj33u3I2sz7zn5lqllyjTaISdGzTdFNTDcITQ6uJUhUHVH/1WUtDBdt3JVLaksJogDxePWRWPndcDc0a6XnIAQJ1fjJoyQ/zzodMgMDqMC61RdW67FOUlc+yGI/e9rHA/DKbtFML/BMnKCuXC/LzaLP4XEs8t93UcFoo8xHfvAnNYIqDoewUseMp58tOihn3N2676V0RaR6CO41PAA4CpPiUjTBS4XtU0xlhg/j5e5dJmC30T8vghR+Kzewf+14x7iSLoyXrGDU+etAd1p9RvtXt/Zq8oaluIe8AgNryqiwlqmJ6K0yE6mtVQXT1Bf8LTj0gVN1BtOl5yYHAhAGYc0RznYvBQmoDO/0W7PLTvMzEfOD7ms0Qqd9Oy+NVsdhd7qkLOQtv7syhKtY3+EhrQ9Oq18G81N6zjVj3cZaDdTLQfzmQUvka+7JPi2d4QpyaKHdwOLYenXcfAblcB2Pcae+cu9lb/85uSTMk7Y+P5Z4Gz+qEwgPC8whQ8fpoKcMoRFyStyaOt69Jr/MnuR3cOLmXumQE0I2M/DVGKvj8sO4M/3tzeAln/lmYXO3AD5CEkUJjfwBuAWIbMq1t2WzNu7jttUyw2v3jRZlNByrSrQnUw5uD80WcDIyJI+TIIUwyWbPxZXTo74b5W8thUKgltRngpersYiK7gANWXou3iqpARe9I0ReeGZLJeVtnTVaGfT0CHRR4xe5XpFPK2FlzYzcXmjlxaBDTiZ0W4sAGFHD9iTZgz91ldTsR9sFgYaoJiRrIbtdD1hFFmctxQhFDnvBlhRlrK8o2LlzVj1ZXwc+18mO2bpKtPgfVjtlpajIZtyd7qRMwcDoQI0bz4TlXROXcKqsVp0H77pSZ91prsuu0zP36aVPsQaVyTJ0Zr/lznfN+BotrM1MqXVkcspOdvRW+UtvFpdKVVipnhBMV4AQVgG17bf0T5gdc4lZZA09o2ndZCEL8r8XGIRkwIAe8Mpi/u0U6Ljj5sEqNXDRfyN+ePfBSrU+F8NXQAC0P3Oae3M/jD6amHhkZZe/VmiHviIjcyCuQuM72P9fdPfM4dKW+E3P/O+54LPkEUT2acxC4fXPB2aGw==
X-Forefront-Antispam-Report: CIP:18.7.68.33;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:outgoing-alum.mit.edu;PTR:outgoing-alum.mit.edu;CAT:NONE;SFS:(13230040)(1800799024)(4022899009)(376014)(82310400026)(41320700013)(36860700013);DIR:OUT;SFP:1102;
X-OriginatorOrg: alum.mit.edu
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Jul 2024 16:09:44.1292 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 883797fe-3787-40e7-9ecd-08dcad8d5d26
X-MS-Exchange-CrossTenant-Id: 3326b102-c043-408b-a990-b89e477d582f
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3326b102-c043-408b-a990-b89e477d582f;Ip=[18.7.68.33];Helo=[outgoing-alum.mit.edu]
X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF000023D0.namprd02.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5942
Message-ID-Hash: ARNM67C4YGX4VQZACAEK67FAP5EOFC36
X-Message-ID-Hash: ARNM67C4YGX4VQZACAEK67FAP5EOFC36
X-MailFrom: pkyzivat@alum.mit.edu
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: tools-discuss <tools-discuss@ietf.org>, CBOR <cbor@ietf.org>, abnf-discuss@ietf.org
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Cbor] Re: [art] Re: [Tools-discuss] Exploring ABNF extracts from RFCs
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/u2tfJpGcHeALnaE9Z8cET-8WibY>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

This is great work!

Long ago (starting around 2015) there were discussions about how to 
formalize dependencies(an "import" mechanism) between abnf fragments in 
different RFCs (and drafts). Ultimately they didn't get traction and 
petered out.

Extracting abnf with all dependencies resolved is useful for multiple 
purposes. Of note are:
1) to import into a formal verifier / syntax checker.
2) for import into tools that can generate a parser

I find (1) more compelling than (2). An implementer typically doesn't 
need to do the extraction very often, so doing it manually, while 
annoying, isn't a huge burden. But (2) should, in principle, be done 
every time a draft containing abnf is submitted. I have done this 
several times while acting as a reviewer. But for some docs the burden 
of doing it means it is rarely done. Instead the abnf is simply 
"eyeballed". A tool that could do it for every draft submission would be 
hugely beneficial.

There are interesting open issues around abnf dependencies:

* If I want to use a rule defined in another document, do I add to my 
abnf *all* of the foreign abnf that contains the rule definitions I 
need? Or do I only import the specific rules I need and any other rules 
those depend on?

* What to do if the imported abnf has rule names that duplicate ones I 
have defined in my own abnf? Should the abnf of each document live in a 
distinct namespace, linked only for the cross referencing rule names? Or 
should the writer of abnf with dependencies be responsible for avoiding 
naming conflicts?

* How to deal with the different ways abnf is used in documents. E.g.,

- the simple case: the document contains a single block of abnf defining 
the syntax for the protocol that is the subject of the document.

- the doc has several blocks of abnf, each defining a separate syntax. 
Rule names may be reused, with different syntax, in each.

- the doc contains (possibly incomplete) fragments of abnf scattered 
through the doc text, plus a complete consolidated abnf syntax that 
likely duplicates some or all of the fragments.

- the doc contains a complete abnf syntax that is broken into fragments 
interspersed with text descriptions.

My interest in this came from working on sip for many years. It has a 
root document RFC3261, and many extension & revision documents. There 
are extensive dependencies among these documents. The manner of 
expressing dependencies evolved over time. E.g.,

- Simple text in the doc prior to the abnf. E.g.,: "The rules foo, bar, 
and baz are defined in RFCnnn".

- Saying (or simply assuming) that all the Core Rules from RFC5234 are 
available.

- Stating the dependency in comment in the abnf

- Stating the dependency in an abnf prose-val. E.g.,
     foo = <foo as defined in RFCnnn>

The latter is now my recommended best practice. It results in an abnf 
that can be fully syntax checked. Ideally we would formalize the syntax 
within the prose-val so it can be mechanically processed. Or define a 
new syntax for import, and possibly for namespaces.

	Thanks,
	Paul

On 7/26/24 8:01 AM, Carsten Bormann wrote:
> Hi Dominique,
> 
>> (I hope this list is the appropriate place to bring this project up; let me know if there is a better place)
> 
> Not sure, as it both is about tools we use and the specs that make use of these tools, so I have added art@ietf.org and a few more lists and will do a full cite in here.
> 
>> Inspired by some of the work happening in the W3C community around extracting, curating and publishing reusable data and code from Web specifications [1], I have been exploring what this might look like for RFCs.
> 
> This is much-needed work!
> I have started a much simpler effort that does similar things for CDDL [c1] [c4], so I think we will be able to compare our approaches and maybe find some common way to deal with “exportable” from specifications.
> 
> [c1]: https://www.ietf.org/archive/id/draft-bormann-cbor-rfc-cddl-models-03.html
> 
>> I had initially planned to present and continue that exploration during the IETF 120 hackathon [2], but didn't realize the hackathon start was moved to Saturday, so ended up working on the project on my own.
>>
>> The current state of my exploration is visible in two github repos:
>>   https://github.com/dontcallmedom/rfcref/tree/main/abnf
>>   https://github.com/dontcallmedom/ABNFary
>>
>> I have in particular focused on extracting ABNF from (some) RFCs (re-using aex on pre-XML RFCs, and a simpler XML-extraction script on XML RFCs), and crucially, identifying the dependencies that exist between ABNF fragments (where e.g. the URI production defined in RFC3986 gets "imported" into e.g. RFC9110):
>>   https://github.com/dontcallmedom/rfcref/tree/main/abnf/dependencies
> 
> I put up extracted sourcecode from the 8650+ RFCs [c7] and the XML I-Ds [c8], which already has turned out to be useful for a lot of applications but needs some additional care and more frequent updates; I hope something like this becomes rsyncable content in the near future.
> 
> [c7]: https://user.informatik.uni-bremen.de/~cabo/rfc/
> [c8]: https://user.informatik.uni-bremen.de/~cabo/i-d/
> 
>> With these dependencies semi-manually detected and documented, a (node.js) tool I wrote allows to build consolidated ABNF files that are guaranteed to have all their rules defined:
>>   https://github.com/dontcallmedom/rfcref/tree/main/abnf/consolidated
> 
> This is very much what we need to make ABNF more prominent in building (and not just specifying) our applications.
> 
> At some point we might want to have a syntax where a new document’s ABNF can point to another document’s ABNF and explicitly import from that.  CDDL is gaining such a syntax [c2], and a similar syntax for ABNF should be easy to construct (and would actually become part of the CDDL ecosystem as CDDL includes ABNF).
> 
> [c2]: https://www.ietf.org/archive/id/draft-ietf-cbor-cddl-modules-02.html
> 
> A rough snapshot of a consolidated set of CDDL modules is part of the cddlc tool [c3], you may specifically want to look at [c4].
> 
> [c3]: https://github.com/cabo/cddlc
> [c4]: https://github.com/cabo/cddlc/tree/master/data
> 
>> (I identified a couple of errata in the process of consolidating some of these RFCs [3])
> 
> Indeed, not all RFCs are providing good exportable ABNF (or even CDDL); e.g., some reuse the same rule name with different right hand sides…
> 
>> Finally, a built a simple Web interface on top of the said consolidated ABNF that allows to see interlinked ABNF rules, with railroad diagrams illustrating the "endemic" rules of a given RFC:
>>   https://dontcallmedom.github.io/ABNFary/?num=9110
>>
>> The broader motivation for that project is to enable easier and broader usage of ABNF definitions (and more generally, data/code provided in RFCs); the similar effort in W3C I mentioned [1] has found impactful re-use in implementation conformance and interop testing, developer documentation, IDEs, libraries, fuzz-testing, spec authoring tools, spec validation.
>>
>> Although the errata I filed show potential, I'm seeking input and feedback on whether this project (and in particular its current focus on ABNF) is likely to provide opportunities for similar re-use.
>>
>> And if so, I'm also interested to hear what would be the best way to make this a collaborative effort that would align best with the IETF community expectations and processes.
>>
>> (I'm about to disappear for a few weeks, but thought I would start this discussion while the project is still fresh in my mind)
> 
> I think that this is a great discussion point for the next interim of the CBOR WG (August 21?  I think we’ll set that date in the CBOR meeting today).  The WG is in the process of completing [c2]; we certainly want to be able to make this as useful as possible with the existing body of specs and probably need similar tooling.
> 
> Apart from ABNF and CDDL, the third spec language we probably want to think about is YANG [c5]; this has yangcatalog.org [c6], but also a lot of datatracker and IANA support.  With YANG, there has been much more attention to making models referenceable, so we might learn a bit from them as well.
> 
> [c5]: https://www.rfc-editor.org/rfc/rfc7950.html
> [c6]: https://www.yangcatalog.org/
> 
> Grüße, Carsten
> 
>> Dom
>>
>> 1. https://github.com/w3c/webref/
>> 2. https://wiki.ietf.org/en/meeting/120/hackathon#collecting-consolidated-abnf-from-rfcs
>> 3. https://www.rfc-editor.org/errata/eid8040 https://www.rfc-editor.org/errata/eid8039
>>
>> -----------------------------------------------
>> Tools-discuss mailing list -- tools-discuss@ietf.org
>> To unsubscribe send an email to tools-discuss-leave@ietf.org
>> https://mailarchive.ietf.org/arch/browse/tools-discuss/
> 
> _______________________________________________
> art mailing list -- art@ietf.org
> To unsubscribe send an email to art-leave@ietf.org