Re: [Json] [rfc-i] sourcecode type="json"

Carsten Bormann <cabo@tzi.org> Wed, 27 October 2021 06:21 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B421D3A0A87 for <json@ietfa.amsl.com>; Tue, 26 Oct 2021 23:21:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tlbL1wkFdixY for <json@ietfa.amsl.com>; Tue, 26 Oct 2021 23:21:05 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [134.102.50.15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AE7D53A0970 for <json@ietf.org>; Tue, 26 Oct 2021 23:21:04 -0700 (PDT)
Received: from smtpclient.apple (p5089a10c.dip0.t-ipconnect.de [80.137.161.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4HfJV51GvWz2xLG; Wed, 27 Oct 2021 08:21:01 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20211027003607.5C4472E60559@ary.qy>
Date: Wed, 27 Oct 2021 08:21:00 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <86844357-A8C7-4590-B8DC-D801E223A60A@tzi.org>
References: <20211027003607.5C4472E60559@ary.qy>
To: rfc-interest <rfc-interest@rfc-editor.org>, json@ietf.org
X-Mailer: Apple Mail (2.3654.120.0.1.13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/_FgQkgb9_3T1QUVGXXBuLbJ7xec>
Subject: Re: [Json] [rfc-i] sourcecode type="json"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Oct 2021 06:21:11 -0000

Thank you all for your great feedback.

> On 27. Oct 2021, at 02:36, John Levine <johnl@taugh.com> wrote:
> 
> Right.  We have lots of sourcecode which is a chunk of a program, not a full program.
> 
> If it's a chunk of JSON, call it JSON.

That is certainly true of e.g. C language — there is no expectation that C language in a sourcecode block is a complete program.

The question is really why are we marking up sourcecode as to its type in the first place.

One need would be for rendering.
But amazingly, it’s 2021 and we don’t yet have syntax coloring in RFCs.
This, however, could be added from the information that is in the XML today; full JSON texts and JSON fragments probably can be handled by the same coloring engine.
(The coloring would be heuristic during rendering, not embedded in the XML; only the type information would be needed to select the appropriate heuristics.)

With FDT languages such as ABNF or CDDL, extraction is needed; since some ABNF/CDDL is just for exposition, extraction needs the distinction between that and the normative ABNF/CDDL; in recent RFCs that has sometimes been done by using the @name attribute alongside @type.

Apart from that, the needs I’m more interested in are as a support for authoring.
This is mostly based on extraction, but also on the ability to run CI processes on the sourcecode extracted (per-block and between them).
Additional metadata may be required as input to e.g. a validation process; YANG puts that into the “<CODE BEGINS>” line.

The specific question came up because I was adding some CI to the SDF spec repo.
Inevitably, that made me find one typo, which although not obscuring the intention, if undetected would sooner or later have led to an errata report.
I expect authoring tools like kramdown-rfc will provide some of this validation as a matter of course, but that requires metadata.
Of course, these don’t *have* to be saved in the XML, but for practical reasons much of the CI processing operates on the XML output.

So I’d rather have a way to embed metadata that is required for automatic processing (extraction, validation) in the XML.
This would also help with maintaining validation through the RFC production.

My current CI code has to guess whether type=json means JSON text or JSON fragment (where the latter is well-defined for SDF, but maybe not in general).
My concern is less that this adds 5 lines of code to the now 19-line validation script.
I’d rather not require that a validation step needs to guess the author’s intention.

(The next step then is to add CDDL validation for the JSON texts and fragments.
Additional metadata needed here is: which specific CDDL rule is intended to govern that text/fragment.
I’m not too happy if I need to cram that into @name, but it looks that way.) 

Grüße, Carsten