Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?

Jay Daley <exec-director@ietf.org> Mon, 13 June 2022 14:21 UTC

Return-Path: <exec-director@ietf.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 35048C15AAC4 for <tools-discuss@ietfa.amsl.com>; Mon, 13 Jun 2022 07:21:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.91
X-Spam-Level:
X-Spam-Status: No, score=-6.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 69S5gIP8fjiR for <tools-discuss@ietfa.amsl.com>; Mon, 13 Jun 2022 07:21:33 -0700 (PDT)
Received: from ietfx.amsl.com (ietfx.amsl.com [50.223.129.196]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 58E92C1595E6 for <tools-discuss@ietf.org>; Mon, 13 Jun 2022 07:21:33 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by ietfx.amsl.com (Postfix) with ESMTP id 4E0EC4053E35; Mon, 13 Jun 2022 07:21:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from ietfx.amsl.com ([50.223.129.196]) by localhost (ietfx.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wuxin2R7g44r; Mon, 13 Jun 2022 07:21:33 -0700 (PDT)
Received: from smtpclient.apple (109-196.icannmeeting.org [199.91.196.109]) by ietfx.amsl.com (Postfix) with ESMTPSA id BD5554053E33; Mon, 13 Jun 2022 07:21:32 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
From: Jay Daley <exec-director@ietf.org>
In-Reply-To: <E022AAF289DF04D70F449FF7@PSB>
Date: Mon, 13 Jun 2022 15:21:30 +0100
Cc: tools-discuss@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <49687028-4FF4-44D1-A3D3-79FDF670A5A1@ietf.org>
References: <B39D28F0353AE74800217ADC@PSB> <7EDFAAE2-3109-4D16-BC16-1A47DB365522@ietf.org> <E022AAF289DF04D70F449FF7@PSB>
To: John C Klensin <john-ietf@jck.com>
X-Mailer: Apple Mail (2.3696.100.31)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/44tiIv8jXJrKhkHQrENVn4RLbMA>
Subject: Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jun 2022 14:21:37 -0000

Hi John

From reading your message I think I need to start with a clear taxonomy of the various moving parts here because this is still not clear to most participants, afaict:

1.  All XML languages define what elements and attributes are acceptable within that language and what element can appear inside what other element.  This is what we call RFCXML and what was previously called the xml2rfc vocabulary.  This is more generally called the *grammar*.  The grammar is defined in RFCs.

2.  We have chosen to use another language to formally define the grammar above.  For v1, defined in RFC 2629, the formal definition used a DTD.  For v2, defined in RFC 7749, this was specified in a RelaxNG schema and not a DTD.  For v2, defined in RFC 7991, this was also defined in a RelaxNG schema.  This is more generally called the *schema*. At some point work was put into changing rfc2629.dtd to make it compliant with v2 (possibly even v3) but I believe it does not (and cannot) because of the limitations of DTDs, correctly define the v2 grammar.

3.  There is an XML construct called a *processing instruction* (aka a PI), which is embedded inside an XML document and provides instructions that are to be interpreted by any XML processor.  These are not part of the grammar and therefore cannot be part of the formal definition.  To repeat myself - PIs sit outside of the grammar, they are conceptually similar to escape codes in that respect.  DOCTYPE is a processing instruction, as are <?xml-model…> and <?xml-stylesheet…>..  None of the above RFCs have comprehensively covered PIs - RFC 2629 does not mention them at all, RFC 7749 notes that certain things are set by PIs but does not define them and  RFC 7991 notes that certain grammar changes are intended to deprecate some PIs but doesn’t formally define those.

With that in mind ...

> On 12 Jun 2022, at 18:05, John C Klensin <john-ietf@jck.com> wrote:
> 
> 
> 
> --On Sunday, June 12, 2022 16:43 +0100 Jay Daley
> <exec-director@ietf.org> wrote:
> 
>> Hi John
>> 
>>> On 11 Jun 2022, at 20:29, John C Klensin <john-ietf@jck.com>
>>> wrote:
>>> 
>>> Hi. I have an old document, in xml2rfc v2 format, whose
>>> content I'm trying to upgrade. When I run
>>> xml2rfc DocName.xml --v2
>>> (with version 3.12.7) I get a series of messages that look
>>> like
>>> 
>>> Warning:
>>> 	file:/C:/Users/Klensin/AppData/Local/Programs/Python/Pyt
>>> 	hon37/lib/site-packages/xml2rfc/templates/rfc2629-xhtml.
>>> 	ent is no longer needed as the special processing of
>>> 	non-ASCII characters has been superseded by direct
>>> 	support for non-ASCII characters in RFCXML.
>>> 
>>> The source does not contain any references to
>>> rfc2629-xhtml.ent. It does, of course, contain 
>>> 	<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
>>> as well as the DOCTYPE statement that specifies the DTD
>>> 
>>> Are these warnings actually addressed to something in the DTD
>>> or stylesheet file than should be cleaned out and, if so, how
>>> does that get done?
>> 
>> The DOCTYPE that you have included I am guessing is for
>> rfc2629.dtd which includes by reference rfc2629-xhtml.ent,
>> 
>> The DOCTYPE is superfluous for document validation and has
>> been since v2 of the grammar because v2 is when the change was
>> made from XML Schema (as specified in a DOCTYPE) to RelaxNG.
>> However, it continued to be used partly I think because the
>> templates were never updated to be RelaxNG-aware and partly
>> because it was a convenient way to incorporate the character
>> entities.
> 
> But this is a v2 document whose first version dates to early
> 2017, when v3 was still in its infancy and, IIR, the online
> version of xml2rfc would not yet handle v3. RelaxNG was, again
> IIR, only introduced with RFC 7749 in February 2016, bringing
> the change from a DTD-based definition to a Schema-based one
> with it. Most of the changes described in its Appendix B had
> (as RFC 7749 says) been adopted some years earlier and templates
> adjusted, so that the actual changes needed in 2016 were very
> small and many documents and templates did not require changes
> at all.

That "most of the changes … had been adopted some years earlier" doesn’t change the fact that v2 is defined in RFC 7749 and that uses a RelaxNG schema.  

> I also note that DOCTYPE appears in Section 4 of RFC
> 7749 with language that implies to me that it is required.

No, it’s a PI not part of the grammar.

> 
>> I normally recommend that authors replace the DOCTYPE
>> statement with this:
>> 
>> 	<?xml-model href="rfc7991bis.rnc"?>
>> 
>> (The file referenced can be found at
>> https://raw.githubusercontent.com/ietf-tools/rfcxml-templates-
>> and-schemas/main/rfc7991bis.rnc)
> 
> But that is a piece of version 3 vocabulary (see below).

No, it’s a PI and therefore outside of the v3 vocabulary.  

However, I did make a mistake here, forgetting that you are a v2 user.  The correct PI for a v2 document would be 

	<?xml-model href="rfc7749.rnc"?>

Where that file can be found at https://raw.githubusercontent.com/ietf-tools/legacy-templates-and-schemas/main/rfc7749.rnc

>> Doing this tells any XML processor that this uses the
>> referenced RelaxNG schema and a RelaxNG aware editor will then
>> both validate against this schema and provide schema-aware
>> editing support (such as auto-suggestion and
>> autoi-completion). 
> 
>> However afaict your editor, Epsilon, does not appear to do
>> schema validation of any sort and so neither the statement
>> above nor a DOCTYPE will result in any validation.
> 
> Epsilon (and emacs) are "just" editors. Their modes for
> handling XML are not aware of schema, just such fundamental
> --and essentially lexical-- issues as formatting, element
> matching, and so on. 

As explained by Carsten that is not correct for Emacs.  From my research Epsilon is almost unique in not doing any form of schema validation.  The reason all the others do it appears to be because they use the same underlying open source XML libraries that provide this functionality  

> 
> So I am confused by your explanation and suggestions:
> 
> (1) They would seem to lead to documents that are v2-v3 hybrids.
> I don't know how the current versions of xml2rfc would deal with
> that but, given assorted v2 elements and constructions that were
> deprecated in v3, I'd guess it would be very hard to get right
> and that going in that direction would be a bad idea.

As noted above, PIs are not part of the grammar and so you can have a v2 document that uses new PIs and it is still a v2 document.

> 
> (2) If retaining DOCTYPE, or at least DOCTYPE with those
> definitions, in a v3 document is, as you suggest, obsolete and a
> bad practice, then that should be reflected in the v2v3
> conversion process. However, when I did the conversion
> yesterday, that definition (straight out of the v2 document and
> RFC 7749) is retained unchanged. I presume that should go
> onto the list of bugs in the converter.

Except that it’s not a bug.  Having a DOCTYPE in a v3 document doesn’t stop it from being a v3 document.  What it does is provide an instruction to any XML processor that may be wrong and which it may choose to ignore.  I agree however that a warning from xml2rfc would be helpful.
> 
> 
> More generally, I have no idea what happens behind the scenes
> when I invoke xml2rfc v3.12.7 with "--v2"

https://authors.ietf.org/en/upgrading-from-v2

> but, given the very
> large number of documents in the RFC Editor's collection in v2
> format that, I assume, have not been converted to v3 and tested
> for consistency with the output produced, a decision to retire
> support for v2 should be taken only with great care (and, IMO,
> given the risks and tradeoffs, made only with IESG signoff after
> a community Last Call). Until then, I believe the tools team
> and IETF staff have considerable responsibility for keeping
> version 2 supported. I don't think a few spurious warning
> messages are a big deal unless they are a sign of things to
> come, but, when the answer to a problem with constructions that
> are valid and well-documented under v2 is "convert to v3", that
> is not supporting v2 properly and as the community has a
> reasonable right to expect. 

v2 was officially obsoleted when RFC 7991 was published and RFC 7991 is explicit about that.  Yes a transition process should be supported and it has for six years now, but I disagree that the community has any right to expect that to continue.  It will inevitably get more expensive and complex to support v2.  Having said that, the transition certainly has taken longer than many might expect, which I attribute to the templates that were provided until recently, which still used a number of v2 idioms and did not showcase v3.  The new templates should ease the transition considerably.

> 
> Similar comments apply to your comment about epsilon: my
> expectation is that I can continue to use a text editor to work
> with what are now called RFCXML files in both v2 and v3,
> expecting that validation will come out of the xml2rfc program
> (or, at worst, a competent, LINT-like, validator will be
> provided by the IETF). I also expect those validation processes
> will produce clear, correct, and, where possible, actionable
> warning and error messages. If that ever becomes not the case,
> e.g., if it were expected that people creating documents will
> use an editor with XML Schema validating capability and
> validation responsibility will lie there, that would essentially
> require document authors who wish to work in XML to use such an
> editor. I would assume that, too, would require IESG signoff
> after an IETF Last Call if only because it would increase the
> barriers to entry and participation in the IETF and hence reduce
> the diversity of its active, document-writing, participants.

There’s no expectation that you use a schema-aware editor, but if you do then with the appropriate processing instructions, it will make your work considerably easier.  Your choice though,

Jay

> 
> best,
> john

-- 
Jay Daley
IETF Executive Director
exec-director@ietf.org