Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?

John C Klensin <john-ietf@jck.com> Mon, 13 June 2022 15:42 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 02788C15AAEF; Mon, 13 Jun 2022 08:42:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.907
X-Spam-Level:
X-Spam-Status: No, score=-1.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ljd8qCKdhvwJ; Mon, 13 Jun 2022 08:42:22 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 70D3AC15AAF7; Mon, 13 Jun 2022 08:42:21 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1o0mCp-0008Pu-LX; Mon, 13 Jun 2022 11:42:19 -0400
Date: Mon, 13 Jun 2022 11:42:14 -0400
From: John C Klensin <john-ietf@jck.com>
To: Jay Daley <exec-director@ietf.org>
cc: tools-discuss@ietf.org
Message-ID: <280CEA676989D89FF1789101@PSB>
In-Reply-To: <49687028-4FF4-44D1-A3D3-79FDF670A5A1@ietf.org>
References: <B39D28F0353AE74800217ADC@PSB> <7EDFAAE2-3109-4D16-BC16-1A47DB365522@ietf.org> <E022AAF289DF04D70F449FF7@PSB> <49687028-4FF4-44D1-A3D3-79FDF670A5A1@ietf.org>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/kPtZIG62B1cXGBz3ln-9vw5VwnU>
Subject: Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jun 2022 15:42:26 -0000

Jay,

This is very helpful even thought I think, for the person who is
trying to get work done rather than being steeped in the theory,
the vocabulary - schema and PI distinctions (including the
additional distinction in Carsten's recent note) might be
considered fussy details.  I don't believe (but have not gone
back and checked) that any of the definitions say something
equivalent to "don't bother to try to understand this without
first becoming very familiar with the details of XML terminology
and distinctions".

As to the transition, you may reasonably disagree either about
the principle or about the boundary point but, from my
perspective, the purpose of tools like the RFCXML definition and
xml2rfc is to let people contributing to the IETF to get work
done without unreasonable barriers (including sorting out of
working around significant bugs).  From that perspective, it
would be a grave disservice to the community to declare v2
support at an end until xml2rfc is actually stable and
relatively problem-free.  As long as people, such as Carsten,
who have been paying far more attention to this than I have, are
mentioning hundreds of "issues" as significant or making
comments about keeping the newer versions of things alive, we
are not near "relatively problem-free" yet.

Finally and most important, we should all remember that there
has never been a publicly stated expectation that everyone in
the IETF who might need to write a document or use other tools
will be on this list.  I have no way to know, but I assume that
only a very small fraction of those contributors are watching
the list carefully.   Information such as that below --if you
and others think it is as important as it appears to be-- should
be on an easily found web page somewhere and the community
pointed to it, including in a normative reference from the
revised vocabulary document.

thanks,
   john


--On Monday, June 13, 2022 15:21 +0100 Jay Daley
<exec-director@ietf.org> wrote:

> Hi John
> 
> From reading your message I think I need to start with a clear
> taxonomy of the various moving parts here because this is
> still not clear to most participants, afaict:
> 
> 1.  All XML languages define what elements and attributes are
> acceptable within that language and what element can appear
> inside what other element.  This is what we call RFCXML and
> what was previously called the xml2rfc vocabulary.  This is
> more generally called the *grammar*.  The grammar is defined
> in RFCs.
> 
> 2.  We have chosen to use another language to formally define
> the grammar above.  For v1, defined in RFC 2629, the formal
> definition used a DTD.  For v2, defined in RFC 7749, this was
> specified in a RelaxNG schema and not a DTD.  For v2, defined
> in RFC 7991, this was also defined in a RelaxNG schema.  This
> is more generally called the *schema*. At some point work was
> put into changing rfc2629.dtd to make it compliant with v2
> (possibly even v3) but I believe it does not (and cannot)
> because of the limitations of DTDs, correctly define the v2
> grammar.
> 
> 3.  There is an XML construct called a *processing
> instruction* (aka a PI), which is embedded inside an XML
> document and provides instructions that are to be interpreted
> by any XML processor.  These are not part of the grammar and
> therefore cannot be part of the formal definition.  To repeat
> myself - PIs sit outside of the grammar, they are conceptually
> similar to escape codes in that respect.  DOCTYPE is a
> processing instruction, as are <?xml-model…> and
> <?xml-stylesheet…>..  None of the above RFCs have
> comprehensively covered PIs - RFC 2629 does not mention them
> at all, RFC 7749 notes that certain things are set by PIs but
> does not define them and  RFC 7991 notes that certain grammar
> changes are intended to deprecate some PIs but doesn't
> formally define those.
> 
> With that in mind ...
> 
>> On 12 Jun 2022, at 18:05, John C Klensin <john-ietf@jck.com>
>> wrote:
>> 
>> 
>> 
>> --On Sunday, June 12, 2022 16:43 +0100 Jay Daley
>> <exec-director@ietf.org> wrote:
>> 
>>> Hi John
>>> 
>>>> On 11 Jun 2022, at 20:29, John C Klensin <john-ietf@jck.com>
>>>> wrote:
>>>> 
>>>> Hi. I have an old document, in xml2rfc v2 format, whose
>>>> content I'm trying to upgrade. When I run
>>>> xml2rfc DocName.xml --v2
>>>> (with version 3.12.7) I get a series of messages that look
>>>> like
>>>> 
>>>> Warning:
>>>> 	file:/C:/Users/Klensin/AppData/Local/Programs/Python/Pyt
>>>> 	hon37/lib/site-packages/xml2rfc/templates/rfc2629-xhtml.
>>>> 	ent is no longer needed as the special processing of
>>>> 	non-ASCII characters has been superseded by direct
>>>> 	support for non-ASCII characters in RFCXML.
>>>> 
>>>> The source does not contain any references to
>>>> rfc2629-xhtml.ent. It does, of course, contain 
>>>> 	<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
>>>> as well as the DOCTYPE statement that specifies the DTD
>>>> 
>>>> Are these warnings actually addressed to something in the
>>>> DTD or stylesheet file than should be cleaned out and, if
>>>> so, how does that get done?
>>> 
>>> The DOCTYPE that you have included I am guessing is for
>>> rfc2629.dtd which includes by reference rfc2629-xhtml.ent,
>>> 
>>> The DOCTYPE is superfluous for document validation and has
>>> been since v2 of the grammar because v2 is when the change
>>> was made from XML Schema (as specified in a DOCTYPE) to
>>> RelaxNG. However, it continued to be used partly I think
>>> because the templates were never updated to be RelaxNG-aware
>>> and partly because it was a convenient way to incorporate
>>> the character entities.
>> 
>> But this is a v2 document whose first version dates to early
>> 2017, when v3 was still in its infancy and, IIR, the online
>> version of xml2rfc would not yet handle v3. RelaxNG was, again
>> IIR, only introduced with RFC 7749 in February 2016, bringing
>> the change from a DTD-based definition to a Schema-based one
>> with it. Most of the changes described in its Appendix B had
>> (as RFC 7749 says) been adopted some years earlier and
>> templates adjusted, so that the actual changes needed in 2016
>> were very small and many documents and templates did not
>> require changes at all.
> 
> That "most of the changes … had been adopted some years
> earlier" doesn't change the fact that v2 is defined in RFC
> 7749 and that uses a RelaxNG schema.  
> 
>> I also note that DOCTYPE appears in Section 4 of RFC
>> 7749 with language that implies to me that it is required.
> 
> No, it's a PI not part of the grammar.
> 
>> 
>>> I normally recommend that authors replace the DOCTYPE
>>> statement with this:
>>> 
>>> 	<?xml-model href="rfc7991bis.rnc"?>
>>> 
>>> (The file referenced can be found at
>>> https://raw.githubusercontent.com/ietf-tools/rfcxml-template
>>> s- and-schemas/main/rfc7991bis.rnc)
>> 
>> But that is a piece of version 3 vocabulary (see below).
> 
> No, it's a PI and therefore outside of the v3 vocabulary.  
> 
> However, I did make a mistake here, forgetting that you are a
> v2 user.  The correct PI for a v2 document would be 
> 
> 	<?xml-model href="rfc7749.rnc"?>
> 
> Where that file can be found at
> https://raw.githubusercontent.com/ietf-tools/legacy-templates-
> and-schemas/main/rfc7749.rnc
> 
>>> Doing this tells any XML processor that this uses the
>>> referenced RelaxNG schema and a RelaxNG aware editor will
>>> then both validate against this schema and provide
>>> schema-aware editing support (such as auto-suggestion and
>>> autoi-completion). 
>> 
>>> However afaict your editor, Epsilon, does not appear to do
>>> schema validation of any sort and so neither the statement
>>> above nor a DOCTYPE will result in any validation.
>> 
>> Epsilon (and emacs) are "just" editors. Their modes for
>> handling XML are not aware of schema, just such fundamental
>> --and essentially lexical-- issues as formatting, element
>> matching, and so on. 
> 
> As explained by Carsten that is not correct for Emacs.  From
> my research Epsilon is almost unique in not doing any form of
> schema validation.  The reason all the others do it appears to
> be because they use the same underlying open source XML
> libraries that provide this functionality  
> 
>> 
>> So I am confused by your explanation and suggestions:
>> 
>> (1) They would seem to lead to documents that are v2-v3
>> hybrids. I don't know how the current versions of xml2rfc
>> would deal with that but, given assorted v2 elements and
>> constructions that were deprecated in v3, I'd guess it would
>> be very hard to get right and that going in that direction
>> would be a bad idea.
> 
> As noted above, PIs are not part of the grammar and so you can
> have a v2 document that uses new PIs and it is still a v2
> document.
> 
>> 
>> (2) If retaining DOCTYPE, or at least DOCTYPE with those
>> definitions, in a v3 document is, as you suggest, obsolete
>> and a bad practice, then that should be reflected in the v2v3
>> conversion process. However, when I did the conversion
>> yesterday, that definition (straight out of the v2 document
>> and RFC 7749) is retained unchanged. I presume that should go
>> onto the list of bugs in the converter.
> 
> Except that it's not a bug.  Having a DOCTYPE in a v3
> document doesn't stop it from being a v3 document.  What it
> does is provide an instruction to any XML processor that may
> be wrong and which it may choose to ignore.  I agree however
> that a warning from xml2rfc would be helpful.
>> 
>> 
>> More generally, I have no idea what happens behind the scenes
>> when I invoke xml2rfc v3.12.7 with "--v2"
> 
> https://authors.ietf.org/en/upgrading-from-v2
> 
>> but, given the very
>> large number of documents in the RFC Editor's collection in v2
>> format that, I assume, have not been converted to v3 and
>> tested for consistency with the output produced, a decision
>> to retire support for v2 should be taken only with great care
>> (and, IMO, given the risks and tradeoffs, made only with IESG
>> signoff after a community Last Call). Until then, I believe
>> the tools team and IETF staff have considerable
>> responsibility for keeping version 2 supported. I don't think
>> a few spurious warning messages are a big deal unless they
>> are a sign of things to come, but, when the answer to a
>> problem with constructions that are valid and well-documented
>> under v2 is "convert to v3", that is not supporting v2
>> properly and as the community has a reasonable right to
>> expect. 
> 
> v2 was officially obsoleted when RFC 7991 was published and
> RFC 7991 is explicit about that.  Yes a transition process
> should be supported and it has for six years now, but I
> disagree that the community has any right to expect that to
> continue.  It will inevitably get more expensive and complex
> to support v2.  Having said that, the transition certainly has
> taken longer than many might expect, which I attribute to the
> templates that were provided until recently, which still used
> a number of v2 idioms and did not showcase v3.  The new
> templates should ease the transition considerably.
> 
>> 
>> Similar comments apply to your comment about epsilon: my
>> expectation is that I can continue to use a text editor to
>> work with what are now called RFCXML files in both v2 and v3,
>> expecting that validation will come out of the xml2rfc program
>> (or, at worst, a competent, LINT-like, validator will be
>> provided by the IETF). I also expect those validation
>> processes will produce clear, correct, and, where possible,
>> actionable warning and error messages. If that ever becomes
>> not the case, e.g., if it were expected that people creating
>> documents will use an editor with XML Schema validating
>> capability and validation responsibility will lie there, that
>> would essentially require document authors who wish to work
>> in XML to use such an editor. I would assume that, too, would
>> require IESG signoff after an IETF Last Call if only because
>> it would increase the barriers to entry and participation in
>> the IETF and hence reduce the diversity of its active,
>> document-writing, participants.
> 
> There's no expectation that you use a schema-aware editor,
> but if you do then with the appropriate processing
> instructions, it will make your work considerably easier.
> Your choice though,
> 
> Jay
> 
>> 
>> best,
>> john