Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?

Jay Daley <exec-director@ietf.org> Mon, 13 June 2022 21:36 UTC

Return-Path: <exec-director@ietf.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DC5AAC15AAF0 for <tools-discuss@ietfa.amsl.com>; Mon, 13 Jun 2022 14:36:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.908
X-Spam-Level:
X-Spam-Status: No, score=-1.908 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ejD6qV2VL4_G for <tools-discuss@ietfa.amsl.com>; Mon, 13 Jun 2022 14:36:30 -0700 (PDT)
Received: from ietfx.amsl.com (ietfx.amsl.com [50.223.129.196]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 99235C157B4C for <tools-discuss@ietf.org>; Mon, 13 Jun 2022 14:36:30 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by ietfx.amsl.com (Postfix) with ESMTP id 7B10C4053E45; Mon, 13 Jun 2022 14:36:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from ietfx.amsl.com ([50.223.129.196]) by localhost (ietfx.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8DPNE5hDBNQz; Mon, 13 Jun 2022 14:36:30 -0700 (PDT)
Received: from smtpclient.apple (unknown [78.108.139.241]) by ietfx.amsl.com (Postfix) with ESMTPSA id 9E3984053E43; Mon, 13 Jun 2022 14:36:29 -0700 (PDT)
From: Jay Daley <exec-director@ietf.org>
Message-Id: <5C2BC1A3-7DAE-4AB3-ABE3-19AB161D38BA@ietf.org>
Content-Type: multipart/alternative; boundary="Apple-Mail=_67D18536-29C9-4230-A6BF-158BC2ECCA82"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
Date: Mon, 13 Jun 2022 23:36:26 +0200
In-Reply-To: <280CEA676989D89FF1789101@PSB>
Cc: tools-discuss@ietf.org
To: John C Klensin <john-ietf@jck.com>
References: <B39D28F0353AE74800217ADC@PSB> <7EDFAAE2-3109-4D16-BC16-1A47DB365522@ietf.org> <E022AAF289DF04D70F449FF7@PSB> <49687028-4FF4-44D1-A3D3-79FDF670A5A1@ietf.org> <280CEA676989D89FF1789101@PSB>
X-Mailer: Apple Mail (2.3696.100.31)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/bKfOOqMRCA4RVMwXw8LsrL0fyIg>
Subject: Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jun 2022 21:36:34 -0000


> On 13 Jun 2022, at 17:42, John C Klensin <john-ietf@jck.com> wrote:
> 
> Jay,
> 
> This is very helpful even thought I think, for the person who is
> trying to get work done rather than being steeped in the theory,
> the vocabulary - schema and PI distinctions (including the
> additional distinction in Carsten's recent note) might be
> considered fussy details.  I don't believe (but have not gone
> back and checked) that any of the definitions say something
> equivalent to "don't bother to try to understand this without
> first becoming very familiar with the details of XML terminology
> and distinctions".

Actually John, the only reason I had to explain that is because you took my recommendation about adding a PI as meaning that this was somehow creating a v2/v3 hybrid.  Someone less worried about the formalities might not need that level of detail.  Having said that, XML is indeed very complex and does come with a lot of baggage which we are all exposed to at times - but that ship has sailed.

> 
> As to the transition, you may reasonably disagree either about
> the principle or about the boundary point but, from my
> perspective, the purpose of tools like the RFCXML definition and
> xml2rfc is to let people contributing to the IETF to get work
> done without unreasonable barriers (including sorting out of
> working around significant bugs).  From that perspective, it
> would be a grave disservice to the community to declare v2
> support at an end until xml2rfc is actually stable and
> relatively problem-free.  As long as people, such as Carsten,
> who have been paying far more attention to this than I have, are
> mentioning hundreds of "issues" as significant or making
> comments about keeping the newer versions of things alive, we
> are not near "relatively problem-free" yet.

I do not agree with the view that xml2rfc is not ready and I think that does a disservice to the work put into over the years.  There are indeed anomalies in the grammar that are a bit more fundamental than xml2rfc but those are issues with the RFC (mostly due to hindsight) and not the tool.

> 
> Finally and most important, we should all remember that there
> has never been a publicly stated expectation that everyone in
> the IETF who might need to write a document or use other tools
> will be on this list.  I have no way to know, but I assume that
> only a very small fraction of those contributors are watching
> the list carefully.   Information such as that below --if you
> and others think it is as important as it appears to be-- should
> be on an easily found web page somewhere and the community
> pointed to it, including in a normative reference from the
> revised vocabulary document.

Good job we have https://authors.ietf.org <https://authors.ietf.org/> which meets all of your criteria above.

Jay

> 
> thanks,
>   john
> 
> 
> --On Monday, June 13, 2022 15:21 +0100 Jay Daley
> <exec-director@ietf.org> wrote:
> 
>> Hi John
>> 
>> From reading your message I think I need to start with a clear
>> taxonomy of the various moving parts here because this is
>> still not clear to most participants, afaict:
>> 
>> 1.  All XML languages define what elements and attributes are
>> acceptable within that language and what element can appear
>> inside what other element.  This is what we call RFCXML and
>> what was previously called the xml2rfc vocabulary.  This is
>> more generally called the *grammar*.  The grammar is defined
>> in RFCs.
>> 
>> 2.  We have chosen to use another language to formally define
>> the grammar above.  For v1, defined in RFC 2629, the formal
>> definition used a DTD.  For v2, defined in RFC 7749, this was
>> specified in a RelaxNG schema and not a DTD.  For v2, defined
>> in RFC 7991, this was also defined in a RelaxNG schema.  This
>> is more generally called the *schema*. At some point work was
>> put into changing rfc2629.dtd to make it compliant with v2
>> (possibly even v3) but I believe it does not (and cannot)
>> because of the limitations of DTDs, correctly define the v2
>> grammar.
>> 
>> 3.  There is an XML construct called a *processing
>> instruction* (aka a PI), which is embedded inside an XML
>> document and provides instructions that are to be interpreted
>> by any XML processor.  These are not part of the grammar and
>> therefore cannot be part of the formal definition.  To repeat
>> myself - PIs sit outside of the grammar, they are conceptually
>> similar to escape codes in that respect.  DOCTYPE is a
>> processing instruction, as are <?xml-model…> and
>> <?xml-stylesheet…>..  None of the above RFCs have
>> comprehensively covered PIs - RFC 2629 does not mention them
>> at all, RFC 7749 notes that certain things are set by PIs but
>> does not define them and  RFC 7991 notes that certain grammar
>> changes are intended to deprecate some PIs but doesn't
>> formally define those.
>> 
>> With that in mind ...
>> 
>>> On 12 Jun 2022, at 18:05, John C Klensin <john-ietf@jck.com>
>>> wrote:
>>> 
>>> 
>>> 
>>> --On Sunday, June 12, 2022 16:43 +0100 Jay Daley
>>> <exec-director@ietf.org> wrote:
>>> 
>>>> Hi John
>>>> 
>>>>> On 11 Jun 2022, at 20:29, John C Klensin <john-ietf@jck.com>
>>>>> wrote:
>>>>> 
>>>>> Hi. I have an old document, in xml2rfc v2 format, whose
>>>>> content I'm trying to upgrade. When I run
>>>>> xml2rfc DocName.xml --v2
>>>>> (with version 3.12.7) I get a series of messages that look
>>>>> like
>>>>> 
>>>>> Warning:
>>>>> 	file:/C:/Users/Klensin/AppData/Local/Programs/Python/Pyt
>>>>> 	hon37/lib/site-packages/xml2rfc/templates/rfc2629-xhtml.
>>>>> 	ent is no longer needed as the special processing of
>>>>> 	non-ASCII characters has been superseded by direct
>>>>> 	support for non-ASCII characters in RFCXML.
>>>>> 
>>>>> The source does not contain any references to
>>>>> rfc2629-xhtml.ent. It does, of course, contain 
>>>>> 	<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
>>>>> as well as the DOCTYPE statement that specifies the DTD
>>>>> 
>>>>> Are these warnings actually addressed to something in the
>>>>> DTD or stylesheet file than should be cleaned out and, if
>>>>> so, how does that get done?
>>>> 
>>>> The DOCTYPE that you have included I am guessing is for
>>>> rfc2629.dtd which includes by reference rfc2629-xhtml.ent,
>>>> 
>>>> The DOCTYPE is superfluous for document validation and has
>>>> been since v2 of the grammar because v2 is when the change
>>>> was made from XML Schema (as specified in a DOCTYPE) to
>>>> RelaxNG. However, it continued to be used partly I think
>>>> because the templates were never updated to be RelaxNG-aware
>>>> and partly because it was a convenient way to incorporate
>>>> the character entities.
>>> 
>>> But this is a v2 document whose first version dates to early
>>> 2017, when v3 was still in its infancy and, IIR, the online
>>> version of xml2rfc would not yet handle v3. RelaxNG was, again
>>> IIR, only introduced with RFC 7749 in February 2016, bringing
>>> the change from a DTD-based definition to a Schema-based one
>>> with it. Most of the changes described in its Appendix B had
>>> (as RFC 7749 says) been adopted some years earlier and
>>> templates adjusted, so that the actual changes needed in 2016
>>> were very small and many documents and templates did not
>>> require changes at all.
>> 
>> That "most of the changes … had been adopted some years
>> earlier" doesn't change the fact that v2 is defined in RFC
>> 7749 and that uses a RelaxNG schema.  
>> 
>>> I also note that DOCTYPE appears in Section 4 of RFC
>>> 7749 with language that implies to me that it is required.
>> 
>> No, it's a PI not part of the grammar.
>> 
>>> 
>>>> I normally recommend that authors replace the DOCTYPE
>>>> statement with this:
>>>> 
>>>> 	<?xml-model href="rfc7991bis.rnc"?>
>>>> 
>>>> (The file referenced can be found at
>>>> https://raw.githubusercontent.com/ietf-tools/rfcxml-template
>>>> s- and-schemas/main/rfc7991bis.rnc)
>>> 
>>> But that is a piece of version 3 vocabulary (see below).
>> 
>> No, it's a PI and therefore outside of the v3 vocabulary.  
>> 
>> However, I did make a mistake here, forgetting that you are a
>> v2 user.  The correct PI for a v2 document would be 
>> 
>> 	<?xml-model href="rfc7749.rnc"?>
>> 
>> Where that file can be found at
>> https://raw.githubusercontent.com/ietf-tools/legacy-templates-
>> and-schemas/main/rfc7749.rnc
>> 
>>>> Doing this tells any XML processor that this uses the
>>>> referenced RelaxNG schema and a RelaxNG aware editor will
>>>> then both validate against this schema and provide
>>>> schema-aware editing support (such as auto-suggestion and
>>>> autoi-completion). 
>>> 
>>>> However afaict your editor, Epsilon, does not appear to do
>>>> schema validation of any sort and so neither the statement
>>>> above nor a DOCTYPE will result in any validation.
>>> 
>>> Epsilon (and emacs) are "just" editors. Their modes for
>>> handling XML are not aware of schema, just such fundamental
>>> --and essentially lexical-- issues as formatting, element
>>> matching, and so on. 
>> 
>> As explained by Carsten that is not correct for Emacs.  From
>> my research Epsilon is almost unique in not doing any form of
>> schema validation.  The reason all the others do it appears to
>> be because they use the same underlying open source XML
>> libraries that provide this functionality  
>> 
>>> 
>>> So I am confused by your explanation and suggestions:
>>> 
>>> (1) They would seem to lead to documents that are v2-v3
>>> hybrids. I don't know how the current versions of xml2rfc
>>> would deal with that but, given assorted v2 elements and
>>> constructions that were deprecated in v3, I'd guess it would
>>> be very hard to get right and that going in that direction
>>> would be a bad idea.
>> 
>> As noted above, PIs are not part of the grammar and so you can
>> have a v2 document that uses new PIs and it is still a v2
>> document.
>> 
>>> 
>>> (2) If retaining DOCTYPE, or at least DOCTYPE with those
>>> definitions, in a v3 document is, as you suggest, obsolete
>>> and a bad practice, then that should be reflected in the v2v3
>>> conversion process. However, when I did the conversion
>>> yesterday, that definition (straight out of the v2 document
>>> and RFC 7749) is retained unchanged. I presume that should go
>>> onto the list of bugs in the converter.
>> 
>> Except that it's not a bug.  Having a DOCTYPE in a v3
>> document doesn't stop it from being a v3 document.  What it
>> does is provide an instruction to any XML processor that may
>> be wrong and which it may choose to ignore.  I agree however
>> that a warning from xml2rfc would be helpful.
>>> 
>>> 
>>> More generally, I have no idea what happens behind the scenes
>>> when I invoke xml2rfc v3.12.7 with "--v2"
>> 
>> https://authors.ietf.org/en/upgrading-from-v2
>> 
>>> but, given the very
>>> large number of documents in the RFC Editor's collection in v2
>>> format that, I assume, have not been converted to v3 and
>>> tested for consistency with the output produced, a decision
>>> to retire support for v2 should be taken only with great care
>>> (and, IMO, given the risks and tradeoffs, made only with IESG
>>> signoff after a community Last Call). Until then, I believe
>>> the tools team and IETF staff have considerable
>>> responsibility for keeping version 2 supported. I don't think
>>> a few spurious warning messages are a big deal unless they
>>> are a sign of things to come, but, when the answer to a
>>> problem with constructions that are valid and well-documented
>>> under v2 is "convert to v3", that is not supporting v2
>>> properly and as the community has a reasonable right to
>>> expect. 
>> 
>> v2 was officially obsoleted when RFC 7991 was published and
>> RFC 7991 is explicit about that.  Yes a transition process
>> should be supported and it has for six years now, but I
>> disagree that the community has any right to expect that to
>> continue.  It will inevitably get more expensive and complex
>> to support v2.  Having said that, the transition certainly has
>> taken longer than many might expect, which I attribute to the
>> templates that were provided until recently, which still used
>> a number of v2 idioms and did not showcase v3.  The new
>> templates should ease the transition considerably.
>> 
>>> 
>>> Similar comments apply to your comment about epsilon: my
>>> expectation is that I can continue to use a text editor to
>>> work with what are now called RFCXML files in both v2 and v3,
>>> expecting that validation will come out of the xml2rfc program
>>> (or, at worst, a competent, LINT-like, validator will be
>>> provided by the IETF). I also expect those validation
>>> processes will produce clear, correct, and, where possible,
>>> actionable warning and error messages. If that ever becomes
>>> not the case, e.g., if it were expected that people creating
>>> documents will use an editor with XML Schema validating
>>> capability and validation responsibility will lie there, that
>>> would essentially require document authors who wish to work
>>> in XML to use such an editor. I would assume that, too, would
>>> require IESG signoff after an IETF Last Call if only because
>>> it would increase the barriers to entry and participation in
>>> the IETF and hence reduce the diversity of its active,
>>> document-writing, participants.
>> 
>> There's no expectation that you use a schema-aware editor,
>> but if you do then with the appropriate processing
>> instructions, it will make your work considerably easier.
>> Your choice though,
>> 
>> Jay
>> 
>>> 
>>> best,
>>> john
> 
> 

-- 
Jay Daley
IETF Executive Director
exec-director@ietf.org