Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?

John C Klensin <john-ietf@jck.com> Tue, 14 June 2022 00:30 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E5872C14CF1E for <tools-discuss@ietfa.amsl.com>; Mon, 13 Jun 2022 17:30:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.904
X-Spam-Level:
X-Spam-Status: No, score=-1.904 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M__NeVcXa0J0 for <tools-discuss@ietfa.amsl.com>; Mon, 13 Jun 2022 17:30:23 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E6EE5C157B4C for <tools-discuss@ietf.org>; Mon, 13 Jun 2022 17:30:22 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1o0uRo-0009HY-7l; Mon, 13 Jun 2022 20:30:20 -0400
Date: Mon, 13 Jun 2022 20:30:14 -0400
From: John C Klensin <john-ietf@jck.com>
To: Julian Reschke <julian.reschke@gmx.de>, tools-discuss@ietf.org
Message-ID: <389F1569A899D1D63752B92D@PSB>
In-Reply-To: <0888d7b6-0009-dd1a-cb89-f5967fbf8f30@gmx.de>
References: <B39D28F0353AE74800217ADC@PSB> <7EDFAAE2-3109-4D16-BC16-1A47DB365522@ietf.org> <E022AAF289DF04D70F449FF7@PSB> <0888d7b6-0009-dd1a-cb89-f5967fbf8f30@gmx.de>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/hzeuxcAs6Xj0oc2Voc33ieTyntI>
Subject: Re: [Tools-discuss] xml2rfc in --v2 mode -- bug report?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Jun 2022 00:30:27 -0000


--On Monday, June 13, 2022 20:34 +0200 Julian Reschke
<julian.reschke@gmx.de> wrote:

> Am 12.06.2022 um 19:05 schrieb John C Klensin:
>> ...
>> But this is a v2 document whose first version dates to early
>> 2017, when v3 was still in its infancy and, IIR, the online
>> version of xml2rfc would not yet handle v3.  RelaxNG was,
>> again IIR, only introduced with RFC 7749 in February 2016,
>> bringing the change from a DTD-based definition to a
>> Schema-based one with it.  Most of the changes described in
>> its Appendix B had (as RFC 7749 says) been adopted some years
>> earlier and templates adjusted, so that the actual changes
>> needed in 2016 were very small and many documents and
>> templates did not require changes at all.   I also note that
>> DOCTYPE appears in Section 4 of RFC 7749 with language that
>> implies to me that it is required. ...
> 
> Trying to clarify:
> 
> - there never ever was a *need* to include a DOCTYPE

Sorry, but I obviously had a misimpression from years of text
and templates that, in retrospect, were not clear enough.

> - RFC 7749 was published over 16 years after 2629. During that
> time frame, the grammar had been extended quite a bit,
> sometimes with lots of community discussion, sometimes rather
> ad-hoc. RFC 7749 documents the common grammar understood by
> the existing implementations (the TCL script, the Python
> re-implementation, and my XSLT) at the time of publication.

Thanks.  FWIW, that is completely consistent with my memory and
understanding.

> It's not a new design, just an inventory of what was there
> back then, to be used as basis when working on v3 (and, fwiw,
> I think that has mostly worked well)

Again, I would agree.  One effect, however, is that the
publication of 7749 was not a major new development with
community support for brand new significant changes from v1.
Instead, many members of the community who had been using v2,
with at least some of those grammatical extensions and changes,
before (sometimes a long time before) 7749 went through Last
Call and was published just continued about their business.
IIR, it was presented at the time as a new and consolidated
definition, using a different definitional method than the DTD
of 2629, but not a significant and/or substantive change in what
had been done and what was being done.

> - Section 4 explains how to declare named character entities
> (https://greenbytes.de/tech/webdav/rfc7749.html#special.unicod
> e.code.points). These indeed need some kind of doctype
> declaration (for precision:
> https://www.w3.org/TR/2008/REC-xml-20081126/#sec-prolog-dtd)
> to work; but that doesn't imply that you need a full DTD, nor
> that the recipient/processor will actually look at it even if
> you have one.

Understood.  And I think I understood that long before the
current discussion started.

> To be clear: if you don't want to type non-ASCII character
> codes directly, you'll need a Unicode-capable editor, use
> numeric references (such as "&#160;" instead of "&nbsp;"), or
> declare these entities; and the only way to do that in XML is
> what Section 4 of RFC 7749 describes (and I believe Jay's note
> about referencing the RelaxNG grammar using a PI is misleading
> as it doesn't help with that case at all).

Again, consistent with my understanding, so I am perhaps a
little bit less confused and ignorant than I had begun to
suspect a few minutes ago.

> Finally, I do agree that xml2rfc's attempt to discourage use
> of DTDs might be well-intended, but really increases confusion
> with something that's already confusing enough in XML.

And I clearly fell victim to that confusion.

Picking up from you later note to save time and traffic...


> Am 13.06.2022 um 20:38 schrieb Julian Reschke:
>> Am 12.06.2022 um 22:41 schrieb John C Klensin:
>>> ...
>>>   * a working interpreter for the v2 format
>>>   * a working converter between v2 and v3 that gets very
>>>     nearly everything right and produces clear messages
>>> when     it does not
>>>   * really good documentation about differences to look
>>>     for and what to do about them.  Such documentation
>>> is     different from just supplying two or more
>>>     carefully-written "vocabulary" documents.
>>> ...
>> 
>> I believe the first two if these. If there's something broken
>> with these, it should be treated as regular bug report (where
>> I'd prefer to put the focus on the conversion tool, so that
>> we can stop worrying about v2 documents as soon as possible).
> 
> "I believe the first two are here".

I believe the first two are at least nearly here.   How nearly
depends on the rate at which people are having problems and I
have no competent opinion on that: I have not been on this list
for very long (and will probably drop off later this week) and
have not reviewed the issues list carefully enough to form an
opinion from it.  I do not expect bug-compatibility with the
earlier v2 version (and would consider such an expectation
fairly silly).  So (to use my problem report as an example), if
an author does something careless or worse in a source document
like having a "day" argument to a <date> element have a
non-numeric value or a numeric one in three or four digits, and
the v2 processors ignores it but the conversion tool rejects it,
I don't think that is a big deal as long as people are willing
to help.  I would, however, prefer a more obvious error message.

And we are in strong agreement that, at this stage, it is better
to concentrate on the converter.  Again using my own report as
an example, I think it would be a waste of scarce resources to
put energy into changing the v2 processor to reject that
impossible construction rather than continuing to ignore the
argument with the invalid value.

Again, many thanks.
   john

p.s.  For the avoidance of doubt given the recent discussion on
the IETF list, comments about competence, confusion, ignorance,
carelessness, etc., above are strictly about my own feeling and
mistakes.  While I believe it would be inappropriate to apply
such terms to others, I believe it is still ok for me to hold
those opinions of myself and my errors.