Re: [Rfc-markdown] [Tools-discuss] Xml2rfc regression?: rfc2629.dtd (emergency fix in 1.5.25)

Carsten Bormann <cabo@tzi.org> Tue, 25 January 2022 06:24 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: rfc-markdown@ietfa.amsl.com
Delivered-To: rfc-markdown@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8D2DA3A189E; Mon, 24 Jan 2022 22:24:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IdKHiNb9Ixtw; Mon, 24 Jan 2022 22:24:11 -0800 (PST)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 016A53A189D; Mon, 24 Jan 2022 22:24:09 -0800 (PST)
Received: from [192.168.217.118] (p5089a6b7.dip0.t-ipconnect.de [80.137.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4JjcJ40hjWzDCd7; Tue, 25 Jan 2022 07:24:04 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <d68b79c6-2240-89e0-fb24-47c35e9a8b3f@staff.ietf.org>
Date: Tue, 25 Jan 2022 07:24:03 +0100
Cc: tools-discuss <tools-discuss@ietf.org>, rfc-markdown@ietf.org
X-Mao-Original-Outgoing-Id: 664784643.4691629-669ee938621fcf3966153f79329f12e3
Content-Transfer-Encoding: quoted-printable
Message-Id: <29409D64-A1A8-45DA-8757-64D5EB304C1A@tzi.org>
References: <81E41F85-E62B-4329-83AE-F84C4AB3165A@tzi.org> <d68b79c6-2240-89e0-fb24-47c35e9a8b3f@staff.ietf.org>
To: Kesara Rathnayake <kesara@staff.ietf.org>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-markdown/WyhM1DpH149tbKcS7X2z40KlR-8>
Subject: Re: [Rfc-markdown] [Tools-discuss] Xml2rfc regression?: rfc2629.dtd (emergency fix in 1.5.25)
X-BeenThere: rfc-markdown@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "rfc-markdown is a discussion list for people writing I-Ds and RFCs in Markdown and the authors of the tools used for that." <rfc-markdown.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-markdown/>
List-Post: <mailto:rfc-markdown@ietf.org>
List-Help: <mailto:rfc-markdown-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Jan 2022 06:24:16 -0000

Hi Kesara,

thanks for the super-quick response!

> On 2022-01-25, at 03:17, Kesara Rathnayake <kesara@staff.ietf.org> wrote:
> 
> Note that `SYSTEM` inclusions are allowed without the new flag, as long as the included file is in the templates directory.
> The templates directory can be configured by using `--template-dir`.

There is indeed an rfc2629.dtd under
lib/python3.9/site-packages/xml2rfc/templates
(I’m assuming that this is where my xml2rfc thinks its (default) template directory is.)

However, there also is an rfc2629.dtd in the local directory where the draft’s .xml is (this is needed so we can extract elements from the xml without xmlstarlet complaining).  Xml2rfc apparently finds the local one but then decides it shouldn’t allow access to it, instead of using the template one.

So, essentially, almost all builds that use xml tools for extracting components from the XML file are now broken.

I did an emergency fix in kramdown-rfc 1.5.25 that substitutes

<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">

for

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

if --v3 (kdrfc -3) is given.
This leaves v2 high and dry, but those files are more likely to use the predefined character entities and less likely to use additional XML tooling and this a local rfc2629.dtd.
The emergency fix also makes the predefined character entities (except the four that now seem to be canonical for the RFC editor) unusable with v3 — this is a breaking change (I didn’t check how many drafts this breaks), but ultimately possibly a good one.  If you need a workaround, add:

doctype-reference: 'SYSTEM "rfc2629.dtd"'
or
doctype-reference: 'SYSTEM “rfc2629-xhtml.ent"'

to the YAML (and make sure you don’t have a local copy in the directory).

(FYI: When I looked today in /archive/id, 
of about 53451 XML I-Ds, approximately 
49185 XML I-Ds contained matches for
<!DOCTYPE rfc SYSTEM "rfc2629.dtd”
or similar and approximately 1870 contained matches for
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent”>
which probably has the same problem with a local copy.
So this is not a theoretical concern.)

Grüße, Carsten