Re: [Rfc-markdown] [Tools-discuss] Tool to convert TXT to RFCXML

Carsten Bormann <cabo@tzi.org> Mon, 17 July 2023 06:48 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: rfc-markdown@ietfa.amsl.com
Delivered-To: rfc-markdown@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 69D1CC151066 for <rfc-markdown@ietfa.amsl.com>; Sun, 16 Jul 2023 23:48:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x1muN7N_uD1B for <rfc-markdown@ietfa.amsl.com>; Sun, 16 Jul 2023 23:48:08 -0700 (PDT)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 73494C14CE42 for <rfc-markdown@ietf.org>; Sun, 16 Jul 2023 23:48:07 -0700 (PDT)
Received: from smtpclient.apple (p548dc15c.dip0.t-ipconnect.de [84.141.193.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4R4CMT1Yb6zDCcF; Mon, 17 Jul 2023 08:48:05 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <66ef1749-f4b1-51af-8e08-06eb58a9005a@gmx.de>
Date: Mon, 17 Jul 2023 08:47:54 +0200
Cc: rfc-markdown@ietf.org, RFC Interest <rfc-interest@rfc-editor.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <869922F5-DA96-40DB-A749-32E801A1A632@tzi.org>
References: <CABXxEz97ZDeHhtMeX6CwX842d=s9CXfUtG5DFWpxNKbtBcoW6Q@mail.gmail.com> <CAD2=Z86=DyfT0Fp23DqdCyz32Od4uhAhA48K=pst64eZ+CS8NA@mail.gmail.com> <CABXxEz-KC5Nayv=KMce9mDi_ngu9J8PY1-pAbMUBypxEX2Pw8Q@mail.gmail.com> <430B62BB-45BB-4ECF-812F-39E84926CFA7@tzi.org> <CABXxEz-_6MV0P9sc=GBJw+eReFpLZC2vDFFvo9katc1ECwpyRw@mail.gmail.com> <9f95474b-c049-2e5a-1cfd-f3e9ebf48e2d@gmx.de> <fe96fb1a-1c56-4416-b937-b0b66c38ae71@betaapp.fastmail.com> <66ef1749-f4b1-51af-8e08-06eb58a9005a@gmx.de>
To: Julian Reschke <julian.reschke@gmx.de>
X-Mailer: Apple Mail (2.3731.600.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-markdown/mEHQCIzkUeiYZYnMSfYAnAVaZiA>
Subject: Re: [Rfc-markdown] [Tools-discuss] Tool to convert TXT to RFCXML
X-BeenThere: rfc-markdown@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "rfc-markdown is a discussion list for people writing I-Ds and RFCs in Markdown and the authors of the tools used for that." <rfc-markdown.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-markdown/>
List-Post: <mailto:rfc-markdown@ietf.org>
List-Help: <mailto:rfc-markdown-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jul 2023 06:48:12 -0000

On 17. Jul 2023, at 07:13, Julian Reschke <julian.reschke@gmx.de> wrote:
> 
> I agree that this is a problem. But that's also what the RFC Editor has
> been doing in the past (AFAIR).

The RFC editor can only express what RFCXML provides.
Since there is no <nobr in RFCXML, only character substitutions remain.
As Martin notes, there can be undesirable effects:

* xml2rfc can process these non-break character and replace them with basic characters when generating final form for plaintext rendering.
* Theoretically, it could to that for PDF, too, but that requires the use of tools that enable this.
* For HTML, it is the HTML renderer (browser) that has to react on the no-break characters, so xml2rfc has to leave them in.

Fortunately, there is some support for searching in the presence of no-break characters in browsers, so the HTML exception is not as bad.

We should

* find out more about browser support for searching in the presence of no-break characters
* make sure that our PDF support also generates text that allows useful copy/paste
* add <nobr or similar to RFCXML, so we don’t have to play the character substitution games.

I know that <nobr in HTML was a browser vendor invention that did not make it into HTML5, but that is because HTML5 enables representing the same information in style properties, which they instead did.  This doesn’t apply to RFCXML.  (Any argument that this *is* a style property and therefore should not be part of the RFCXML gamut is confused.)

(I re-added rfc-interest to rfc-markdown as, while markdown can help automatically doing the character substitution games, that is just a stopgap until RFCXML catches up — at which point we can just change the markdown processors to emit <nobr without a need to change the actual markdown.)

Grüße, Carsten