Re: [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.18.0

Carsten Bormann <cabo@tzi.org> Fri, 04 August 2023 04:26 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: rfc-markdown@ietfa.amsl.com
Delivered-To: rfc-markdown@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 24A6BC13AE35; Thu, 3 Aug 2023 21:26:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.896
X-Spam-Level:
X-Spam-Status: No, score=-6.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_TEMPERROR=0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lRcfy61RTgje; Thu, 3 Aug 2023 21:26:10 -0700 (PDT)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D7CE5C151AE5; Thu, 3 Aug 2023 21:26:06 -0700 (PDT)
Received: from smtpclient.apple (p548dc15c.dip0.t-ipconnect.de [84.141.193.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4RHCMH3hFGzDCgY; Fri, 4 Aug 2023 06:26:03 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CAD2=Z85hDSHt9gmAz4OGZ3HpYsyY_0tVjUuad7qPOiYFHdEKgA@mail.gmail.com>
Date: Fri, 04 Aug 2023 06:25:52 +0200
Cc: XML2RFC Interest Group <xml2rfc@ietf.org>, tools-discuss <tools-discuss@ietf.org>, Kesara Rathnayake <kesara@staff.ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0C87712C-4F97-4150-A7ED-F6438B157462@tzi.org>
References: <CAD2=Z85hDSHt9gmAz4OGZ3HpYsyY_0tVjUuad7qPOiYFHdEKgA@mail.gmail.com>
To: rfc-markdown@ietf.org
X-Mailer: Apple Mail (2.3731.700.6)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-markdown/t-poUOdWioubJKwlzI7HQzk4qNU>
Subject: Re: [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.18.0
X-BeenThere: rfc-markdown@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "rfc-markdown is a discussion list for people writing I-Ds and RFCs in Markdown and the authors of the tools used for that." <rfc-markdown.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-markdown/>
List-Post: <mailto:rfc-markdown@ietf.org>
List-Help: <mailto:rfc-markdown-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Aug 2023 04:26:16 -0000

> See https://github.com/ietf-tools/xml2rfc/releases/tag/v3.18.0 for
> release details.
> 
> This release allows the use of Unicode characters everywhere.

Wonderful!

(This release allows the use of non-ASCII Unicode characters everywhere;
Xml2rfc already allowed Unicode characters that were in its “ASCII” subset — which included a few select non-ASCII characters.)

This update should not require any updates in kramdown-rfc, but of course the need for workarounds like {{{}}{{🤦‍♂️}}} is gone.

> The  `--warn-bare-unicode` command line option will warn if Unicode
> characters are present in any element except artwork, city, cityarea,
> code, country, email, extaddr, organization, pobox, postalLine,
> refcontent, region, sortingcode, sourcecode, street, title and u.
> See https://github.com/ietf-tools/xml2rfc/pull/1017 for more details.

We generally want a soft transition to using the full Unicode repertoire, not the least because xml2rfc’s PDF generator may need attention with new character blocks coming into use — Gurmukhi may not quite work just yet.
RFCXML's <u element stays useful as an easy way to fulfil RFC 7997’s requirement to fully explain non-ASCII characters when that may be needed for interchange.

Non-ASCII characters sometimes sneak into drafts via copy-paste from sources that use full Unicode as a matter of course.
Not just typographic quotes, which can be jarring when mixed with typewriter quotes, but also various invisible characters such as zero-width space and word joiners which were already part of xml2rfc’s “ASCII” repertoire.

Kramdown-rfc comes with an analysis tool called “echars” (explain characters).

Running this on a markdown (or XML or TXT!) file generates output such as:

$ echars draft-bormann-restatement.md
*** Latin-1 Supplement
»: U+00BB    1 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (Common)
ä: U+00E4    1 LATIN SMALL LETTER A WITH DIAERESIS (Latin)
*** General Punctuation (Common)
—: U+2014    5 EM DASH
’: U+2019    1 RIGHT SINGLE QUOTATION MARK
”: U+201D    2 RIGHT DOUBLE QUOTATION MARK
…: U+2026    1 HORIZONTAL ELLIPSIS
⁠: U+2060    1 WORD JOINER

So there is nothing strange in this document, but it is still worth knowing where these characters outside the LF + %x40-7e space are (in this case: mostly in the titles of references), so I check this now and then for my documents (*).
Your editor might help with that, e.g. in Emacs use:

M-C-s [^^J-~]

(where ^J is a newline character, entered as ctrl-j, while the ^ preceding it is a caret.)

Of course, in XML you might be hiding beyond-ASCII by using entity references such as &nbsp; or character references such as &#x20AC; or &#8364; — echars doesn’t show these, but then the intent should be quite obvious in the manuscript.

> Report any issues on https://github.com/ietf-tools/xml2rfc/issues

… and any issues with kramdown-rfc on rfc-markdown@ietf.org and/or as issues in https://rfc.space

Grüße, Carsten

(*) I contemplate generating this report with each kramdown-rfc run, possibly modulated by declarations in the YAML header that say which characters the document author already expects.  But I’m going on a couple of vacations first now…