Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0

Marc Petit-Huguenin <> Thu, 19 January 2023 17:20 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id B9EC5C14CEE4; Thu, 19 Jan 2023 09:20:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_FAIL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Y_X307RBJcFB; Thu, 19 Jan 2023 09:20:18 -0800 (PST)
Received: from ( []) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPS id 98686C14EB17; Thu, 19 Jan 2023 09:20:18 -0800 (PST)
Received: from [IPV6:2601:204:e37f:a6af:d250:99ff:fedf:93cf] (unknown [IPv6:2601:204:e37f:a6af:d250:99ff:fedf:93cf]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "Marc Petit-Huguenin", Issuer "" (verified OK)) by (Postfix) with ESMTPS id 325EDAE232; Thu, 19 Jan 2023 18:20:15 +0100 (CET)
Message-ID: <>
Date: Thu, 19 Jan 2023 09:20:12 -0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0
Content-Language: en-US
To: Jay Daley <>
Cc: Kesara Rathnayake <>,, tools-discuss <>
References: <> <> <>
From: Marc Petit-Huguenin <>
In-Reply-To: <>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="------------320W8I0bsl1pn0qcUTzy9l5s"
Archived-At: <>
Subject: Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 19 Jan 2023 17:20:22 -0000

On 1/19/23 08:01, Jay Daley wrote:
>> On 19 Jan 2023, at 15:41, Marc Petit-Huguenin <> wrote:
>> Signed PGP part
>> On 1/18/23 14:09, Kesara Rathnayake wrote:
>>> See for
>>> release details.
>>> New changes include,
>>> * Permit non-ASCII within <t> without the use of <u>.
>> Isn't an unconditional use of non-ASCII a violation of RFC 7997?
> Section 3.4 says:
>    When the mention of non-ASCII characters is required for correct
>    protocol operation and understanding, the characters' Unicode code
>    points must be used in the text.  The addition of each character name
>    is encouraged.
>    o  Non-ASCII characters will require identifying the Unicode code
>        point.
>    o  Use of the actual UTF-8 character (e.g., (See PDF for non-ASCII
>        character string)) is encouraged so that a reader can more easily
>        see what the character is, if their device can render the text.
>    o  The use of the Unicode character names like "INCREMENT" in
>        addition to the use of Unicode code points is also encouraged.
>        When used, Unicode character names should be in all capital
>        letters.
> <u> is a convenient way of ensuring that this happens because it is recognised by xml2rfc and processed in line with those bullets above.  However, note that the text says "is required for correct protocol operation" and that does not cover such usage an example where the specific character chosen for that example doesn’t matter (e.g. when demonstrating output using RTL script).  Under such circumstances non-ASCII characters should be allowed without the adornment listed above.
> The previous implementation of <u> (which btw was added after RFC 7991 and so never had consensus) requires a <u> for *all* non-ASCII characters and so exceeded the requirement of 3.4 above.  This change now allows non-ASCII to be used without a <u> being enforced automatically but it does not mean that 3.4 will be ignored for RFCs.  <u> will still be required for RFCs to follow the principle of "required for correct protocol operation" and it will for the RPC, authors and stream owners to work that out.

RFC 7997 clearly says that Unicode CANNOT be used unless for a finite list of cases:

1. Purely part of an example (3.1)
2. for English words imported from foreign languages, with the strict constraints that they are defined in the Merriam-Webster dictionary (3.1).
3. person or Organization name (3.2, 3.3)
4. when the Unicode character is described, instead of being used (3.4)
5. in a table
6. in code
7. in a bibliographic item
8. in address information

The modification above is clearly not restricted to these cases.

I notice that the xml2rfc language already contains some elements that are can be used into enforcing these cases.  When missing, new elements could be added:

(1) An <artwork> element can contain Unicode
(2) a new element (as <t> content) can mark word that can contain Unicode.  Xml2rfc can then extract them and check that they are valid English words
(3) <contact> can contain Unicode
(4) <u>, used to describe a Unicode character, can contains Unicode
(5) a <tr> element can contain Unicode
(6) a <sourcecode> element can contain Unicode
(7) a <reference> can contain Unicode (not just the organization/address)
(8) The <address> and <organization> elements can contain Unicode.

Doing that and documenting it in the next revision of RFC 7991 seems the sensible thing to do.

But unconditionally letting everyone adding Unicode characters willy-nilly looks to me as a way to, at some point in the future, being able to say that we have no other choices than officially authorizing Unicode everywhere because there is already too many legacy RFCs doing that (a well known tactic to work around standards).

Marc Petit-Huguenin