Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0

Marc Petit-Huguenin <> Thu, 19 January 2023 17:47 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C86B8C15153E; Thu, 19 Jan 2023 09:47:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.897
X-Spam-Status: No, score=-6.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_FAIL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Q7x3qnSchtER; Thu, 19 Jan 2023 09:47:54 -0800 (PST)
Received: from ( []) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 99903C14CE28; Thu, 19 Jan 2023 09:47:54 -0800 (PST)
Received: from [IPV6:2601:204:e37f:a6af:d250:99ff:fedf:93cf] (unknown [IPv6:2601:204:e37f:a6af:d250:99ff:fedf:93cf]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "Marc Petit-Huguenin", Issuer "" (verified OK)) by (Postfix) with ESMTPS id C52ECAE232; Thu, 19 Jan 2023 18:47:51 +0100 (CET)
Message-ID: <>
Date: Thu, 19 Jan 2023 09:47:49 -0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0
Content-Language: en-US
To: Jay Daley <>
Cc:, tools-discuss <>
References: <> <> <> <> <>
From: Marc Petit-Huguenin <>
In-Reply-To: <>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="------------JFh5Hpw0WD0t6tscsQ3C0o06"
Archived-At: <>
Subject: Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 19 Jan 2023 17:47:58 -0000

On 1/19/23 09:36, Jay Daley wrote:
>> On 19 Jan 2023, at 17:20, Marc Petit-Huguenin <> wrote:
>> On 1/19/23 08:01, Jay Daley wrote:
>>>> On 19 Jan 2023, at 15:41, Marc Petit-Huguenin <> wrote:
>>>> Signed PGP part
>>>> On 1/18/23 14:09, Kesara Rathnayake wrote:
>>>>> See for
>>>>> release details.
>>>>> New changes include,
>>>>> * Permit non-ASCII within <t> without the use of <u>.
>>>> Isn't an unconditional use of non-ASCII a violation of RFC 7997?
>>> Section 3.4 says:
>>>   When the mention of non-ASCII characters is required for correct
>>>   protocol operation and understanding, the characters' Unicode code
>>>   points must be used in the text.  The addition of each character name
>>>   is encouraged.
>>>   o  Non-ASCII characters will require identifying the Unicode code
>>>       point.
>>>   o  Use of the actual UTF-8 character (e.g., (See PDF for non-ASCII
>>>       character string)) is encouraged so that a reader can more easily
>>>       see what the character is, if their device can render the text.
>>>   o  The use of the Unicode character names like "INCREMENT" in
>>>       addition to the use of Unicode code points is also encouraged.
>>>       When used, Unicode character names should be in all capital
>>>       letters.
>>> <u> is a convenient way of ensuring that this happens because it is recognised by xml2rfc and processed in line with those bullets above.  However, note that the text says "is required for correct protocol operation" and that does not cover such usage an example where the specific character chosen for that example doesn’t matter (e.g. when demonstrating output using RTL script).  Under such circumstances non-ASCII characters should be allowed without the adornment listed above.
>>> The previous implementation of <u> (which btw was added after RFC 7991 and so never had consensus) requires a <u> for *all* non-ASCII characters and so exceeded the requirement of 3.4 above.  This change now allows non-ASCII to be used without a <u> being enforced automatically but it does not mean that 3.4 will be ignored for RFCs.  <u> will still be required for RFCs to follow the principle of "required for correct protocol operation" and it will for the RPC, authors and stream owners to work that out.
>> RFC 7997 clearly says that Unicode CANNOT be used unless for a finite list of cases:
>> 1. Purely part of an example (3.1)
>> 2. for English words imported from foreign languages, with the strict constraints that they are defined in the Merriam-Webster dictionary (3.1).
>> 3. person or Organization name (3.2, 3.3)
>> 4. when the Unicode character is described, instead of being used (3.4)
>> 5. in a table
>> 6. in code
>> 7. in a bibliographic item
>> 8. in address information
>> The modification above is clearly not restricted to these cases.
>> I notice that the xml2rfc language already contains some elements that are can be used into enforcing these cases.  When missing, new elements could be added:
>> (1) An <artwork> element can contain Unicode
>> (2) a new element (as <t> content) can mark word that can contain Unicode.  Xml2rfc can then extract them and check that they are valid English words
>> (3) <contact> can contain Unicode
>> (4) <u>, used to describe a Unicode character, can contains Unicode
>> (5) a <tr> element can contain Unicode
>> (6) a <sourcecode> element can contain Unicode
>> (7) a <reference> can contain Unicode (not just the organization/address)
>> (8) The <address> and <organization> elements can contain Unicode.
>> Doing that and documenting it in the next revision of RFC 7991 seems the sensible thing to do.
>> But unconditionally letting everyone adding Unicode characters willy-nilly looks to me as a way to, at some point in the future, being able to say that we have no other choices than officially authorizing Unicode everywhere because there is already too many legacy RFCs doing that (a well known tactic to work around standards).
> While a tool can be used to enforce policy, it is not the only way and in some cases it is not the best way.
> For example, RFC 7332 (the RFC Style Guide) says in section 3.1 "The RFC publication language is English".  Nobody would suggest that xml2rfc checks every word to determine if it is English or not and errors if it find one that isn’t, because we all know that this policy is best enforced by people at the appropriate stages.  That’s what’s happening here - compliance with 7997 is now with the RPC editors.  So, no there will not be a set of legacy RFCs with non-ASCII used incorrectly that can then be used to reverse engineer a policy change.

Or, as I proposed above, create a new element in the xml2rfc language that tags a word as permitted to use Unicode in it, e.g., "<loanword>attaché</loanword>".  That can be mechanically verified against a dictionary, does not require to lookup each word, and clearly prevents things that are not authorized by RFC 7997.

Marc Petit-Huguenin