Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0

Jay Daley <> Thu, 19 January 2023 17:36 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 74FB1C14CF18 for <>; Thu, 19 Jan 2023 09:36:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id bsbmLhmNfKpZ for <>; Thu, 19 Jan 2023 09:36:36 -0800 (PST)
Received: from ( [IPv6:2a00:1450:4864:20::42c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPS id C8D7BC14CF0D for <>; Thu, 19 Jan 2023 09:36:36 -0800 (PST)
Received: by with SMTP id t5so2613969wrq.1 for <>; Thu, 19 Jan 2023 09:36:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=1RrkpHzsmofaCJ7Xx+OfdRE3DoOYycKmt+0opP6bMvU=; b=Sw8rkUlx0TNYnC/uDKm+h9K8yQsAuQUUH3vPnDJbEwScqFTahk6oPITHmf7OR1kovH fraiGGZ2M+nYHenjrwz1zBTluC7L9uo6/Fj9zrmW2VzqQK9LkNQeCpOpgeJsKkrdaY7x h9W+4VN9HZAhGMqI/500nua8ncnv1//ao2lJT27b3x8GqlR2Vzk3qasM/egMNAzIISpX CpqJDyzLG72pIp4gJEdK2KZ1285zkfdtT5CrD58pfjQFz7ozV2RdHkdcZwikDrwoN1g0 t/5berdMdEr7RNLiHZcKsgnDZM58YeEfCIW77GrUZdBHHArdox/LpF9JV9HR5YklCipH ZFtg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1RrkpHzsmofaCJ7Xx+OfdRE3DoOYycKmt+0opP6bMvU=; b=WwOXfLlFM9l4tvYvxGVPXWGp4B9CXh+J+NngJkOYhOYMscwdoVS3FH07KHxafocCqN JCDOPKNS2iPQ9qht1fojDNFrJh+BAgVW7cP3gIeGjhkHw8m18OujLbOtr3qSKk69lrqq CRsdvyksUsbYqslxT0OnniuecPg+yAL+H7ZpJuo8UXkzWodxSguUKwdTjcVtM5xW+3v3 0MsMV9PDeQpeRJVPrAzKBbgSDGuWmvTg0Ria+aiIn/g8mkyvRm6kuR19oUbcffas/xF7 75bwGa2HfkZYm0xnOPWpSUsCYJmDtLcWJnUcqO2MnAw4pmHYf6YhzK2l7pXzLCtQRKAH OUWQ==
X-Gm-Message-State: AFqh2kpRimwX/rSxq9eYKi7pD/6orFohkx/FggkgKE2v578LfNIwC32L RFDFYu9MPJz9RBhSSs4Bcr6v+7Zm
X-Google-Smtp-Source: AMrXdXtrMZFiSv6/hD5A3YVccTU3YTLl3LIHA0Vsky6ssJhJTBwBMFEmHStGi7SL27mth8tj7sxHng==
X-Received: by 2002:a05:6000:1a8f:b0:2be:3fa7:ab4e with SMTP id f15-20020a0560001a8f00b002be3fa7ab4emr3517595wry.38.1674149794631; Thu, 19 Jan 2023 09:36:34 -0800 (PST)
Received: from ( []) by with ESMTPSA id u24-20020adfa198000000b002bc84c55758sm31821791wru.63.2023. (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Jan 2023 09:36:34 -0800 (PST)
From: Jay Daley <>
Message-Id: <>
Content-Type: multipart/signed; boundary="Apple-Mail=_0408F317-69E7-4A4B-9B59-2FEE438C454D"; protocol="application/pgp-signature"; micalg="pgp-sha256"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.\))
Date: Thu, 19 Jan 2023 17:36:32 +0000
In-Reply-To: <>
Cc: Kesara Rathnayake <>,, tools-discuss <>
To: Marc Petit-Huguenin <>
References: <> <> <> <>
X-Mailer: Apple Mail (2.3696.
Archived-At: <>
Subject: Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 19 Jan 2023 17:36:41 -0000

> On 19 Jan 2023, at 17:20, Marc Petit-Huguenin <> wrote:
> On 1/19/23 08:01, Jay Daley wrote:
>>> On 19 Jan 2023, at 15:41, Marc Petit-Huguenin <> wrote:
>>> Signed PGP part
>>> On 1/18/23 14:09, Kesara Rathnayake wrote:
>>>> See for
>>>> release details.
>>>> New changes include,
>>>> * Permit non-ASCII within <t> without the use of <u>.
>>> Isn't an unconditional use of non-ASCII a violation of RFC 7997?
>> Section 3.4 says:
>>   When the mention of non-ASCII characters is required for correct
>>   protocol operation and understanding, the characters' Unicode code
>>   points must be used in the text.  The addition of each character name
>>   is encouraged.
>>   o  Non-ASCII characters will require identifying the Unicode code
>>       point.
>>   o  Use of the actual UTF-8 character (e.g., (See PDF for non-ASCII
>>       character string)) is encouraged so that a reader can more easily
>>       see what the character is, if their device can render the text.
>>   o  The use of the Unicode character names like "INCREMENT" in
>>       addition to the use of Unicode code points is also encouraged.
>>       When used, Unicode character names should be in all capital
>>       letters.
>> <u> is a convenient way of ensuring that this happens because it is recognised by xml2rfc and processed in line with those bullets above.  However, note that the text says "is required for correct protocol operation" and that does not cover such usage an example where the specific character chosen for that example doesn’t matter (e.g. when demonstrating output using RTL script).  Under such circumstances non-ASCII characters should be allowed without the adornment listed above.
>> The previous implementation of <u> (which btw was added after RFC 7991 and so never had consensus) requires a <u> for *all* non-ASCII characters and so exceeded the requirement of 3.4 above.  This change now allows non-ASCII to be used without a <u> being enforced automatically but it does not mean that 3.4 will be ignored for RFCs.  <u> will still be required for RFCs to follow the principle of "required for correct protocol operation" and it will for the RPC, authors and stream owners to work that out.
> RFC 7997 clearly says that Unicode CANNOT be used unless for a finite list of cases:
> 1. Purely part of an example (3.1)
> 2. for English words imported from foreign languages, with the strict constraints that they are defined in the Merriam-Webster dictionary (3.1).
> 3. person or Organization name (3.2, 3.3)
> 4. when the Unicode character is described, instead of being used (3.4)
> 5. in a table
> 6. in code
> 7. in a bibliographic item
> 8. in address information
> The modification above is clearly not restricted to these cases.
> I notice that the xml2rfc language already contains some elements that are can be used into enforcing these cases.  When missing, new elements could be added:
> (1) An <artwork> element can contain Unicode
> (2) a new element (as <t> content) can mark word that can contain Unicode.  Xml2rfc can then extract them and check that they are valid English words
> (3) <contact> can contain Unicode
> (4) <u>, used to describe a Unicode character, can contains Unicode
> (5) a <tr> element can contain Unicode
> (6) a <sourcecode> element can contain Unicode
> (7) a <reference> can contain Unicode (not just the organization/address)
> (8) The <address> and <organization> elements can contain Unicode.
> Doing that and documenting it in the next revision of RFC 7991 seems the sensible thing to do.
> But unconditionally letting everyone adding Unicode characters willy-nilly looks to me as a way to, at some point in the future, being able to say that we have no other choices than officially authorizing Unicode everywhere because there is already too many legacy RFCs doing that (a well known tactic to work around standards).

While a tool can be used to enforce policy, it is not the only way and in some cases it is not the best way.

For example, RFC 7332 (the RFC Style Guide) says in section 3.1 "The RFC publication language is English".  Nobody would suggest that xml2rfc checks every word to determine if it is English or not and errors if it find one that isn’t, because we all know that this policy is best enforced by people at the appropriate stages.  That’s what’s happening here - compliance with 7997 is now with the RPC editors.  So, no there will not be a set of legacy RFCs with non-ASCII used incorrectly that can then be used to reverse engineer a policy change.


> --
> Marc Petit-Huguenin
> Email: <>
> Blog: <>
> Profile: <>
Jay Daley
IETF Executive Director