Re: [xml2rfc] [Tools-discuss] [Rfc-markdown] New xml2rfc release: v3.16.0

John C Klensin <> Thu, 19 January 2023 22:49 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 424FAC14F721; Thu, 19 Jan 2023 14:49:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id sHe0CTPn-umz; Thu, 19 Jan 2023 14:49:36 -0800 (PST)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B6393C14F726; Thu, 19 Jan 2023 14:49:35 -0800 (PST)
Received: from [] (helo=PSB) by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1pIdit-000DkI-IU; Thu, 19 Jan 2023 17:49:31 -0500
Date: Thu, 19 Jan 2023 17:49:24 -0500
From: John C Klensin <>
To: Carsten Bormann <>, Marc Petit-Huguenin <>
cc:, tools-discuss <>
Message-ID: <0208735A97AFC7D999F13FD7@PSB>
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Archived-At: <>
Subject: Re: [xml2rfc] [Tools-discuss] [Rfc-markdown] New xml2rfc release: v3.16.0
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 19 Jan 2023 22:49:41 -0000

--On Thursday, January 19, 2023 22:01 +0100 Carsten Bormann
<> wrote:

> On 2023-01-19, at 21:46, Marc Petit-Huguenin
> <> wrote:
>> Because what I find to be problematic is allowing Unicode
>> everywhere.
> We already allow Unicode everywhere (all that ASCII is
> Unicode, too).
> It's only that some of the Unicode characters are shunned in
> certain contexts (and these aren't even exactly the
> non-ASCII characters).
> This discussion would be easier to take serious if we could at
> least get the terminology right.


I'm feeling forced into the uncomfortable role of picker of
nits.  However, strictly speaking, "all that ASCII is Unicode,
too", is not strictly true either.  For starters, ASCII is a
seven-bit encoding, not an eight bit or longer one.  It is
certainly the case that the Unicode characters from U+0021
through U+007E correspond to the "printable" ASCII characters
designated in the ASCII standard as 2/1 through 7/14 and that,
with, IIR, slight differences in definitions, the so-called C0
controls plus SP (U+0020) and DEL (U+007F) correspondence as
well.  Expressed in Hex UTF-8, the ASCII characters have the
same numeric values as the corresponding Unicode characters if
the ASCII ones are represented as right-justified in octets (not
the only plausible or historically important representation).
But that does not make them the same, especially if UTF-8 is a
preference and not a requirement.  And, of course, in UTF-16 or
UTF-32 (both perfectly good Unicode representations), the
characters that corresponding to ASCII occupy 16 or 32 bits
respectively with many leading zero bits.

I now return you to the usual hair-splitting.