Re: [xml2rfc] [irsg] character sets, was UPDATE regarding <u>

John Levine <> Sat, 04 March 2023 19:03 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0FB86C14CF09 for <>; Sat, 4 Mar 2023 11:03:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.848
X-Spam-Status: No, score=-6.848 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key) header.b="G4V/pJMC"; dkim=pass (2048-bit key) header.b="ypprY6Bf"
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Ir9g1vCrUzVI for <>; Sat, 4 Mar 2023 11:03:19 -0800 (PST)
Received: from ( [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPS id 36CB2C14F6EC for <>; Sat, 4 Mar 2023 11:03:18 -0800 (PST)
Received: (qmail 66729 invoked from network); 4 Mar 2023 19:03:17 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=simple;; h=date:message-id:from:to:cc:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:cleverness; s=10499.640395f5.k2303; bh=41ZUI69Ju1bD3IQ7r+XbHHTxhOvUavhZXZkULaMp2Nc=; b=G4V/pJMC1Ih19Kth246RP1IeznQ1lmfCnGtJo2BRG9b08lZsiQPAyw5R6M0hadmQwn4c1hZmUwJ7mLVrHEg33VPABDDWRncwHD55JR/DUSziAJxT/mmLtERX9Ma1nQWd/7roQJz2mrdBt8C3GW8eBjOGXbChXRG0tAJOtbtuI2LaIzA3pcE70MXNZJtkGIXntyus8rKrouJprvaNtd8X7BWujisIYyRokhpt/AGIJnSch0bbdWYdHDygn92UroYJ+w1izw2ieJ0muJSfrw9+lC9ycli6J4iWRxiF5wB2ApxWDF7/YxWwgfu9FMSCdWT6i2ZMO6gramqEwEl2N65JiA==
DKIM-Signature: v=1; a=rsa-sha256; c=simple;; h=date:message-id:from:to:cc:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:cleverness; s=10499.640395f5.k2303; bh=41ZUI69Ju1bD3IQ7r+XbHHTxhOvUavhZXZkULaMp2Nc=; b=ypprY6BfJ0f4xhg/L2B7BYJWHGmBYC5rgkBcTi8VUqYfZoV4TRfEVCBJZGej+rOPCWJDQqmjmae/oMDQe9Dmv4z2rVD7rvN7ZJH42ADWODw84BO/JYIDVBQVuI99A0Ztfu/bByUe8YGRgPXPHE+h0GKKBO1RzmcVqUhFzm+YKMwqaMz+8Qj/ERvaK/jv5kk/+brRqxpWe9AaBUw6/WeWpJ9aXvn6kCwrXsurL2Sj5A2mmkiGWd7rTIR/H+HxxvyoNlmIvIDYb015fn50a1cIGYZ0lXi/GQ/oIo6ZZMlqqR4bl4S+bv8YCxzH5kDQNmnMOExBgRpv09g1XO6K3a/t5g==
Received: from ary.qy ([IPv6:2001:470:1f07:1126::78:696d:6170]) by ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.3 ECDHE-RSA AES-256-GCM AEAD) via TCP6; 04 Mar 2023 19:03:16 -0000
Received: by ary.qy (Postfix, from userid 501) id 05346A51F3D2; Sat, 4 Mar 2023 14:03:15 -0500 (EST)
Date: Sat, 04 Mar 2023 14:03:15 -0500
Message-Id: <20230304190316.05346A51F3D2@ary.qy>
From: John Levine <>
In-Reply-To: <>
Organization: Taughannock Networks
X-Headerized: yes
Cleverness: minimal
Mime-Version: 1.0
Content-type: text/plain; charset="utf-8"
Content-transfer-encoding: 8bit
Archived-At: <>
Subject: Re: [xml2rfc] [irsg] character sets, was UPDATE regarding <u>
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 04 Mar 2023 19:03:24 -0000

It appears that Fred Baker  <> said:
>My name includes an umlaut, as yours does. “Juergens”. I might expect that this is reasonably common.

The Noto fonts include every alphabetic character used in every living
language that uses Latin, Greek, or Cyrillic alphabets.  That's not
the problem.

One issue is our policy about where any non-ASCII goes, but a separate
issue that Carsten has run into is exotic characters beyond the 3000
or so that are in the fonts we normally use.


>Sent using a machine that autocorrects in interesting ways...
>> On Mar 4, 2023, at 8:38 AM, Carsten Bormann <> wrote:
>> On 2023-03-04, at 16:46, John R Levine <> wrote:
>>> In any event, this reminds us that we need some discipline in what we allow beyond letters and punctuation.  Unicode does
>not make this any easier by providing so many different glyphs that look nearly or exactly the same.
>> Correct, except that the “allow” is a bit misplaced.  “Recommend”, “nudge authors towards”,  “consider good
>style” etc. would have worked better for me.
>> Anyway, that’s why there is now authoring support in kramdown-rfc for character repertoire diagnostics, initially with the
>tool “echars” (which doesn’t require actually using markdown).  
>> For those actually using markdown, eventually, I expect the yaml header to the markdown input to be able to carry a
>declaration of what non 10,32-126,160,8203,8209,8288 characters are actually desired in the input, so warnings can be emitted
>if the document isn’t staying inside those bounds.
>> Both of these would be helped by access to information about the current repertoire limitations of xml2rfc, which is why I
>initiated this subthread.