Re: [Alldispatch] Proposed discussion topic for ALLDISPATCH: draft-bray-unichars

Peter Saint-Andre <stpeter@stpeter.im> Tue, 12 March 2024 21:48 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: alldispatch@ietfa.amsl.com
Delivered-To: alldispatch@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E23F3C14F6AC for <alldispatch@ietfa.amsl.com>; Tue, 12 Mar 2024 14:48:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=stpeter.im header.b="BH/KqSB7"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="ceSoe5GU"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BLEgbr-8C5ZC for <alldispatch@ietfa.amsl.com>; Tue, 12 Mar 2024 14:48:27 -0700 (PDT)
Received: from wfhigh7-smtp.messagingengine.com (wfhigh7-smtp.messagingengine.com [64.147.123.158]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EE10CC14F690 for <alldispatch@ietf.org>; Tue, 12 Mar 2024 14:48:26 -0700 (PDT)
Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailfhigh.west.internal (Postfix) with ESMTP id 5463D1800086; Tue, 12 Mar 2024 17:48:23 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Tue, 12 Mar 2024 17:48:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stpeter.im; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:message-id:mime-version:reply-to:subject :subject:to:to; s=fm2; t=1710280102; x=1710366502; bh=ezQ/Sgbw82 GNVHAzieRlMPNinkIWwEO0a4TuI3/0kVs=; b=BH/KqSB7NZo+/NMX2Wi+7dghp3 mg3GdX3Zw5u40f7fUJuyK4FTT8IrIFo7lBNRWdhCVVogs8Az1rjWaRQ+OVi1j58k PQMWTqO/v+mPh8Wr8uyv5qbHH0HAdmckblOwSWI5BeW9H7Oy1z+73qaRzbcEVexq gxnrh91o2leYgzhzEiGUTbJ/lJsWx5X7+JDUwMECPSgxu3SB633uRGtSOIDraGWG uutGcTTkr3v8Dmstq3wQtLzewFaKsBhrnq21Cphnp77Leo+PnuhbXjJrYKOSUoFi ip8RPSiF/p799Pc2OytRCrQYDpKkQYZEGERqnHMubvzVOrgsBgrn836yX/Bw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1710280102; x=1710366502; bh=ezQ/Sgbw82GNVHAzieRlMPNinkIW wEO0a4TuI3/0kVs=; b=ceSoe5GUBOnYhMnqPehYaZnwKKqiKEVP4m80NV9Q1f0G ihXrSeAW3J7EVoyy0JFREpf3ViADuy2XOxkk6wZhupNGyLNsUu20SvOkzIcp6erv fvjpGCQqICnI+GgTKttBMaQkiPIEwjUMbY/sagVZycaO+VT6XAjjK1WMT+y6I+Np f6ryW5FnlkAKLT0QhTDQbTwsToxrOn9sZv0bG13X7lkWHBV3/xWLaGpTmkX/rmjv cNUoVtwYuCqA4J78YBJHbQ1y9KN0u9xN2iN4gkR0YYHyKbgpqRTfTgSWokwgnF+m xICtDbcg+Zk0UcRsKAL2MKVzZK6dCycmcuzuJlRUhg==
X-ME-Sender: <xms:ps3wZQkjq__9In8TEurC5MV3J4zirYluGDFziEaNXwfUbm_CGqhBvw> <xme:ps3wZf0cmv8ePP_3eqyoGhiMnzfX-0FR-dA3aUBlJtaXdfc2hu4Upy5yhWKhPpVyS WRbSQ44V4c3_e2kIA>
X-ME-Received: <xmr:ps3wZer9RWGNQstklWcGvs-7ODSCEKgkJM1lcgmk4Xv4H6wXMK4Kkw7ppD5w4xVc>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrjeefgdduhedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefkffggfgfvhffutgfgsehtkeertd dtvdejnecuhfhrohhmpefrvghtvghrucfurghinhhtqdetnhgurhgvuceoshhtphgvthgv rhesshhtphgvthgvrhdrihhmqeenucggtffrrghtthgvrhhnpeeiheefgeejtedvfffhve eivddtleekffehfeeuhefhgeejkeeffeevvdffjedtjeenucffohhmrghinhepihgvthhf rdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomh epshhtphgvthgvrhesshhtphgvthgvrhdrihhm
X-ME-Proxy: <xmx:ps3wZcnEiJGur2JGR_OyPQfnxwod2h_K_4e3uDSN_9vZ2qJN1187Eg> <xmx:ps3wZe0jmMubBaHvS4H7Ak9RFQBbjEcCRnvySD4ojivaMrzlb8dPPw> <xmx:ps3wZTvWcIVYhNrCTf2jLKmqZ14uOw4MfkjgfmY2cjHWNAGr_OCFjw> <xmx:ps3wZaXKhUzM05qkAH5gYtEQ-_Zy65JEPZV9WivlpmqHj4dt2tf3OQ> <xmx:ps3wZTyoycmYRYp6uuYq4cD9mtEkkLrVZW0WExANqhHUUoDJqwMdqZkcjdA>
Feedback-ID: i24394279:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Mar 2024 17:48:21 -0400 (EDT)
Message-ID: <c28adb0d-96ec-4d8a-8c8e-52e795fea009@stpeter.im>
Date: Tue, 12 Mar 2024 15:48:20 -0600
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: alldispatch@ietf.org
Content-Language: en-US
From: Peter Saint-Andre <stpeter@stpeter.im>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/alldispatch/jZVj1LVRBy-5K1iZCSeRNPMdDJk>
Subject: Re: [Alldispatch] Proposed discussion topic for ALLDISPATCH: draft-bray-unichars
X-BeenThere: alldispatch@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Alldispatch <alldispatch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/alldispatch>, <mailto:alldispatch-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/alldispatch/>
List-Post: <mailto:alldispatch@ietf.org>
List-Help: <mailto:alldispatch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/alldispatch>, <mailto:alldispatch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Mar 2024 21:48:32 -0000

I'm late to this party, so I can't reply in the existing thread. Sorry!

In a previous message [1] in the existing thread, Tim Bray made what to 
me is a surprising claim:

###

In the context of this discussion, perhaps the notable thing about the
Unichars draft is that it’s the first (AFAIK) attempt to seriously 
consider all the flavors of code points, with plenty of community input, 
and identify the ones that nobody should really ever use, and to provide 
ABNF to support the result of that research.

###

ABNF aside, I would posit that we have a long history of considering all 
the flavors of Unicode code points and identifying which ones should and 
should not be used for various purposes in IETF protocols. That was the 
whole point of the IETF's work on IDNA2003, IDNA2008, EAI, PRECIS, etc.

More specifically, in RFC 8264 (the PRECIS framework) we defined two 
string classes for general use in IETF protocols based on years of 
experience by various "customers" of Stringprep (see Appendix B of RFC 
6885 for a list of the existing protocols whose needs were considered in 
developing PRECIS):

1. IdentifierClass specifies a smaller subset of Unicode characters for 
use when safety and identifier stability are more important than 
expressiveness.

2. FreeformClass specifies a larger subset of Unicode characters for use 
when expressiveness is more important than safety and stability.

We also provided mechanisms for profiling those string classes into more 
restrictive subsets (to meet the needs of specific protocols), and for 
defining new string classes if the PRECIS classes don't meet people's needs.

In defining these classes the PRECIS framework (following on IDNA2008) 
considered over a dozen different categories of Unicode code points, 
based on the Unicode properties of those code points, in much greater 
detail than draft-bray-unichars does (see RFC 8264 for details).

Also surprising to me is that draft-bray-unichars does not discuss a 
number of thorny issues related to the use of Unicode code points in 
text fields, including but not limited to visually similar characters, 
directionality, string comparison, normalization, width mapping, and 
case mapping.

At the very least, it seems to me that draft-bray-unichars needs to make 
a strong argument for why we should consider the PRECIS string classes 
to be wrong, harmful, unusable, or obsolete.

Peter

[1] 
https://mailarchive.ietf.org/arch/msg/alldispatch/-OTmPk46W4wkIVvjCkB2IMfYElc/