Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt

"Manger, James" <James.H.Manger@team.telstra.com> Mon, 18 September 2023 14:05 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1262CC15198B; Mon, 18 Sep 2023 07:05:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.008
X-Spam-Level:
X-Spam-Status: No, score=-2.008 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=team.telstra.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RHgHU-sUi3QI; Mon, 18 Sep 2023 07:05:14 -0700 (PDT)
Received: from AUS01-SY4-obe.outbound.protection.outlook.com (mail-sy4aus01on2104.outbound.protection.outlook.com [40.107.107.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD306C14F74A; Mon, 18 Sep 2023 07:05:08 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=luiIZbviTDoKU1+Iewa+KSTpfJ3zPCJcr37ssF9XJtUyUBiInwLIZu0IvTKzf73bgveo1krdwjVsKAuPzFstCY56w2HcGJcbkxWprnr/0u2H6iHNZ0GTWXykL9oDiBvA8CnJ8RQWEE0xx/dg8/kgr9J6FFX0z/VNkFFdOmliMb603snCaonpaQ9IkYN0tbUkyjeTLzZKYSjVIiLU+OTLZ1xp6THX6l+aIcCHT4Qj0XkYcLmINePceEBXx8siqrRI80rIsAVNqhFny5RjvtGATIKfnQmPfd32vSQwr0ZSRWnCXQkSG3UBizGjrdhGJiW0MkblntSBQUG28TdrGyFlyQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jr95r09EXJ31CyXhcWzzIRtO469cAjfO+5SZQhpqJuY=; b=hJBHJQntitPynNUPXG8+L6avzhmsO9SLLnvgI8Ok7rsbFLNy8BYyxIZGg0dMd/lZMnX317YY710F5ZTVgdf4JYcj7PrPB2exrIukAeAqFRwy+qDVGrqf2y3f0wPavnl2GYMqFH/z6EPyOK6cZV0dUXDDBGFm4sBTUXbDgfOir/tuSxe9SVzcrnWxde6tKrFYQbpdUpiAbZJZDh15Lj4efFJ3OCYEX+iyIelVX5FjqN7zQxMqyTMpib/Tg33ZiC1XzSQERjuIeZPWAwa4vsplykvTTcexsSZydDnh6lEBMGxVRFCLj7ZrTIKRezklpPbivPfcdqhzii+ikSlDqcJ3+Q==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=team.telstra.com; dmarc=pass action=none header.from=team.telstra.com; dkim=pass header.d=team.telstra.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=team.telstra.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jr95r09EXJ31CyXhcWzzIRtO469cAjfO+5SZQhpqJuY=; b=WcC2z86+ozznUv+kLjOhHdEQgGILnFa0KZbRdDWkMgGCzDmqVsjG5MBpDHRvw88dMF8abjWMjC1S6xC+o9qBXvmPWK0xyIpScgUT1yX8dD4cTOU6C36vl1KACwjrbqk8hQBrwNYLm3SIOk03ANBx3NDMsEpX0Cl5ppMjQqODIMs=
Received: from SY4PR01MB5980.ausprd01.prod.outlook.com (2603:10c6:10:f7::9) by ME3PR01MB6966.ausprd01.prod.outlook.com (2603:10c6:220:163::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.26; Mon, 18 Sep 2023 14:05:05 +0000
Received: from SY4PR01MB5980.ausprd01.prod.outlook.com ([fe80::9e6a:b9cb:3e87:b02a]) by SY4PR01MB5980.ausprd01.prod.outlook.com ([fe80::9e6a:b9cb:3e87:b02a%4]) with mapi id 15.20.6792.026; Mon, 18 Sep 2023 14:05:05 +0000
From: "Manger, James" <James.H.Manger@team.telstra.com>
To: Tim Bray <tbray@textuality.com>, ART Area <art@ietf.org>, "i18ndir@ietf.org" <i18ndir@ietf.org>
Thread-Topic: [art] Fwd: New Version Notification for draft-bray-unichars-04.txt
Thread-Index: AQHZ5/zsmS2wpxwnEkKNM3oLn0BLIbAgRFye
Date: Mon, 18 Sep 2023 14:05:05 +0000
Message-ID: <SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBA@SY4PR01MB5980.ausprd01.prod.outlook.com>
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com>
In-Reply-To: <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com>
Accept-Language: en-AU, en-US
Content-Language: en-AU
X-Hashtags: #NewslettersPlus
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Enabled=True; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SiteId=49dfc6a3-5fb7-49f4-adea-c54e725bb854; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SetDate=2023-09-18T08:28:57.3832938Z; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_ContentBits=0; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Method=Standard
authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=team.telstra.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: SY4PR01MB5980:EE_|ME3PR01MB6966:EE_
x-ms-office365-filtering-correlation-id: 3b165419-07eb-467b-af84-08dbb8504272
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: +RhSIz3oNQoXMkNTAT2R6UT/qK2YHhV4gRQqExW66Gp3Q9MA4TMpqylgQ940JKIaUpMjRjnwW34c7YFVjsI+ed2r5zKuvv4DffuE3xIIvEGjPc8UvV7g7a5xJjeQN+KiNrRCs0Xed0U0XwK8RJJ5SZozx4raddovgr1tBnjY/EPCRzbRb3yuGev8krVpFOrJ+pIpv6Pc84LotGOYSfy6yKPOyNJBkj5xDY7QDzpFaJBWSf5V9n3rvtswP50cUIz1v/S4cT2hPaXo6xx9DgklTBibs8PGxMVPh2zsnDoVNWahhjR3s7HF1LhxpMz29zatX8yl9mWdf/Khq8hjcNtU38Y8Y9IJdiIwc7tzq3f3GMSSkKUfWVmYYaH0nN1UvGSJiQrXgoSbngGLXu4nkKpH79Lsy6+3y+EO4Apa7hj0Vh33QkqfL2LRvn5Z1s/7FuiKtI+MKsd6CKQx6OCQe2B14AtYdu1n4w8NP1hG+WjBgc9I5wj70dhPzplpZEed1S8vRyjgOimf2rhxRUJy0Hs0THSUey47slW1lVeVQZYUcWYlMo7dLU/1qYxEiyLw2khxCRhW8g7MTh/NeMo9o1AOZLE31owhNyHxwgUYpAIzQ1U=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SY4PR01MB5980.ausprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(346002)(39860400002)(366004)(376002)(136003)(396003)(186009)(451199024)(1800799009)(66556008)(55016003)(2906002)(82960400001)(33656002)(38100700002)(5660300002)(38070700005)(21615005)(86362001)(8936002)(8676002)(166002)(122000001)(316002)(41300700001)(76116006)(66946007)(66476007)(64756008)(66446008)(478600001)(71200400001)(110136005)(52536014)(6506007)(9686003)(7696005); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: CYtaKEf39XakCyp36Fhs4FUeoEtkgOjaiWD4AF5HrcC8pejud2h49qUOyMPcV8Q5wIE76pQhkleFuE/uqWYRmGSkEJyUjAyGgXAcUnSDQyh8Bf18tB8PDGvDyuHqpyYCGqAC65DuxGyttX/BuFmdxDPbHH3b/L6txcx0Er1zgCgicbPQeilfH9Cz+fWvsTgNiwkkkP3DePivZUnSCV3Z9CRjYSpur2nFu5B6LWD3D0MZoz07baDZa/6VWc7SP6HUi5lHA1W+i8N476jMgIkE6FwOhCKvAUHQbdvp1IUu1Y3gK+ZgRYijiLJBV7H5EhlAE0cUweoDkqDD/3aSF4QqnkEdX2HlHgfazPpLB0EQ3IiMqz5kzSK26AWvrdrw1Ai87EUDlZdKz5r8vLaBEfYehnyzjZnbPHBDfhVVP1SvEGbY1YWjFKdaqZ4Cj1Yjm+VGBsg1wDaD3C3CBa+O+I+xk3goZ0KuM9AnADi/r4j65rLF26lpkkFP7yDSuwrqdECUL44DupRM2VIxml/0oi1RZJD5XyW6tRhLZ3PoUkfeQQoQSA5yWEyWn3MsSX1D66TIe3tH6Jt6+ieyGPcI6C5LZA5V7OyI1z2uLVxKfbjxnAtSxT1dnVFZWHRrqZBA8SW/VLhTr/kHIbOeH90s/+GNNsyJ7VXFAaU2eTBf6/xs0HhJXvmT4C3r1PPzRSaIJv8UDa2kAG5aB4OGgCKMx2/0Su8FFCXd6EbTn0SjaOst937Dnc5IjzU/KKw4kBcMKI5wAPPRwVp4N1+uWBTPALWtKnL1Cvnrt8mDlyKB6gRXXFT8m9K1QTjkb4kNSFxcuRITkTg9rk/x4xUsFAppeOF/6BMNteyxTGYGzkoCi7R3O33abxVkJTl2VEDybJA7AU+xaFBHO1xoT6q10Nx8stCcSnFpTBjugqu3YnNQZeDilYlOf+0dvU4F7O2y9p5WqAEkBvIdDmwGtSUEc16xpByat2WmwOlCHVdGVP8I4Ivn98jHInFWKvhUc5xulKq4UsEvYful6Kiw2scpQyb3TFhVNaTsLrYvrf62lVGNql5QfWS5wRlkt+qhd4zRwg2crZseLNJVjFFg1oBkIGSwEcfuqcBIPKcOfvYprLHD9ChL4QINNmgsajm20BdMe+/uWbKWT3sH4NpfHS+n4JJYi6LSOY57H7r1e3zwA47a3D5cNh8vXT6UpT1K7ZAjtTB+uEtjz7MiWJoZN0CgwzL5QaG2M8fEoF600iwCtDartNHurjardgfq1uFiKLHNVsiFqNnYaVywV25HvfXZtGsAjWM9Ko40VcrHB3wm7nv5uiWiAXhVR6amaOJtWQwn/5ZEqO8O/m78RPP6TCXvgiwG+ye+FSHsbhYmbheuzmVff7tVj8FZi4wN+wp3cTE32Lkk8tMK6A0/i4rS6jUBr1NE3JtCf4bbQUu7v1xk9Ql3e+oNg4/o2wkEhSwf8UgUZoel5P66Wpxv5YEmWW1L7tFao+lm6v/a+hcb6M7tuJJG3pvXb2UvKqo466ILxQPle73fSIaAa2yGbz+9d3e4teisylBk8OD1CDv7VZZsG9cCljcpFk6xg+A+0K0IkU+NAkS0Yne/iDSABXd+RyCPbw5o2MTziuoINBOWJFzX/iNwJxUlOjFj+12MNr5yQrxGyogERhrR6SuQ55nYU2Mc3lyoEUHf4v0aNpI2wlW96VULAXYtr2aZ0oQAxj0lU7k51Y3tZVCU
Content-Type: multipart/alternative; boundary="_000_SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBASY4PR01MB5980ausp_"
MIME-Version: 1.0
X-OriginatorOrg: team.telstra.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: SY4PR01MB5980.ausprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 3b165419-07eb-467b-af84-08dbb8504272
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Sep 2023 14:05:05.4241 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 49dfc6a3-5fb7-49f4-adea-c54e725bb854
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: OqtmlEk68q9QnCloGTNrUxe/ezwT/+JtUUsGudIJeQRn6C3e+N5pb11gVPr1O1XMYKldH24CfbHgXSF4mgRmhXs9Cf98pi715yO8K3mySnA=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: ME3PR01MB6966
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/JdcbTxlcPlUz57gK04QSh-W3B9w>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Sep 2023 14:05:19 -0000

I don’t think draft-bray-unichars-04<https://www.ietf.org/archive/id/draft-bray-unichars-04.html> should advance in the IETF.

Defining unicode-scalar-values, xml-chars, and useful-assignables are 3 helpful “subsets of Unicode characters” that can be used in protocols and data formats.
Defining unicode-code-points as though it is similar is a category error, however.

Section 3.1 and 3.2 deliberately make %x0-10FFFF and %x0-D7FF / %xE000-10FFFF looks like similar repertoires, which is too misleading.
UTF-8, UTF-16 and UTF-32 are only defined for the latter. There may be “obvious” extensions of UTF-8 (WTF-8) and UTF-32 that can cover %x0-10FFFF, but they are simply not widely supported in modern software, so those extensions are no use in an IETF standard. And the “obvious” extension to UTF-16 gives something that no longer works as you expect from a repertoire.

An implementation that accepts surrogates cannot distinguish a {high-surrogate, low-surrogate} pair from a non-BMP character. ECMA-404 is clear on this when it says, “whether a processor of JSON texts interprets such a surrogate pair (“\uD834\uDD1E”) as a single code point (U+1D11E) or as an explicit surrogate pair is a semantic decision that is determined by the specific processor”. That is totally unexpected from seeing %x0-10FFFF as a seemingly simple repertoire.

It makes sense for a spec to define:
  unicode-scalar-value = %x0-D7FF / %xE000-10FFFF
  string = *unicode-scalar-value

It does not make sense for a spec to define:
  unicode-code-point = %x0-10FFFF
  string = *unicode-code-point
because, for instance, %xD834 %xDD1E and %x1D11E are separate values of that ABNF grammar but will not be treated that way by implementations. In implementations they will be indistinguishable strings. An internal 16-bit format will store the same two 16-bit words for both. Only 1 form can come out.

For understandable reasons, JSON supports both *(%x0-D7FF / %xE000-10FFFF) and *(%x0-FFFF) (arbitrary 16-bit data) as models for the logical strings it can represent. An implementation can pick either. They don’t exactly overlap. There is probably a complicated ABNF that can cover both involving *(%x0-D7FF / %xE000-10FFFF)  and unpaired-surrogate, but it would be non-trivial (and not that practical). And that ABNF for the logical strings JSON can represent is different from the ABNF for JSON text itself, which excludes controls other than whitespace and has escape sequences. *(%x0-10FFFF) – as implied by 3.1 – doesn’t match any concept here.

The Unicode spec (chapter 2 General Structure<https://www.unicode.org/versions/Unicode15.0.0/ch02.pdf>) covers abstract characters, code points, codespace, code unit, encoding form, encoding scheme etc. draft-bray-unichars-04 is shorter and simpler, but the simplifications aren’t quite precise enough to add clarity.

P.S. The ABNF productions in draft-bray-unichars-04 should be singular not plural, eg unicode-scalar-value.

--
James Manger



General