Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt

"Manger, James" <James.H.Manger@team.telstra.com> Sat, 30 September 2023 09:15 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BDE6FC14CEE3; Sat, 30 Sep 2023 02:15:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.008
X-Spam-Level:
X-Spam-Status: No, score=-2.008 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=team.telstra.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 80ulW77eiA3v; Sat, 30 Sep 2023 02:15:07 -0700 (PDT)
Received: from AUS01-ME3-obe.outbound.protection.outlook.com (mail-me3aus01on2117.outbound.protection.outlook.com [40.107.108.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 48759C15108B; Sat, 30 Sep 2023 02:15:05 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZRyqxtv8dByiLM5RWWHc9HNd39gSKR++Hn5DvO+LXI+iHCApFpdnivJr4uT9H3mEcXmr/jfiPXEDzNpZMQWoxNjEc7kSmdfB0sVYvUQkCT+DmhkKbb12uLuJtozn4ZbcQvnNXo+Mc2Uki7H961uitIq3S6y3ilxnW+Ehbs6q7wI/MLwpLvbhiLOY/d2hRjzGfc5AuVX5lMoFlnrVIFdlCZZBFJFcIAm4M9BkmtezguUD0YZN88SJeIN/O7xUg5UocZHfCIyVO9dpfw1RCEqJGBewe9v7FP/+sZJv7yFQAskCFzPjTd0rm7mMrEhW/MBqvfeplzgcoexYc6lW6jms5g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=13QLPA6xANwB8Gt13AdjtQ0JOHqgMNd1TaO5RYQirIE=; b=es5XPu7w5sbWwHHyafo/b9YP+ZlS/ddD39ouGRJEpbs5ZLcOmhUia7VZy5tHuCz4cQGCsJmfcD1mz3YR+Zt7s5LGlFFuMDhSVkPmBdol8xK9mBb5FmjZD6gtnvYilb131s/rfFzQeq2bqurKtH4opqEFVwjd6uXr/PsNPsYsjX45LF5mjejU4h99vxQqYCfuxzQXIlmZqq3BZbGJeddCgvsQ0zLH4EVkHg2fHltvosd2lbNNTknk3Gi+2mgeGfQJEcOe9V+mAcvOu1UCmO34xeaZDvXYyVd2P7WxW8CmWyboG9/kZJyy9m+dGeHXWvMVE2YP56uOBKQZErk5JV1r6g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=team.telstra.com; dmarc=pass action=none header.from=team.telstra.com; dkim=pass header.d=team.telstra.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=team.telstra.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=13QLPA6xANwB8Gt13AdjtQ0JOHqgMNd1TaO5RYQirIE=; b=UbC8EWm1XkT86nD4cyWaM/PHkkD8jKjH6ykLFe0+PXyePpmoFRFrflth4EPGDm38Uc71kAEz/dkFOCXmzJsqbp5ZeATFjaL+CkW9ic4oEGPAK2tfeRUEq4Dd057gMKy60HTqitAGdgpmQMI3fRQ26ZSlsJsVbftg16pvsC284NM=
Received: from SY4PR01MB5980.ausprd01.prod.outlook.com (2603:10c6:10:f7::9) by SY7PR01MB8224.ausprd01.prod.outlook.com (2603:10c6:10:1eb::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.28; Sat, 30 Sep 2023 09:15:02 +0000
Received: from SY4PR01MB5980.ausprd01.prod.outlook.com ([fe80::9cc9:656:a953:176b]) by SY4PR01MB5980.ausprd01.prod.outlook.com ([fe80::9cc9:656:a953:176b%3]) with mapi id 15.20.6838.028; Sat, 30 Sep 2023 09:15:01 +0000
From: "Manger, James" <James.H.Manger@team.telstra.com>
To: Tim Bray <tbray@textuality.com>, "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>
Thread-Topic: [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
Thread-Index: AQHZ79CDx7kIJf8SrkaIgd4ICEsz7rAyr4kz
Date: Sat, 30 Sep 2023 09:15:01 +0000
Message-ID: <SYBPR01MB59814B3448F5754AAEDA1740E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com>
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com>
In-Reply-To: <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com>
Accept-Language: en-AU, en-US
Content-Language: en-AU
X-Hashtags: #NewslettersPlus
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Enabled=True; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SiteId=49dfc6a3-5fb7-49f4-adea-c54e725bb854; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SetDate=2023-09-30T02:41:15.6355222Z; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_ContentBits=0; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Method=Standard
authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=team.telstra.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: SY4PR01MB5980:EE_|SY7PR01MB8224:EE_
x-ms-office365-filtering-correlation-id: 738df3dd-f011-42b5-8295-08dbc195b9fe
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: fxTEV6SMQy2rPtMb5ZTU555p/lxAvuSLF0NK2WV3pK3s0TAsPNSpNRdZZO6g8Xf0wc7kDi6qkvhRObep9fpz986LUwocl+JQGvSWfwcsA6GbNHCnSyTZVzYHgAe3lSFDCRgR43zxLtQuSSF9+8B2DoyIM2EcehV3EyYX0e9Q4MQpbjS1Mso6BNF0kBJawanwqs0pHiXd1kLa9KXZddyEYz5mx6zAfSrnev10ofuUPlFSIKXu5t16ZVX8RmyQdeHewzqLuLcAT3Hv8UbycsTCgcTNdIkkA7ZbhmTUCGt78K7agcMgz/IuVamqHaPRDPffWctHodLvZgjQBpPtwM5YKKEvu4OOd/w8tx87O0gOK8dHjQ/ZXDirFVMI8B4loXKEnzXf8beyVCsi4GprmkLuLPuxTc11hT1ZA2AkPNLbkfe5RoOVXIs3GILyFrPMyavVmYFgzCsgRnoD81BH9q8O7MvemUXs9PXF/zIQKmL+WXpTPNbPRUxZVMlA2ZITk9GNxVMpEDY4nWNW+Q5AzjZ6TuUEnPcAybfdAsV09ElqRvxC5c/hLc274fUS9hV0Cak3K0hUBtLaj0NCeC/yJV3NBTkCo8lYThxBNqjlGjA7qqo=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SY4PR01MB5980.ausprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(366004)(396003)(346002)(376002)(136003)(39850400004)(230922051799003)(451199024)(64100799003)(1800799009)(186009)(66556008)(66446008)(66946007)(64756008)(76116006)(66899024)(110136005)(316002)(66476007)(9686003)(6512007)(83380400001)(71200400001)(6506007)(6486002)(53546011)(166002)(82960400001)(122000001)(33656002)(38070700005)(86362001)(38100700002)(66574015)(966005)(478600001)(2906002)(15650500001)(8676002)(8936002)(41300700001)(52536014)(7066003)(21615005)(5660300002); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: tkNWad0SQk7evd9JnZKezgSoa2Z3zZfVNNd0Evun7WT+p6M0arSQsAeA+826eTzoifYe+6Ef1ZNK1VfXL3BS740RnDkPNxCwYumytylswvzRWiL9khpjmHAlvYpro8LlX4+i5H2t6n6UWQqY1KAJw5NyJHu4PNG2h6uFtWmQG8jhCf3coNPJcag8+2uhgU7QeVA+TzOdBaiP0/vNJPSkiOcgB+KxfpeacrAuV/O3E57hNO/q0BsWBhKfi2OHJajJjdor6uc60c4jBfEwKTInQUHhpsDNw7cTAfqLA7zgbNBgRfVZizs9hpeQFxjluU9IYj6sCqe5UY9447YxMS7Xo/ID610CLWP1qOYWgr9CL6yIL0vt5sEUm9Yd1qDUVqL2BxhDbo9Z56ANxcn1NV7lg2wjvgJZSHMbJVdMfaGmARpdE03d+NJ+tguP88QDzMtHR39MSROMCY6JdXUHvIIKSzslU++2gHlTqv2Rbrmmwqz9vwkgnN85d1lMzaZDU0C62US6ZqCdr+t8KBDqCatPHje/mF8qkDKd1fMv67Az/csuic4FoS0ShMsU1GL9cv2QwiYMRU817Rhgkgs/k7D28rq96UrBjapG//JswwKtRVoo8RiiJ5eAoGXmpVgNEGGJQJzMoKr/dJMjZdiWxA+AdRIiwiV/u8thVucPPlrLDe/MzlQ2MuIfWMR/+8YiT8VouoRq0i3IFsnMKvgdrCDoQdpHjEdMj0VdpTddww2oJd+9Wfkk5iMKA3S+qzXjV/xCyGAjF4O29AE7cgIbYaBnvMRgsBBwaFiRWx61STr5laCX6mLRvy2dvyxqfQDaBQpgfyI0kjMrHYEmQMPUyFKkrgZdVt7a4sgOo3Q+HPsp6ZlI11/RMIqnygCuDciexMZT00t8xUFcWdPHlaE/yaxmMd7m4iTNuyXetGRjXdoWofwi4gn0cVnFhMv20tL88phgrvqBb8g9E5I2PaArfDR+4YWnqPAlkQdz5nfhFtTKUIDr3xsAiq/N979Ldyx/2m3bzKl1fYp+ir+ZkK2q8wowR0mQeSU+QlQ/kJYTWaMPFX8qUHFvDN1V8VznOzaIpenTFek5Mw5BXhTqbEEGdVfrIKzMFcsDkGKxK9javn+xAUE+BbvNexuPVZrkmJQhneSLpD8JAZQXNFD2XLYVr+LOcDcOY2GNejtuVNjPetSo5BSzb9qUdQ0agZiNYORYL3SSuXNEm4HO7YMEe44rgJwJtc0tscg/1aiACvPJrifI2b/Hnru27zIkwfkDlTpxWM2zbsIlRehoUlCwVuTQcyS4ViPb7Pwla4kl7FsU6LtwpXrSBYl0J2MkLyEkRc1ASf9Ohds0vTCg2rekDrat2BZxvzKFvyzfJUduaYO1Ns1DxZoyGcrTa9NV0YTlkwr925SSX5PBtHTTWNhjH9J4xVcRbtw4AUo5eG832VweN9IFpl8R8FY7MTWRRjPorvQDf58ziLLVhqdG1Ioz5ifrUl1xMfAjChge6UOq61MycVnlLYIITYi9P+ldYVBvAhzP0WMLE/5ZyW9kiJzslnmRfhCMjOEb3Rbug97KQ+vkLVrd/rBHYStGYDtnKAwzlzNjiygEF105/f1CQBsxY2STZZjBIvR7Ia65ZixuAzYtbpzDDexxGNJWPVihpQyUBpyIBR/NAiFtw966lkPoDuQ5D/9s/M6tK9Y8KtzqRjOFT5I1aU2uMhtK58YjAUN0xa5gNf27
Content-Type: multipart/alternative; boundary="_000_SYBPR01MB59814B3448F5754AAEDA1740E5C7ASYBPR01MB5981ausp_"
MIME-Version: 1.0
X-OriginatorOrg: team.telstra.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: SY4PR01MB5980.ausprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 738df3dd-f011-42b5-8295-08dbc195b9fe
X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Sep 2023 09:15:01.7233 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 49dfc6a3-5fb7-49f4-adea-c54e725bb854
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: iSjfgXXuRzcpN54RLdbAFkQZeSMsT5P+xE88rYw8kOY1L+Pt8rOdBjAjNQP462bRtF+ql0vlWHQi4s7g+9Zh8mxAYGjcie3cQ4wOh9/g1to=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SY7PR01MB8224
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/mcR9UZjI5lS91nGQqDIJ4otc5V4>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Sep 2023 09:15:11 -0000

Comments on draft-bray-unichars-06<https://datatracker.ietf.org/doc/html/draft-bray-unichars>.

§2 “Characters and Code Points” should start with scalars; they are far more important than code points. Suggested replacement text.

[UNICODE<http://www.unicode.org/versions/latest/>] defines the 1,081,344 integers in the ranges 0 to D7FF16 and E00016 to 10FFFF16 as "Unicode scalars". Every character is assigned to one scalar. As of Unicode 15.1 (2023), 149,813 characters have been assigned, leaving 931,531 scalars available for assignment in future versions.

unicode-scalar = %x0-D7FF / %xE000-10FFFF

Scalars are the complete set of values that can be uniquely represented in all 3 Unicode encoding forms – UTF-8, UTF-16, and UTF-32 – which use 8-bit, 16-bit and 32-bit code units respectively. So scalars are the repertoire that works for representing characters in memory, storage, and in network protocols regardless of choices to use 8, 16 or 32-bit words.

I’d rename the section to be “Characters and scalars”.

Drop §2.1 “Transformation formats”.

§2.2 “Problematic Code Point Types” could be renamed “Problematic characters”.

BOM should be considered a problematic character (and excluded from unicode-assignables) as it is used as an encoding-layer signal.

Surrogates are handled at the encoding layer, not the character layer, so drop §2.2.1 “Surrogates”. I suggest a new §2.3 “Ill-formed encodings” to replace §2.2.1 and most of §3.

2.3. Ill-formed encodings

A sequence of 8-bit, 16-bit or 32-bit code units representing scalars is a well-formed UTF-8, UTF-16, or UTF-32 encoding respectively. However, there are other code unit sequences in each of these 3 encodings that don’t map to scalars (eg C016 8016 in UTF-8; D80016 in UTF-16; 20FFFF16 in UTF-32). Such sequences are call ill-formed. They can exist in practice. Reasonable options when interpreting such code unit sequences are signalling an error or treating them as "�" (U+FFFD, REPLACEMENT CHARACTER). Silently ignoring ill-formed code unit sequences is a known security risk.

Drop §3 “Dealing With Problematic Code Points”.

Typo: \U0089 should be \u0089.

Typo: RFC19413 should be RFC9413.

I’d define unicode-scalar in §2 so we don’t need §4.1 “Unicode Scalars”. §4 can say:

Specifications can refer to these by the names “Unicode scalars” (section 2), “XML Characters”, and “Unicode Assignables”.

I-JSON can’t be used as an example using unicode-scalar as it explicitly excludes noncharacters; and the difference between the repertoire for the JSON vs the repertoire for the logical string that can be represented by a JSON string is not explained.

i-json-value-repertoire = %x9 / %xA / %xD / %x20-D7FF / %xE000-FFFD / %x10000-1FFFD / %x20000-2FFFD / … / %x100000-10FFFD

i-json-logical-string-repertoire = %x0-D7FF / %xE000-FFFD / %x10000-1FFFD / %x20000-2FFFD / … / %x100000-10FFFD
--
James Manger




General

From: art <art-bounces@ietf.org> on behalf of Tim Bray <tbray@textuality.com>
Date: Tuesday, 26 September 2023 at 2:51 am
To: i18ndir@ietf.org <i18ndir@ietf.org>, ART Area <art@ietf.org>
Subject: [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
[External Email] This email was sent from outside the organisation – be cautious, particularly with links and attachments.
What’s new and different here.


  1.  Locked down definition of “problematic”
  2.  Locked down definition of “character repertoire”
  3.  Changed “Useful Assignables” to “Unicode Assignables” (checked with Asmus first)

A new version of Internet-Draft draft-bray-unichars-06.txt has been
successfully submitted by Paul Hoffman and posted to the
IETF repository.

Name:     draft-bray-unichars
Revision: 06
Title:    Unicode Character Repertoire Subsets
Date:     2023-09-25
Group:    Individual Submission
Pages:    10
URL:      https://www.ietf.org/archive/id/draft-bray-unichars-06.txt
Status:   https://datatracker.ietf.org/doc/draft-bray-unichars/
HTML:     https://www.ietf.org/archive/id/draft-bray-unichars-06.html
HTMLized: https://datatracker.ietf.org/doc/html/draft-bray-unichars
Diff:     https://author-tools.ietf.org/iddiff?url2=draft-bray-unichars-06

Abstract:

  This document discusses specifying subsets of the Unicode character
  repertoire for use in protocols and data formats.



The IETF Secretariat