Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt

"Manger, James" <James.H.Manger@team.telstra.com> Sun, 01 October 2023 01:53 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 31A0CC15199A; Sat, 30 Sep 2023 18:53:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.009
X-Spam-Level:
X-Spam-Status: No, score=-2.009 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=team.telstra.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AyxZs-0MV15B; Sat, 30 Sep 2023 18:53:35 -0700 (PDT)
Received: from AUS01-ME3-obe.outbound.protection.outlook.com (mail-me3aus01on2117.outbound.protection.outlook.com [40.107.108.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AE9E9C14CE2B; Sat, 30 Sep 2023 18:53:34 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JBiNc8gI7gQPkxQuP+sM+SnXiff7Uub18vpLzTyYOWPF+8uVX4fC1CJ+7Rvr16D/WksPQT4UqC5RASOSlLoZKH7XlgD7j65qvo2xCVquQqQm1TNAvYd1Eq+Lgm+ufuDC2tf59hzssqSgjdZrAdk+zinXCMC/ixP9iNvR1+yi/JWLR0vrnNgTiImzOqB/Kxg22aHClHjet8KV2H0XtORQOQbPsNZqF3L3KMZRP36TcZGefJdLy1Suw4jxso6+gT0KWGzGweGTp198NOcvAW7e1XLdP2FCpagc9K81TWUaCqOl1bynwCZnl3m5Jj6Nej8zPUx5KorU2+GizvHnVSL7mQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DI9SqlLzl4vCajcaSU3fzEHuWxhB3yJILP6PEVMt9JM=; b=goTAIUr2lU/VFkh6qrIAhp4EgGFUmnDbAXdsapfeg2q3qgEyazWJszDpMEZk3aEqNhG0SPIwSlza0lBLsT9IkuqvFKyO1dFYjj0CUKIDpMMaWewnvflCKGctNP0RgWRh1qNNfjlbd/KKVfGUrOrI/4kDPqwX1hu5mlR9Ob+fSr7fK9bLeKD5jTU2POIaYACV0t4YMBzAAlH+r6QD8IokkOwEmF+Td8ZwxPJS1FYKkzC8pCbCOYAJHutRBRXujpJOZObx7I5rNDNn7HVqpsQhx/eavUqVjNSWAeroPXdxk7K7DoSSiUVqy5Vyixip2R9yLUCJ/RZ6EYGtq8CZDZVC0w==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=team.telstra.com; dmarc=pass action=none header.from=team.telstra.com; dkim=pass header.d=team.telstra.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=team.telstra.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DI9SqlLzl4vCajcaSU3fzEHuWxhB3yJILP6PEVMt9JM=; b=T2uxfoU/4NTszHMOu6dtRCtRmu10dCfCK98Uml2vAcXAmACElEZMLEaiWBqG/4TpPNfAFJyeLUXhQW3AW9uYHEu5XsA1bGcCwepwivFKkRYuuEJe9OuXMDfIGZEa56aRrPmm4Ip1TjAGU1Kc+MRiqpMXm4vFhksRAU5JPKpCo9U=
Received: from SYBPR01MB5981.ausprd01.prod.outlook.com (2603:10c6:10:9a::13) by ME3PR01MB8568.ausprd01.prod.outlook.com (2603:10c6:220:18b::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.26; Sun, 1 Oct 2023 01:53:29 +0000
Received: from SYBPR01MB5981.ausprd01.prod.outlook.com ([fe80::ab68:5f74:484f:7fe2]) by SYBPR01MB5981.ausprd01.prod.outlook.com ([fe80::ab68:5f74:484f:7fe2%7]) with mapi id 15.20.6838.029; Sun, 1 Oct 2023 01:53:29 +0000
From: "Manger, James" <James.H.Manger@team.telstra.com>
To: Tim Bray <tbray@textuality.com>
CC: "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>
Thread-Topic: [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
Thread-Index: AQHZ79CDx7kIJf8SrkaIgd4ICEsz7rAyr4kzgAE/8ACAACToMA==
Date: Sun, 01 Oct 2023 01:53:28 +0000
Message-ID: <SYBPR01MB59819A9F0BDD785F74EB2855E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com>
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <SYBPR01MB59814B3448F5754AAEDA1740E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com> <CAHBU6iueqtd5T1T-ciYUMWvmo8XqBQqO5LkWbdRaoXQzPYSQOQ@mail.gmail.com>
In-Reply-To: <CAHBU6iueqtd5T1T-ciYUMWvmo8XqBQqO5LkWbdRaoXQzPYSQOQ@mail.gmail.com>
Accept-Language: en-AU, en-US
Content-Language: en-AU
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Enabled=True; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SiteId=49dfc6a3-5fb7-49f4-adea-c54e725bb854; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SetDate=2023-09-30T23:58:26.8342450Z; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_ContentBits=0; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Method=Standard
authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=team.telstra.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: SYBPR01MB5981:EE_|ME3PR01MB8568:EE_
x-ms-office365-filtering-correlation-id: 0a51cbc4-b24c-43c2-30e1-08dbc2213578
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: YHoNXOfewavTQhf1xnZ/r58osz8dVBG130ZcMIxqrFlHPw0KBz5I7cTWIRywwN52yO+W2CXKVGIJLNgQYy1SMMMemzbGtAjJiaGnL/YzQqCblNTrQfgYQHHPFZjWFDT+VZeWsFMvDoDGQDt8ZDTAf8nTGVYGgQ+sIYwBTI4yk0oQUZkLw7SfLpzPbvvvqAN8NTgMwzq2I7WH/kFRq0pEWWQa1kEGUZyQqjmQACWI1AdmLU3Czi6BgOpfY4PCF0oTH7uB+ZRJ7cdvq8vzN3URFMwKjEbfSHk4L6UMbDxmKgSb5ZzagUZd67Jh1B45HjH2EffgCZkruat86ZMmAfPJ2ZwKopWD8eUMOwH+MVXtm3zaH6xCygHDoDtFcCulf51PWj5feI4m7OHSCmd41rzN/nhTHrNrGveMYEXB6mPVxuDbBPFYxpYOjzgojN+7QyEBLPBSzAZPpF+T0NBf+6/cwOzWmjz048QDHZGRUeZQNSzfadzrpaPw2PoziZHuDqh1VwuqM+LcN4IAH2GH5ldGIL4P1TIkBP8gCe09pN8GRNmcyqClBBQbd8b9jB3J02oGK1f5OQnyxSqBD/CPFRryOtO4D1s6fpvFAY3jqiWunjA=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SYBPR01MB5981.ausprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(396003)(136003)(366004)(346002)(376002)(39860400002)(230922051799003)(1800799009)(186009)(451199024)(64100799003)(55016003)(52536014)(8676002)(66899024)(7066003)(4326008)(8936002)(66556008)(66946007)(316002)(6916009)(41300700001)(66476007)(66446008)(21615005)(76116006)(54906003)(64756008)(5660300002)(38070700005)(38100700002)(66574015)(166002)(122000001)(2906002)(82960400001)(15650500001)(53546011)(71200400001)(9686003)(83380400001)(7696005)(6506007)(966005)(478600001)(86362001)(33656002); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: yZPQfPf60WQVxGzRj8b74f01X+OYkbvCKYN0LF7VldouS80Yu0QdjmXQgIOGZPabU+8/erXGzqdgSAADwXilA8vh2zY6WJnOwWMhh9eB83KWr/w7uOBJEdF0MfWvlADZRJtxnC9TaNhA1atjnGfv9V+wHR8tfBPJ5QwRo3G71cc4Dj3tlgUOaKbXzNM60RYz1XLWQi0gJBp53ao5vlqnCR9CVAHaS7fufHSTnh4DYawcYRFqAv02rQFvlUhPKUu3xh3nSpdyR4/6KJ55PPZ7k7pxLjNftdWjPQ4LUvbjT6fhk4cFNqweJ5TJwNrujparHxdNXbMQ8iFbh8WEO6/df6viTEjzytPTku23LjSv+vPJCFRw5QxBdlOCbSIo1/AdX/7wVLeLt205KoSxZMEF/45Dp8eZBzs6vTJsfW15VeE0nmMywDXDLzmuD/4HsG8eWVI6/rK/sZUK9/rv3sJZyI4/jto+SDzXAN4HkUvAMG5vC7l6fL2j4hANVed0zlQ4X64lvWpc3cVZpK0/djBg8gfkc7OiSb41YoZxfUUGWaU1FLN+9XRQJywXOKZLg69pBjRSotx36rc5auELKAqc4W+PkHxFRIiw0UaAmP2mUr41lLtceoGkneQk5cmATQqks4TRH75MD/AqvR1DzdfGOAZHddziu8DYo+ZB8EQna3fjhVFxetHDY4bTIHANROi0P/tyflsvkDVSd0+xgzlse19bWjUvebDvC7A9d9T2kflT55zJ1//uOzBMDPm3ONN8b/BKwn7F6KsG3vA9KkaZxpdeGEjthrnGoeDFpkFh7pXtcYDVZ3JtVJJqkWDjy8E+f/y1LZN7Ib6BTj3tlM2mFydn+4Na/E3P3Fz/S65el3BpaN32yrsIVHVrIwozf6IjlRFeRXROBoLkMLHLfmGI8+rzDJQ27Shilx09dDtzV/KmxWMZXpMd2M/uVjKlzkQgaIqKuEU50sXl8fnMVY4/NpkoQ8jfRKnQoXJbfnJIcfkxc+I/jTKp4+NQ7Nr3SJyyjNa3CKS+osdCnc0DoIsZDX53P22nJrYw9tIFlQu+tQ22V8g902CMSMmbGWNzzYbeJ+FXFwPLh6CFN+hSHNLveIaUd4kXhXIOEkwO6e5pdi5HlC7bWlVkDKo9t0jqz65sVoHBfYtU143w+dZx+6LObdWh7Ar0HOQYuQZ6bMBxM3vpPB4ol9iQVTde76FjhC31cXtmvR2Z0PxYeD4OnJyehCOnKdhih+IL7D4sdTycbCHG2tG8gIduVTuyOHcwmLgdQ5GJ16NGpJKEaN1u0Ua8lrsEErSqQEu5DCXBvVj0oLKcPRQ0spqzl1oYRbiL+S+24WQkHGfpIfisNDzZBRQY24IDPW0O7ocLHf5f686XMmmNRUuBhlYo+ioQrE+BCDIb3ud/9ditvUh/6kx952HU/kNObMloid9DRR+4yHnONPACjt9SElCu/4mlO3HU824HKj6ihkLTligbd39y0cbN4/scRq/SOd7IsLHMKHIzAH3QOKIQMYEyzUVXc3N3JdwF6143sazVvTID1aa0ggORH3069Kt37/iw+xezMUwdVdic5MNi+o3PNzlSh0wKW14DGUysiOdYwZYKg1kjo+e9hUFwJ7sXF2GM+RXa4quRVkMKUcClrikW9OorV0zSK2ZtdsRdDTOf8zVsf9m7a2FvThR7PmFndF9C+hy25nu/j/A6tCsjOariNb5BvXwje1Tf
Content-Type: multipart/alternative; boundary="_000_SYBPR01MB59819A9F0BDD785F74EB2855E5C7ASYBPR01MB5981ausp_"
MIME-Version: 1.0
X-OriginatorOrg: team.telstra.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: SYBPR01MB5981.ausprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 0a51cbc4-b24c-43c2-30e1-08dbc2213578
X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Oct 2023 01:53:28.9684 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 49dfc6a3-5fb7-49f4-adea-c54e725bb854
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: x6pT82s7KforPb5f3PwO/0ywX/zMhqJyctopdqxDtdKOpKITJSJ9j+Udq6Ch8rGApfb5+E38MNlDqlwFA8It99ZkJ3IPyotDVAodZzLtW2M=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: ME3PR01MB8568
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/LHF_dyZW7Wbv1gI4wSc7V7TrDfs>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Oct 2023 01:53:40 -0000

Why is U+D800 so much more crucial than U+20FFFF or an ill-formed UTF-8 byte sequence such as 0xC0 0x80?

There are lots of byte sequences that are ill-formed UTF-8. Some correspond to non-minimal-length encodings (eg C0 80 ~ U+0000); some to encoding surrogates (eg ED A0 80); some to encoding values greater than U+10FFFF (eg F7 BF BF BF ~ U+1FFFFF); some that just don’t fit UTF-8’s structure (eg FF). A robust system needs to cope with all of these (signalling an error or using a marker such as U+FFFD). draft-bray-unichars only covers surrogates. That’s poor. I feel it adds to the confusion about what is lurking in Unicode by conflating surrogates with abstract characters, whereas it is much clearer to group surrogates with other ill-formed code unit sequences.

> the explanation of the problem works better if you start from code points

I disagree. Problems include:

  1.  Arbitrary code unit sequences can be ill-formed
  2.  BOM
  3.  Noncharacters
  4.  Legacy controls
  5.  Tag chars and other deprecated format chars
  6.  NFC vs NFKC; canonical chars
  7.  Private use chars
  8.  BIDI
  9.  …
  10. Escaping

#1 is about encodings. #2 is about encodings and is a char. #3-7 are about char subsets. #10 is about a higher-level.
Code points don’t help explain any of these.

> motivate why scalars exist

Explaining the 1,081,344 size and the U+D800-U+DFFF gap would be interesting. I’d love to read about why that gap was chosen and not, say, U+F800-U+FFFF. My guess is that that was an unused (or little used) part of 16-bit charsets at the time. Scalars could have been defined as 0-1081343, but I guess that would mean more math when converting 16-bit code units to & from scalars. Stick it in an appendix, however, as it doesn’t help the docs primary purpose.

> I don’t think the discussion about scalars and UTF-8/16/32 is appropriate. UTF-16 and UTF-32 shouldn’t be used in protocols, the Internet has converged on UTF-8

Argh!
Sure, only use UTF-8 in protocols. But there is also memory & APIs where 16-bit code units (hence UTF-16) are still important. A set to work across UTF-8/16/32 is the motivation for scalars. And this doc only started because JSON uses 16-bit code units in its escape mechanism.
If we are only interested in UTF-8 why are surrogates ever mentioned?

--
James Manger




General

From: Tim Bray <tbray@textuality.com>
Date: Sunday, 1 October 2023 at 8:46 am
To: Manger, James <James.H.Manger@team.telstra.com>
Cc: i18ndir@ietf.org <i18ndir@ietf.org>, ART Area <art@ietf.org>
Subject: Re: [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
[External Email] This email was sent from outside the organisation – be cautious, particularly with links and attachments.
This is interesting, thank you James.

James proposes a restructure that cleverly removes mentions of code points and surrogates, on the basis that scalars are more important than code points.

He also points out an important error; the I-JSON repertoire is not scalars, it’s scalars minus noncharacters. I had entirely forgotten this.  Should we introduce an I-JSON subset for people who might want to use control codes?  I personally think using control codes is a bad idea, do others disagree?

While I agree that scalars are more important than code points, I have a problem with some of this, which is not technical.  I think the most common scenario in which this document would be used (should it progress), would be someone who is working on an internet draft with text fields and gets told “you need to specify your character repertoire, go look at [Unichars]”.

So the document exists at least in part to educate people about what’s lurking in Unicode.   For that reason, I don’t want to have a spec in the style of “just do this, don’t worry about why, trust us”. I think the explanation of the problem works better if you start from code points and explicitly caution against surrogates. Also, it’s hard to motivate why scalars exist without introducing surrogates.

I don’t think the discussion about scalars and UTF-8/16/32 is appropriate. UTF-16 and UTF-32 shouldn’t be used in protocols, the Internet has converged on UTF-8, RFC2277 mandates it, and this document should assume that characters on the wire are UTF-8.

What do people think about the BOM?  Officially, it’s ZERO WIDTH NO-BREAK SPACE.  It probably doesn't add value to a UTF-8 field but it also probably won’t break anything.  I can see people using it to pad out fixed-length fields?  If we remove it from “Unicode Assignables” then we probably have to include an explanation about the BOM functionality.



On Sep 30, 2023 at 2:15:01 AM, "Manger, James" <James.H.Manger@team.telstra.com<mailto:James.H.Manger@team.telstra.com>> wrote:
Comments on draft-bray-unichars-06<https://datatracker.ietf.org/doc/html/draft-bray-unichars>.

§2 “Characters and Code Points” should start with scalars; they are far more important than code points. Suggested replacement text.

[UNICODE<http://www.unicode.org/versions/latest/>] defines the 1,081,344 integers in the ranges 0 to D7FF16 and E00016 to 10FFFF16 as "Unicode scalars". Every character is assigned to one scalar. As of Unicode 15.1 (2023), 149,813 characters have been assigned, leaving 931,531 scalars available for assignment in future versions.

unicode-scalar = %x0-D7FF / %xE000-10FFFF

Scalars are the complete set of values that can be uniquely represented in all 3 Unicode encoding forms – UTF-8, UTF-16, and UTF-32 – which use 8-bit, 16-bit and 32-bit code units respectively. So scalars are the repertoire that works for representing characters in memory, storage, and in network protocols regardless of choices to use 8, 16 or 32-bit words.

I’d rename the section to be “Characters and scalars”.

Drop §2.1 “Transformation formats”.

§2.2 “Problematic Code Point Types” could be renamed “Problematic characters”.

BOM should be considered a problematic character (and excluded from unicode-assignables) as it is used as an encoding-layer signal.

Surrogates are handled at the encoding layer, not the character layer, so drop §2.2.1 “Surrogates”. I suggest a new §2.3 “Ill-formed encodings” to replace §2.2.1 and most of §3.

2.3. Ill-formed encodings

A sequence of 8-bit, 16-bit or 32-bit code units representing scalars is a well-formed UTF-8, UTF-16, or UTF-32 encoding respectively. However, there are other code unit sequences in each of these 3 encodings that don’t map to scalars (eg C016 8016 in UTF-8; D80016 in UTF-16; 20FFFF16 in UTF-32). Such sequences are call ill-formed. They can exist in practice. Reasonable options when interpreting such code unit sequences are signalling an error or treating them as "�" (U+FFFD, REPLACEMENT CHARACTER). Silently ignoring ill-formed code unit sequences is a known security risk.

Drop §3 “Dealing With Problematic Code Points”.

Typo: \U0089 should be \u0089.

Typo: RFC19413 should be RFC9413.

I’d define unicode-scalar in §2 so we don’t need §4.1 “Unicode Scalars”. §4 can say:

Specifications can refer to these by the names “Unicode scalars” (section 2), “XML Characters”, and “Unicode Assignables”.

I-JSON can’t be used as an example using unicode-scalar as it explicitly excludes noncharacters; and the difference between the repertoire for the JSON vs the repertoire for the logical string that can be represented by a JSON string is not explained.

i-json-value-repertoire = %x9 / %xA / %xD / %x20-D7FF / %xE000-FFFD / %x10000-1FFFD / %x20000-2FFFD / … / %x100000-10FFFD

i-json-logical-string-repertoire = %x0-D7FF / %xE000-FFFD / %x10000-1FFFD / %x20000-2FFFD / … / %x100000-10FFFD
--
James Manger




General
From: art <art-bounces@ietf.org<mailto:art-bounces@ietf.org>> on behalf of Tim Bray <tbray@textuality.com<mailto:tbray@textuality.com>>
Date: Tuesday, 26 September 2023 at 2:51 am
To: i18ndir@ietf.org<mailto:i18ndir@ietf.org> <i18ndir@ietf.org<mailto:i18ndir@ietf.org>>, ART Area <art@ietf.org<mailto:art@ietf.org>>
Subject: [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
[External Email] This email was sent from outside the organisation – be cautious, particularly with links and attachments.
What’s new and different here.


  1.  Locked down definition of “problematic”
  2.  Locked down definition of “character repertoire”
  3.  Changed “Useful Assignables” to “Unicode Assignables” (checked with Asmus first)

A new version of Internet-Draft draft-bray-unichars-06.txt has been
successfully submitted by Paul Hoffman and posted to the
IETF repository.

Name:     draft-bray-unichars
Revision: 06
Title:    Unicode Character Repertoire Subsets
Date:     2023-09-25
Group:    Individual Submission
Pages:    10
URL:      https://www.ietf.org/archive/id/draft-bray-unichars-06.txt
Status:   https://datatracker.ietf.org/doc/draft-bray-unichars/
HTML:     https://www.ietf.org/archive/id/draft-bray-unichars-06.html
HTMLized: https://datatracker.ietf.org/doc/html/draft-bray-unichars
Diff:     https://author-tools.ietf.org/iddiff?url2=draft-bray-unichars-06

Abstract:

  This document discusses specifying subsets of the Unicode character
  repertoire for use in protocols and data formats.



The IETF Secretariat