Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt

"Manger, James" <James.H.Manger@team.telstra.com> Sat, 07 October 2023 00:10 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2DF24C151082; Fri, 6 Oct 2023 17:10:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.008
X-Spam-Level:
X-Spam-Status: No, score=-2.008 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=team.telstra.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WKnHl2z7xULh; Fri, 6 Oct 2023 17:10:16 -0700 (PDT)
Received: from AUS01-ME3-obe.outbound.protection.outlook.com (mail-me3aus01on2132.outbound.protection.outlook.com [40.107.108.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0A009C151067; Fri, 6 Oct 2023 17:10:15 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HhDs8xCZqnMsDv+NNp/dk3XCt797nAIJXKI9IlZQW0EdhQhvOyyc4to00gzpfOmbYspz79HQn+OQhce5Sgb9vo7McWADogLlGzKxSxmqMlZfOXWNtEb671YE4Tr4FX4RYNFePeRTxm17HQP4VPaNJHx4Dgv2WYPyfVk8md0IYWoCsBY4L5P2EcQ2ck03ltpom6wwuy3eizux1kK/zeOMqmacR3qDs8uPfA221c018XFRAm1OY0KENusRoeOdQ/pjzbEu6+YuY2l9RGtfJcdoZWnGcxjeOyYerB4bRVhlGx9iSwoL+exePIya9WN7iHTMn5PCtBjiymPmRNDInu9Uag==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gtOriveZ4JN72mig9l5YK5jb8/hob94xaRtQzGSt4EA=; b=Evh84nR0/OD/xGhNlO97fgAX3sWS0GO8xcM4jPb0S6Fa0PVm17PA882cYCDaCiJslabC0NCghYJgpViS5rRa6eqf9y8V3QPcdNCg3ixcge54BGMLFNR/P+CP9Jch5eFxM+XPlWSZbWmKQAadgUuPHMlVznQ3t1Nbu+9XDwNh+7nFmbekATy+7N0glA0Tmm4D2WASKGOq4A51XsGJw8iD1ZAlnQddAnraJF97BnfvzAq/eoIRL7nbSjaTE6Ub5WGMHwfHQWCqM6P6Hq0D6eygr4R4bl+8Dhpl/uGjk82fY7Y1ctzCSKnUpMfU3FolNCzbuqz3x8p3CabcGRPW+GnCdg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=team.telstra.com; dmarc=pass action=none header.from=team.telstra.com; dkim=pass header.d=team.telstra.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=team.telstra.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gtOriveZ4JN72mig9l5YK5jb8/hob94xaRtQzGSt4EA=; b=qqEWpb3itRzaubkyfYQP9y57cWRRE3XQ1ificcXJ4UmPiqaVhStt02z3xGS0qEZsKgav9QVElexztTjGdElCIAy0cEh3MWQZiyz+HnlBB1Mn+a/poJrr4tzan7UaIICjLjRz6Ba2Gs8UfGJWqKXOny6/FUqjIsSxXwezIVH1/Gc=
Received: from SY4PR01MB5980.ausprd01.prod.outlook.com (2603:10c6:10:f7::9) by ME3PR01MB8193.ausprd01.prod.outlook.com (2603:10c6:220:1bb::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.41; Sat, 7 Oct 2023 00:10:11 +0000
Received: from SY4PR01MB5980.ausprd01.prod.outlook.com ([fe80::9cc9:656:a953:176b]) by SY4PR01MB5980.ausprd01.prod.outlook.com ([fe80::9cc9:656:a953:176b%3]) with mapi id 15.20.6838.033; Sat, 7 Oct 2023 00:10:11 +0000
From: "Manger, James" <James.H.Manger@team.telstra.com>
To: Rob Sayre <sayrer@gmail.com>, Tim Bray <tbray@textuality.com>
CC: "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>
Thread-Topic: [art] New Version Notification for draft-bray-unichars-06.txt
Thread-Index: AQHZ9UuFPCV3ExadKEGgYt15nltTT7A3GDaAgAABPJ6ABlCCAIAABUoAgAAEJe4=
Date: Sat, 07 Oct 2023 00:10:11 +0000
Message-ID: <SY4PR01MB5980A1D1A942722DF360889EE5C9A@SY4PR01MB5980.ausprd01.prod.outlook.com>
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <SYBPR01MB59814B3448F5754AAEDA1740E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com> <CAHBU6iueqtd5T1T-ciYUMWvmo8XqBQqO5LkWbdRaoXQzPYSQOQ@mail.gmail.com> <SY4PR01MB5980D009F1623E3694B871B7E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzMXqmEJvwQ0Vb0+CfchBn2kMueQJ-2Th1=4Oct8b9t6A@mail.gmail.com> <E1464943-EB11-4FA4-B933-4F138C6C34A0@tzi.org> <CAHBU6itgC07j0P5DcACDyHSjEOG6=j5kWE=eYF8E0NA3mm_b5A@mail.gmail.com> <SY4PR01MB59803C733B6B6A1C9D4E04F4E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAHBU6iuEbKOri56HiTB+HcsPKOpXJArFpbkVnf68=5i8FMWPUg@mail.gmail.com> <CAChr6Sy34Ca16imTu7Db7hWEMEY_7dKj2ZsZrNNWkWWbZG=D9Q@mail.gmail.com>
In-Reply-To: <CAChr6Sy34Ca16imTu7Db7hWEMEY_7dKj2ZsZrNNWkWWbZG=D9Q@mail.gmail.com>
Accept-Language: en-AU, en-US
Content-Language: en-AU
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Enabled=True; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SiteId=49dfc6a3-5fb7-49f4-adea-c54e725bb854; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_SetDate=2023-10-06T23:44:03.7336575Z; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_ContentBits=0; MSIP_Label_f4ab56b7-6ec4-4073-8d92-ac7cc2e7a5df_Method=Standard
authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=team.telstra.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: SY4PR01MB5980:EE_|ME3PR01MB8193:EE_
x-ms-office365-filtering-correlation-id: f62690e0-d374-4b4f-08f0-08dbc6c9c5d2
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: UglZJHy0uSss3GzQ1+BvGnPV6G/v3v6FIgNa85+Ea70lGazhJGc9kKWICf5VDKRRa68i6sS0MH8eZXAmB2bkxmpRbl06hUEjVFI2EvJZM7FzLNFqYkXzFcBu7lZVnvuf71Fh9LRY9byCG1zPIWxkm+JPJccf97XHUGV47NVuIE9nh54LAysS2XubGg3B0gA7hQoFCtpFWER6Qnlfy2S8T1DdBKlcQ64lie1y4I2gfpqtr7oOOdj+rKP04KVpmifNos670xa5FFXKyB757Jm+BZ/dxHKy7YiSYO2KklrLFmOZoYO1ifiKHDE2Pl3umHXVgf+pVMOwXG0qGuiaU2CViD19XaqbZhbtg1Asz6HeVCNltiqCUCZQzJhhYUVIJrhSFvDmN7gsAJker9e1euVpYnCYwobDd9diWD/CtUJu8QFD+TM1O7ZuKKOc2mP+ziip714cwiCK/SurN9oTO0VjW9T1IB9uIrwTSbrAYw3jUcJbxDSkNER8+YwhRoqb+o9aCTSQJ0tlUy7/YPH1f14E1urT/pBz02xQ9j4yJgP/h2haSMeoCeS+KjnOxuDhe9pUFIwbCrv9YrcnOz+UQmf+6If94RsZ7pfl4oak0OOkw2c=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SY4PR01MB5980.ausprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(376002)(39860400002)(396003)(366004)(346002)(136003)(84050400002)(230922051799003)(451199024)(64100799003)(186009)(1800799009)(5660300002)(66446008)(2906002)(15650500001)(316002)(64756008)(54906003)(66476007)(52536014)(166002)(86362001)(478600001)(8936002)(41300700001)(76116006)(110136005)(66946007)(66556008)(4326008)(8676002)(21615005)(33656002)(38100700002)(38070700005)(71200400001)(7696005)(6506007)(9686003)(122000001)(66899024)(82960400001)(55016003)(83380400001)(53546011); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: AtJuPmmXEvvWBBiSnHT3o9sMv3l4KR2pvITTrxFbtgKjcZXUOFqQbrr/Qi1Tth6OjGqszRc5iAQ61claY+jHAvGk5Ojzq4sJtr/yjAf7hbGY4xgqrg+qjnfSOFKqO7ZGSAqE67Zo3jStnkvLUiYMdFXZFdRNUNUV72pvF1kiXxOsVjFBiosu7h9G3/t0P4OrMJ/uZN/Ni8+R6rGwGwVv7JxbGUPZHH6fNvVzag2B4XvEvJIEmZPGWRXBjUF+Rmvd1bHIYgaxY7mPXrn1cBomKKO7XYEVxfyjNwmhh0w1TTVVMfEaRFIyCwoIt4yIqEUTgJFt+4giUSEbxKsU59fG7bKlrPTPLCx4ZqfnHuxCOnW280kHvT2d3yS/43/kR9Wt4I0ExNMI6I1IaJrgQjEDWaO+l3vmUdDqd4mWimOhMz14YqU9K0Z7vm6e7Tr8UqRsQa2QrUTC+0ZBmHsxHZgvfKdcDwybgvFhhbpzukNmkGYxZUvHfiTcHTSdEHL34I2ctAJiSiCCpfXKKdF2yNTUd+UqhoAmbP3ezfU6xrXJThcQS8dPb4HNCcvbjLqdg0/7scfo08ILeI+pchJ5Ju/rprt/dvk25BwBWQAaKjS+f4YdyIgqd3/RPHHWALrw69T7lSojWPxijwv+QK4SF7l3LK0wwT7hafeC92rsMS0zYNzadF1OPpnpQcn0ikesDWGoMjeqhoWwYJKnwmyD4PH7QmnDGlKVGFMrqnTg8dWy72QV+SSBx0GEXWW59UNPZNkjgIkvqutQ68LMo5CfukHpWq3ifIjwXGtPzf8BiOdAre9UBqRKf6bPqB4dnRAewsT04EffPvZoXYzv9LRtTD+0lD9XpfnRJFFChNxIaGeTobndUhRzI2rvINOm142iAiCTuKwSkAZcl88OxUNts9Y1tC3/BJUqq78NW7Iqtbis1WMqS2CxXqkMa4nH7Kc5ZDY0o6J4H8ZnNrsigMxHjfEn0pmLb4/bp8l18/LMkCCe21UrI+66oH1yj08Qi96deUtyxUgyoJ73sG0emFz00WG5KtU4wpKOMSiWPtmxMU7/3b/V6sFjpO1K8hcbxRr0CXuq4CJ1bgvoaGhEfltN25QNUo3R3I0pEkRB95xJ8gvrBOGBgH+kDP5iHIuV0iHi6/407NqKLt8VJnNUGrEt3LbPT36jp2qq5fmwEk/5KGWZM+tWNOfUAYbPgF18F4Zd6/xpCIHKLuOTld51RlyQbWzvtxgJ9tD8WTfSNA9ClSZSO2P/mKXVoFqb28DSbw+cvpR37Vuqzf6Joc038LGJ2xL+oD6AkhFkXQ5cJ+tdUWWROQZ5NV+JxF7xFEOL+O0oAERxaToJapSDOUN1ByfljBP2Zqf4F4PD/MKll+TGcdr5wOkeOzj5etnGkgdrXod0npM48yCq6bZaQfdiApRa/l8lRQeiIBYYxM1P8dLELKq3RvDkwoQtYrfIvgIedGWP5YuyLmUY0oVbAXyontTHxa4HqjqkKFi90JqxAmiEyVauVFnOs/W2MBxAnfaY7rgXg+cC3+UBGoayr0cIM6gpHylaWCcXkhRnbDOUydU0kziIRghyQ4dItz+iU4pT2bbBdUbNLsH1mAaa3lSUB9KKKfcjpMbPcSvKYitSF4t200nzICcCSTYyD4BnRDe8u1E3EEsfPueYN3iTsgKH4zhlx/3kYXd38PJr6Q46AtKWYO4JoCr4bbEvaKK1dvNA/WdbZAF/
Content-Type: multipart/alternative; boundary="_000_SY4PR01MB5980A1D1A942722DF360889EE5C9ASY4PR01MB5980ausp_"
MIME-Version: 1.0
X-OriginatorOrg: team.telstra.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: SY4PR01MB5980.ausprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: f62690e0-d374-4b4f-08f0-08dbc6c9c5d2
X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Oct 2023 00:10:11.2270 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 49dfc6a3-5fb7-49f4-adea-c54e725bb854
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 9QEf9q4tqPvlO5wQmtMsWIn2Akxr0HAFiarhP5PSF3dX49D8v+qZeMYtquZcht5Q5v4fx6oRa/DjoF7gFxgSyTIW414HwmudIXu67wcdzy4=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: ME3PR01MB8193
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/9DYjf23NqXXB2l8afWdZwcUlU7E>
Subject: Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Oct 2023 00:10:21 -0000

% jshell

|  Welcome to JShell -- Version 20.0.2

|  For an introduction type: /help intro



jshell> import static java.nio.charset.StandardCharsets.*;



jshell> HexFormat.of().toHexDigits(new String(new char[] { 0xDEAD}).codePointAt(0))

$2 ==> "0000dead"



jshell> HexFormat.of().formatHex(new String(new char[] { 0xDEAD}).getBytes(UTF_16BE))

$3 ==> "fffd"



jshell> HexFormat.of().formatHex(new String(new char[] { 0xDEAD}).getBytes(UTF_8))

$4 ==> "3f"

The Java code snippets above show that:
2. You can store an unpaired surrogate in a Java string
3. The easiest way to get a UTF-16 encoding replaces ill-formed code units with U+FFFD
4. The easiest way to get a UTF-8 encoding replaces ill-formed code units with U+3F QUESTION MARK


jshell> var cb = CharBuffer.allocate(1).append((char)0xDEAD);

cb ==>



jshell> Charset.forName("UTF-8").newEncoder().encode(cb.rewind())

|  Exception java.nio.charset.MalformedInputException: Input length = 1

|        at CoderResult.throwException (CoderResult.java:274)

|        at CharsetEncoder.encode (CharsetEncoder.java:820)

|        at (#20:1)

A UTF-8 encoder in Java can also be configured to signal an error for ill-formed code units.

From CharsetEncoder Javadoc<https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CharsetEncoder.html#:~:text=How%20an%20encoding%20error%20is%20handled%20depends>
How an encoding error is handled depends upon the action requested for that type of error, which is described by an instance of the CodingErrorAction<https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html> class. The possible error actions are to ignore<https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html#IGNORE> the erroneous input, report<https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html#REPORT> the error to the invoker via the returned CoderResult<https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CoderResult.html> object, or replace<https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html#REPLACE> the erroneous input with the current value of the replacement byte array. The replacement is initially set to the encoder's default replacement, which often (but not always) has the initial value { (byte)'?' }; its value may be changed via the replaceWith<https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CharsetEncoder.html#replaceWith(byte%5B%5D)> method.

--
James Manger




General

From: Rob Sayre <sayrer@gmail.com>
Date: Saturday, 7 October 2023 at 10:29 am
To: Tim Bray <tbray@textuality.com>
Cc: Manger, James <James.H.Manger@team.telstra.com>, i18ndir@ietf.org <i18ndir@ietf.org>, ART Area <art@ietf.org>
Subject: Re: [art] New Version Notification for draft-bray-unichars-06.txt
[External Email] This email was sent from outside the organisation – be cautious, particularly with links and attachments.
On Fri, Oct 6, 2023 at 4:10 PM Tim Bray <tbray@textuality.com<mailto:tbray@textuality.com>> wrote:

  1.
  2.  U+FFFD is an obvious choice to replace code units or scalars you don’t want. But Unicode does allow choices. Unicode ch3<https://www.unicode.org/versions/Unicode15.1.0/ch03.pdf> C10 only says “with a marker such as U+FFFD”. Unicode TR36<https://unicode.org/reports/tr36/#Substituting_for_Ill_Formed_Subsequences> says “where U+FFFD is not available, a common alternative is "?"”. Java, for instance, uses “?” is some common circumstances. Unichars does not admit such an option.
Also worth a reference. If you’re writing Java code you should probably do what Java does, no?

Can we write the code here, and see what Java does? I find the other points uncontroversial,

thanks,
Rob