Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt

Tim Bray <tbray@textuality.com> Sat, 07 October 2023 00:13 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 48FEDC151086 for <i18ndir@ietfa.amsl.com>; Fri, 6 Oct 2023 17:13:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Cl9E0w9zFSHO for <i18ndir@ietfa.amsl.com>; Fri, 6 Oct 2023 17:13:51 -0700 (PDT)
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E435BC151067 for <i18ndir@ietf.org>; Fri, 6 Oct 2023 17:13:50 -0700 (PDT)
Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-536071e79deso5912220a12.1 for <i18ndir@ietf.org>; Fri, 06 Oct 2023 17:13:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1696637629; x=1697242429; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=iAv1MN2cWAokDhsLl5N+V1FpB8PoqPh0qTY+yMNRgbA=; b=M76wJNtnwFKKg8MmIdcbkIyJMLaqwh1Z+YIB/eAd1kPoBMHzTIRrSt3eORZbYa7m4G 9/w0fdnVRrav3BtRs+4IbSQhYpwwXCTtyk0rS6YdSZR2oZBNQwys2HSeUsIfTqKCoJYq UBgPs1xjif02n/Th6jH+BH1T2AcsX6Qa3pIIo=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696637629; x=1697242429; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iAv1MN2cWAokDhsLl5N+V1FpB8PoqPh0qTY+yMNRgbA=; b=qPu8+Q2YQHKcicmeE8lCC0erEKgnFiA0pYYlPi4aDBMd9oXU7GbkvAi+TJHFFoxUvF o5HrhxIBJS0wJVeOOvakgXTIHo4XcSzgqDCzTAPfA8QGMhtslWnA4+V3R7DJHp48cuUY 8qKnDDDxvKYKhPZuI3Lp44wLP0waFqF8j2SJ7zl/GhWljSI18AeNG8yNE04lPPgsdU0j N2JKuA83UBXtL7jx2gSNl7EN6OEUCrHKwVgoKIpOXPjvdXd8Rg4ybZxqPp5Eb1fPqFge 9eLF0NxzTTJkYtQ3RApNgTZDjE6I4zcrxIpX9W1GK1AZ+InZHZTo+xLnEXEG36++w96k JoFw==
X-Gm-Message-State: AOJu0YwGxr0IYrBdRRAw7NZ/vWLpwLRg8wUv+RdV5zjMECW0EpOCu+0e RxR9Zpw0cus3F6zfXXCNyGftD0REkPY4MO4YGLt9Gg+SWCjw+V6deBc=
X-Google-Smtp-Source: AGHT+IGMXrzgCZXjfkbTXvEiyB/lC+cKlFgVitwKAieDEbRw6nrPpl7OSP+12ZmK2vde6F0WKEo+giA4iTUaj2iQ5CA=
X-Received: by 2002:aa7:da5a:0:b0:530:8942:e830 with SMTP id w26-20020aa7da5a000000b005308942e830mr4085104eds.2.1696637629124; Fri, 06 Oct 2023 17:13:49 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Fri, 6 Oct 2023 20:13:48 -0400
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Fri, 6 Oct 2023 20:13:44 -0400
Mime-Version: 1.0 (Mimestream 1.1.2)
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <SYBPR01MB59814B3448F5754AAEDA1740E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com> <CAHBU6iueqtd5T1T-ciYUMWvmo8XqBQqO5LkWbdRaoXQzPYSQOQ@mail.gmail.com> <SY4PR01MB5980D009F1623E3694B871B7E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzMXqmEJvwQ0Vb0+CfchBn2kMueQJ-2Th1=4Oct8b9t6A@mail.gmail.com> <E1464943-EB11-4FA4-B933-4F138C6C34A0@tzi.org> <CAHBU6itgC07j0P5DcACDyHSjEOG6=j5kWE=eYF8E0NA3mm_b5A@mail.gmail.com> <SY4PR01MB59803C733B6B6A1C9D4E04F4E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAHBU6iuEbKOri56HiTB+HcsPKOpXJArFpbkVnf68=5i8FMWPUg@mail.gmail.com> <CAChr6Sy34Ca16imTu7Db7hWEMEY_7dKj2ZsZrNNWkWWbZG=D9Q@mail.gmail.com> <SY4PR01MB5980A1D1A942722DF360889EE5C9A@SY4PR01MB5980.ausprd01.prod.outlook.com>
In-Reply-To: <SY4PR01MB5980A1D1A942722DF360889EE5C9A@SY4PR01MB5980.ausprd01.prod.outlook.com>
From: Tim Bray <tbray@textuality.com>
Date: Fri, 06 Oct 2023 20:13:48 -0400
Message-ID: <CAHBU6iu7vRYEUGY-bDo6ZtX8sMrezeDDE+ZfY61FnQrFA00qxw@mail.gmail.com>
To: "Manger, James" <James.H.Manger@team.telstra.com>
Cc: "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>, Rob Sayre <sayrer@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000003264330607153b90"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/wiFisaYDEaLUvmcAqW-GtSUbz_U>
Subject: Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Oct 2023 00:13:55 -0000

 Thanks. It’s a long weekend in Canada, which may help me recover from my
depression concerning the world’s most popular programming language.

On Oct 6, 2023 at 5:10:11 PM, "Manger, James" <
James.H.Manger@team.telstra.com> wrote:

> % jshell
>
> |  Welcome to JShell -- Version 20.0.2
>
> |  For an introduction type: /help intro
>
>
>
> jshell> import static java.nio.charset.StandardCharsets.*;
>
>
>
> jshell> HexFormat.of().toHexDigits(new String(new char[] {
> 0xDEAD}).codePointAt(0))
>
> $2 ==> "0000dead"
>
>
>
> jshell> HexFormat.of().formatHex(new String(new char[] {
> 0xDEAD}).getBytes(UTF_16BE))
>
> $3 ==> "fffd"
>
>
>
> jshell> HexFormat.of().formatHex(new String(new char[] {
> 0xDEAD}).getBytes(UTF_8))
>
> $4 ==> "3f"
>
>
>
> The Java code snippets above show that:
>
> 2. You can store an unpaired surrogate in a Java string
>
> 3. The easiest way to get a UTF-16 encoding replaces ill-formed code units
> with U+FFFD
>
> 4. The easiest way to get a UTF-8 encoding replaces ill-formed code units
> with U+3F QUESTION MARK
>
>
>
> jshell> var *cb* = CharBuffer.allocate(1).append((char)0xDEAD);
>
> cb ==>
>
>
>
> jshell> Charset.forName("UTF-8").newEncoder().encode(cb.rewind())
>
> |  Exception java.nio.charset.MalformedInputException: Input length = 1
>
> |        at CoderResult.throwException (CoderResult.java:274)
>
> |        at CharsetEncoder.encode (CharsetEncoder.java:820)
>
> |        at (#20:1)
>
>
>
> A UTF-8 encoder in Java can also be configured to signal an error for
> ill-formed code units.
>
>
>
> From CharsetEncoder Javadoc
> <https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CharsetEncoder.html#:~:text=How%20an%20encoding%20error%20is%20handled%20depends>
>
> How an encoding error is handled depends upon the action requested for
> that type of error, which is described by an instance of the
> CodingErrorAction
> <https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html> class.
> The possible error actions are to ignore
> <https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html#IGNORE> the
> erroneous input, report
> <https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html#REPORT> the
> error to the invoker via the returned CoderResult
> <https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CoderResult.html> object,
> or replace
> <https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CodingErrorAction.html#REPLACE> the
> erroneous input with the current value of the replacement byte array. The
> replacement is initially set to the encoder's default replacement, which
> often (but not always) has the initial value { (byte)'?' }; its value may
> be changed via the replaceWith
> <https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/CharsetEncoder.html#replaceWith(byte%5B%5D)>
>  method.
>
>
>
> --
>
> James Manger
>
>
>
>
>
>
>
> * General From: *Rob Sayre <sayrer@gmail.com>
> *Date: *Saturday, 7 October 2023 at 10:29 am
> *To: *Tim Bray <tbray@textuality.com>
> *Cc: *Manger, James <James.H.Manger@team.telstra.com>, i18ndir@ietf.org <
> i18ndir@ietf.org>, ART Area <art@ietf.org>
> *Subject: *Re: [art] New Version Notification for
> draft-bray-unichars-06.txt
>
> [External Email] This email was sent from outside the organisation – be
> cautious, particularly with links and attachments.
>
> On Fri, Oct 6, 2023 at 4:10 PM Tim Bray <tbray@textuality.com> wrote:
>
>
>    1.
>    2. U+FFFD is an obvious choice to replace code units or scalars you
>    don’t want. But Unicode does allow choices. Unicode ch3
>    <https://www.unicode.org/versions/Unicode15.1.0/ch03.pdf> C10 only
>    says “with a marker such as U+FFFD”. Unicode TR36
>    <https://unicode.org/reports/tr36/#Substituting_for_Ill_Formed_Subsequences>
>    says “where U+FFFD is not available, a common alternative is "?"”. Java,
>    for instance, uses “?” is some common circumstances. Unichars does not
>    admit such an option.
>
> Also worth a reference. If you’re writing Java code you should probably do
> what Java does, no?
>
>
>
> Can we write the code here, and see what Java does? I find the other
> points uncontroversial,
>
>
>
> thanks,
>
> Rob
>
>
>
>
>
>
>