Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt

Rob Sayre <sayrer@gmail.com> Tue, 19 September 2023 16:44 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A4354C15199A; Tue, 19 Sep 2023 09:44:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.107
X-Spam-Level:
X-Spam-Status: No, score=-2.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ifzx9p7nTsHB; Tue, 19 Sep 2023 09:44:45 -0700 (PDT)
Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1F6C7C15198E; Tue, 19 Sep 2023 09:44:45 -0700 (PDT)
Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-503397ee920so122023e87.1; Tue, 19 Sep 2023 09:44:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695141883; x=1695746683; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sJ2ezhSsE9Yey1s97ZhEcK+1B+Xo3kYcid3XsSBlnCs=; b=UHyf9gIzqVb4pcL/81T3j7sP2C3BP4wS9BIGt1Pb1P7bdrgIisL7MDu+tv4J8RJXi2 +MeGEb60cz6NADLXuhjHjNF4gXP1ZWEzL0QaDgl1xW/Lzmgcnfp7kUHJCF8RduS4Gbie A0WOAjTjFnF7xJx8O+Vmlfc6yPnXCFfXAy4L+ybKDW+q+To1FQ5t3L0w8seeFt0BtFo1 8LRS8T+JM8Dyz13ivRs3i8XQmf1ZmTwsHnwGmWiwPg5ZqngCi/aoZS+kWMLrvU6oU93s 6FvGIvWJW3xQA5iRxPEhoyYvTqc3Zf8jW04bGsPdjCg02f6ysSS+q409mJ00IdsJHHMg f2SQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695141883; x=1695746683; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sJ2ezhSsE9Yey1s97ZhEcK+1B+Xo3kYcid3XsSBlnCs=; b=wbofoAwmZt1ERpSuNrarT9W9UJ+ASLBAS3V7MVA+I/2oX5K0Gjf0224/zG0lrKSXv2 8xuRua3MGjx5VGv0DZ7NCToyUE+yQvRXguici+Ye9iq4aKOJATdg3+4sUo8AMlEUhUuj DXCEVy6dDaj9xnM03SB41WcOcOEIlDeWMLHDWMdiL9u476LBkbrtByuzkx4aWU9RX/4N JJMIIuJL9i95hg/y17blSXaF0V3uacY1BWtAKaf4x//kgWJfK/0J8gSqZ3Vp0D+hZp/r 1zO4HxljTJRI8OCKka7b6hr3fgIEZURuRh10nKI3BhgPEG2CeelujrTLbhVlqhN5wYjw U90A==
X-Gm-Message-State: AOJu0Yxb0mu/8I2N3RnH7DXdr+4Mg5kqipFSsQPoVHq3aO05+QbP2eBz 0gnr0hX8CC0MfDTc35/TEo+92KppZI1pWN3V2x4=
X-Google-Smtp-Source: AGHT+IFocbOeU3GT0gO7pK5ubI7SWDlIwh0pm6WqAbRxVaVlYdGVojDr1QTvFngez4dr2sLowdVfVYWxUkNvz5hzvJY=
X-Received: by 2002:a05:6512:e86:b0:500:7e64:cff1 with SMTP id bi6-20020a0565120e8600b005007e64cff1mr268794lfb.14.1695141882476; Tue, 19 Sep 2023 09:44:42 -0700 (PDT)
MIME-Version: 1.0
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com> <SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBA@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzRa8F+OrELa8N3rAMLmxdvr-g5c0i_9ESnWnwZY-iA4A@mail.gmail.com> <CAChr6Sy05spOW9nsy36kYr8Ob6OYS7vCgrEVPhhWs9Pe4LkpNA@mail.gmail.com> <2e6c2d13-9fc9-d320-3803-2b9a4df3b042@ix.netcom.com> <CAChr6Swr5tS2-wW8dZ0A4J7_Jd+RoHZNJkzhNfcVTi84oDvOPA@mail.gmail.com> <SY4PR01MB5980320C8286458EA28EFF32E5FAA@SY4PR01MB5980.ausprd01.prod.outlook.com>
In-Reply-To: <SY4PR01MB5980320C8286458EA28EFF32E5FAA@SY4PR01MB5980.ausprd01.prod.outlook.com>
From: Rob Sayre <sayrer@gmail.com>
Date: Tue, 19 Sep 2023 09:44:30 -0700
Message-ID: <CAChr6Sw=ZKoxv-veN2KPKd2je8uptZS4DrhfmmW6Mi=KkwH7Sw@mail.gmail.com>
To: "Manger, James" <James.H.Manger@team.telstra.com>
Cc: Asmus Freytag <asmusf@ix.netcom.com>, Tim Bray <tbray@textuality.com>, ART Area <art@ietf.org>, "i18ndir@ietf.org" <i18ndir@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000bfbc520605b8f9b0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/uo8jpACtsRr9-7SwMcCdlW1upGU>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Sep 2023 16:44:49 -0000

On Tue, Sep 19, 2023 at 5:59 AM Manger, James <
James.H.Manger@team.telstra.com> wrote:

> Plenty of systems will parse the JSON "\uDEAD" to & from an internal
> string representation, but that JSON is 8 ASCII characters (including the
> quotes) – there is nothing ill-formed there. It’s far less clear to me if
> many systems will parse "<ill-formed code unit sequence for U+DEAD>" from
> five 8-bit WTF-8 code units, three WTF-16 code units, or three WTF-32 code
> units. And it’s unclear to me whether ECMA-404 *expects* anything to
> parse those.
>

Ah, but no one is advocating this. The ABNF does not represent the
transformation formats, but the Unicode  /after/ decoding. Perhaps the
document should be clearer here. For example, ECMA-404 says:

"To escape a code point that is not in the Basic Multilingual Plane, the
character may be represented as a twelve-character sequence, encoding the
UTF-16 surrogate pair corresponding to the code point. So for example, a
string containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

So, the ABNF is concerned with U+1D11E, not the ASCII escape sequences.


> No. TR17 says a Unicode 16-bit string is *(%x0-FFFF). Quite different
> from *(%x0-10FFFF).
>
> TR17 does look good.
>

I was referring to the phrase: "A string data type is simply a sequence of
code units."

Even looser than the ECMA-404 definition.

thanks,
Rob