Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt

Rob Sayre <sayrer@gmail.com> Mon, 18 September 2023 17:51 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 38B40C14CE4C; Mon, 18 Sep 2023 10:51:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0OSjH0vBlsYY; Mon, 18 Sep 2023 10:51:18 -0700 (PDT)
Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BC3B3C14CE5F; Mon, 18 Sep 2023 10:51:18 -0700 (PDT)
Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-9ada2e6e75fso634612966b.2; Mon, 18 Sep 2023 10:51:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695059477; x=1695664277; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TWyNxsJyUvqZwoRwc6+EkRSWrcopIjkl/4QoVmjBLa0=; b=CxHSeKodo/AcFEcgV0jZWa5J3FhezOE0E5u6NehapxyE+DAxymyCbaoKesGyZnJ1nf OLtrTdoCZwuabZ68dNz9a2aVOf+ET8+9uo29EPkBhLqZVPhwvpoBf6XMW2mZ55xugk8l qeWw2tBkQuBTYwCgGzae4tSR8L90/gM5XkNwHZ0102+TRoBYVeZydGeixuFrw3jVd7z8 dKgLlDKuAsCBbOjWdz5hDhzb1MAe2qA+B7dcdNkN0vT6iOjXq/fdpGQJnDxR1OV0xsls 64O/xM1ue3SQYuRNJCJ/75OpE11glplkzX5KsuWksa7ZBPA1YaVo9Ws7+dQky8v8Z2tw HQ7Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695059477; x=1695664277; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TWyNxsJyUvqZwoRwc6+EkRSWrcopIjkl/4QoVmjBLa0=; b=O/NC7mlQgbyiXfgUZ/kpe+UesRnR+6ITPxDW2iUiqqrh9umi1oEnVK22QKLnO/4cyL om1QjPpfva1wvcBV3aoBhh7eygFS1JkDpFebfJWZXdaSzAGZUQIcTX6b79AWpvN79r2p BkVr6EsafJXnU9BFl6iZC8s0eRclP7kKOjRbE0z1A0miJkG/EV3jSynVRaRPr0ZGvRoS PwqYGzYPxUB7SOnU78QEepnZjaBFPfcaapqCs22tpbH4qPrCaD50frKVnoNk7lw0yT6V SfSWWoFl8lbs63kpajwfHpcjLRl0zaCt1aLnmM9H4PdMVPeC2mDLBzT2fUTWmSni2Qwb Masw==
X-Gm-Message-State: AOJu0YxrHEGWT3ffFFye+vknTL+dffNLd82h38yOd5FPFbBbI1ndmfo0 dcOde2CvejG5W77xZ5XaiFWjPzDREmtEj7StA+o=
X-Google-Smtp-Source: AGHT+IGaKNgOBTD0H6DBq7CyP/3N/znZz+lfn8jsUspch2Hcs4uOF+QjFdT9w54vCDcmdx+0mdiQ8xlxswO2TyiCQI8=
X-Received: by 2002:a17:906:5354:b0:9a5:7887:ef09 with SMTP id j20-20020a170906535400b009a57887ef09mr9458620ejo.32.1695059476847; Mon, 18 Sep 2023 10:51:16 -0700 (PDT)
MIME-Version: 1.0
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com> <SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBA@SY4PR01MB5980.ausprd01.prod.outlook.com>
In-Reply-To: <SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBA@SY4PR01MB5980.ausprd01.prod.outlook.com>
From: Rob Sayre <sayrer@gmail.com>
Date: Mon, 18 Sep 2023 10:51:05 -0700
Message-ID: <CAChr6SzRa8F+OrELa8N3rAMLmxdvr-g5c0i_9ESnWnwZY-iA4A@mail.gmail.com>
To: "Manger, James" <James.H.Manger=40team.telstra.com@dmarc.ietf.org>
Cc: Tim Bray <tbray@textuality.com>, ART Area <art@ietf.org>, "i18ndir@ietf.org" <i18ndir@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000fda45c0605a5c9e4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/JREW5aym-gbo9UBLy9riVCYU9-A>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Sep 2023 17:51:23 -0000

On Mon, Sep 18, 2023 at 7:05 AM Manger, James <James.H.Manger=
40team.telstra.com@dmarc.ietf.org> wrote:

> Defining unicode-scalar-values, xml-chars, and useful-assignables are 3
> helpful “subsets of Unicode characters” that can be used in protocols and
> data formats.
>
> Defining unicode-code-points as though it is similar is a category error,
> however.
>
>
>
> Section 3.1 and 3.2 deliberately make %x0-10FFFF and %x0-D7FF /
> %xE000-10FFFF looks like similar repertoires, which is too misleading.
>
> UTF-8, UTF-16 and UTF-32 are only defined for the latter. There may be
> “obvious” extensions of UTF-8 (WTF-8) and UTF-32 that can cover %x0-10FFFF,
> but they are simply not widely supported in modern software, so those
> extensions are no use in an IETF standard....
>

Yes, they are very widely supported, although they may not have names, or
be exactly the same across languages. In Rust, refer to the OSString docs:
<https://doc.rust-lang.org/src/std/ffi/os_str.rs.html#17>

That's why they're out there.

Here, you've gotten into transformation formats right away, but the
document is not really about transformation formats. Section 6 would seem
to cover this: "...this sort of character-repertoire restriction, which
applies to data content, not textual representation in packaging formats."



> For understandable reasons, JSON supports both *(%x0-D7FF / %xE000-10FFFF)
> and *(%x0-FFFF) (arbitrary 16-bit data) as models for the logical strings
> it can represent.
>

ECMA-404 is clear: "JSON syntax describes a sequence of Unicode code
points." and the discrepancy between this text and RFC8259 is what
motivated this document. The document also seems to fairly clearly
recommend against using this production if you can help it.

thanks,
Rob