Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt

Asmus Freytag <asmusf@ix.netcom.com> Tue, 19 September 2023 21:09 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 61450C151994; Tue, 19 Sep 2023 14:09:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.996
X-Spam-Level:
X-Spam-Status: No, score=-1.996 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Esrp66PutUJD; Tue, 19 Sep 2023 14:09:24 -0700 (PDT)
Received: from mta-201a.earthlink-vadesecure.net (mta-201a.earthlink-vadesecure.net [51.81.229.180]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 86A00C15C52D; Tue, 19 Sep 2023 14:09:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=PXwuZsLwaDFF3EQvHw2GCBCSBzpSBhqoivDnvW kZOl0=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1695157762; x=1695762562; b=mj8E09rF/1xpfEDRnBoVu8A+03U4sTckUrlRBjwSXAg+xg9Uw+vt/S5 cLHcnE0ZmoK0MQBv60V0RtGJGF3Gk2LJphMsv5nJfjaWWM8Vxxus/Uck0lWAql61kJmW1qp XH75iP9XMq3yQ2ITSllwWpAAVA1EBRnxfb3EkY7yLG2AgJ1bw5/Ew8mmjk9aQThCs/3qkGM t+LlLVTrROwtuKssTCbysmqT8YtVLlXRApic2amtVyNG76lU25WTkJvWIo0eT6zUENLFX2+ uyZkIT14TT2Uxmk2rdT1YZOUQ7fd4ymhWbA6C3PymtNviho2GYobGbLhmDdTEEKFtADKP+2 5yA==
Received: from [10.71.219.206] ([198.54.134.115]) by vsel2nmtao01p.internal.vadesecure.com with ngmta id 16ad19a4-17866900d49e6dfd; Tue, 19 Sep 2023 21:09:22 +0000
Content-Type: multipart/alternative; boundary="------------HBD8T8bJo5uTzJ8LF0vDhMgK"
Message-ID: <1d19f72f-8c41-f10c-831c-8e5cea347478@ix.netcom.com>
Date: Tue, 19 Sep 2023 14:09:19 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1
Content-Language: en-US
To: Rob Sayre <sayrer@gmail.com>
Cc: "Manger, James" <James.H.Manger=40team.telstra.com@dmarc.ietf.org>, Tim Bray <tbray@textuality.com>, ART Area <art@ietf.org>, "i18ndir@ietf.org" <i18ndir@ietf.org>
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com> <SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBA@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzRa8F+OrELa8N3rAMLmxdvr-g5c0i_9ESnWnwZY-iA4A@mail.gmail.com> <CAChr6Sy05spOW9nsy36kYr8Ob6OYS7vCgrEVPhhWs9Pe4LkpNA@mail.gmail.com> <2e6c2d13-9fc9-d320-3803-2b9a4df3b042@ix.netcom.com> <CAChr6Swr5tS2-wW8dZ0A4J7_Jd+RoHZNJkzhNfcVTi84oDvOPA@mail.gmail.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <CAChr6Swr5tS2-wW8dZ0A4J7_Jd+RoHZNJkzhNfcVTi84oDvOPA@mail.gmail.com>
Authentication-Results: earthlink-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/iv-aJ7dIOUpnPT_rbfxpxslXy2k>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Sep 2023 21:09:28 -0000

On 9/18/2023 1:31 PM, Rob Sayre wrote:
>
>
> On Mon, Sep 18, 2023 at 1:17 PM Asmus Freytag <asmusf@ix.netcom.com> 
> wrote:
>
>     On 9/18/2023 11:09 AM, Rob Sayre wrote:
>>     On Mon, Sep 18, 2023 at 10:51 AM Rob Sayre <sayrer@gmail.com> wrote:
>>
>>         On Mon, Sep 18, 2023 at 7:05 AM Manger, James
>>         <James.H.Manger=40team.telstra.com@dmarc.ietf.org> wrote:
>>
>>             For understandable reasons, JSON supports both *(%x0-D7FF
>>             / %xE000-10FFFF) and *(%x0-FFFF) (arbitrary 16-bit data)
>>             as models for the logical strings it can represent.
>>
>>
>>         ECMA-404 is clear: "JSON syntax describes a sequence of
>>         Unicode code points." and the discrepancy between this text
>>         and RFC8259 is what motivated this document. The document
>>         also seems to fairly clearly recommend against using this
>>         production if you can help it.
>>
>>
>>     Perhaps this document should reference
>>     <https://unicode.org/reports/tr17/#Strings> (note authors), which
>>     covers similar territory.
>>
>     Thanks for noticing.
>
>     The need for transient states that are discoverable (that is, not
>     fully encapsulated) is a big reason why many specs are not tighter.
>
>     However, there are points in a protocol where strings aren't in a
>     transient processing state, and here the full restrictions should
>     apply (and be specified).
>
> Yes. The problem here is that JSON can transmit stuff resembling 
> these: "For example, strings in Java, C#, or ECMAScript are Unicode 
> 16-bit strings, but are not necessarily well-formed UTF-16 sequences." 
> I also mentioned it because it says "A string data type is simply a 
> sequence of code units.", which matches ECMA-404 pretty well.
>
> Here, the distinction between "string" and UTF-8/UTF-16/UTF-32 is 
> clearly drawn. To use James' example:
>
> ---
>
> It does not make sense for a spec to define:
>   unicode-code-point = %x0-10FFFF
>   string = *unicode-code-point
>
> ---
>
> It seems to me that TR17 defines "string" this way. Which is not to 
> say that I recommend sending these things over the internet, just that 
> it can happen. I think the draft does a decent job discouraging this 
> one, but I guess it will have to be yet clearer.
>
The disconnect may be that Unicode needs to define "string" in ways that 
are not limited to interchange.

A string data type may transiently be in state that is not well-formed, 
or may be able to hold ill-formed input data before verifying its status.

Unicode, in other words, has to be universal covering ALL possible uses 
of a "string". The same is not true for a protocol that defines a record 
with data fields. Those fields don't need to be able to accommodate 
ill-formed data.

The exception might be a protocol that is "just a pipe". In that case, 
the responsibility for enforcing well-formedness belongs to (usually) 
the receiver connected to that "pipe". Such a protocol may well want to 
allow "all unicode code points".

A./