Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt

Asmus Freytag <asmusf@ix.netcom.com> Sat, 23 September 2023 15:24 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1EC6AC151991; Sat, 23 Sep 2023 08:24:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.999
X-Spam-Level:
X-Spam-Status: No, score=-6.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BjNSypKBD5_3; Sat, 23 Sep 2023 08:24:19 -0700 (PDT)
Received: from mta-102a.earthlink-vadesecure.net (mta-102a.earthlink-vadesecure.net [51.81.61.66]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3BC0FC14CE22; Sat, 23 Sep 2023 08:24:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=PDD+J9OU7Bc4DKV93id1Iyk/RyEguY1TrCNdv/ 3fsww=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1695482656; x=1696087456; b=E++pQ3Rds864jslT8xm7xPH9A+U2ss6nHJw1aip7bA85bof+UN2kW7x yqO01xyadPvjiuCfKyOs2XxX55svuvoaec1YUh+++WmKUwNPCVqem/HeJ93BwxhemqHmqo8 TTsIb3GOkNeRWtgBSdS5AlM0SrUUz/mOncihGbIzjoaiNSntLtJcn3/urbIMlHQtXnEEEj5 pjnOVsk1ZlEn9Ux9QMNp+AiciV0lrt5lbp0fFbT804PFSZIRMNL9d/8tgl7Ul9QFpUFqAjr v4GNI1mIe8YRpzOOTG9cV7S7wVh/+Q3GH6+RThv9ga7j+TzD4KZiGm1cq1hyRFQgThZDj0S DkQ==
Received: from [10.71.219.206] ([198.54.131.115]) by vsel1nmtao02p.internal.vadesecure.com with ngmta id e2bf60fa-1787907dcb7bcadd; Sat, 23 Sep 2023 15:24:16 +0000
Content-Type: multipart/alternative; boundary="------------fDR1uztexABFPmC9NbzeZLp5"
Message-ID: <1a4bf4fa-9468-0739-0f99-70336e1037c9@ix.netcom.com>
Date: Sat, 23 Sep 2023 08:24:15 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1
Content-Language: en-US
To: Rob Sayre <sayrer@gmail.com>, Carsten Bormann <cabo@tzi.org>
Cc: "Manger, James" <James.H.Manger=40team.telstra.com@dmarc.ietf.org>, Tim Bray <tbray@textuality.com>, ART Area <art@ietf.org>, "i18ndir@ietf.org" <i18ndir@ietf.org>
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com> <SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBA@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzRa8F+OrELa8N3rAMLmxdvr-g5c0i_9ESnWnwZY-iA4A@mail.gmail.com> <CAChr6Sy05spOW9nsy36kYr8Ob6OYS7vCgrEVPhhWs9Pe4LkpNA@mail.gmail.com> <2e6c2d13-9fc9-d320-3803-2b9a4df3b042@ix.netcom.com> <CAChr6Swr5tS2-wW8dZ0A4J7_Jd+RoHZNJkzhNfcVTi84oDvOPA@mail.gmail.com> <1d19f72f-8c41-f10c-831c-8e5cea347478@ix.netcom.com> <CAChr6Syxofvsz6bzw7sZcNNbQHw0KnBgTFfAmAmz8gRcQQwnBg@mail.gmail.com> <E20C6F72-C895-4EBB-B076-A3C317445049@tzi.org> <CAChr6Swza1QVvWCzqFBM3mdv=NjWHurGxkjO1zKdmUestQ-uOg@mail.gmail.com> <CAChr6SydgwW3wc+prxx+V8d+ithLoZ9+HGehGYYpWFOoMNMEYA@mail.gmail.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <CAChr6SydgwW3wc+prxx+V8d+ithLoZ9+HGehGYYpWFOoMNMEYA@mail.gmail.com>
Authentication-Results: earthlink-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/WGQrkzwYTVHRvIoVc6JnTyVZmzQ>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Sep 2023 15:24:21 -0000

On 9/22/2023 11:44 AM, Rob Sayre wrote:
> On Wed, Sep 20, 2023 at 9:56 AM Rob Sayre <sayrer@gmail.com> wrote:
>
>     Well, it depends on whether you need to interoperate with the Web.
>     This can mean not only detecting the "toxic waste" code points,
>     but accepting them. That seems straightforward to me. I think the
>     other protocols and subsets are a better idea in other situations.
>
>
> ...
>
> Since strings in some languages and systems are Unicode strings, but 
> not necessarily well-formed UTF-8 or UTF-16, it can make sense to 
> transmit text of this sort.
>
> thanks,
> Rob
>
I think we need to distinguish three different needs.

(1) identifiers (names)

(2) clean definition of text content fields

(3) string data type (which is also used internally and may have 
transient states)

These use cases need different repertoires, and have different constraints.

Identifiers benefit from a limited repertoire, often sharply limited, to 
aid in reliable recognition. Being mnemonic devices, there's no need to 
be able to represent all edge cases for all languages, as there would be 
for text. There are many other specifications that provide identifier 
repertoires, such as UAX#31, IDAN2008, domain-specific repertoires, and 
so on. For the purpose here, it would be useful to mention that there is 
a class of use cases that does need stricter repertoire limits than 
discussed here and that what is discussed here isn't recommended for 
those purposes.

A clean text content field will be specified as text data in a 
well-formed encoding form and repertoire that excludes internal use as 
well as "useless" controls. As an output specification, that would be 
complete, as an input specification, it raises the issue of what to do 
with input that fails to fully conform. (Being ill-formed, or exceeding 
the repertoire). We have useful discussion on that.

Then there's a the string data type. We don't need to define it, because 
Unicode defines it already, but it would be useful to put it into 
perspective. As defined, the string data type includes code points that 
can only ever be used in internal processing, and it includes ill-formed 
encoding forms, because they will normally occur in transient states 
during processing and may need to be temporarily held when receiving 
input (before and during verification). Those are the reasons that 
generic string data types are not restricted, unlike text content fields 
conforming to a protocol. (There may be protocols that should be defined 
on the basis of strings and not text content and for those, the 
recommendations here would not apply. And that should be explained).


Being explicit about these use cases and their different needs, even if 
the suggested repertoires only address the middle case, prevents readers 
from bringing their assumptions based on different use cases.

A./