Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt

Asmus Freytag <asmusf@ix.netcom.com> Sat, 23 September 2023 17:25 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BA9C9C14F736; Sat, 23 Sep 2023 10:25:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.989
X-Spam-Level:
X-Spam-Status: No, score=-1.989 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nz3YEbwdvMNl; Sat, 23 Sep 2023 10:25:42 -0700 (PDT)
Received: from mta-102a.earthlink-vadesecure.net (mta-102a.earthlink-vadesecure.net [51.81.61.66]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 35C3CC14CEED; Sat, 23 Sep 2023 10:25:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=a5PAY/gW8wI4GQHWCnTyvSCFBr/xWK2Nvjp1pZ TG40w=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1695489939; x=1696094739; b=IGQ4GEWJVZRr+Fy+fnXF44Fx7hXm0jwhuA6UM0PkP6WouoxwpcergM4 c2/dr5n7I/kalhi7m57cqP1aco2xgTRx4894j/i7j0Ep+Stdq7GlYwwTEE+3aYJoHN7gCPf QBXobEW3fwolZTagoFpXVhkCnJ2BiPd92AqC6AAe5NxExEJy7cx49JncbtCZj6o5uO2ViH2 mExcA1a1d8wk+kaoVQRHxmkY9E4MKXF6fHf6x5S7PtCsIcBDOMLxWosh4OZDS4umtz5DhqB YR/NosELgTneJM/Y8OmSqX9WpVD6071eFUfE3pG3XxomlD7wzPNAWGjoQ2wbvBU6N7bTvT1 b+Q==
Received: from [10.71.219.206] ([198.54.131.115]) by vsel1nmtao02p.internal.vadesecure.com with ngmta id 7066aa9c-1787971daa2dcfcc; Sat, 23 Sep 2023 17:25:39 +0000
Content-Type: multipart/alternative; boundary="------------5jZggweXY0fcN75fzAt64rj0"
Message-ID: <651c624f-638f-01b0-9b20-9ff02f227e8a@ix.netcom.com>
Date: Sat, 23 Sep 2023 10:25:37 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1
Content-Language: en-US
To: Tim Bray <tbray@textuality.com>
Cc: "Manger, James" <James.H.Manger=40team.telstra.com@dmarc.ietf.org>, ART Area <art@ietf.org>, "i18ndir@ietf.org" <i18ndir@ietf.org>, Rob Sayre <sayrer@gmail.com>, Carsten Bormann <cabo@tzi.org>
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com> <SY4PR01MB5980D8DDE229D1C57AEDFB55E5FBA@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzRa8F+OrELa8N3rAMLmxdvr-g5c0i_9ESnWnwZY-iA4A@mail.gmail.com> <CAChr6Sy05spOW9nsy36kYr8Ob6OYS7vCgrEVPhhWs9Pe4LkpNA@mail.gmail.com> <2e6c2d13-9fc9-d320-3803-2b9a4df3b042@ix.netcom.com> <CAChr6Swr5tS2-wW8dZ0A4J7_Jd+RoHZNJkzhNfcVTi84oDvOPA@mail.gmail.com> <1d19f72f-8c41-f10c-831c-8e5cea347478@ix.netcom.com> <CAChr6Syxofvsz6bzw7sZcNNbQHw0KnBgTFfAmAmz8gRcQQwnBg@mail.gmail.com> <E20C6F72-C895-4EBB-B076-A3C317445049@tzi.org> <CAChr6Swza1QVvWCzqFBM3mdv=NjWHurGxkjO1zKdmUestQ-uOg@mail.gmail.com> <CAChr6SydgwW3wc+prxx+V8d+ithLoZ9+HGehGYYpWFOoMNMEYA@mail.gmail.com> <1a4bf4fa-9468-0739-0f99-70336e1037c9@ix.netcom.com> <CAHBU6isNM4to5NHkARL5pu3+F-QLoh3O7WHzrCSuQMhetqLFDg@mail.gmail.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <CAHBU6isNM4to5NHkARL5pu3+F-QLoh3O7WHzrCSuQMhetqLFDg@mail.gmail.com>
Authentication-Results: earthlink-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/4EHzYz3je8kx8DJ3T0hNVxydi4c>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Sep 2023 17:25:46 -0000

On 9/23/2023 9:51 AM, Tim Bray wrote:
> On Sep 23, 2023 at 8:24:15 AM, Asmus Freytag <asmusf@ix.netcom.com> wrote:
>> Identifiers benefit from a limited repertoire, often sharply limited, 
>> to aid in reliable recognition. Being mnemonic devices, there's no 
>> need to be able to represent all edge cases for all languages, as 
>> there would be for text. There are many other specifications that 
>> provide identifier repertoires, such as UAX#31, IDAN2008, 
>> domain-specific repertoires, and so on. For the purpose here, it 
>> would be useful to mention that there is a class of use cases that 
>> does need stricter repertoire limits than discussed here and that 
>> what is discussed here isn't recommended for those purposes.
>
> So, you’re suggesting a note saying that protocols can further 
> restrict repertoires on a per-field basis.  This seems so obvious that 
> I doubt it adds value, but if multiple people want it I guess it can 
> do no harm.
Yes and no. The fact that JSON decided to use code points as the 
repertoire for names means that the *need* to make a restriction is 
_not_ as obvious as you might think.

I understand one the key points of your draft to be that protocol 
designers should think through what the appropriate repertoire should be.

There are other constraints that people put on text fields (like 
formatted date, etc) but they are not usually based on the notion of a 
repertoire. Identifiers are an interesting case in that modern ones 
support fairly large repertoires, but making the repertoire too large 
doesn't make the result better. There is a tendency for people to think 
that for Unicode "the more the merrier" is always the right answer. You 
are going to the trouble of writing a draft that explains why that isn't 
so, and I'm suggesting you take a well known class of use cases to point 
out that there are occasions where even the "Unicode Assignables" are 
too unrestricted.

To me, that level discussion is different from saying "you can always be 
more selective" which contains just one bit of information but 0 
guidance or context.
>>
>> Then there's a the string data type. We don't need to define it, 
>> because Unicode defines it already, but it would be useful to put it 
>> into perspective. As defined, the string data type includes code 
>> points that can only ever be used in internal processing, and it 
>> includes ill-formed encoding forms, because they will normally occur 
>> in transient states during processing and may need to be temporarily 
>> held when receiving input (before and during verification). Those are 
>> the reasons that generic string data types are not restricted, unlike 
>> text content fields conforming to a protocol. (There may be protocols 
>> that should be defined on the basis of strings and not text content 
>> and for those, the recommendations here would not apply. And that 
>> should be explained).
>>
> I can’t see any reason to discuss string processing in programming 
> languages. The IETF is concerned with the bytes that flow across the 
> network. Our job is to document what to transmit and what to receive.

Again, my suggestion was for a discussion limited to explaining what is 
out of scope and why.

A./