Re: [I18ndir] Fwd: New Version Notification for draft-bray-unichars-04.txt

Asmus Freytag <asmusf@ix.netcom.com> Mon, 18 September 2023 05:30 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C7A86C14CE45; Sun, 17 Sep 2023 22:30:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.987
X-Spam-Level:
X-Spam-Status: No, score=-6.987 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UEb0OKUCT9BN; Sun, 17 Sep 2023 22:30:27 -0700 (PDT)
Received: from mta-102a.earthlink-vadesecure.net (mta-102b.earthlink-vadesecure.net [51.81.61.67]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9A98EC14CE39; Sun, 17 Sep 2023 22:30:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=vmA0d3R/R7RYWmaeRI0+UyZ7sxKKwOJPDXN3GZ unuAs=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1695015026; x=1695619826; b=GInBBivkGo+1GHT3yuusXZhr7E7OYO2caX+TB/Z8GbDsmMgynyAQsTl D1ExdzlpHIgc7CTrJ2+LCjpbeBhCz6AmuR1JDZRrYXSe8fF7sYNtnSe0D+AHUUNVe9IyobO 2HPuIgfB3CzqbchhyjoP6nJioUEBesSOtpaTVMm+THQAsNfEFCVMufmobPsySyYdUeubUZ3 DRezMAjdVAtNxyU9pnTvwJu2KHEf37tjarurKh3+yLdiBCEJYVRdIVxyXMzonts5EQMsPho /igWnQ7e4yMIUgFCFbPwnT0k8jrKja1rvqPBEH9BxVAeOfETvRVQ29UZ8Ag/QUchKHfvb9W 5OA==
Received: from [10.71.219.206] ([198.54.134.147]) by vsel1nmtao02p.internal.vadesecure.com with ngmta id e2bcb297-1785e72f7f7c0c96; Mon, 18 Sep 2023 05:30:26 +0000
Content-Type: multipart/alternative; boundary="------------FLEBeVvnEG6b4kOwBnyQ461A"
Message-ID: <4b192644-127f-a660-eedc-86774a1be6db@ix.netcom.com>
Date: Sun, 17 Sep 2023 22:30:24 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1
Content-Language: en-US
To: Tim Bray <tbray@textuality.com>
Cc: ART Area <art@ietf.org>, i18ndir@ietf.org
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com> <472ef154-3f4b-d6f0-dc48-8599a7896f13@ix.netcom.com> <CAHBU6ivN5mWfH1f8SaBfQgWbwtT8MjzABVVPnOx+8hgj8RZYVQ@mail.gmail.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <CAHBU6ivN5mWfH1f8SaBfQgWbwtT8MjzABVVPnOx+8hgj8RZYVQ@mail.gmail.com>
Authentication-Results: earthlink-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/E_E4Klb7_1Zsox_ifkQ1ASSZ34Q>
Subject: Re: [I18ndir] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Sep 2023 05:30:29 -0000

On 9/17/2023 4:06 PM, Tim Bray wrote:
> On Sep 15, 2023 at 2:01:48 PM, Asmus Freytag <asmusf@ix.netcom.com> wrote:
>>
>> This time I looked only at the diffs and sometimes a bit of adjacent 
>> text.
>>
>> This first one is not major, but a small fix would avoid a 
>> contradiction in terms.
>>
>>> The numbers assigned to Unicode characters are called “code points”;
>> This is backwards, as can be seen by "unassigned" code points. Easy 
>> to fix:
>
> You are correct that there’s a problem here.  However, as we 
> previously established (on the art@ list only, so you probably didn’t 
> see it), there’s not really a “backward" and “forward". The Unicode 
> Standard is inconsistent on “assignment”, saying both that integer 
> code points are assigned to characters, and that characters are 
> assigned to code points. See 
> https://mailarchive.ietf.org/arch/msg/art/3s70K1uHFguqeL8DFkIjiNfP5ZY/

OK - would have been helpful if I had seen that earlier.

And I'm not surprised to see some variability in usage in the Unicode 
Standard.

However, the fact that the underlying mapping is bidirectional makes it 
possible to speak of "assigned code points" while simultaneously 
considering the code point a piece of, as you call it, meta data on the 
character.

In this case "the numbers used to represent Unicode character are ...." 
would also work, as you found out re: "representation".


>
> The current draft talks about code points being assigned to characters 
> and is consistent, end to end. That’s because we felt that the 
> abstract character was the important thing, and the code point is a 
> useful piece of attached metadata.  I’m strongly of the opinion that 
> we should be consistent even if the Unicode standard isn’t.  So we can 
> either address this problem by reversing the direction of assignment 
> end-to-end and adopting your suggestion, or by modifying the language 
> to make it clearer what code points are - and in fact the definition 
> from the Unicode standard (which we quote later) says explicitly it's 
> the numbers in the code space 0-0x10FFFF.  I’ll figure out which gives 
> a better result.

The nub of my comment was that you had assignment one way, but talk 
about "un/assigned" code points which then is weird.

You may be able to retain the sense you have by talking a bit about the 
assignment and also introducing representation.

It might not hurt giving the definition of encoded character.

I think you have all the information in hand to come to a good resolution.

A./


>> New text on dealing with problematic code points:
>>
> …
>>> In applying the recommendations of RFC19413 for text fields 
>>> containing ill-formed UTF-8, for example, the recommendations must 
>>> be applied to the field as a whole, not on the character or byte 
>>> level. In fact, silently ignoring an ill-formed part of a string is 
>>> a known security risk. Responding to that risk, [UNICODE] section 
>>> 3.2 ....
>
> Looks good to me.
>>
>> The last paragraph is overselling RFC9413, because the phrasing 
>> conceivably implies that it contains guidance specific to code 
>> points, when it is more generically concerned with problematic input. 
>> It also doesn't flow particularly well.
>>
>> You could move it at the head of Section 5, with tweak.
>>
>>> Problematic code points are an example of problematic input. 
>>> [RFC9413], "Maintaining Robust Protocols", provides a thorough 
>>> discussion of error-handling options when choosing a strategy for 
>>> dealing with problematic input. Different types of problematic code 
>>> points cause different issues.
>>>
>>> Noncharacters....
>>
>> (I'm also suggesting adding a sentence to make the transition)
>>
> Will do something like what you suggest.
>>