Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03
Asmus Freytag <asmusf@ix.netcom.com> Fri, 08 September 2023 21:01 UTC
Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D70AEC14CE54 for <i18ndir@ietfa.amsl.com>; Fri, 8 Sep 2023 14:01:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.996
X-Spam-Level:
X-Spam-Status: No, score=-6.996 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5NKgNgV_KOs8 for <i18ndir@ietfa.amsl.com>; Fri, 8 Sep 2023 14:01:20 -0700 (PDT)
Received: from mta-201a.earthlink-vadesecure.net (mta-201a.earthlink-vadesecure.net [51.81.229.180]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 06E68C14CE3F for <i18ndir@ietf.org>; Fri, 8 Sep 2023 14:01:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=k44rKt7ljDRXGDwkuu78JDAbX4hL6z+RH61U6w vQY3U=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1694206879; x=1694811679; b=jzsam/oriyNHM3+gPpYSSkVfw2GV+4yj6eq9IL17e0jClCT0DoFh46x Qj3qg5LUD8fhSa5/L+lVMzSnrr0iXw49UOGZZ1JUw02tQEGF2AIdafTn0HGsmg+lrSw+4WW VP8U6RGo5K4DoKlWMP5jGdSiOCkPhmx9inzyc/PUm90WfZfmtvRuM9QcH9GETXxwY2R5jof 2EKwcbQwSFIbHM7s+p2EwrZCRENMoJTGjmFgii5BdWuCVZ682K9uImxCZUt4o6/5qJC5BzT qCach08ochq/q/1OkHefp/siPR7x6l1zau1dSn6f0CY35eneNgF8NEzpeXUytCm7irRTf8s GFQ==
Received: from [10.71.219.206] ([198.54.131.147]) by vsel2nmtao01p.internal.vadesecure.com with ngmta id 6000dd95-1783082e3b3c3abd; Fri, 08 Sep 2023 21:01:19 +0000
Content-Type: multipart/alternative; boundary="------------V4OclXtMe0M6YiqhyVZulrfu"
Message-ID: <3f61b607-f2cb-265d-7396-24e18355327e@ix.netcom.com>
Date: Fri, 08 Sep 2023 14:01:19 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0
Content-Language: en-US
To: i18ndir@ietf.org
References: <CAHBU6is50TkpDsqXTp6WxdVSgE66j3gGHZ60ey2jFYbefaHFJw@mail.gmail.com> <CAChr6SwDNujzq6+T6CXPko3jju9EiL6kmQCgNs4Ly7QAALujqg@mail.gmail.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <CAChr6SwDNujzq6+T6CXPko3jju9EiL6kmQCgNs4Ly7QAALujqg@mail.gmail.com>
Authentication-Results: earthlink-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/8mhypkdutRAw-tgU-g7EoKcffKA>
Subject: Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Sep 2023 21:01:23 -0000
On 9/8/2023 1:16 PM, Rob Sayre wrote: > On Fri, Sep 8, 2023 at 12:38 PM Tim Bray <tbray@textuality.com> wrote: > > See https://www.ietf.org/archive/id/draft-bray-unichars-03.html > > A bunch of minor corrections and improvements, thanks to everyone > for that, especially James Manger for noticing that the ABNF was > entirely wrong in one place. > > The word “useless” has been replaced by “legacy”. > I like. On re-reading, I note that I keep stumbling over this statement "While the inclusion of unassigned code points in text data is undesirable, it is difficult to specify that it should be avoided, because unassigned code points regularly become assigned as new characters are added to Unicode." Small, closed character sets are a luxury of the past. They are unrealistic, because they assume a static universe. A comprehensive character set like the Unicode Standard must be able to grow. Not only because it intends to be universal, with many writing systems not yet cataloged, but also because existing writing systems change. Western Europe alone, despite its well established use of the Latin script, has seen two new characters, the € and the capital sharp s. When you have a closed set, your only way to react to such changes is to switch character sets. A minimal fix to the text might be to change "is undesirable" to "may have its drawbacks". Now, there are cases where it is necessary to restrict the use of Unicode characters to the subset that represents a known version. This is usually the case if data has to be normalized (and must stay normalized, something that cannot be guaranteed for unassigned code points). For an example, see IDNA2008. Which leads to a suggestion for a fifth set (really a series), which would be "versioned-assigned". It would include all code points assigned as of a given version, minus the "legacy controls". That is a nice repertoire to use whenever you need to normalize your data and guarantee that their normalization does not change. (As it turns out, you can migrate your repertoire, over time, to later versions, as long as data that is normalized is free of unassigned code points relative to the version of the normalization tables.) > > I think the feedback was pretty clear that the draft needed to be > more opinionated; just because we document the existence of the > default JSON repertoire (“all the code points”) doesn’t mean that > anyone should use it in the present or future. So, introduced a > new section “Refining Character Repertoires” to highlight those > issues and offer a suggestion. > > > This one is tougher and correct. Fully in favor. > > I would change > > "These numbers are used to represent the characters in computer memory > and storage systems and, in specifications, to specify the allowed > repertoires of Unicode characters." > > to > > "These numbers are used to represent the allowed repertoires of > Unicode characters." This change makes it sound like that is their only use. For that reason, I like the original formulation better. I can see where it could be a little confusing in the other direction, because the original formulation might imply that internal representation is directly in terms of those numbers, whereas it usually is in terms of a transformation format, whether UTF-8 or UTF-32. Now, we don't want to delve deeply into a discussion of transformation formats, code units and the like. "These numbers underlie the representation of characters in computer memory, storage systems and data transmission. In specifications they are used..." To me, that better reflects the central nature of this concept, but removes the implication that they are used "as is" in data representation. > > Other commenters have said the "useful" term is not that great. I > agree, but I can't think of anything better. In particular, I thought > "no, people really do use NULL". Maybe "text-characters"? IDK, up to > the editors. There are two types of data formats. Those that can contain NULL and those that cannot. NULL, where used, always has a special meaning. The days where it could be inserted into a datastream with "null effect" are long over, event thought that was among the original use cases. Most people don't use VT or FF (or NEL) but there are legacy data formats that do. Perhaps, the best thing would be to acknowledge that there may be a need to be more permissive, if and when lossless conversion from legacy data is needed. A useful discussion would contain a recommendation to what spec authors should do. The section on "Refining Character Repertoires" currently represents 1/2 of the coin -- it is limited to a discussion on how to be more restrictive. It could benefit from discussing the other half: what about citing something like "useful assignables" *plus* selected controls, instead of either allowing all controls, or using one of the wider repertoires that drags in all the non-characters. I think that would be a useful extension of the work. > > thanks, > Rob > >
- [I18ndir] Just uploaded draft-bray-unichars-03 Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Steffen Nurpmeso
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray
- Re: [I18ndir] Just uploaded draft-bray-unichars-03 Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Manger, James
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Carsten Bormann
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Carsten Bormann
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Carsten Bormann
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Manger, James
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Asmus Freytag
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Carsten Bormann
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Steffen Nurpmeso
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Steffen Nurpmeso
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Steffen Nurpmeso
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Rob Sayre
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Kevin Marks
- Re: [I18ndir] [art] Just uploaded draft-bray-unic… Tim Bray