Re: [I18ndir] draft-bray-unichars-01
Tim Bray <tbray@textuality.com> Wed, 30 August 2023 02:50 UTC
Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9BD01C151546 for <i18ndir@ietfa.amsl.com>; Tue, 29 Aug 2023 19:50:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.106
X-Spam-Level:
X-Spam-Status: No, score=-7.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cwsHQUCURE3r for <i18ndir@ietfa.amsl.com>; Tue, 29 Aug 2023 19:50:14 -0700 (PDT)
Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5E2DFC151532 for <i18ndir@ietf.org>; Tue, 29 Aug 2023 19:50:14 -0700 (PDT)
Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-500913779f5so8152058e87.2 for <i18ndir@ietf.org>; Tue, 29 Aug 2023 19:50:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1693363812; x=1693968612; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ykUj6fQB9qj9K+aZk6t7hFpR2/WklEYAPlf42d5OG9g=; b=MrWwYUqe7x3FLi9z9qKjTorWQ8F4xkolt6dlaUhZkmKZhemEMswkC0bhTVIvdMj6A9 Wayng2bu4ETfpPWEiS9Z6OQOPR/ZShbil788GLmQI4Ztfz06Iki2mnRyHmGuFqTtyiqo wnLaesy9rfXhrhhyUkLkwyqL5c24NLdJ9V16I=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693363812; x=1693968612; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ykUj6fQB9qj9K+aZk6t7hFpR2/WklEYAPlf42d5OG9g=; b=Qcuncqf2hcNygfTdNJLHTNHcuLXOW1MdHSDDE+dwN3G0+ua+D2jiTwtN1xHStJb8Mu GvLpY9vde9/D3GBMVHTNmTaQcYHB8+VLAK3A4QMJnCQc9R8MOSxFqaUBBio5JUMajZyv XyuaQzfukuIxc9FpaReGfzTWz1DSAdTBR20tdjXgR6gGotg9j+yc/20Qxhj2RREJQ1vV rAh4bSpVqlPTuBDeKpSlHGUNJTD8Hs7czq8z7vFeXrSWI47lwOp/CBVcaxeCbjHhqx4l 1bw3aj4k8g50CuyKCmENJ2zp4d1agMSAld96va8EMrlTDSvn4MCTPaMnLetULPYWOvp0 gPeA==
X-Gm-Message-State: AOJu0Yw0Sr6yxFqlujqjd8jeQSMXSjzMBRvOoZYM0W0hK47YyYpNw4us 9y9VQHYmcI5VIDMbW4qAKLn0rpjM3h3GRggneQxQFaTWM3ncw65UtWE=
X-Google-Smtp-Source: AGHT+IG0M40s0Zq+jLQijOxktgISFEECKrYPMSBXn6ESPteFBWHA/Es/h7g5lkcL8ycFVCRTq+ZKRuLl97++jwecIrQ=
X-Received: by 2002:ac2:4e8c:0:b0:500:b8bc:bd9a with SMTP id o12-20020ac24e8c000000b00500b8bcbd9amr449282lfr.49.1693363811427; Tue, 29 Aug 2023 19:50:11 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Tue, 29 Aug 2023 21:50:10 -0500
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Tue, 29 Aug 2023 21:50:07 -0500
Mime-Version: 1.0 (Mimestream 1.0.5)
References: <CAHBU6isuZ1fgAjv14JRCiWaq-cmE69iEGajQkDDNA4CzfTKoxQ@mail.gmail.com> <122f70b8-62f8-cd24-a0e1-c3e0052b37e8@ix.netcom.com> <CAHBU6isB7u7wqaJOsuae3O8m9vi3P9Z5c8H1OhH4EiXUP9wJsA@mail.gmail.com> <a4e8ead7-c3ac-0d9a-fc27-e70b59233614@ix.netcom.com>
In-Reply-To: <a4e8ead7-c3ac-0d9a-fc27-e70b59233614@ix.netcom.com>
From: Tim Bray <tbray@textuality.com>
Date: Tue, 29 Aug 2023 21:50:10 -0500
Message-ID: <CAHBU6isnxUvx7wFz=Z8QjgFWncekPUpfQ8gK=pjF5C+vh=EkPQ@mail.gmail.com>
To: Asmus Freytag <asmusf@ix.netcom.com>
Cc: "i18ndir@ietf.org" <i18ndir@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000074baa806041afc24"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/5mZtov5ZK_8k24ZaWrsytqjta0s>
Subject: Re: [I18ndir] draft-bray-unichars-01
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Aug 2023 02:50:18 -0000
On Aug 29, 2023 at 5:22:03 PM, Asmus Freytag <asmusf@ix.netcom.com> wrote: > On 8/29/2023 5:05 PM, Tim Bray wrote: > > Asmus, these are great! Thanks. If this document goes forward I think it > will include almost all of what you suggest. > > I would be thrilled. I think it's useful to move this forward (with > changes). > > > My favorite input is (7) - Neither co-author is in love with “Basic”, a > better suggestion would be greeted with joy. > > "Minimally restrictive" describes what you are aiming for. You are > restricting everything that has no business being interchanged and no more.' > Standing alone, that makes sense. But compared to the other options in the draft, such as those used by JSON & XML, we are *more* restrictive. So there also could be a case for “maximally restrictive”. Maybe just “Restricted Code Points”? Hmm, not obvious. > Something like this then sets up the idea that there are other scenarios > where the likely direction is "more restrictive", except perhaps some > protocols that do assign meaning to certain controls and therefore rely on > them -- we all think that these should be the exception; in other words, if > you picked something that indicates an endpoint (like "minimally") the nice > feature would be that most other specifications would tend to subset. > > For example, by restricting private use. > > A./ > > PS: you might think about > > (8) > > Suggest a standard methodology for specifying subsets other than the "XML" > or "minimally restrictive" ones / or at least discuss some concepts around > this area, or point to specs in and out of ITETF that create limited > subsets for various purposes. > > There's always a tension of defining data formats that don't impose > practical restrictions on the text content and situation where general text > is not what you need (but pure ASCII strings don't cut it either). > > > > On Aug 29, 2023 at 12:10:42 PM, Asmus Freytag <asmusf@ix.netcom.com> > wrote: > >> Comments on the draft (in no particular order or priority). >> >> (1) >> "Since the C0 controls include zero and the 32 smallest integers, they >> are likely to occur in data as a result of programming errors." >> >> In the way the Unicode standard refers to characters "the control codes" >> are the characters, so "the C0 control codes include" would be read that >> way. What you are referring to here are the integer values, corresponding >> to a code point. Suggest rewording because "zero" as a digit is not a >> control code and while it's possible to figure out what must be meant, it's >> a needless ambiguity. >> >> How about "Since the code points for C0 controls include the 32 smallest >> integers including zero..." >> >> (2) >> I recommend thinking not of "problematic code points" but of "problematic >> code point types" as per this definition from the Unicode Glossary. >> >> *Code Point Type <https://www.unicode.org/glossary/#code_point_type>*. >> Any of the seven fundamental classes of code points in the standard: >> Graphic, Format, Control, Private-Use, Surrogate, Noncharacter, Reserved. >> (See definition D10a in Section 3.4, Characters and Encoding >> <https://www.unicode.org/versions/latest/ch03.pdf#G2212>.) >> >> This would let you also address the private-use problematic in >> interchange (even if you conclude they might be useful enough not to >> restrict them by default, they suffer from the same lack of consensus >> interpretation as the controls). >> >> (3) >> >> "reserved for internal use" should be "reserved for internal use by >> application" (as opposed to internal use by the standard). >> >> >> (4) >> This subset has the advantage of excluding surrogates, which can never >> add any value and have the potential to cause problems. >> >> should be reworded a bit to: >> >> "This subset has the advantage of excluding surrogates, which are not >> assigned to any characters, and thus can never add any value. >> They have the potential to cause problems, for example it is not possible >> to represent them individually in UTF-8." >> >> Rationale for this suggestion is to be slightly more specific, so the >> reader comes away with the conclusion that the "can never add any value" is >> based on well-founded reasons and not editorial opinion by the writers. >> >> (5) >> The ABNF in section 4.1: the comments are confusing. >> >> (6) >> I would like to refer you to table 2-3 in the Unicode Standard. The >> "Basic Unicode Characters" that you propagate consist of what Unicode >> considers "Assigned" code points plus "Reserved", with the modification >> that you are subtracting the "useless controls". >> >> Reserved code points are sometimes referred to as "assignable code >> points" (for example on the bottom of page 30 in Unicode 15.0.0). That >> makes that subset the combination of "assigned" plus "assignable" code >> points. (Which then, modulo "useless" controls, corresponds to the bulk of >> the basic set.) >> >> I would suggest to explicitly relate your definition to those terms. >> >> (6) >> The definition "useful controls" is currently buried in the text and >> there's not even a header to locate the definition. Because this (or the >> complementary definition of "useless" ones) is the only value added piece >> over "assigned + assignable", I suggest elevating the definition. >> >> I'm not going to quibble about the very opinionated naming of the >> concept. >> >> (7) >> The term "Basic" is an interesting choice. Because the set is anything >> but "basic" -- it includes all code points that can be maximally assigned, >> except for 61 C0 / C1 Controls. The only part of the set that is "basic" is >> in fact the subset of control codes. >> >> What you have defined is the "maximally useful set of Unicode code points >> for data interchange, absent a protocol defining specific control code >> semantics". >> >> >> Hope you find some of these comments useful, >> A./ >> >> >> >> >> >> >> >> >> On 8/29/2023 9:37 AM, Tim Bray wrote: >> >> Hello I18ndir (anyone still here?), Paul Hoffman and I just submitted >> draft-bray-unichars-01 - our AD Francesca Palombini suggested we notify >> this list: https://datatracker.ietf.org/doc/draft-bray-unichars/ >> >> This draft fell out of a conversation originally provoked by this errata >> report: https://www.rfc-editor.org/errata/eid7603 >> >> It revealed a distressing lack of consensus about Unicode characters and >> code points and character repertoires. I feel personally bad because I am >> the editor of a couple of RFCs that are open to criticism on this front. >> >> So, this tries to say “here’s how an RFC should specify which Unicode >> characters it supports”. We think this would be useful to multiple groups >> almost immediately, including especially those who don’t realize the area >> can be problematic. >> >> Anyhow, the purpose of this note is to ask your advice on how to take >> this forward. We think this works well as an individual submission. We >> don’t *think* it needs a working group, but that’s not our call. The draft >> doesn’t express any opinions about best practices, it just points out >> several alternative character repertoires, provides ABNF, and discusses >> their trade-offs. None of them are wrong. >> >> >> >
- [I18ndir] draft-bray-unichars-01 Tim Bray
- Re: [I18ndir] draft-bray-unichars-01 Asmus Freytag
- Re: [I18ndir] draft-bray-unichars-01 John C Klensin
- Re: [I18ndir] draft-bray-unichars-01 Carsten Bormann
- Re: [I18ndir] draft-bray-unichars-01 Tim Bray
- Re: [I18ndir] draft-bray-unichars-01 Asmus Freytag
- Re: [I18ndir] draft-bray-unichars-01 Tim Bray
- Re: [I18ndir] draft-bray-unichars-01 Asmus Freytag
- Re: [I18ndir] draft-bray-unichars-01 Tim Bray
- Re: [I18ndir] draft-bray-unichars-01 Nico Williams
- Re: [I18ndir] draft-bray-unichars-01 Tim Bray
- Re: [I18ndir] draft-bray-unichars-01 Asmus Freytag
- Re: [I18ndir] draft-bray-unichars-01 Tim Bray