Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-05.txt

Tim Bray <tbray@textuality.com> Wed, 20 September 2023 19:25 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 36B17C14CF1B for <i18ndir@ietfa.amsl.com>; Wed, 20 Sep 2023 12:25:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YQLvM35-eVR9 for <i18ndir@ietfa.amsl.com>; Wed, 20 Sep 2023 12:25:43 -0700 (PDT)
Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D2DD0C14F747 for <i18ndir@ietf.org>; Wed, 20 Sep 2023 12:25:43 -0700 (PDT)
Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-5308430052fso103751a12.1 for <i18ndir@ietf.org>; Wed, 20 Sep 2023 12:25:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1695237942; x=1695842742; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1GS07wqB/h9WK5CgLBKxXOnJqQifsBD7XXtFVv51I+U=; b=dP5DKGyG6AVonipBA6yu02/4fMHIXLWgNBl4ub8mg++LwEQTz30HZjH3WU6AZ8NNue 7/EtHcGYiQTgn5InCEV4kX5BupoeRuDYIuQIUAd+8fe0YG7e0Ix5+d2wxdGqUh2GUDl3 Q1XxFzJmGYhaDTSWFbUkzVTzi4kGAg15JvB1U=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695237942; x=1695842742; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1GS07wqB/h9WK5CgLBKxXOnJqQifsBD7XXtFVv51I+U=; b=KzcZqpaV7D8J1mj93/hXGpgrWQG62HjWi7tvuNt+V54bK918pJteAYKbsp4vTs3TTN IznQJS5zgi+ZojXl59BVHDe35Werl5tM8dE2rfYeV6BaQlbm8StRryzhABg++bVmY4Jr YOII2f3ZW4VlyDk5UUAGqWx7Gusey8GukZDHIe6t3ppGcYtld+yeFJfBZsJ/dja1NAB6 u8R62yqbO0jr2KyBJKymomj+lJhboxOQErDPMFB4nXUSPp5UpF+VUXEGZRiQ13QK5SeL UWBcpp1wS9aDvqScujSJOR04BF8azEHBAgjSU+bcDbxMc3+3pITheRkTC0iwjDFr9Lm0 skzg==
X-Gm-Message-State: AOJu0YwYc8krXHeky8HPdmUfE+hZX0i3ydF5Ykf6A/mQ0bdItnSZ4jOs j8QAXBakU+86a2fY6iiKuhfMN8mN55+nI77mUF86RgQRIS7UhWCiLkc=
X-Google-Smtp-Source: AGHT+IHP21tnp6+Yjoj29PPD92pBq4V7P09ehVvgdLqxETkdmDaZS1IbRwmzK/X4kTXs8gDVfE9TNYEHmf9alsZAgt0=
X-Received: by 2002:aa7:c993:0:b0:523:ae0a:a446 with SMTP id c19-20020aa7c993000000b00523ae0aa446mr3318495edt.24.1695237942175; Wed, 20 Sep 2023 12:25:42 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Wed, 20 Sep 2023 21:25:41 +0200
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Wed, 20 Sep 2023 21:25:38 +0200
Mime-Version: 1.0 (Mimestream 1.1.1)
References: <169514412895.12827.17924518978945582691@ietfa.amsl.com> <CAHBU6iuUsa4H_9BNvf3XEuOg3ZB5qB31vQuodQhacQUMxFiUMg@mail.gmail.com> <CAChr6SxMswjKACUr3cpZymjEOqnrxTQV2hX9mpsZO1=H2TwEZg@mail.gmail.com>
In-Reply-To: <CAChr6SxMswjKACUr3cpZymjEOqnrxTQV2hX9mpsZO1=H2TwEZg@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
Date: Wed, 20 Sep 2023 21:25:41 +0200
Message-ID: <CAHBU6iu-taEfDuO=BYHwPFtkKjqe=hX2xH4jAZQzySvXZ-Psbg@mail.gmail.com>
To: Rob Sayre <sayrer@gmail.com>
Cc: i18ndir@ietf.org, ART Area <art@ietf.org>, asmus@unicode.org
Content-Type: multipart/alternative; boundary="0000000000005a82c80605cf57b8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/VjIaey-U0tVTlrdjEjpEG8ugTbA>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-05.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Sep 2023 19:25:48 -0000

On Sep 19, 2023 at 4:04:02 PM, Rob Sayre <sayrer@gmail.com> wrote:

>
> There are still some bugs here. Generally, I think "Abstract Character
> Repertoire" as used here is good:
> https://unicode.org/reports/tr17/#Repertoire
>

> Making it clear that the various encoding and escaping routines happen
> before or after this idea. I don't think you need to add "Abstract" as a
> qualifier. Just explain it.
>

Yeah, per comments below, we need to tighten up the definition of what we
mean when we say “character repertoire”, will do so.


> > The Unicode Standard's definition of "Unicode character" is conceptual.
> > However, each Unicode character is assigned a code point, used to
> represent
> > the characters in computer memory and storage systems and, in
> specifications,
> > to specify the allowed repertoires of Unicode characters.
>
> I think you want to add: "Not all code points represent characters."
>

I think this is rather fully covered in the immediately following paragraph.


> > In ABNF, the hexadecimal values for characters are preceded by "%x"
> rather than "U+"."
>
> But these are code points in the ABNF, right? For example:
>

Right, but we haven’t defined “code point” yet, so I just took out “for
characters” - In ABNF, hex is signaled by %x

https://www.ietf.org/archive/id/draft-bray-unichars-05.html#section-4.1
>
> "; exclude surrogates"
>
> These are in the problematic code point types. They are not characters.
> So, it's probably best to go through and clean that up.
>

Hmm.  I think it will suffice to clean up the definition of “character
repertoire” - none of ours include surrogates, thus this production, but
some of ours include noncharacters.  Which I think is OK, I really want to
retain the term "character repertoire" because that maps to the way people
think about the problem: “What Unicode characters should I allow?” That’s a
reasonable question to ask; to answer it, you have to learn about code
points and scalars and surrogates and so on. Thus this document.

I think the "Restricting Character Repertoires" section should be run
> through a grammar checker (MS Word or something). It doesn't say anything
> incorrect, but I often thought "hmm, there should be a comma there" and
> little things like that. Thank you for taking the "conforming JSON text"
> suggestion, but the capitalization differs between the two uses: "JSON
> text" vs "JSON Text".
>

If this progresses, there will be lots of opportunities for language
revision.


>