Re: [I18ndir] draft-bray-unichars-01

Tim Bray <tbray@textuality.com> Wed, 30 August 2023 16:48 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 249FFC15155A for <i18ndir@ietfa.amsl.com>; Wed, 30 Aug 2023 09:48:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bQAxSrP0l9QQ for <i18ndir@ietfa.amsl.com>; Wed, 30 Aug 2023 09:48:29 -0700 (PDT)
Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 119FAC15108C for <i18ndir@ietf.org>; Wed, 30 Aug 2023 09:48:23 -0700 (PDT)
Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-52a250aa012so7582066a12.3 for <i18ndir@ietf.org>; Wed, 30 Aug 2023 09:48:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1693414102; x=1694018902; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FpdaSWLhhVN1smSZWy0Bw75ow7yqLKY027utlv9F97M=; b=eT2ge/oYpuXfp1ec8x5JG86IcQWV5SWffSiQTlRtqPSHxfP7vIZjIkeKMdN7jL+bgt Y7YKqOQk05kf9NddRMI6aSyKpNj8BJekbN5oq2FZ2ublWCsdlcGRNBnamPT/YF9JQYRr /iN28NPJbzGHQXabm3qSvW3vMMbtypSbrVimQ=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693414102; x=1694018902; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FpdaSWLhhVN1smSZWy0Bw75ow7yqLKY027utlv9F97M=; b=LU7GrZr2yf0Kc9FMwwnfKySx1FShN9uFnLcnVfvA+7o0RSDdSQ/vSNjkFWtRlPmxba IQ31H2D7LA4GeFiXzRLxKuv5DHXHTZb8ZIXUDCHpnGJWOzHkwxsxOYdsM169PjkErI3S 1eXd+dVGAr2S5VlVwmFcxy6yCWKCwG8JIV4+W9A1RPtCN76Jr4jEWpc17jVXtwNNarL9 NCuZMpVWzYzv7VXYGavyP6DRNYXfUK5mHPfxdcgD13hZIKZEeU+icRooqvunkKGtm0KN x0nmDqmNAGmdsGyUw7TATSkFhhnzy2Pt7ILLvi9hHRT0RTqKnbO2Rw4udvPz8TWu/NdK ToTg==
X-Gm-Message-State: AOJu0YxG3quVAkcDvhnGTa2IcPirOar7LfOYw/Khj+LoVLaSmMPLzwXQ BiOYt9FxBQa8+aDWV8Oeng1Y/43Vor564Hc4+DyZQZJE9z3k7kCdlO8=
X-Google-Smtp-Source: AGHT+IFLHYvzyQJp5U1hN5szEHX2W8b1a+uBuNqEb39gWzSxSdlkn7APX9KlyMSGIGrh8NMvo8GNNxF58frLjmTDWjA=
X-Received: by 2002:a05:6402:204e:b0:529:fa63:ef80 with SMTP id bc14-20020a056402204e00b00529fa63ef80mr2265245edb.28.1693414101919; Wed, 30 Aug 2023 09:48:21 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Wed, 30 Aug 2023 09:48:21 -0700
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Wed, 30 Aug 2023 09:48:18 -0700
Mime-Version: 1.0 (Mimestream 1.0.5)
References: <CAHBU6isuZ1fgAjv14JRCiWaq-cmE69iEGajQkDDNA4CzfTKoxQ@mail.gmail.com> <122f70b8-62f8-cd24-a0e1-c3e0052b37e8@ix.netcom.com>
In-Reply-To: <122f70b8-62f8-cd24-a0e1-c3e0052b37e8@ix.netcom.com>
From: Tim Bray <tbray@textuality.com>
Date: Wed, 30 Aug 2023 09:48:21 -0700
Message-ID: <CAHBU6ivmCCOghYSP5zT1d6q3KbGtrtpC=pa4JZOMruz8iU=Bsg@mail.gmail.com>
To: Asmus Freytag <asmusf@ix.netcom.com>
Cc: "i18ndir@ietf.org" <i18ndir@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000000c02e060426b230"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/jaTVMiHFS6Lixbc9tpLHPNJVXvo>
Subject: Re: [I18ndir] draft-bray-unichars-01
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Aug 2023 16:48:33 -0000

 Working on a PR to include this input.

On Aug 29, 2023 at 12:10:42 PM, Asmus Freytag <asmusf@ix.netcom.com> wrote:

>
> (3)
>
> "reserved for internal use" should be "reserved for internal use by
> application" (as opposed to internal use by the standard).
>

D14 just says “reserved for internal use” and I’m not really comfortable
adding more interpretation.  I’m aware of more interpretation out there on
the net but I think this is OK as it stands?

(4)
> This subset has the advantage of excluding surrogates, which can never add
> any value and have the potential to cause problems.
>
> should be reworded a bit to:
>
> "This subset has the advantage of excluding surrogates, which are not
> assigned to any characters, and thus can never add any value.
> They have the potential to cause problems, for example it is not possible
> to represent them individually in UTF-8."
>

The Unicode Standard says that surrogates can only be used in UTF-16 and
RFC3629 explicitly says they are prohibited. Having said that, it’s easy to
generate UTF-8 that includes them (start by truncating a Java “char” array
incautiously, or just have a bad day while programming in C) and most
(all?) language implementations per Postel’s law will happily give you that
surrogate code point. That’s the problem; although it’s forbidden it can be
done and it does happen.  So I think that instead of “not possible” we
should say something like “… potential to cause problems; while it is
possible to include them in UTF-8 strings, this is forbidden by the
specification of UTF-8”.

Rationale for this suggestion is to be slightly more specific, so the
> reader comes away with the conclusion that the "can never add any value" is
> based on well-founded reasons and not editorial opinion by the writers.
>
> (5)
> The ABNF in section 4.1: the comments are confusing.
>

[more later]

>
>