Re: [I18ndir] Fwd: New Version Notification for draft-bray-unichars-04.txt

Tim Bray <tbray@textuality.com> Sun, 17 September 2023 23:06 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 697B7C15109D for <i18ndir@ietfa.amsl.com>; Sun, 17 Sep 2023 16:06:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1BF2uIseORZ3 for <i18ndir@ietfa.amsl.com>; Sun, 17 Sep 2023 16:06:24 -0700 (PDT)
Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A9E5AC151099 for <i18ndir@ietf.org>; Sun, 17 Sep 2023 16:06:24 -0700 (PDT)
Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-52a250aa012so4690644a12.3 for <i18ndir@ietf.org>; Sun, 17 Sep 2023 16:06:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1694991983; x=1695596783; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8wkczg/IlM3TPtW8iLmYi+OKGwqHmEUaRjpeTgvkkn0=; b=DYRZFvKMtPpb174dxL87FE7Ylt994R3N2dG+Xp6rCjdxw2ItEnSTPKD9taGfjA8mOO yeLMKWDskZFs8Knnd96xpy0dbYrE9b9F8bTMSAuYkvoHUtVNTM41IjjqKCktWEGpA+4F 8xwWTulHZslierTSk8ZY1Yz4kmr9wqf6Bd3lI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694991983; x=1695596783; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8wkczg/IlM3TPtW8iLmYi+OKGwqHmEUaRjpeTgvkkn0=; b=b3tpgC9hzL1y23QLcQsIboK1XJjfIezxa/2Sgr0iCGRej+N/4jKloBzbvCX2t9eNdh CyEFXFr/znn2Uc2dAuINo1vKzDpt+w5TH8AbvIn3dKqfvUB+wb63gUvqOfuOYth+AYQW R6KG0sA/harSYfcYQrAfhJnM977NMah0HcPy0N1vhHAS8OjSLo4hmhAzsa7NRkLX7FCS tXmTiIi9vQQqODAma4JMGc7CFucFy37xt2ppMvEPYTfpjdUnEtST3BFlC7ybqCbG4XuI 67JSBvYXhZNnuukzeLadpGnpqJwcWBO6e9ek3dLb6N8Fa7wFnLL3MNsILA/xo0QcYQMs YQUA==
X-Gm-Message-State: AOJu0YyWyureYE2xLH7U93FqGr6p3A0yqlXoVlKuH7LOQdcy8VvDJNp5 HG8brEM28xN/UCy6lW2x0EPJyo0ReMlxWoYgM3rxiAHe3hMJG5h/FYo=
X-Google-Smtp-Source: AGHT+IHZEntKQPEOsyAm7wAkY1qCcaAV6IP6TYq4/qSS4bXzWbncHI0n6atWBNAv95gifflBao1vIoAjPR6rgytz0Xo=
X-Received: by 2002:a05:6402:12d5:b0:523:b133:5c7e with SMTP id k21-20020a05640212d500b00523b1335c7emr7405961edx.1.1694991982739; Sun, 17 Sep 2023 16:06:22 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Sun, 17 Sep 2023 16:06:21 -0700
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Sun, 17 Sep 2023 16:06:18 -0700
Mime-Version: 1.0 (Mimestream 1.1.1)
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com> <472ef154-3f4b-d6f0-dc48-8599a7896f13@ix.netcom.com>
In-Reply-To: <472ef154-3f4b-d6f0-dc48-8599a7896f13@ix.netcom.com>
From: Tim Bray <tbray@textuality.com>
Date: Sun, 17 Sep 2023 16:06:21 -0700
Message-ID: <CAHBU6ivN5mWfH1f8SaBfQgWbwtT8MjzABVVPnOx+8hgj8RZYVQ@mail.gmail.com>
To: Asmus Freytag <asmusf@ix.netcom.com>
Cc: ART Area <art@ietf.org>, i18ndir@ietf.org
Content-Type: multipart/alternative; boundary="000000000000075292060596139e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/x-8A_J85nus1ttnPXDfGMJr4gfQ>
Subject: Re: [I18ndir] Fwd: New Version Notification for draft-bray-unichars-04.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Sep 2023 23:06:28 -0000

On Sep 15, 2023 at 2:01:48 PM, Asmus Freytag <asmusf@ix.netcom.com> wrote:

> This time I looked only at the diffs and sometimes a bit of adjacent text.
>
> This first one is not major, but a small fix would avoid a contradiction
> in terms.
>
> The numbers assigned to Unicode characters are called “code points”;
>
> This is backwards, as can be seen by "unassigned" code points. Easy to fix:
>

You are correct that there’s a problem here.  However, as we previously
established (on the art@ list only, so you probably didn’t see it), there’s
not really a “backward" and “forward". The Unicode Standard is inconsistent
on “assignment”, saying both that integer code points are assigned to
characters, and that characters are assigned to code points. See
https://mailarchive.ietf.org/arch/msg/art/3s70K1uHFguqeL8DFkIjiNfP5ZY/

The current draft talks about code points being assigned to characters and
is consistent, end to end. That’s because we felt that the abstract
character was the important thing, and the code point is a useful piece of
attached metadata.  I’m strongly of the opinion that we should be
consistent even if the Unicode standard isn’t.  So we can either address
this problem by reversing the direction of assignment end-to-end and
adopting your suggestion, or by modifying the language to make it clearer
what code points are - and in fact the definition from the Unicode standard
(which we quote later) says explicitly it's the numbers in the code space
0-0x10FFFF.  I’ll figure out which gives a better result.

> New text on dealing with problematic code points:
>
…

> In applying the recommendations of RFC19413 for text fields containing
> ill-formed UTF-8, for example, the recommendations must be applied to the
> field as a whole, not on the character or byte level. In fact, silently
> ignoring an ill-formed part of a string is a known security risk.
> Responding to that risk, [UNICODE] section 3.2 ....
>
>
Looks good to me.

> The last paragraph is overselling RFC9413, because the phrasing
> conceivably implies that it contains guidance specific to code points, when
> it is more generically concerned with problematic input. It also doesn't
> flow particularly well.
>
> You could move it at the head of Section 5, with tweak.
>
> Problematic code points are an example of problematic input. [RFC9413],
> "Maintaining Robust Protocols", provides a thorough discussion of
> error-handling options when choosing a strategy for dealing with
> problematic input. Different types of problematic code points cause
> different issues.
>
> Noncharacters....
>
> (I'm also suggesting adding a sentence to make the transition)
>
Will do something like what you suggest.

>
>