Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt

Tim Bray <tbray@textuality.com> Fri, 06 October 2023 23:10 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EE72FC151520 for <i18ndir@ietfa.amsl.com>; Fri, 6 Oct 2023 16:10:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.106
X-Spam-Level:
X-Spam-Status: No, score=-2.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 016p4xOhlB2M for <i18ndir@ietfa.amsl.com>; Fri, 6 Oct 2023 16:10:26 -0700 (PDT)
Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9B2A2C14CE54 for <i18ndir@ietf.org>; Fri, 6 Oct 2023 16:10:21 -0700 (PDT)
Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-52bd9ddb741so4728717a12.0 for <i18ndir@ietf.org>; Fri, 06 Oct 2023 16:10:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1696633819; x=1697238619; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=XgfAVikFVz9H4Sd7NuSBDXQQpoO9p8rao+E0chn5UWs=; b=M3mEQo+CSUEuRh5WC6F0quhhfTAx5WjcI9A/u7O2YrSXqlr7HA8wJO5s70fcFRw7F2 Xin52/wdgOwT/s2b5TgzrnPuwQ9liIqEszpg+jYCX5jpDbZDUa3eT8VhAuciTpSyrGxZ Z2UKhtw25gn+qlu8OfXrZEXqrjWi8NmSp7MdM=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696633819; x=1697238619; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XgfAVikFVz9H4Sd7NuSBDXQQpoO9p8rao+E0chn5UWs=; b=aoB5xbOlblrCoDdvVCjbpwaBmZLTCKwdo/zrPObzyVpG/QQlBrzVzwDhQd9ema2Cdi asBb4DtBl6BBzPM+KCGF5IjFmtPun8VVAseVQ1WEfA6UYCLdbz4Bzyn3Y08HH9ebn4UF 8aSkQumytcs18EUrCFCR/v5ceRShs0W5pb4ODJFxkP7q3z9o3AydFoKmm+Y71h2mqO8f UYa6qbAd8zIVHKEizY8oMWLZNUYdO98ERMGHGlg/1Ej6N4Ky+BRcwABWMP64AMfTcb3o 9fOrxr/DgxOQJbFTObFicRCxhKILvv75Nnznm7FL94ntbrQQARYeiAagNdFmoOJuWEk2 pn9Q==
X-Gm-Message-State: AOJu0YzWyVjg5ER3D9xHNjjZCwHCPdw7p3wN59qN6PtFCpvtpk7gafVb w9nQG5e2WjkcNLsN4K0GIoqthg5ofGkZOjz98hmQ9A==
X-Google-Smtp-Source: AGHT+IERR3v7vXYuXjel5mpVuwYR6pj5ZErSZEQb9WJBU6Z99gnovzrFtuHXmbxN1uempqpSploUAETgc8AV/mehyyY=
X-Received: by 2002:aa7:c907:0:b0:534:7b49:9036 with SMTP id b7-20020aa7c907000000b005347b499036mr8431025edt.12.1696633819299; Fri, 06 Oct 2023 16:10:19 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Fri, 6 Oct 2023 19:10:18 -0400
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Fri, 6 Oct 2023 19:10:15 -0400
Mime-Version: 1.0 (Mimestream 1.1.2)
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <SYBPR01MB59814B3448F5754AAEDA1740E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com> <CAHBU6iueqtd5T1T-ciYUMWvmo8XqBQqO5LkWbdRaoXQzPYSQOQ@mail.gmail.com> <SY4PR01MB5980D009F1623E3694B871B7E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzMXqmEJvwQ0Vb0+CfchBn2kMueQJ-2Th1=4Oct8b9t6A@mail.gmail.com> <E1464943-EB11-4FA4-B933-4F138C6C34A0@tzi.org> <CAHBU6itgC07j0P5DcACDyHSjEOG6=j5kWE=eYF8E0NA3mm_b5A@mail.gmail.com> <SY4PR01MB59803C733B6B6A1C9D4E04F4E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com>
In-Reply-To: <SY4PR01MB59803C733B6B6A1C9D4E04F4E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com>
From: Tim Bray <tbray@textuality.com>
Date: Fri, 06 Oct 2023 19:10:18 -0400
Message-ID: <CAHBU6iuEbKOri56HiTB+HcsPKOpXJArFpbkVnf68=5i8FMWPUg@mail.gmail.com>
To: "Manger, James" <James.H.Manger@team.telstra.com>
Cc: "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000001d12c40607145823"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/p4dhLLAzKzRfJf_HKl4sQFxGCD8>
Subject: Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Oct 2023 23:10:31 -0000

 [inline]

On Oct 2, 2023 at 5:20:59 PM, "Manger, James" <
James.H.Manger@team.telstra.com> wrote:

> draft-bray-unichars
> <https://datatracker.ietf.org/doc/html/draft-bray-unichars> §3 “Dealing
> with problematic code points” suggests “replacing problematic code points
> with "�" (U+FFFD, REPLACEMENT CHARACTER)” (or signalling an error, but I’ll
> only talk about the replacement option in this email).
>
>
>
>    1. An ill-formed sequence of code units needs to be replaced. It is
>    far less obvious to me that “problematic” scalars should be replaced. Even
>    for noncharacters Unicode provides a good FAQ
>    <https://www.unicode.org/faq/private_use.html#nonchar9> and corrigendum
>    #9 “Clarification about noncharacters”
>    <https://www.unicode.org/versions/corrigendum9.html> that suggests
>    passing them along (treating them like unassigned scalars) is often the
>    best policy (because the internal/interchange boundary is blurry).
>
> OK, that’s worth a reference.


>    1. So §4.3 defining unicode-assignable that excludes noncharacters is
>    fine -- when to be lenient on receiving a supposed unicode-assignable value
>    is less obvious.
>    But §3 looks dodgy.
>
> Would a note that it might be reasonable to accept nonchars, referencing
that corregendum, de-dodgify it in your view?

>
>    1.
>    2. U+FFFD is an obvious choice to replace code units or scalars you
>    don’t want. But Unicode does allow choices. Unicode ch3
>    <https://www.unicode.org/versions/Unicode15.1.0/ch03.pdf> C10 only
>    says “with a marker such as U+FFFD”. Unicode TR36
>    <https://unicode.org/reports/tr36/#Substituting_for_Ill_Formed_Subsequences>
>    says “where U+FFFD is not available, a common alternative is "?"”. Java,
>    for instance, uses “?” is some common circumstances. Unichars does not
>    admit such an option.
>
> Also worth a reference. If you’re writing Java code you should probably do
what Java does, no?


>    1.
>    2. “Silently ignoring” is the wrong phrase. The security risk is
>    “deleting” ill-formed sequences or unwanted scalars. “Silently ignoring”
>    feels the same as “deleting” when decoding code units to scalars; but feels
>    different when processing input chars to output chars as it covers passing
>    along untouched an unliked scalar.
>
> Sounds about right.


>    1.
>
>
>
> --
>
> James Manger
>
>
>
>
>
>
>
> * General From: *Tim Bray <tbray@textuality.com>
> *Date: *Tuesday, 3 October 2023 at 9:40 am
> *To: *Carsten Bormann <cabo@tzi.org>
> *Cc: *Manger, James <James.H.Manger@team.telstra.com>, i18ndir@ietf.org <
> i18ndir@ietf.org>, ART Area <art@ietf.org>, Rob Sayre <sayrer@gmail.com>
> *Subject: *Re: [art] New Version Notification for
> draft-bray-unichars-06.txt
>
> [External Email] This email was sent from outside the organisation – be
> cautious, particularly with links and attachments.
>
> On Oct 2, 2023 at 9:14:18 AM, Carsten Bormann <cabo@tzi.org> wrote:
>
>  The IETF could pound its collective fist and say "all ill-formed Unicode
> must be rejected”,
>
>
> Yes, please.
> The fact that this is the only reasonable way forward is the point of RFC
> 9413.
>
>
>
> Now we agree! And further (especially given the threats described in
> Unicode TR36) you often also want to reject control codes and
> noncharacters.  I think the IETF should be shouting this!
>
>
>
> To promote this, it would be helpful if people actually understood what
> the problems are, and which code points to reject, and had a reference that
> explained the issues and provided ABNF for what to accept, at increasing
> levels of fussiness. Then when the IETF starts shouting, there will be a
> short clean reference to accompany the shouting.
>
>
>
>
>
>
>
>