Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt

Tim Bray <tbray@textuality.com> Tue, 26 September 2023 01:10 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9A822C14CE36 for <i18ndir@ietfa.amsl.com>; Mon, 25 Sep 2023 18:10:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zNwiBFLyVSPg for <i18ndir@ietfa.amsl.com>; Mon, 25 Sep 2023 18:10:48 -0700 (PDT)
Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D35B7C1516E3 for <i18ndir@ietf.org>; Mon, 25 Sep 2023 18:10:48 -0700 (PDT)
Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-5033918c09eso12285107e87.2 for <i18ndir@ietf.org>; Mon, 25 Sep 2023 18:10:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1695690646; x=1696295446; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rrjovXKf12sIwyBj5WynE5kljpxzZszRl5aR0rUA20U=; b=d+STwZLe1TJY5NGAC2wKXts3RiBga2vsoXUsZIOeBwnsUtlKQJuZV/3vJHHyuh+w3/ +laQMs9OILIl053doF9Qct3MAt+yvftK4ijLcVrHsKOo0n830grL3jZ3sL84SNrHqgAA V0h6qjTRG++vB2xdIhVXpW1r3T1cm/5Mo1cY8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695690646; x=1696295446; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rrjovXKf12sIwyBj5WynE5kljpxzZszRl5aR0rUA20U=; b=LfK6CMGGSjSEgAi44IXlVEnzeNFbKaiJ+Iab0uEHArAufFbVh2MbOh23xEpSdctjAs W3Vny1Rd0y3MLdQgeqopAxa2A1XzAX9nq0f71WZUTBc1GKEB4XInoVdEWhvvU23MEKsw HKGxszXzVDUSIpAKsTJDauGjICA0SYm1GIdiMEd99gEAAkpFLXxFSr2Zo9e6GyoH4+m+ GQrVdgiCW2z+rFzm0SKonw1y8SVEB3cH3SKYcQkz6A96lavAxkvhpCJhJx8xWxOXGN16 sUwt4r3K+7wuJk7XBzZMzjsZVzlk/oSLbDqrKdD32dR5vLOKHh9yNcqyYd0EnMLNvCRH pOZw==
X-Gm-Message-State: AOJu0YxTIhMMpcveKAaLtUYpZMXSJdQFf8EzLfW5GIkaRhroFcsup/nZ 0ga6E9O1bPuAOsMuP4vyTcaXMY60VMauzOD4IvzrDSYf+khmVvHumfQ=
X-Google-Smtp-Source: AGHT+IEEcrbLy/iQp5xzjBGip4MeQl5gEaSM62SI2t0P3uwyGH62ej7f2k2wUOcWuZ1MB9DacRUWiAFv6+NJSAXvC50=
X-Received: by 2002:a05:6512:3d0b:b0:503:1910:711b with SMTP id d11-20020a0565123d0b00b005031910711bmr7526551lfv.56.1695690644941; Mon, 25 Sep 2023 18:10:44 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Mon, 25 Sep 2023 21:10:44 -0400
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Mon, 25 Sep 2023 21:10:40 -0400
Mime-Version: 1.0 (Mimestream 1.1.1)
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <CAChr6SwM9re+0X8V9YkFLxkuxhSnu0chW9ecKq1JuNuo4fAEWw@mail.gmail.com>
In-Reply-To: <CAChr6SwM9re+0X8V9YkFLxkuxhSnu0chW9ecKq1JuNuo4fAEWw@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
Date: Mon, 25 Sep 2023 21:10:44 -0400
Message-ID: <CAHBU6ivSkEv0AcT52BWrYadmutdYNFx0D0MYR3Sv62a2LXckJw@mail.gmail.com>
To: Rob Sayre <sayrer@gmail.com>
Cc: i18ndir@ietf.org, ART Area <art@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000008a811c060638bef3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/ZOWjBimwP73Gq3JwZ8hvlFpFiPQ>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Sep 2023 01:10:52 -0000

On Sep 25, 2023 at 11:06:38 AM, Rob Sayre <sayrer@gmail.com> wrote:

> > It cannot be serialized into well-formed UTF-8, but the behavior of
> libraries
> > asked to parse the sample is unpredictable; some will silently parse
> this
> > and generate an ill-formed UTF-8 string.
>
> It might be better to say "Its code points do not represent well-formed
> UTF-8..." (or well-formed Unicode?), because the example does show it
> serialized as well-formed UTF-8 via escape sequences. Not attached to my
> suggestion, but the current text is a little confusing.
>

Well, “It” at the front of the sentence is (from the top of the paragraph)
’the value of the “example” field” so I think it’s OK.  But that’s 2
sentences away at this point so probably would be good to replace “It” with
“That value”.

> Reasonable options for dealing with problematic input include, first,
> rejecting text
> > containing problematic code points, and second, replacing them with
> placeholders.
> > Silently ignoring an ill-formed part of a string is a known security
> risk. Responding
> > to that risk, [UNICODE
> <https://www.ietf.org/archive/id/draft-bray-unichars-06.html#UNICODE>] section
> 3.2 recommends dealing with ill-formed byte
> > sequences by by signaling an error, or replacing problematic code points
> with
> > "�" (U+FFFD, REPLACEMENT CHARACTER).
>
> typo: "by by"
>

Oops. Couple more too, got sloppy this draft.

>
> "There are reasonable options for dealing with problematic input. First,
> an implementation
> can reject text containing problematic input. Secondly, it's possible to
> replace problematic
> code points with placeholders.  Responding to that risk, [UNICODE
> <https://www.ietf.org/archive/id/draft-bray-unichars-06.html#UNICODE>] section
> 3.2 recommends
> dealing with ill-formed byte sequences by signaling an error, or
> replacing problematic code points with '�' (U+FFFD, REPLACEMENT
> CHARACTER). Lastly, it can make sense to accept it, if the entire
> implementation
> is designed to accommodate ill-formed Unicode."
>
> Not attached to my words, just trying to get the point across.
>

Not sure what that point is. The proposed language loses the narrative
about silently ignoring, and then the added “make sense to ignore” flies in
the face of that narrative, which I think is important given the security
considerations.  What do others think?




>>