Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt

Rob Sayre <sayrer@gmail.com> Mon, 25 September 2023 18:06 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2EF7DC151061; Mon, 25 Sep 2023 11:06:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pSfF_bshjo2l; Mon, 25 Sep 2023 11:06:52 -0700 (PDT)
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4F92DC14CE5D; Mon, 25 Sep 2023 11:06:52 -0700 (PDT)
Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-52bd9ddb741so8189206a12.0; Mon, 25 Sep 2023 11:06:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695665211; x=1696270011; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=A6j6s3XIlKd8yQgktPu9Y2PJwyr6YLm1X1kwR0cXtXk=; b=hMH6q6DPktgv8/1F5Nb0V1rLCzft/g4MO7eAKXcbqh9KlAqzh4nDmwt414TzK3oI1w CNSgz5ZEdtEtV3Z3poNbhaVSelDDY1M1s/qwdTVCqHSTMDPVDlUbAGW2SVURXj58zQRc WoY5WjUUwpW6aFL5K6nCv6pYLQoEDxoxRU66ab2Zh/ske6E6EOU3M38jg1dQbdet59c/ loYpN66Clpi+lmbR2bEGoxUKlb6TDU/xZH3/kz7W3RX74BjtPcqnX1zkdnnkPfGvlxbw roRVHXt1yDUVqJdE7etphNcYBLM/VRzFlJrxIcvvrZ9jqwpGqETqmhN0Ycv5lQWfM4Qw jBQg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695665211; x=1696270011; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=A6j6s3XIlKd8yQgktPu9Y2PJwyr6YLm1X1kwR0cXtXk=; b=WstzxPh4UzBmxOkmXv7dW1E08+SoPHUT34aD91Dg/0Zyw31856KdH4+boicswkKVUi 8qn2bdkb5kx56GnDF65M0qdAgrNuv8qtpEaK1VZOQIJhXusixS/pC03Pi6cc0CjWUrhy toFZOZJC/TWNKOJPx+d3OHArGc+QKgvrj1X7PVe81tuIyYmNY9QWSyh1KWaTLVFJGPlV zmXJwoaVnXDdbOzH+PDeJoBszuycgcx88wUqvXORCMtz85spq3L/tOzmESGzyn7RCxkq FiLjnNlYwfQwyUBfzABGeQWqX/xDqxuZlMCrhLbkOnufkIpCbmr8cYKdeLaFpd/rsW7+ 0mPA==
X-Gm-Message-State: AOJu0YzrY7sXVMIbH5Fp0UpKeh1RbZVOXZXElDsJvwXgCnLy8XEfma7X 8+2XwWLOSKdu9IqXC0zCZGlhckQ2aOZAAODt5GBGiDd6vo2kVw==
X-Google-Smtp-Source: AGHT+IFKGAe0JVMEfqBthNJ5VwIhZI+yoDfdlv3Lu0juqRit+t4nr5NU7HHRcDULhsqbMx3WtsaJaSURy6f6ROSk/CQ=
X-Received: by 2002:a05:6402:656:b0:532:5187:f0fc with SMTP id u22-20020a056402065600b005325187f0fcmr6001443edx.33.1695665210292; Mon, 25 Sep 2023 11:06:50 -0700 (PDT)
MIME-Version: 1.0
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com>
In-Reply-To: <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com>
From: Rob Sayre <sayrer@gmail.com>
Date: Mon, 25 Sep 2023 11:06:38 -0700
Message-ID: <CAChr6SwM9re+0X8V9YkFLxkuxhSnu0chW9ecKq1JuNuo4fAEWw@mail.gmail.com>
To: Tim Bray <tbray@textuality.com>
Cc: i18ndir@ietf.org, ART Area <art@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000847faf060632d29e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/0k-2MXJ8y3gM9h5FgGvd2Qafx5M>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Sep 2023 18:06:53 -0000

Hi,

> It cannot be serialized into well-formed UTF-8, but the behavior of
libraries
> asked to parse the sample is unpredictable; some will silently parse this
> and generate an ill-formed UTF-8 string.

It might be better to say "Its code points do not represent well-formed
UTF-8..." (or well-formed Unicode?), because the example does show it
serialized as well-formed UTF-8 via escape sequences. Not attached to my
suggestion, but the current text is a little confusing.

> Reasonable options for dealing with problematic input include, first,
rejecting text
> containing problematic code points, and second, replacing them with
placeholders.
> Silently ignoring an ill-formed part of a string is a known security
risk. Responding
> to that risk, [UNICODE
<https://www.ietf.org/archive/id/draft-bray-unichars-06.html#UNICODE>] section
3.2 recommends dealing with ill-formed byte
> sequences by by signaling an error, or replacing problematic code points
with
> "�" (U+FFFD, REPLACEMENT CHARACTER).

typo: "by by"

I'll try a suggestion:

"There are reasonable options for dealing with problematic input. First, an
implementation
can reject text containing problematic input. Secondly, it's possible to
replace problematic
code points with placeholders.  Responding to that risk, [UNICODE
<https://www.ietf.org/archive/id/draft-bray-unichars-06.html#UNICODE>] section
3.2 recommends
dealing with ill-formed byte sequences by signaling an error, or replacing
problematic code points with '�' (U+FFFD, REPLACEMENT CHARACTER). Lastly,
it can make sense to accept it, if the entire implementation
is designed to accommodate ill-formed Unicode."

Not attached to my words, just trying to get the point across.

thanks,
Rob


On Mon, Sep 25, 2023 at 9:51 AM Tim Bray <tbray@textuality.com> wrote:

> What’s new and different here.
>
>
>    1. Locked down definition of “problematic”
>    2. Locked down definition of “character repertoire”
>    3. Changed “Useful Assignables” to “Unicode Assignables” (checked with
>    Asmus first)
>
>
> A new version of Internet-Draft draft-bray-unichars-06.txt has been
> successfully submitted by Paul Hoffman and posted to the
> IETF repository.
>
> Name:     draft-bray-unichars
> Revision: 06
> Title:    Unicode Character Repertoire Subsets
> Date:     2023-09-25
> Group:    Individual Submission
> Pages:    10
> URL:      https://www.ietf.org/archive/id/draft-bray-unichars-06.txt
> Status:   https://datatracker.ietf.org/doc/draft-bray-unichars/
> HTML:     https://www.ietf.org/archive/id/draft-bray-unichars-06.html
> HTMLized: https://datatracker.ietf.org/doc/html/draft-bray-unichars
> Diff:     https://author-tools.ietf.org/iddiff?url2=draft-bray-unichars-06
>
> Abstract:
>
>   This document discusses specifying subsets of the Unicode character
>   repertoire for use in protocols and data formats.
>
>
>
> The IETF Secretariat
>
>
> _______________________________________________
> art mailing list
> art@ietf.org
> https://www.ietf.org/mailman/listinfo/art
>