Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt

Rob Sayre <sayrer@gmail.com> Tue, 26 September 2023 01:59 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 599F3C136147; Mon, 25 Sep 2023 18:59:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CWwXUsEDEcO4; Mon, 25 Sep 2023 18:59:41 -0700 (PDT)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 757C4C151070; Mon, 25 Sep 2023 18:59:41 -0700 (PDT)
Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-533d9925094so5376993a12.2; Mon, 25 Sep 2023 18:59:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695693579; x=1696298379; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=BrUahRrLhki4v3ZIDrx2IyGblyZCzp1Dy42OfOwYmEw=; b=dOtKUGNbSHsMqRNBKurj2roXWeIHgD1vpINZYlXOdzWS/7RvhRETrEWiQs0PmDou9q BbpwtiV0wPcd/FiNXyQjBbWua/B3k45DkgfTsXWFxHhMaIMJz8yeyqK08Sd1n/USBYWt XRh9w9Adtf8NI8MDSmHuHlrpgn7KNsHElu5BcuoWo/VhKiNBYsDwLJi73nbSnOz51vjI 6mnK4lA9qPRZ04cPfNWOFZZBV13Ayo9aH0mdtr9vT+FvPosKri4/gbKpO3hKgIkV2Omt UB0FoKOM25oYoMSaD3/zOtmG1lJVmcIg2iZPTyP3nzNlkCTN60GtQXawxPwoAfBeYEbp lN8Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695693579; x=1696298379; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BrUahRrLhki4v3ZIDrx2IyGblyZCzp1Dy42OfOwYmEw=; b=Y83iKbN2MzKHKpZTZzfLVE06YLiZ1FHWoUO1HuXRB0hKyx5LOLSTGuRMLwebC+W92Q /us1j+yt3/LXLJ8Z38+OwK014CIDJSz8UBtuFSFpPEC+KyaxUzYfcCny5ynDa7KN1gFr w9bMC+6v0sz0R3N9HuHzgXE8sAt90GeACe8ld6KQiBxpceNTIhE5MthT0NQdJAquDjSd IBeUsbYwiJGjpAW2HZYlD4/4+yG9k2ybRBvlySo7DNTUNkZJB9jDM9SsZcNOvtBdrv6F KfgskDl+mqA8KG465i/lBdqqdQ6BGcZJQRAtW8Cbq7hb9k3q2+8/hqTXDCO4u6y/IvLN ZRRA==
X-Gm-Message-State: AOJu0YyPGUNybTE64qjEt8IOW/alRh7XEyjppkQDdq+gGWyz32r1TJVA PqwiSczK1zdl5mNBWZxOBmXwmr9r0dPS3V1Dt8g=
X-Google-Smtp-Source: AGHT+IElbX5sPOuXwrfUYOOrb54VwBdCBiF53pevqyfkr4jjAwGNmuie/jI5YNAe762KzrQlsMB/rW6yORQa/5BAb2s=
X-Received: by 2002:a05:6402:b34:b0:522:580f:8c75 with SMTP id bo20-20020a0564020b3400b00522580f8c75mr7470412edb.17.1695693578369; Mon, 25 Sep 2023 18:59:38 -0700 (PDT)
MIME-Version: 1.0
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <CAChr6SwM9re+0X8V9YkFLxkuxhSnu0chW9ecKq1JuNuo4fAEWw@mail.gmail.com> <CAHBU6ivSkEv0AcT52BWrYadmutdYNFx0D0MYR3Sv62a2LXckJw@mail.gmail.com>
In-Reply-To: <CAHBU6ivSkEv0AcT52BWrYadmutdYNFx0D0MYR3Sv62a2LXckJw@mail.gmail.com>
From: Rob Sayre <sayrer@gmail.com>
Date: Mon, 25 Sep 2023 18:59:26 -0700
Message-ID: <CAChr6SyuLc6-fLsThQJie2G_K4-vZtPK_emnFyA7NWoakBowiA@mail.gmail.com>
To: Tim Bray <tbray@textuality.com>
Cc: i18ndir@ietf.org, ART Area <art@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000062fe230606396d67"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/r2pXIispT4aK3_lHUVgZh26Mp_0>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Sep 2023 01:59:45 -0000

On Mon, Sep 25, 2023 at 6:10 PM Tim Bray <tbray@textuality.com> wrote:

> On Sep 25, 2023 at 11:06:38 AM, Rob Sayre <sayrer@gmail.com> wrote:
>
>> typo: "by by"
>>
>
> Oops. Couple more too, got sloppy this draft.
>

FWIW, I didn't notice it myself. GMail noticed it when I pasted in the
quote. I think a grammar checker of your choice (MS Word, Grammarly,
Google) etc is worth it here, because writing about language is difficult.
It would just save some back-and-forth.


> "There are reasonable options for dealing with problematic input. First,
>> an implementation
>> can reject text containing problematic input. Secondly, it's possible to
>> replace problematic
>> code points with placeholders.  Responding to that risk, [UNICODE
>> <https://www.ietf.org/archive/id/draft-bray-unichars-06.html#UNICODE>] section
>> 3.2 recommends
>> dealing with ill-formed byte sequences by signaling an error, or
>> replacing problematic code points with '�' (U+FFFD, REPLACEMENT
>> CHARACTER). Lastly, it can make sense to accept it, if the entire
>> implementation
>> is designed to accommodate ill-formed Unicode."
>>
>> Not attached to my words, just trying to get the point across.
>>
>
> Not sure what that point is. The proposed language loses the narrative
> about silently ignoring, and then the added “make sense to ignore” flies in
> the face of that narrative, which I think is important given the security
> considerations.  What do others think?
>

You're right. I didn't mean to drop that sentence. Let me try again:

"There are reasonable options for dealing with problematic input. First, an
implementation
can reject text containing problematic input. Secondly, it's possible to
replace problematic
code points with placeholders. Silently ignoring an ill-formed part of a
string is a known
security risk. Responding to that risk, [UNICODE] section 3.2 recommends
dealing with ill-formed byte sequences by signaling an error, or replacing
problematic code points
with '�' (U+FFFD, REPLACEMENT CHARACTER). Lastly, it can make sense to
accept it, if the
entire implementation is designed to accommodate ill-formed Unicode."

The aim here is to point out that the Web (and Java, and Windows) can
accommodate ill-formed Unicode. Is it possible to transmit any Windows path
name via conforming JSON in UTF-8? Yes. Is it a good idea to naively design
that into a protocol? No. But you might have to accept these things to be
sufficiently compatible with the Web.

One could call some of this input "toxic waste", but there is a flipside.
It would be something like "I'm sorry ACME Basic JSON Parser Version 1.0
can't handle web content". The other formats in the document make this
intent explicit, and I support that.

thanks,
Rob