Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt

Rob Sayre <sayrer@gmail.com> Mon, 09 October 2023 05:49 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 25B35C151084; Sun, 8 Oct 2023 22:49:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L2UfN63Zab8O; Sun, 8 Oct 2023 22:49:50 -0700 (PDT)
Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D318CC15107C; Sun, 8 Oct 2023 22:49:50 -0700 (PDT)
Received: by mail-ej1-x629.google.com with SMTP id a640c23a62f3a-9b974955474so701613466b.1; Sun, 08 Oct 2023 22:49:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696830589; x=1697435389; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5wehCgG9JUrcjVsQFM24VGMRAHYMz2KLKKVabr0JiSQ=; b=BZimznOFvYAjDzCjih4RKoDmuruL66ye06y0E47qhCOCeU1FgHeOr4L72MmVtUYchc YG52G79HVNgEc3zCCmv2ug8LQSGpMd0qefXE2jq3hXIBLYV1qbmqvCVDgZ/fF+ubSJ9Y 0tXwCgI/tip+5qoeZ9+zPuuMIq3Se2EAcBO5OcqwcO9W2fG/pxRfSwSv+z2YgWiXjP/6 PuIuelEiGHBjJjeuZ+0Bmq8/sUMBnveoQXp+y/dzadG6OT7t1MdvPnWunLrkYfJsD6CG QEHz0gAsfmxwYLkxMbDPyXPnDURDQJSnQxAFxgCLcNc9SMCqYj/08RTUTbfWovxXAqUU 9oLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696830589; x=1697435389; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5wehCgG9JUrcjVsQFM24VGMRAHYMz2KLKKVabr0JiSQ=; b=CaWj0GMP2OHofa6gstpwPckyHdRcXy46cn3NgeWu+p2fLklgpOIWpY08qIPrxN8+rA lw0RquytPirbXqsM5TUg7waP2MH/+a5d11b0denUNzPIW7eX4pumySkyhBjtbpVwetJG FJuk/1rexWt27Xj4BpA6jZUGC3BtjwXjfpvwOkaOjUZRHWpIf3eeD2WUKWlHbseboNDl brUz5jYf9rWDV2mT98CZN6SaWrrCkOos0neTWRAn3n1Mv/rhtzmsozfFVxjguS/z7o+R 0nGq5BUwPWtjebq3E6IV0cErj2gHr/h6twAJiKXCRddrnMCmjtHx0L4cqQAK6mv7uhu8 8k4Q==
X-Gm-Message-State: AOJu0YxZv9+dQKE8mudhBB+VudXrqCeveYdVuTCBp8rEHnTaoxubwQgj WCo2jcpn3LfESehFSYnHsRpZbAl6lKpuBbrny1UOzZF/pXhBhw==
X-Google-Smtp-Source: AGHT+IFOutHk0u5Gd0o0/KIN2F/W9HqRKcIzeIFivHQT7Alyj8fVJd65Ely78Xt/013wHhKBi45Op7IVFT6Ng3Ni0tU=
X-Received: by 2002:a17:906:5a62:b0:9ae:76b0:c0fe with SMTP id my34-20020a1709065a6200b009ae76b0c0femr12405101ejc.20.1696830588509; Sun, 08 Oct 2023 22:49:48 -0700 (PDT)
MIME-Version: 1.0
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <CAChr6SwM9re+0X8V9YkFLxkuxhSnu0chW9ecKq1JuNuo4fAEWw@mail.gmail.com> <CAHBU6ivSkEv0AcT52BWrYadmutdYNFx0D0MYR3Sv62a2LXckJw@mail.gmail.com> <CAChr6SyuLc6-fLsThQJie2G_K4-vZtPK_emnFyA7NWoakBowiA@mail.gmail.com> <ec50602c-8778-1a31-def6-0218c93d3033@it.aoyama.ac.jp>
In-Reply-To: <ec50602c-8778-1a31-def6-0218c93d3033@it.aoyama.ac.jp>
From: Rob Sayre <sayrer@gmail.com>
Date: Sun, 08 Oct 2023 22:49:36 -0700
Message-ID: <CAChr6SzqXdbngX3JknHr4vmJTb+vRKRym11eg-XYMeBY4D4Wdw@mail.gmail.com>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Cc: Tim Bray <tbray@textuality.com>, i18ndir@ietf.org, ART Area <art@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000078e097060742285a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/CYyQ_dvPcnelb3rJZDtdtJ19KCM>
Subject: Re: [I18ndir] [art] Fwd: New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Oct 2023 05:49:55 -0000

On Sun, Oct 8, 2023 at 8:45 PM Martin J. Dürst <duerst@it.aoyama.ac.jp>
wrote:

> On 2023-09-26 10:59, Rob Sayre wrote:
>
> > The aim here is to point out that the Web (and Java, and Windows) can
> > accommodate ill-formed Unicode. Is it possible to transmit any Windows
> path
> > name via conforming JSON in UTF-8? Yes. Is it a good idea to naively
> design
> > that into a protocol? No. But you might have to accept these things to be
> > sufficiently compatible with the Web.
>
> "the Web" is a big place. Among else, UTF-8 is "The Encoding" (see
> https://encoding.spec.whatwg.org/#the-encoding), and in the definition
> at https://encoding.spec.whatwg.org/#utf-8-decoder, surrogates (whether
> alone or not) produce an error.
>
> So when you speak about the above toxic stuff, please specifically say
> "in JSON" or "in JavaScript".
>

True in both, and the Web in general, because you can put the "toxic" stuff
in a script element in a web page that is well-formed UTF-8. There aren't
very many pages that don't contain at least a little bit of JS.

Maybe this page will be of interest:
<https://hsivonen.fi/chardetng/>

A recurring point in this thread is that someone says "proper UTF-8", or
something similar, and then we discover that things are so much worse if
one really looks, along almost any axis. Here, we see that the system
charset and time itself are the problem.

So, I am still in favor of describing how to do it right, without making
things seem nicer than they really are.

thanks,
Rob