Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt

Rob Sayre <sayrer@gmail.com> Mon, 09 October 2023 23:58 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 72A12C151062; Mon, 9 Oct 2023 16:58:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ws2F4NCN17w5; Mon, 9 Oct 2023 16:58:48 -0700 (PDT)
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 699E4C14CE3B; Mon, 9 Oct 2023 16:58:48 -0700 (PDT)
Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-9b6559cbd74so925458266b.1; Mon, 09 Oct 2023 16:58:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696895926; x=1697500726; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dvgyJQz/JGGp+KbiNs5OHip9HKunWSAlujqTWyMvoQQ=; b=RXyli7NfrVMIJ/1JLMw1Ch14ImGoNSszChg/9/KOGOx3O6m05pzRXMA8BohnQxe2Mj Tvp8uaeVJxPWL1a39VvSRmUEZbGXX3TYWlHyXCN3Tc5aac+zsdIDI4aO9wTO2I6GNIgu ofCm0E/tAFDwlVlzhvbBPrG8WIXyqzhrO/s99uRvCUu3k9yRQYxAnkU4sXT80L+hTSQ3 CbVkUt6YBtNoYmxOrYE16kZ0Zwp1TiAsB440TSrE6Sspevidh/Lk10akhv1OnO5lMNCp xwpYynGKvboj3EVF1H25s3VWJYmBiu+2m6bdYjFm4OUihRvnVr6d3X0F6ONPL+s837oq VT5A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696895926; x=1697500726; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dvgyJQz/JGGp+KbiNs5OHip9HKunWSAlujqTWyMvoQQ=; b=hrxq5aAOBT1HCYpY9CKhXb9VGS2gqeVopxjIc1R545vUO9K6Wi5aYuHWQ1imTDFlc8 NcA1hqKsDOekNVEp+hJWmCInDbgPwbKcfOpkxdePX2I+vbD68WUOk78K4nv6i1zbop+H u55WB76gbQvmeXbGposMDKFyk1TgHMifN/BOMxVNPOuy8VI0qkgaVSz/g8npBBoQOQ2A F9jlFudW7ByOTB+WcEsnTcXHRQw9trPi2v5MaTi8Co2wkBqU0iqT9gK876z4FpSANfAF UX95hkmfebAfyDfSlrWps7Ik9YLr7tCt0dckwnVMY5iC+K1pn5BP7E7wHjHNc1HHMczg uhMA==
X-Gm-Message-State: AOJu0YyLQByq2E8RpJWrzRYSyk2uf7x/saicsHCEQB7gyipW0gb1RZsy 4ypPhiR2JRrxmDJebnWy0OBIm/jzBCM4xOlnezboquFw1Jyu2w==
X-Google-Smtp-Source: AGHT+IFHBaiRx5jw73zde9sP9EPhz9DkmBTYRzOjcDCCtuY0RNgXCvgjNxziIBPAfcAfOUH1Imdk1SVBHVUzCou32Dk=
X-Received: by 2002:a17:906:310f:b0:9a1:aea2:d18d with SMTP id 15-20020a170906310f00b009a1aea2d18dmr13886070ejx.48.1696895925508; Mon, 09 Oct 2023 16:58:45 -0700 (PDT)
MIME-Version: 1.0
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <SYBPR01MB59814B3448F5754AAEDA1740E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com> <CAHBU6iueqtd5T1T-ciYUMWvmo8XqBQqO5LkWbdRaoXQzPYSQOQ@mail.gmail.com> <SY4PR01MB5980D009F1623E3694B871B7E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzMXqmEJvwQ0Vb0+CfchBn2kMueQJ-2Th1=4Oct8b9t6A@mail.gmail.com> <E1464943-EB11-4FA4-B933-4F138C6C34A0@tzi.org> <CAChr6Syn1vD9SA+XseafBfheOtyk_M=NP2Rgi2=L9e8FO7aQZQ@mail.gmail.com> <0cb53e7a-10f4-9d54-c3c2-70b823c3a264@it.aoyama.ac.jp>
In-Reply-To: <0cb53e7a-10f4-9d54-c3c2-70b823c3a264@it.aoyama.ac.jp>
From: Rob Sayre <sayrer@gmail.com>
Date: Mon, 09 Oct 2023 16:58:34 -0700
Message-ID: <CAChr6Sx-+GFTF=1a-8t_GQ7-6Cu=5qfHyDihR86vrvL3b4YPbQ@mail.gmail.com>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Cc: "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000dc5fa90607515e32"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/hcEP6BVj0MJFUMtxR3tPMjAO1lk>
Subject: Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Oct 2023 23:58:52 -0000

On Mon, Oct 9, 2023 at 4:38 PM Martin J. Dürst <duerst@it.aoyama.ac.jp>
wrote:

> Hello Rob, others,
>
> On 2023-10-03 01:42, Rob Sayre wrote:
>
> > Basically, UTF-8 is a work of genius, but it arrived 10 years too late.
> > JavaScript, Windows, Java, and all the rest of them were underway before
> we
> > got RFC 2277. Things would be much different if it had appeared in 1982
> and
> > everyone then realized that you don't need Θ(1) access to characters in a
> > string...
>
> You make it sound like Java already existed in the (early!) 1980ies, but


Well, I know how to write Common Lisp and C++ pre-98, but I was not trying
to take the conversation in that direction. I have a 2005 Powerbook right
here, and I have managed to connect it to the internet, but it is a bit
rough.


> the history is much more compressed. Java started in the early 1990ies.
> And RFC 2277 was published in 1998, but before that, there was RFC 2130,
> and the workshop that was the base for it in Feb/March 1996.
>
> Another way to express the problem is to say that the emoji craze should
> have happened much earlier, because emoji beyond the BMP (the original
> 16-bit space) were the main driver to get UTF-16 taken seriously in
> programming languages such as Java and in databases :-).
>

I agree with your point here, but I did write something close a few months
ago. I cited this rationale:
<https://simonsapin.github.io/wtf-8/#motivation>

It cites emoji, if you give it a close read.

UTF-8 was around, but the web was not in UTF-8 then, so it was still
dealing in many other charsets, and the operating system strings were still
uniformly awful. This comment does not mean I am advocating sending WTF-8.
But that data format still underlies several different systems, so it
arises in escape sequences etc. This draft is about repertoires of code
points, so we still have to describe it this way, if only to exclude them.

This document describes a 2020 rewrite of charset detection:
<https://hsivonen.fi/chardetng/>

I am in favor of describing the bad parts, but I do not view it as an
endorsement.

thanks,
Rob