Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03

Steffen Nurpmeso <steffen@sdaoden.eu> Mon, 11 September 2023 21:50 UTC

Return-Path: <steffen@sdaoden.eu>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A228EC135DE5; Mon, 11 Sep 2023 14:50:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.908
X-Spam-Level:
X-Spam-Status: No, score=-1.908 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MQoKHFLLldmb; Mon, 11 Sep 2023 14:50:26 -0700 (PDT)
Received: from sdaoden.eu (sdaoden.eu [217.144.132.164]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6453CC16950A; Mon, 11 Sep 2023 14:50:23 -0700 (PDT)
Date: Mon, 11 Sep 2023 23:50:19 +0200
Author: Steffen Nurpmeso <steffen@sdaoden.eu>
From: Steffen Nurpmeso <steffen@sdaoden.eu>
To: Tim Bray <tbray@textuality.com>
Cc: i18ndir@ietf.org, ART Area <art@ietf.org>, Steffen Nurpmeso <steffen@sdaoden.eu>
Message-ID: <20230911215019.QNrf1%steffen@sdaoden.eu>
In-Reply-To: <CAHBU6iuixTeS=X1kccw11zEnHVG5tx9aHUC-pH00ociBmukhGQ@mail.gmail.com>
References: <CAHBU6is50TkpDsqXTp6WxdVSgE66j3gGHZ60ey2jFYbefaHFJw@mail.gmail.com> <20230909165843.GlTJy%steffen@sdaoden.eu> <CAHBU6iuixTeS=X1kccw11zEnHVG5tx9aHUC-pH00ociBmukhGQ@mail.gmail.com>
Mail-Followup-To: Tim Bray <tbray@textuality.com>, i18ndir@ietf.org, ART Area <art@ietf.org>, Steffen Nurpmeso <steffen@sdaoden.eu>
User-Agent: s-nail v14.9.24-508-g5394c8bef3
OpenPGP: id=EE19E1C1F2F7054F8D3954D8308964B51883A0DD; url=https://ftp.sdaoden.eu/steffen.asc; preference=signencrypt
BlahBlahBlah: Any stupid boy can crush a beetle. But all the professors in the world can make no bugs.
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/P7pMgmwk_sggjfmKQBUE_7KXC94>
Subject: Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Sep 2023 21:50:30 -0000

Tim Bray wrote in
 <CAHBU6iuixTeS=X1kccw11zEnHVG5tx9aHUC-pH00ociBmukhGQ@mail.gmail.com>:

i make a tit for tat short thing.

 |On Sep 9, 2023 at 9:58:43 AM, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
 ...
 |In 2.2.2.2 i would not say "legacy controls", and that they are
 |> "mostly obsolete".  ECMA-48 is very alive in at least the POSIX
 |> aka Linux world, for many purposes, for example terminal
 ...
 |So, in section 23.1 of [UNICODE] it says "There are 65 code points set
 |aside in the Unicode Standard for compatibility with the C0 and C1 control

Unicode's lower 256 is 1:1 ISO-8859-1.  Yes.

  ...
 |practices may survive in legacy systems but I am pretty sure they are not
 |relevant to any protocol or data format the IETF might work on. This
 |clearly feels like “legacy” and “mostly obsolete” to me. (For those who

To you maybe, but surely not for data that is passed around in
protocols.  Yes, those have the dedicated Cc character class.
But since you seem to address implementors of new protocols, why
does it matter, say?  They can give meaning just as they want.
You mutilate for no reason, and falsely.

The Unicode you quote also has the desire to push people away from
using old controls when they have better ones available, for
example to separate paragraphs in plain text.  This is an old
habit of elder programmers who lived in times when there was no
other possibility, *therefore* slapping "legacy" to force people
to look around seems like an understandable and laudable
reasoning.
It does not mean anything else in practice, as an incredible
amount of protocols makes use of those characters, terminals will
not work, and, in fact, new ones are invented in the OSC space.
Heck, you do not seem to read the Unicode mailing-lists even,
Kent Karlsson was (and likely still is) working on an inofficial
update on ECMA-48.  It was on unicode@, last in January 2023.
(I, mind you, said "whether it will fly .. who can tell".
He is on github, kent-karlsson.)

  ...
 |> No.  Not me.  Sorry .. we are talking string data?
 |> I mean, with your restriction one (possibly) cannot even generate
 |> a protocol that carries around Linux/POSIX path names?
 |
 |Which characters necessary for Linux/POSIX path names are being excluded?
 |I went looking for the appropriate specs but ended up getting lost in

You make a valid claim, as looking at the Wikipedia page revealed
that in the Microsoft world filenames can also contain almost any
byte value.  (But be aware of special names like "aux".*, and
there are others.)

 |opengroup.org pages.  I note that for 25 years, the lifetime of XML, the
 |exclusion of control codes with the exception of \r, \t, and \n, seems to
 |have worked fine.

Polemically speaking you refer to that <de-/humanized/ mutilated
SGML that was invented to allow for simpler and safer parsers and
a more secure network.  A goal that <was> achieved</>.
XML is as sensitive as JSON, interesting exactly you mention it.
I in turn note that you refer to a data format but speak about
a protocol.
It seems to me, for me, it is rather useless to say anything.

I for one strongly protest against _any_ byte being excluded from
about anything.  Because why should you.  But treating the
existence of some bytes as "programming error"s is beyond my
understanding.
In text data: not even a replacement character, not even BOM, not
even what-not?  No.

Thank you.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)