Re: [I18ndir] [art] Modern Network Unicode

Carsten Bormann <cabo@tzi.org> Tue, 09 July 2019 21:50 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E1D112000E; Tue, 9 Jul 2019 14:50:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.197
X-Spam-Level:
X-Spam-Status: No, score=-4.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WoRvYfhnXlP9; Tue, 9 Jul 2019 14:50:47 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0AFFF12007A; Tue, 9 Jul 2019 14:50:47 -0700 (PDT)
Received: from [192.168.217.110] (p548DC676.dip0.t-ipconnect.de [84.141.198.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 45jwx05TSCzyWK; Tue, 9 Jul 2019 23:50:44 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <0A5251342D480BA6437F7549@PSB>
Date: Tue, 9 Jul 2019 23:50:44 +0200
Cc: art@ietf.org, i18ndir@ietf.org
X-Mao-Original-Outgoing-Id: 584401842.5442179-800bc5eabb9d248fb8caf168bd9b0a6d
Content-Transfer-Encoding: quoted-printable
Message-Id: <B243365E-F7C5-4C53-A64F-2E3E87C4CD66@tzi.org>
References: <0A5251342D480BA6437F7549@PSB>
To: John C Klensin <john-ietf@jck.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/9ulB3fQ-MINRi2Ur-zRkI6tx-Ng>
Subject: Re: [I18ndir] [art] Modern Network Unicode
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jul 2019 21:50:50 -0000

Hi John,

thank you for chiming in.  I would have summoned you (and i18ndir) in time; in the current phase of rapid respins of the document I was first trying to make it useful before trying to nail down the details.

I was a toddler when ASA X3.4-1963 was released, but I caught up to ASCII about a dozen years later.  I have used ASR33s (although I spent more time on LA36s), and I have fully internalized the teleprinter model that led to NVT.  (I also spent some time on glass tty models of teleprinters [1].)  But while the teleprinter model has originally shaped what became “text files”, the latter are now living a life of their own.  MNU in effect tries to codify the emancipation of text from the NVT model, and I’m not surprised that this breaking away causes some sorrow.

Some specific comments:

> developments.  It may be a bit ironic that your solution to some
> other problems is to devise yet another network-standard form,
> especially one that has options that basically encourage (or
> require) profiles (or, if you prefer, combinations of variances).

Most new applications will be fine with (1D) CMNU or (2D) MNU with lines.
But sometimes legacy rears its ugly head, and therefore additional variances are defined.  I believe it is much better to expressly define these variances than to lump all the legacy into one big blob (single profile) that confuses the cleaner applications of MNU.

Now what is text being used for?  Some is exclusively for display to humans in 1D (CMNU) or 2D (MNU with lines) environments.
Some is then also used by machines, and that is where easy comparison comes in.
But the advantages of reasonably predictable encoding go way beyond that; which is why NFC coding is a de facto standard in most places that would use MNU.
(NFKC is in the current document mostly as a reminder that variances can be made in normalization, as well; it is probably the only reasonable one beyond NFC among the normalization forms, but has its own problems as you note.)

The CRLF (which really has survived only because the ASR33 needed up to 200 ms for a CR and the LF was a good excuse to waste those other 100 ms) is no longer needed; with the exception of a single popular operating system family, bare LF has won the line ending competition.  CRLF is one legacy feature that it’s worth getting rid of (except maybe as the “CR-tolerance" variance), as is HT.  (I’m assuming FF and VT are no longer relevant in most of today’s applications.)  All this may be a bit opinionated, but it is also forward-looking.  If full NVT is needed, we can always reference RFC 5198; MNU is for the cases where that is not needed.

I hope this doesn’t come over as too brash — I really like RFC 5198 and the careful tradeoffs it makes, but there are lots of applications that need something simpler and MNU reflects current practice and is a sane design for the 2020s+ (*).

Grüße, Carsten

[1]: https://en.wikipedia.org/wiki/GNU_Screen
(*): Yes, I’m fully aware that nothing about text and writing systems is genuinely simple; Einstein/Sessions applies.