Re: [I18ndir] [art] Modern Network Unicode

John C Klensin <> Wed, 10 July 2019 07:10 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id B7960120100; Wed, 10 Jul 2019 00:10:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id D-Szb4cNct3s; Wed, 10 Jul 2019 00:10:02 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 4DA6B120103; Wed, 10 Jul 2019 00:10:02 -0700 (PDT)
Received: from [] (helo=PSB) by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1hl6jo-000EXk-Ad; Wed, 10 Jul 2019 03:10:00 -0400
Date: Wed, 10 Jul 2019 03:09:54 -0400
From: John C Klensin <>
To: Carsten Bormann <>
Message-ID: <248A8DD5DA0D3D34D6B6EFC9@PSB>
In-Reply-To: <>
References: <0A5251342D480BA6437F7549@PSB> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Archived-At: <>
Subject: Re: [I18ndir] [art] Modern Network Unicode
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 10 Jul 2019 07:10:06 -0000

--On Tuesday, July 9, 2019 23:50 +0200 Carsten Bormann
<>; wrote:

> Hi John,
> thank you for chiming in.  I would have summoned you (and
> i18ndir) in time; in the current phase of rapid respins of the
> document I was first trying to make it useful before trying to
> nail down the details.

The counterargument is that it is better to catch any
significant misconceptions early.  I haven't seen those in your
draft(s), but couldn't know without looking through them.

> I was a toddler when ASA X3.4-1963 was released, but I caught
> up to ASCII about a dozen years later.  I have used ASR33s
> (although I spent more time on LA36s), and I have fully
> internalized the teleprinter model that led to NVT.  (I also
> spent some time on glass tty models of teleprinters [1].)  But
> while the teleprinter model has originally shaped what became
> "text files", the latter are now living a life of their
> own.  MNU in effect tries to codify the emancipation of text
> from the NVT model, and I'm not surprised that this breaking
> away causes some sorrow.

Understood.  And, as you have probably figured out, 5198 was an
attempt to update NVT but to do so in as direct a way as
possible, which meant carrying some of the NVT baggage along.
I also suspect from your explanation that I'm close to 20 years
older that you are and have a larger selection of
contemporaneous scars from that period.  Probably less sorrow or
nostalgia than you might assume, but a keen sense of why some
things got to be the way they are.

I will warn you that another effect on me of that period is an
inclination toward simplicity and a minimum of options.  That is
not specifically an NVT issue, but NVT (and early Telnet) show
some of that thinking.  I would even claim that one of the
reasons the Internet succeeded is that many people treated
"profile" as nearly an obscenity.

In principle, I have no difficulty with what I think you are
trying to do.  I do think it is going to need more explanation
about why, scope of applications, etc., than the I-D has now.
Probably that can wait although getting to at least some of it
relatively early may help clarify what this is all about.

> Some specific comments:
>> developments.  It may be a bit ironic that your solution to
>> some other problems is to devise yet another network-standard
>> form, especially one that has options that basically
>> encourage (or require) profiles (or, if you prefer,
>> combinations of variances).
> Most new applications will be fine with (1D) CMNU or (2D) MNU
> with lines. But sometimes legacy rears its ugly head, and
> therefore additional variances are defined.  I believe it is
> much better to expressly define these variances than to lump
> all the legacy into one big blob (single profile) that
> confuses the cleaner applications of MNU.

I will need to think more about this.  It may depend a bit on
how the variances are presented and what you encourage and
discourage.  And I'd be happier if, e.g., what is needed for 2D
were separated more clearly from accommodations for legacy

> Now what is text being used for?  Some is exclusively for
> display to humans in 1D (CMNU) or 2D (MNU with lines)
> environments. Some is then also used by machines, and that is
> where easy comparison comes in. But the advantages of
> reasonably predictable encoding go way beyond that; which is
> why NFC coding is a de facto standard in most places that
> would use MNU. (NFKC is in the current document mostly as a
> reminder that variances can be made in normalization, as well;
> it is probably the only reasonable one beyond NFC among the
> normalization forms, but has its own problems as you note.)

This difference in perspective may or may not be useful, but NFC
is also a close approximation to what any sensible terminal
driver or IME is going to produce natively from a plausible
keyboard layout for the relevant script.  It would just make no
obvious sense to start decomposing characters selected by a
single keystroke (even if multiple fingers were involved), table
entry selection, or equivalent.  If you stick to
Unicode-specified normalization (and there are no
generally-known alternatives), there is nothing else "beyond"
NFC.  NFD and NFKD decompose, which has disadvantages implied
above by both your causal hypothesis and mine.  On the other
hand, there is at least one modern operating system that, IIR,
prefers NFD internally. 

> The CRLF (which really has survived only because the ASR33
> needed up to 200 ms for a CR and the LF was a good excuse to
> waste those other 100 ms) is no longer needed; with the
> exception of a single popular operating system family, bare LF
> has won the line ending competition.  CRLF is one legacy
> feature that it's worth getting rid of (except maybe as the
> "CR-tolerance" variance), as is HT.  (I'm assuming FF and
> VT are no longer relevant in most of today's applications.)
> All this may be a bit opinionated, but it is also
> forward-looking.  If full NVT is needed, we can always
> reference RFC 5198; MNU is for the cases where that is not
> needed.

I think you are wrong about the above although I could provide
an even better justification for obsolescence based on trying to
simulate "extended Latin" characters by overstriking.  For
example, in Unicode, we have "ó" (U+00F3) and U+006F U+0301,
which turns into U+00F3 if NFC is applied, but consider the
   o BS '
an example that turns out to be historically very important.
> I hope this doesn't come over as too brash — I really like
> RFC 5198 and the careful tradeoffs it makes, but there are
> lots of applications that need something simpler and MNU
> reflects current practice and is a sane design for the 2020s+
> (*).

As I said, I thoroughly approve of this effort.

For the record, I do have one other concern.  The examples above
use extended Latin script.  Because of its NVT origins, much of
5198 makes assumptions about that script or scripts closely
related to it.  If you are doing something for this century and
beyond, you should really think carefully about the implications
of scripts that are very different.   For example, does anything
change when using a script that is primarily RtoL?  Do you need
to be concerned about non-spacing breaks and other
usually-invisible separators for strings that might be wrapped
in display or where graphemes are intrinsically not monospaced?
Do you need rules about rational string composition or character
sequences for complex scripts?  At one level, those questions
are where your Einstein/Sessions comment really fits in.  At
another, I  think what you are trying to do gets much less
useful if you blow those questions off so think you should at
least try to work through them.


> Grüße, Carsten
> [1]:
> (*): Yes, I'm fully aware that nothing about text and
> writing systems is genuinely simple; Einstein/Sessions applies.