Re: [I18ndir] [art] Modern Network Unicode
John C Klensin <john-ietf@jck.com> Wed, 10 July 2019 07:10 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B7960120100; Wed, 10 Jul 2019 00:10:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D-Szb4cNct3s; Wed, 10 Jul 2019 00:10:02 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4DA6B120103; Wed, 10 Jul 2019 00:10:02 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1hl6jo-000EXk-Ad; Wed, 10 Jul 2019 03:10:00 -0400
Date: Wed, 10 Jul 2019 03:09:54 -0400
From: John C Klensin <john-ietf@jck.com>
To: Carsten Bormann <cabo@tzi.org>
cc: art@ietf.org, i18ndir@ietf.org
Message-ID: <248A8DD5DA0D3D34D6B6EFC9@PSB>
In-Reply-To: <B243365E-F7C5-4C53-A64F-2E3E87C4CD66@tzi.org>
References: <0A5251342D480BA6437F7549@PSB> <B243365E-F7C5-4C53-A64F-2E3E87C4CD66@tzi.org>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/Ha-Dhqz8GXmKIeD-41xqri0nFk4>
Subject: Re: [I18ndir] [art] Modern Network Unicode
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jul 2019 07:10:06 -0000
--On Tuesday, July 9, 2019 23:50 +0200 Carsten Bormann <cabo@tzi.org> wrote: > Hi John, > > thank you for chiming in. I would have summoned you (and > i18ndir) in time; in the current phase of rapid respins of the > document I was first trying to make it useful before trying to > nail down the details. The counterargument is that it is better to catch any significant misconceptions early. I haven't seen those in your draft(s), but couldn't know without looking through them. > I was a toddler when ASA X3.4-1963 was released, but I caught > up to ASCII about a dozen years later. I have used ASR33s > (although I spent more time on LA36s), and I have fully > internalized the teleprinter model that led to NVT. (I also > spent some time on glass tty models of teleprinters [1].) But > while the teleprinter model has originally shaped what became > "text files", the latter are now living a life of their > own. MNU in effect tries to codify the emancipation of text > from the NVT model, and I'm not surprised that this breaking > away causes some sorrow. Understood. And, as you have probably figured out, 5198 was an attempt to update NVT but to do so in as direct a way as possible, which meant carrying some of the NVT baggage along. I also suspect from your explanation that I'm close to 20 years older that you are and have a larger selection of contemporaneous scars from that period. Probably less sorrow or nostalgia than you might assume, but a keen sense of why some things got to be the way they are. I will warn you that another effect on me of that period is an inclination toward simplicity and a minimum of options. That is not specifically an NVT issue, but NVT (and early Telnet) show some of that thinking. I would even claim that one of the reasons the Internet succeeded is that many people treated "profile" as nearly an obscenity. In principle, I have no difficulty with what I think you are trying to do. I do think it is going to need more explanation about why, scope of applications, etc., than the I-D has now. Probably that can wait although getting to at least some of it relatively early may help clarify what this is all about. > Some specific comments: > >> developments. It may be a bit ironic that your solution to >> some other problems is to devise yet another network-standard >> form, especially one that has options that basically >> encourage (or require) profiles (or, if you prefer, >> combinations of variances). > > Most new applications will be fine with (1D) CMNU or (2D) MNU > with lines. But sometimes legacy rears its ugly head, and > therefore additional variances are defined. I believe it is > much better to expressly define these variances than to lump > all the legacy into one big blob (single profile) that > confuses the cleaner applications of MNU. I will need to think more about this. It may depend a bit on how the variances are presented and what you encourage and discourage. And I'd be happier if, e.g., what is needed for 2D were separated more clearly from accommodations for legacy situations. > Now what is text being used for? Some is exclusively for > display to humans in 1D (CMNU) or 2D (MNU with lines) > environments. Some is then also used by machines, and that is > where easy comparison comes in. But the advantages of > reasonably predictable encoding go way beyond that; which is > why NFC coding is a de facto standard in most places that > would use MNU. (NFKC is in the current document mostly as a > reminder that variances can be made in normalization, as well; > it is probably the only reasonable one beyond NFC among the > normalization forms, but has its own problems as you note.) This difference in perspective may or may not be useful, but NFC is also a close approximation to what any sensible terminal driver or IME is going to produce natively from a plausible keyboard layout for the relevant script. It would just make no obvious sense to start decomposing characters selected by a single keystroke (even if multiple fingers were involved), table entry selection, or equivalent. If you stick to Unicode-specified normalization (and there are no generally-known alternatives), there is nothing else "beyond" NFC. NFD and NFKD decompose, which has disadvantages implied above by both your causal hypothesis and mine. On the other hand, there is at least one modern operating system that, IIR, prefers NFD internally. > The CRLF (which really has survived only because the ASR33 > needed up to 200 ms for a CR and the LF was a good excuse to > waste those other 100 ms) is no longer needed; with the > exception of a single popular operating system family, bare LF > has won the line ending competition. CRLF is one legacy > feature that it's worth getting rid of (except maybe as the > "CR-tolerance" variance), as is HT. (I'm assuming FF and > VT are no longer relevant in most of today's applications.) > All this may be a bit opinionated, but it is also > forward-looking. If full NVT is needed, we can always > reference RFC 5198; MNU is for the cases where that is not > needed. I think you are wrong about the above although I could provide an even better justification for obsolescence based on trying to simulate "extended Latin" characters by overstriking. For example, in Unicode, we have "ó" (U+00F3) and U+006F U+0301, which turns into U+00F3 if NFC is applied, but consider the ASCII o BS ' an example that turns out to be historically very important. > I hope this doesn't come over as too brash — I really like > RFC 5198 and the careful tradeoffs it makes, but there are > lots of applications that need something simpler and MNU > reflects current practice and is a sane design for the 2020s+ > (*). As I said, I thoroughly approve of this effort. For the record, I do have one other concern. The examples above use extended Latin script. Because of its NVT origins, much of 5198 makes assumptions about that script or scripts closely related to it. If you are doing something for this century and beyond, you should really think carefully about the implications of scripts that are very different. For example, does anything change when using a script that is primarily RtoL? Do you need to be concerned about non-spacing breaks and other usually-invisible separators for strings that might be wrapped in display or where graphemes are intrinsically not monospaced? Do you need rules about rational string composition or character sequences for complex scripts? At one level, those questions are where your Einstein/Sessions comment really fits in. At another, I think what you are trying to do gets much less useful if you blow those questions off so think you should at least try to work through them. best, john > > Grüße, Carsten > > [1]: https://en.wikipedia.org/wiki/GNU_Screen > (*): Yes, I'm fully aware that nothing about text and > writing systems is genuinely simple; Einstein/Sessions applies. >
- Re: [I18ndir] [art] Modern Network Unicode John C Klensin
- Re: [I18ndir] [art] Modern Network Unicode Carsten Bormann
- Re: [I18ndir] [art] Modern Network Unicode Asmus Freytag
- Re: [I18ndir] [art] Modern Network Unicode John C Klensin
- Re: [I18ndir] [art] Modern Network Unicode Asmus Freytag
- Re: [I18ndir] [art] Modern Network Unicode John C Klensin
- Re: [I18ndir] [art] Modern Network Unicode Carsten Bormann
- Re: [I18ndir] [art] Modern Network Unicode Asmus Freytag (c)
- Re: [I18ndir] [art] Modern Network Unicode John C Klensin
- Re: [I18ndir] [art] Modern Network Unicode Carsten Bormann
- Re: [I18ndir] [art] Modern Network Unicode Patrik Fältström
- Re: [I18ndir] [art] Modern Network Unicode John C Klensin
- Re: [I18ndir] [art] Modern Network Unicode Ira McDonald
- Re: [I18ndir] [art] Modern Network Unicode Carsten Bormann
- Re: [I18ndir] [art] Modern Network Unicode John C Klensin