Re: [art] [I18ndir] Modern Network Unicode
John C Klensin <john-ietf@jck.com> Thu, 11 July 2019 17:25 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: art@ietfa.amsl.com
Delivered-To: art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A677A120478; Thu, 11 Jul 2019 10:25:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YI6-KVScqVSU; Thu, 11 Jul 2019 10:25:19 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EC581120475; Thu, 11 Jul 2019 10:25:18 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1hlcol-000L4B-CK; Thu, 11 Jul 2019 13:25:15 -0400
Date: Thu, 11 Jul 2019 13:25:09 -0400
From: John C Klensin <john-ietf@jck.com>
To: Carsten Bormann <cabo@tzi.org>
cc: "Asmus Freytag (c)" <asmusf@ix.netcom.com>, i18ndir@ietf.org, art@ietf.org
Message-ID: <DFB116527FF004C961182B15@PSB>
In-Reply-To: <C7BBF677-E752-4258-A357-AE56338F6326@tzi.org>
References: <0A5251342D480BA6437F7549@PSB> <B243365E-F7C5-4C53-A64F-2E3E87C4CD66@tzi.org> <248A8DD5DA0D3D34D6B6EFC9@PSB> <213ae024-b819-4f56-6e37-0cd53eb566c9@ix.netcom.com> <D921117F-BA9E-430B-8287-06D15248E1B7@tzi.org> <90f8f2b5-ff3d-f9f1-860c-ae4d43f92c81@ix.netcom.com> <7F1F41C25D0AC5960D95A67E@PSB> <C7BBF677-E752-4258-A357-AE56338F6326@tzi.org>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/art/Cbvz_nTqO22Zj8PgKjeiIuV4JNY>
Subject: Re: [art] [I18ndir] Modern Network Unicode
X-BeenThere: art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Applications and Real-Time Area Discussion <art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/art>, <mailto:art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/art/>
List-Post: <mailto:art@ietf.org>
List-Help: <mailto:art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/art>, <mailto:art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jul 2019 17:25:26 -0000
--On Thursday, July 11, 2019 18:44 +0200 Carsten Bormann <cabo@tzi.org> wrote: > On Jul 11, 2019, at 18:31, John C Klensin <john-ietf@jck.com> > wrote: >> >> (top post and adding the ART list back in) >> >> Carsten, >> I think Asmus and I are in complete agreement, but you might >> find an attempt to summarize the state of work for relevant >> characteristics and behaviors on the web, some of which >> interact with this discussion, illuminating. It is a chart >> with explanation, not a long document. See >> https://w3c.github.io/typography/gap-analysis/language-matrix >> .html > > Thank you for this interesting link. > This is very condensed information; what column should I look > for specifically on the issue of normalization and > applicability of NFC? E.g., in the linked page > https://w3c.github.io/iip/gap-analysis/beng-gap I find > something about grapheme clusters but nothing about > normalization — normalization of course is not an immediate > need for Web interoperability. I'll try to hunt for it when I have time unless someone else gets to it first, but there are W3C recommendations that specifically recommend against trying to insert a normalization step before transmission or immediately before storage but focusing on it as a comparison-time operation. That is consistent with what Asmus and I have been trying to tell you. Let me try to summarize those comments in a different way: (1) For most language-script combinations, the form in which a network application gets a string is usually going to be the right form for that string. If there is a strong preference that the string be normalized, it will probably come to the network application (and probably when it leaves something rather close to the keyboard) normalized. If there is a strong preference that it not be normalized, it will probably come to you in _that_ form. If there is no preference at all, normalizing the string (as long as you stay away from NFKC/NFDC) will probably not cause harm, but it won't buy much either. (2) Checking as to whether a string is normalized before sending it is unlikely to produce any safely actionable information unless you know a lot about the language, script, and circumstances. (3) If a string is unnormalized for what whomever created it thinks is a good reason, then normalizing on general principles is going to lose whatever information that decision was intended to convey. The above may be different for strings that are specifically intended as identifiers, but I don't think that is what you are talking about. I used to believe that "normalize everywhere" and "store strings only in normalized form" was a good idea, but I've been gradually (and sometimes quite unpleasantly) educated by a long series of cases and counterexamples. If I were updating 5198 today, that recommendation is the thing I would be most likely to either change or at least moderate and qualify considerably. john
- [art] Modern Network Unicode Carsten Bormann
- Re: [art] Modern Network Unicode Manger, James
- Re: [art] Modern Network Unicode Tim Bray
- Re: [art] Modern Network Unicode Martin Thomson
- Re: [art] Modern Network Unicode — –02 submitted Carsten Bormann
- Re: [art] Modern Network Unicode — –02 submitted Manger, James
- Re: [art] Modern Network Unicode — –02 submitted Peter Occil
- Re: [art] Modern Network Unicode — –02 submitted Larry Masinter
- Re: [art] Modern Network Unicode John C Klensin
- Re: [art] Modern Network Unicode Carsten Bormann
- Re: [art] Modern Network Unicode — –02 submitted John C Klensin
- Re: [art] Modern Network Unicode Dale R. Worley
- Re: [art] Modern Network Unicode John C Klensin
- Re: [art] [I18ndir] Modern Network Unicode John C Klensin
- Re: [art] [I18ndir] Modern Network Unicode Carsten Bormann
- Re: [art] [I18ndir] Modern Network Unicode John C Klensin
- Re: [art] [I18ndir] Modern Network Unicode Ira McDonald
- Re: [art] [I18ndir] Modern Network Unicode Carsten Bormann
- Re: [art] [I18ndir] Modern Network Unicode John C Klensin