Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
John C Klensin <john-ietf@jck.com> Thu, 19 March 2015 08:48 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: lucid@ietfa.amsl.com
Delivered-To: lucid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2F3941A026C for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 01:48:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.61
X-Spam-Level:
X-Spam-Status: No, score=-4.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eLIGbU9hwGbD for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 01:48:06 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 747AC1A8966 for <lucid@ietf.org>; Thu, 19 Mar 2015 01:48:06 -0700 (PDT)
Received: from [198.252.137.35] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1YYW7X-000NwZ-QQ; Thu, 19 Mar 2015 04:48:03 -0400
Date: Thu, 19 Mar 2015 04:47:58 -0400
From: John C Klensin <john-ietf@jck.com>
To: Asmus Freytag <asmusf@ix.netcom.com>, lucid@ietf.org
Message-ID: <C884797C1998868E85D84343@JcK-HP8200.jck.com>
In-Reply-To: <550A4EC6.3090203@ix.netcom.com>
References: <20150311013300.GC12479@dyn.com> <CA+9kkMDZW9yPtDxtLTfY1=VS6itvHtXHF1qdZKtXdwwORwqnew@mail.gmail.com> <55008F97.8040701@ix.netcom.com> <CA+9kkMAcgSA1Ch0B9W1Np0LMn2udegZ=AzU1b26dAi+SDcbGgg@mail.gmail.com> <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd0 3.prod.outlook.com> <20150311200941.GV15037@mx1.yitter.info> <CY1PR0301MB0731F4EBE5EB5C3340F7059282190@CY1PR0301MB0731.namprd03.prod.outlook.com> <20150319014018.GI5743@mx1.yitter.info> <BLUPR03MB1378184CE32E928A3086665582010@BLUPR03MB1378.namprd03.prod.outlook.com> <20150319023029.GA6046@mx1.yitter.info> <550A4EC6.3090203@ix.netcom.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.35
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <http://mailarchive.ietf.org/arch/msg/lucid/tz_xzvBCy3o_2foj-NPmHe7LbZU>
Subject: Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
X-BeenThere: lucid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Locale-free UniCode Identifiers \(LUCID\)" <lucid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lucid>, <mailto:lucid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lucid/>
List-Post: <mailto:lucid@ietf.org>
List-Help: <mailto:lucid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lucid>, <mailto:lucid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Mar 2015 08:48:08 -0000
--On Wednesday, March 18, 2015 21:21 -0700 Asmus Freytag <asmusf@ix.netcom.com> wrote: > As a btw: > > I'm amazed at the near total lack of IDNs registered for the > Latin script in the root. It seems that people like the > "fall-back" nature of non-accented ASCII labels for anything > that should be accessed universally (top level). > > So, for that script at least, you could say that users don't > like being surprised by a more linguistically accurate, but > less universally accessible way of constructing identifiers. > > Interesting.... Or it is a policy artifact. Suppose I have a language (not, not just script) with well-defined and often-used conventions for representing characters in, e.g., a simplified or less decorated form. I'm almost certainly going to want that form in the DNS (not just the root, but at other levels as well). That is especially likely to be true if people have been using those simplified forms in the DNS for 20 or 25 years (as is the case for most European use of Latin Script) so that "often-used" including users getting used to those forms in Internet contexts. Now we introduce IDNs, making "linguistically accurate" forms possible. In at least some domains, those IDN forms are registered alongside the simplified (for Latin script, ASCII) ones, either as privileged "variants" or because registration of both forms separately is cheap, certain, and efficient. Then ICANN come along with IDN rules for the root. For, e.g., Chinese, delegation of both the Simplified and more decorated Traditional form is straightforward and the marginal cost of doing so (in dollars and/or aggravation) is almost zero. For Latin script, a decision was made (at least at one point) to ban variants in the root, so the only way to get both the simplified (i.e., ASCII) form and the decorated (and more correct) one costs USD 168K in fees, plus whatever it costs to prepare the application, plus an aggravating review process, plus the possibility that some committee, process, or bureaucrat will decide that the decorated form is confusingly similar to the simplified one and reject the application causing that (probably USD 200K plus) investment to disappear with no benefits to the applicant. The situation might be quite accidentally reinforced by the observation that acronyms and abbreviations are much more common and accepted (and likely to end up in the DNS) in many languages that use Latin script than with some other languages. Especially for the large subset of those languages for which typical strings are mostly ASCII with a few "decorated" letters, those abbreviations or acronyms are reasonably likely to be ASCII in "linguistically accurate" form. Under those conditions, would you expect a lot of Latin script, "linguistically accurate", registrations? It would be an interesting experiment to ask Latin script registrants whose own languages make significant use of non-ASCII characters or those whose chosen labels are an ACII adaptation whether, if they were offered the option of registering a "linguistically accurate" form at little or no charge (and after a simple application) whether or not they would do so. Absent that experiment or something equivalent, I think it is hard to infer anything reliably from your observation about the number of such strings now registered. best, john
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- [Lucid] [mark@macchiato.com: Re: Non-normalizable… Andrew Sullivan
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Ted Hardie
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Ted Hardie
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Shawn Steele
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Andrew Sullivan
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… John C Klensin
- [Lucid] FW: [mark@macchiato.com: Re: Non-normaliz… Shawn Steele