Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]

John C Klensin <john-ietf@jck.com> Thu, 19 March 2015 08:48 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: lucid@ietfa.amsl.com
Delivered-To: lucid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2F3941A026C for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 01:48:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.61
X-Spam-Level:
X-Spam-Status: No, score=-4.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eLIGbU9hwGbD for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 01:48:06 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 747AC1A8966 for <lucid@ietf.org>; Thu, 19 Mar 2015 01:48:06 -0700 (PDT)
Received: from [198.252.137.35] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1YYW7X-000NwZ-QQ; Thu, 19 Mar 2015 04:48:03 -0400
Date: Thu, 19 Mar 2015 04:47:58 -0400
From: John C Klensin <john-ietf@jck.com>
To: Asmus Freytag <asmusf@ix.netcom.com>, lucid@ietf.org
Message-ID: <C884797C1998868E85D84343@JcK-HP8200.jck.com>
In-Reply-To: <550A4EC6.3090203@ix.netcom.com>
References: <20150311013300.GC12479@dyn.com> <CA+9kkMDZW9yPtDxtLTfY1=VS6itvHtXHF1qdZKtXdwwORwqnew@mail.gmail.com> <55008F97.8040701@ix.netcom.com> <CA+9kkMAcgSA1Ch0B9W1Np0LMn2udegZ=AzU1b26dAi+SDcbGgg@mail.gmail.com> <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd0 3.prod.outlook.com> <20150311200941.GV15037@mx1.yitter.info> <CY1PR0301MB0731F4EBE5EB5C3340F7059282190@CY1PR0301MB0731.namprd03.prod.outlook.com> <20150319014018.GI5743@mx1.yitter.info> <BLUPR03MB1378184CE32E928A3086665582010@BLUPR03MB1378.namprd03.prod.outlook.com> <20150319023029.GA6046@mx1.yitter.info> <550A4EC6.3090203@ix.netcom.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.35
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <http://mailarchive.ietf.org/arch/msg/lucid/tz_xzvBCy3o_2foj-NPmHe7LbZU>
Subject: Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
X-BeenThere: lucid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Locale-free UniCode Identifiers \(LUCID\)" <lucid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lucid>, <mailto:lucid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lucid/>
List-Post: <mailto:lucid@ietf.org>
List-Help: <mailto:lucid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lucid>, <mailto:lucid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Mar 2015 08:48:08 -0000


--On Wednesday, March 18, 2015 21:21 -0700 Asmus Freytag
<asmusf@ix.netcom.com> wrote:

> As  a btw:
> 
> I'm amazed at the near total lack of IDNs registered for the
> Latin script in the root. It seems that people like the
> "fall-back" nature of non-accented ASCII labels for anything
> that should be accessed universally (top level).
> 
> So, for that script at least, you could say that users don't
> like being surprised by a more linguistically accurate, but
> less universally accessible way of constructing identifiers.
> 
> Interesting....

Or it is a policy artifact.

Suppose I have a language (not, not just script) with
well-defined and often-used conventions for representing
characters in, e.g., a simplified or less decorated form.  I'm
almost certainly going to want that form in the DNS (not just
the root, but at other levels as well).  That is especially
likely to be true if people have been using those simplified
forms in the DNS for 20 or 25 years (as is the case for most
European use of Latin Script) so that "often-used" including
users getting used to those forms in Internet contexts.

Now we introduce IDNs, making "linguistically accurate" forms
possible.  In at least some domains, those IDN forms are
registered alongside the simplified (for Latin script, ASCII)
ones, either as privileged "variants" or because registration of
both forms separately is cheap, certain, and efficient.

Then ICANN come along with IDN rules for the root.   For, e.g.,
Chinese, delegation of both the Simplified and more decorated
Traditional form is straightforward and the marginal cost of
doing so (in dollars and/or aggravation) is almost zero.  For
Latin script, a decision was made (at least at one point) to ban
variants in the root, so the only way to get both the simplified
(i.e., ASCII) form and the decorated (and more correct) one
costs USD 168K in fees, plus whatever it costs to prepare the
application, plus an aggravating review process, plus the
possibility that some committee, process, or bureaucrat will
decide that the decorated form is confusingly similar to the
simplified one and reject the application causing that (probably
USD 200K plus) investment to disappear with no benefits to the
applicant.

The situation might be quite accidentally reinforced by the
observation that acronyms and abbreviations are much more common
and accepted (and likely to end up in the DNS) in many languages
that use Latin script than with some other languages.
Especially for the large subset of those languages for which
typical strings are mostly ASCII with a few "decorated" letters,
those abbreviations or acronyms are reasonably likely to be
ASCII in "linguistically accurate" form.

Under those conditions, would you expect a lot of Latin script,
"linguistically accurate", registrations?

It would be an interesting experiment to ask Latin script
registrants whose own languages make significant use of
non-ASCII characters or those whose chosen labels are an ACII
adaptation whether, if they were offered the option of
registering a "linguistically accurate" form at little or no
charge (and after a simple application) whether or not they
would do so.   Absent that experiment or something equivalent, I
think it is hard to infer anything reliably from your
observation about the number of such strings now registered.

best,
    john