Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]

Shawn Steele <Shawn.Steele@microsoft.com> Thu, 19 March 2015 10:26 UTC

Return-Path: <Shawn.Steele@microsoft.com>
X-Original-To: lucid@ietfa.amsl.com
Delivered-To: lucid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA73C1A893A for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 03:26:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.902
X-Spam-Level:
X-Spam-Status: No, score=-3.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pMSH57UpMDSA for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 03:26:54 -0700 (PDT)
Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0792.outbound.protection.outlook.com [IPv6:2a01:111:f400:fc10::1:792]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6A8921A0161 for <lucid@ietf.org>; Thu, 19 Mar 2015 03:26:54 -0700 (PDT)
Received: from BLUPR03MB1378.namprd03.prod.outlook.com (25.163.81.12) by BLUPR03MB214.namprd03.prod.outlook.com (10.255.212.151) with Microsoft SMTP Server (TLS) id 15.1.118.15; Thu, 19 Mar 2015 10:26:34 +0000
Received: from BLUPR03MB1378.namprd03.prod.outlook.com (25.163.81.12) by BLUPR03MB1378.namprd03.prod.outlook.com (25.163.81.12) with Microsoft SMTP Server (TLS) id 15.1.112.19; Thu, 19 Mar 2015 10:26:33 +0000
Received: from BLUPR03MB1378.namprd03.prod.outlook.com ([25.163.81.12]) by BLUPR03MB1378.namprd03.prod.outlook.com ([25.163.81.12]) with mapi id 15.01.0112.000; Thu, 19 Mar 2015 10:26:33 +0000
From: Shawn Steele <Shawn.Steele@microsoft.com>
To: John C Klensin <john-ietf@jck.com>, Asmus Freytag <asmusf@ix.netcom.com>, "lucid@ietf.org" <lucid@ietf.org>
Thread-Topic: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
Thread-Index: AQHQYeWvrIxB8XNhl06inScNBJS0IJ0jC27ggAAKHACAAB7+AIAASngAgAAaawA=
Date: Thu, 19 Mar 2015 10:26:32 +0000
Message-ID: <BLUPR03MB1378E8DC91D298DB612C678582010@BLUPR03MB1378.namprd03.prod.outlook.com>
References: <20150311013300.GC12479@dyn.com> <CA+9kkMDZW9yPtDxtLTfY1=VS6itvHtXHF1qdZKtXdwwORwqnew@mail.gmail.com> <55008F97.8040701@ix.netcom.com> <CA+9kkMAcgSA1Ch0B9W1Np0LMn2udegZ=AzU1b26dAi+SDcbGgg@mail.gmail.com> <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd0 3.prod.outlook.com> <20150311200941.GV15037@mx1.yitter.info> <CY1PR0301MB0731F4EBE5EB5C3340F7059282190@CY1PR0301MB0731.namprd03.prod.outlook.com> <20150319014018.GI5743@mx1.yitter.info> <BLUPR03MB1378184CE32E928A3086665582010@BLUPR03MB1378.namprd03.prod.outlook.com> <20150319023029.GA6046@mx1.yitter.info> <550A4EC6.3090203@ix.netcom.com> <C884797C1998868E85D84343@JcK-HP8200.jck.com>
In-Reply-To: <C884797C1998868E85D84343@JcK-HP8200.jck.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [50.34.94.236]
authentication-results: jck.com; dkim=none (message not signed) header.d=none;
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:; SRVR:BLUPR03MB1378; UriScan:; BCL:0; PCL:0; RULEID:; SRVR:BLUPR03MB214;
x-forefront-antispam-report: BMV:1; SFV:NSPM; SFS:(10019020)(6009001)(377454003)(24454002)(51704005)(33656002)(66066001)(15975445007)(102836002)(40100003)(122556002)(74316001)(76176999)(54356999)(50986999)(76576001)(2656002)(87936001)(107886001)(106116001)(19580405001)(19580395003)(86362001)(92566002)(2950100001)(2900100001)(93886004)(77156002)(62966003)(46102003)(2501003)(99286002)(7059030)(222073002)(220923002); DIR:OUT; SFP:1102; SCL:1; SRVR:BLUPR03MB1378; H:BLUPR03MB1378.namprd03.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en;
x-microsoft-antispam-prvs: <BLUPR03MB13785FA5A91B6177C09251D182010@BLUPR03MB1378.namprd03.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:;
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(5002010)(5005006); SRVR:BLUPR03MB1378; BCL:0; PCL:0; RULEID:; SRVR:BLUPR03MB1378;
x-forefront-prvs: 052017CAF1
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Mar 2015 10:26:32.9347 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR03MB1378
X-OriginatorOrg: microsoft.onmicrosoft.com
Archived-At: <http://mailarchive.ietf.org/arch/msg/lucid/TiltN_DgpHMx8h0eF5P22gzyhKk>
Subject: Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
X-BeenThere: lucid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Locale-free UniCode Identifiers \(LUCID\)" <lucid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lucid>, <mailto:lucid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lucid/>
List-Post: <mailto:lucid@ietf.org>
List-Help: <mailto:lucid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lucid>, <mailto:lucid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Mar 2015 10:26:56 -0000

I agree with John on this one... there's a lot of momentum behind Latin, IDN is really new, and there are still systems that don't handle it (so I "need" to have a pure ASCII name anyway).  Businesses could register alternates, but I'm always amazed at the simple ASCII-only variations that they don't bother to register/protect/claim.

I think it'll take the non-Latin scripts really taking off to kick Latin into gear.  The other scripts have a much bigger incentive to make sure their systems work with IDN.  They'll drag the Latin crowd along, but it's going to take a bit.

-Shawn

-----Original Message-----
From: Lucid [mailto:lucid-bounces@ietf.org] On Behalf Of John C Klensin
Sent: Thursday, March 19, 2015 1:48 AM
To: Asmus Freytag; lucid@ietf.org
Subject: Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]



--On Wednesday, March 18, 2015 21:21 -0700 Asmus Freytag <asmusf@ix.netcom.com> wrote:

> As  a btw:
> 
> I'm amazed at the near total lack of IDNs registered for the Latin 
> script in the root. It seems that people like the "fall-back" nature 
> of non-accented ASCII labels for anything that should be accessed 
> universally (top level).
> 
> So, for that script at least, you could say that users don't like 
> being surprised by a more linguistically accurate, but less 
> universally accessible way of constructing identifiers.
> 
> Interesting....

Or it is a policy artifact.

Suppose I have a language (not, not just script) with well-defined and often-used conventions for representing characters in, e.g., a simplified or less decorated form.  I'm almost certainly going to want that form in the DNS (not just the root, but at other levels as well).  That is especially likely to be true if people have been using those simplified forms in the DNS for 20 or 25 years (as is the case for most European use of Latin Script) so that "often-used" including users getting used to those forms in Internet contexts.

Now we introduce IDNs, making "linguistically accurate" forms possible.  In at least some domains, those IDN forms are registered alongside the simplified (for Latin script, ASCII) ones, either as privileged "variants" or because registration of both forms separately is cheap, certain, and efficient.

Then ICANN come along with IDN rules for the root.   For, e.g.,
Chinese, delegation of both the Simplified and more decorated Traditional form is straightforward and the marginal cost of doing so (in dollars and/or aggravation) is almost zero.  For Latin script, a decision was made (at least at one point) to ban variants in the root, so the only way to get both the simplified (i.e., ASCII) form and the decorated (and more correct) one costs USD 168K in fees, plus whatever it costs to prepare the application, plus an aggravating review process, plus the possibility that some committee, process, or bureaucrat will decide that the decorated form is confusingly similar to the simplified one and reject the application causing that (probably USD 200K plus) investment to disappear with no benefits to the applicant.

The situation might be quite accidentally reinforced by the observation that acronyms and abbreviations are much more common and accepted (and likely to end up in the DNS) in many languages that use Latin script than with some other languages.
Especially for the large subset of those languages for which typical strings are mostly ASCII with a few "decorated" letters, those abbreviations or acronyms are reasonably likely to be ASCII in "linguistically accurate" form.

Under those conditions, would you expect a lot of Latin script, "linguistically accurate", registrations?

It would be an interesting experiment to ask Latin script registrants whose own languages make significant use of non-ASCII characters or those whose chosen labels are an ACII adaptation whether, if they were offered the option of registering a "linguistically accurate" form at little or no charge (and after a simple application) whether or not they
would do so.   Absent that experiment or something equivalent, I
think it is hard to infer anything reliably from your observation about the number of such strings now registered.

best,
    john



_______________________________________________
Lucid mailing list
Lucid@ietf.org
https://www.ietf.org/mailman/listinfo/lucid