Re: [DNSOP] draft-liman-tld-names-04

Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> Thu, 18 November 2010 12:55 UTC

Return-Path: <mohta@necom830.hpcl.titech.ac.jp>
X-Original-To: dnsop@core3.amsl.com
Delivered-To: dnsop@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4EAD23A6851 for <dnsop@core3.amsl.com>; Thu, 18 Nov 2010 04:55:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.318
X-Spam-Level:
X-Spam-Status: No, score=-0.318 tagged_above=-999 required=5 tests=[AWL=1.772, BAYES_00=-2.599, GB_I_LETTER=-2, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id At2RNIwwF+zP for <dnsop@core3.amsl.com>; Thu, 18 Nov 2010 04:55:14 -0800 (PST)
Received: from necom830.hpcl.titech.ac.jp (necom830.hpcl.titech.ac.jp [131.112.32.132]) by core3.amsl.com (Postfix) with SMTP id EEF2D3A681A for <dnsop@ietf.org>; Thu, 18 Nov 2010 04:55:13 -0800 (PST)
Received: (qmail 19346 invoked from network); 18 Nov 2010 13:29:59 -0000
Received: from softbank219001188004.bbtec.net (HELO ?192.168.1.21?) (219.1.188.4) by necom830.hpcl.titech.ac.jp with SMTP; 18 Nov 2010 13:29:59 -0000
Message-ID: <4CE52226.70502@necom830.hpcl.titech.ac.jp>
Date: Thu, 18 Nov 2010 21:55:02 +0900
From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
MIME-Version: 1.0
To: dnsop@ietf.org
References: <4CE0F829.1010605@necom830.hpcl.titech.ac.jp> <7F03C666-16F8-49E4-BC56-F5DD441DD970@frobbit.se> <4CE1A110.1060403@necom830.hpcl.titech.ac.jp> <20101115213532.GD322@shinkuro.com> <04856F66-598D-43CC-8164-90178A6F2952@virtualized.org> <4CE283DA.5080606@abenaki.wabanaki.net> <20101116145308.GG1389@shinkuro.com> <alpine.LSU.2.00.1011161601450.14239@hermes-2.csi.cam.ac.uk> <20101116164818.GN1389@shinkuro.com> <alpine.LSU.2.00.1011171101250.14239@hermes-2.csi.cam.ac.uk> <20101117121906.GC3773@shinkuro.com> <8CEF048B9EC83748B1517DC64EA130FB43C309F821@off-win2003-01.ausregistrygroup.local>
In-Reply-To: <8CEF048B9EC83748B1517DC64EA130FB43C309F821@off-win2003-01.ausregistrygroup.local>
Content-Type: text/plain; charset="ISO-2022-JP"
Content-Transfer-Encoding: 7bit
Subject: Re: [DNSOP] draft-liman-tld-names-04
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dnsop>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 12:55:15 -0000

James Mitchell wrote:

> As a thought, consider names written in the Arabic script.

Arabic? Isn't Latin/French localization much more than enough
to show IDNs are not operational.

If not...

> Being
> a cursive script, how is a TLD applicant expected to separate
> 'words' in a top level domain without the use of a hyphen or
> equivalent.

> Removing the spaces will cause the characters to join, and the
> meaning lost (besides which the A-label will contain hyphens
> anyway?!).

> I'm far from qualified to talk authoritatively on the Arabic
> script, or how it will be used in domain names, however I do
> know of parties that will be applying for Arabic TLDs - are
> we to exclude these applicants?

As is explained in wikipedia

   http://en.wikipedia.org/wiki/Arabic_alphabet

arabic letters have four forms, "isolated", "end", "middle"
and "beggining" and the form is determined by the location
of a letter within a word. Thus, they are not different
from Latin distinctions of capital/small letters, which is
determined by the location of a letter within a sentence.

So, it is an extended case insensitivity problem actively
ignored by people working on Unicode.

Moreover, as the wikipedia entry says:

   For compatibility with previous standards, all these forms can
   be encoded separately in Unicode; however, they can also be
   inferred from their joining context, using the same encoding.
   The following table shows this common encoding, in addition to
   the compatibility encodings for their normally contextual
   forms (Arabic texts should be encoded today using only the
   common encoding, but the rendering must then infer the joining
   types to determine the correct glyph forms, with or without
   ligation).

while separate code points of Unicode are used between different
forms, extended case insensitivity between forms of Arabic
characters is not specified at all.

The situation is even more confusing than Latin/French, which
is why I wrote:

   It should be noted that, extended case insensitivities beyond
   European characters, such as correspondence between Chinese
   ones, the problem is even more unsolvable.

It is trivially easy to declare some broken specification as
IDN 2008 or something like that, but it does not make localized
domain names operational.

						Masataka Oh