Re: [DNSOP] clue w.r.t. arabic

Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> Fri, 19 November 2010 07:43 UTC

Return-Path: <mohta@necom830.hpcl.titech.ac.jp>
X-Original-To: dnsop@core3.amsl.com
Delivered-To: dnsop@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BF4783A68BA for <dnsop@core3.amsl.com>; Thu, 18 Nov 2010 23:43:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.95
X-Spam-Level: *
X-Spam-Status: No, score=1.95 tagged_above=-999 required=5 tests=[AWL=0.080, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, RCVD_IN_BL_SPAMCOP_NET=1.96]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ctDtqiJWJPWn for <dnsop@core3.amsl.com>; Thu, 18 Nov 2010 23:43:58 -0800 (PST)
Received: from necom830.hpcl.titech.ac.jp (necom830.hpcl.titech.ac.jp [131.112.32.132]) by core3.amsl.com (Postfix) with SMTP id 95E813A67E3 for <dnsop@ietf.org>; Thu, 18 Nov 2010 23:43:57 -0800 (PST)
Received: (qmail 51329 invoked from network); 19 Nov 2010 08:18:58 -0000
Received: from vaio.hpcl.titech.ac.jp (HELO ?131.112.32.134?) (131.112.32.134) by necom830.hpcl.titech.ac.jp with SMTP; 19 Nov 2010 08:18:58 -0000
Message-ID: <4CE62AB2.1020604@necom830.hpcl.titech.ac.jp>
Date: Fri, 19 Nov 2010 16:43:46 +0900
From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
MIME-Version: 1.0
To: dnsop@ietf.org
References: <20101115213532.GD322@shinkuro.com> <04856F66-598D-43CC-8164-90178A6F2952@virtualized.org> <4CE283DA.5080606@abenaki.wabanaki.net> <20101116145308.GG1389@shinkuro.com> <alpine.LSU.2.00.1011161601450.14239@hermes-2.csi.cam.ac.uk> <20101116164818.GN1389@shinkuro.com> <alpine.LSU.2.00.1011171101250.14239@hermes-2.csi.cam.ac.uk> <20101117121906.GC3773@shinkuro.com> <8CEF048B9EC83748B1517DC64EA130FB43C309F821@off-win2003-01.ausregistrygroup.local> <4CE52226.70502@necom830.hpcl.titech.ac.jp> <20101118140728.GB5795@shinkuro.com> <4CE5523A.7090407@abenaki.wabanaki.net> <4CE5B12E.8020502@necom830.hpcl.titech.ac.jp> <4CE5C667.8010106@abenaki.wabanaki.net>
In-Reply-To: <4CE5C667.8010106@abenaki.wabanaki.net>
Content-Type: text/plain; charset="ISO-2022-JP"
Content-Transfer-Encoding: 7bit
Subject: Re: [DNSOP] clue w.r.t. arabic
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dnsop>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Nov 2010 07:43:58 -0000

Eric Brunner-Williams wrote:

>> Anyway, Arabic strings are examples of exponential explosions
>> with large coefficients a lot easier to understand for most
>> of you than Chinese ones.
> 
> the density of variant (context dependent) characters in arabic script, 
> whether sampled as text, or sampled as domain names, is sparse,

While meaningful variations would be sparse, meaningful
capitalization of Latin is also sparse. That is, meaningful
capitalization of "mydomain" should be

	myDomain
	Mydomain
	MyDomain
	MYDOMAIN

However, meaningless capitalization such as:

	mYdOmAiN

should also be protected, which should be same for Arabic.

A protection could be rejection of domain registration. But it,
anyway, requires a document of complete and unambiguous
definition of extended case insensitivities.

And, the definition used for TLDs must be international one.

So, the requirement for complete and unambiguous specification
for case insensitivities or canonicalization is same.

> relative
> to the density of "variant" characters in the (unified) han script(s), 
> which is not quite 2^^n,

You seems to be thinking "variant" for Chinese mean simplified
and complex. But there are other types of "variant" Chinese
characters. It's somewhat like plain 'C' without ceddille and
'C' with ceddille, which, sometimes (here is dependency on
locale information), must be treated as identical characters.

> their answers were as i expected, and fail to support a
> "Arabic strings are examples of exponential explosions with
> large coefficients" claim.

They should, naturally, reject meaningless combinations.

						Masataka Ohta