Re: [idn] Re: character tables

William Tan <wil@dready.org> Thu, 03 March 2005 06:41 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id BAA18720 for <idn-archive@lists.ietf.org>; Thu, 3 Mar 2005 01:41:48 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D6jti-000GTd-37 for idn-data@psg.com; Thu, 03 Mar 2005 06:33:22 +0000
Received: from [203.117.75.22] (helo=mx1.allumer.com.au) by psg.com with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.44 (FreeBSD)) id 1D6jte-000GSj-71 for idn@ops.ietf.org; Thu, 03 Mar 2005 06:33:20 +0000
Received: (qmail 69225 invoked by uid 0); 3 Mar 2005 14:36:08 +0800
Received: from wil@dready.org by allumer.com.au by uid 0 with qmail-scanner-1.22 (clamdscan: 0.74. spamassassin: 2.63. Clear:RC:0(220.233.74.98):SA:0(?/?):. Processed in 6.318587 secs); 03 Mar 2005 06:36:08 -0000
X-Qmail-Scanner-Mail-From: wil@dready.org via allumer.com.au
X-Qmail-Scanner: 1.22 (Clear:RC:0(220.233.74.98):SA:0(?/?):. Processed in 6.318587 secs)
Received: from unknown (HELO ?192.168.1.7?) (wil@dready.org@220.233.74.98) by mx1.allumer.com.au with RC4-MD5 encrypted SMTP; 3 Mar 2005 14:36:01 +0800
Message-ID: <4226AF7F.9010205@dready.org>
Date: Thu, 03 Mar 2005 17:32:31 +1100
From: William Tan <wil@dready.org>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Erik van der Poel <erik@vanderpoel.org>
CC: Cary Karp <ck@nic.museum>, idn@ops.ietf.org
Subject: Re: [idn] Re: character tables
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <421FA55B.9000308@vanderpoel.org> <421FCBD7.8000805@vanderpoel.org> <42227EBF.9040703@vanderpoel.org> <45781B7428C6AA07C3B283BD@scan.jck.com> <42229BBC.8020608@vanderpoel.org> <p0621021ebe484f52c0c5@[10.20.30.249]> <4225ABAB.60002@mozilla.org> <p0621022dbe4ab4b8a3fa@[10.20.30.249]> <42251B80.5050503@vanderpoel.org> <Pine.LNX.4.61.0503020759240.17184@nic.museum> <42261AC2.3020004@vanderpoel.org>
In-Reply-To: <42261AC2.3020004@vanderpoel.org>
X-Enigmail-Version: 0.89.5.0
X-Enigmail-Supports: pgp-inline, pgp-mime
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

> Other communities have other needs. I've been told that some 
> communities use a set of letters that are currently encoded in two 
> different ranges of the Unicode space (e.g. Latin and Cyrillic). 
> Today, my idea is that these communities can "occupy" their "own" part 
> of the DNS space, for example a .tld or a .2ld.tld. 

If by "community" you mean users of a certain language / culture group 
within a geographic region, yes. In fact, they already do. The Japanese 
already "occupy" .jp, Korean .kr, Chinese in PRC .cn, Chinese in Taiwan 
.tw, Chinese in Singapore .sg, etc.

I'm not sure what you are proposing here - are you saying to allocate 
new TLDs for each "community"?

> They can publish the rules that they enforce in their registries, ...

They already do. The rules are just not in a machine readable format, 
and John has already made the case against standardizing language 
tables, let alone other rules that may not be character-based (imagine 
the .th registry saying, allow both Thai and Latin digits, but not both 
in the same label/domain).

> and then the browsers can either allow any character sequence in those 
> labels or check them to see if the rules were indeed followed.

I'd vote against browsers trying to enforce rules set by various 
registries. This is the sort of thing you'd build into a specialized 
tool (as you mentioned) but not in a general application.

>
> Of course, it is much harder to come up with and enforce rules in a 
> "global" TLD like .com.

Don't forget countries who choose to honour multiple cultures within 
their society - .PL allows many different tables, including Cyrillic 
(but does not allow mixing of Cyrillic and Latin scripts, see Andrzej 
Bartosiewicz's draft.

> As a result, the browsers may simply blacklist .com in its entirety.

It looks like a reasonable interim solution, but I'm worried about 
whether .com can actually get off the list. Unlike DNSBL, if the list is 
hardcoded or statically included in the installation package, it's going 
to be difficult to get off that list.

Come to think of it, as an off-IETF solution, maintaining an IDNBL of 
sorts might be a good idea. I know, it didn't really work for mail 
abuses, the landscape is quite different for the problem at hand though. 
The IDNBL can ban an entire zone based on the reasoning that the zone 
administrator has shown to be negligent (".com"), or ban individual 
domains of known phishers, and can even be used to implement character 
blacklists instead of having them hard wired in the browser.

> Or maybe .com will eventually figure out some rules and actually 
> enforce them in the 2LDs, so that the browsers don't have to check the 
> 2LDs. 

I'm hopeful that this will happen.

> Indeed, in a perfect world, .com would even enforce rules in 3LDs, 
> 4LDs, etc, so that browsers would not have to check those either. 

It's not enforceable in 3LD and beyond. period.


wil.