[idn] Re: character tables

Cary Karp <ck@nic.museum> Thu, 03 March 2005 09:07 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id EAA24619 for <idn-archive@lists.ietf.org>; Thu, 3 Mar 2005 04:07:21 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D6mEo-000FLn-CZ for idn-data@psg.com; Thu, 03 Mar 2005 09:03:18 +0000
Received: from [130.242.24.5] (helo=nic.museum) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1D6mEc-000FKO-Gu for idn@ops.ietf.org; Thu, 03 Mar 2005 09:03:06 +0000
Received: from nic.museum (nic.museum [127.0.0.1]) by nic.museum (8.13.1/8.13.1) with ESMTP id j2393YZP007051; Thu, 3 Mar 2005 10:03:34 +0100
Received: from localhost (ck@localhost) by nic.museum (8.13.1/8.13.1/Submit) with ESMTP id j2393YRd007048; Thu, 3 Mar 2005 10:03:34 +0100
Date: Thu, 03 Mar 2005 10:03:34 +0100
From: Cary Karp <ck@nic.museum>
To: idn@ops.ietf.org
Subject: [idn] Re: character tables
In-Reply-To: <42261AC2.3020004@vanderpoel.org>
Message-ID: <Pine.LNX.4.61.0503030934580.6590@nic.museum>
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <421FA55B.9000308@vanderpoel.org> <421FCBD7.8000805@vanderpoel.org> <42227EBF.9040703@vanderpoel.org> <45781B7428C6AA07C3B283BD@scan.jck.com> <42229BBC.8020608@vanderpoel.org> <p0621021ebe484f52c0c5@[10.20.30.249]> <4225ABAB.60002@mozilla.org> <p0621022dbe4ab4b8a3fa@[10.20.30.249]> <42251B80.5050503@vanderpoel.org> <Pine.LNX.4.61.0503020759240.17184@nic.museum> <42261AC2.3020004@vanderpoel.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"; format="flowed"
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk

Quoting Erik van der Poel:

> I've been told that some communities use a set of letters that are 
> currently encode in two different ranges of the Unicode space 
> (e.g. Latin and Cyrillic). Today, my idea is that these 
> communities can "occupy" their "own" part of the DNS space, for 
> example a .tld or a .2ld.tld.

The community occupying 2ld.tld doesn't write the rules that 
determine the character repertoire available for use in .tld, and 
can therefore not necessarily represent even its own name as it 
might ideally prefer. The 2ld.tld folks do get to make the 
corresponding decision for 3ld.2ld.tld (if permitted in .tld 
policy). In the reasonably commonplace situation where all 
subdomains under 2ld.tld are operated by a single entity, coherent 
rules can be applied throughout. This situation is, however, by no 
means the only one that pertains, and it certainly does not apply to 
the point of delegation under a TLD.

Most people would probably use the term 'language' to designate the 
attribute of community identity that is expressed by its use of a 
certain set of letters. A community wishing to project that identity 
into the domain namespace will therefore need either to locate a 
parent domain that accepts the registration of names including the 
needed characters, or convince what would otherwise be the most 
desirable parent to implement that support. Languages are, however, 
frequently shared by numerous communities without any other aspect 
of shared identity, and identical sets of letters often appear in 
more than one language. (This is one of the reasons why gTLDs so 
prominently appear in the present discussion.)

> I have an idea for the Guidelines. As Paul has indicated, the 
> various communities around the world have different needs, and 
> some have already started writing down the rules that they are 
> following in their registries. The JET community comes to mind:

Both the JET and the ICANN guidelines are intended to assist TLD 
operators in establishing safe and responsible IDN policies that 
will prove useful to the broadest number of nameholder communities. 
The JET action addresses the needs of three languages but I doubt 
that the people who use those languages perceive the slightest 
additional sense of 'JET community'. And, as an uncomfortable matter 
of fact, the extent to which the genuinely excellent JET Guidelines 
will be accepted by the target language groups remains to be 
determined.

> Of course, it is much harder to come up with and enforce rules in 
> a "global" TLD like .com

One might think the situation to be straightforward in a ccTLD, but 
there are clear current trends in ccTLD policy development toward 
removing the entry-level requirement for national nexus, and 
permitting the use of more extensive character repertoires than 
would normally be associated with the nominal TLD designation. It is 
by no means uncommon for a country to have more than one official 
language and an even larger number of officially recognized minority 
languages. It is also common for countries to belong to 
multinational alliances, with member states recognizing all of the 
languages used within that union. All this needs to be reflected in 
a ccTLD's IDN policies, which will often require every bit as artful 
juggling of a range of scripts and languages as would be needed in a 
gTLD, with a heavy further amount of political intricacy that a gTLD 
might be able to avoid.

It is true, nonetheless, that a ccTLD operator will generally be in 
a better position to produce an authoritative statement of the 
character repertoire necessary for the IDN representation of one of 
'its' languages, than would the operator of a gTLD serving the same 
language community. For this reason, many gTLDs introduce IDN 
support for a given language first subsequent to a ccTLD clearly 
associated with that language having described its requirements, or 
when a similar statement has been produced by some other obviously 
authoritative group.

Please also keep in mind that the geographic permimeters within 
which many languages are used do not coincide with national 
boundaries, and that many communities do not associate their 
language identities with the national identities of the countries in 
which they reside. IDN provides unprecedented means for such things 
as allowing a diaspora to maintain it sense of cultural cohesion, or 
furthering the cause of a group struggling to have its language 
officially recognized or attempting to reverse threats to its 
survival. In such contexts the national implication of a cc label 
may be undesired, which is another of the reasons why gTLDs so 
prominently appear in the present discussion.

> They can publish the rules that they enforce in their registries, 
> and then the browsers can either allow any character sequence in 
> those labels or check them to see if the rules were indeed 
> followed.

I am grateful that my only headache in this regard is anticipating 
the policy and technical requirements for supporting the thousand or 
so languages that some segment of the museum community may sooner or 
later express interest in representing via IDN in .museum. The same 
repertoire may also appear elsewhere in the TLD space and I 
certainly don't envy the people who intend to devise and implement 
the algorithmic underpinnings for the automation of that process or
the validation of its results :-)

/Cary