[idn] IDN spoofing

Erik van der Poel <erik@vanderpoel.org> Fri, 18 February 2005 13:48 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA02526 for <idn-archive@lists.ietf.org>; Fri, 18 Feb 2005 08:48:09 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D28OH-0009Q1-A3 for idn-data@psg.com; Fri, 18 Feb 2005 13:41:53 +0000
Received: from [207.115.63.101] (helo=pimout2-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1D28OB-0009Ox-Vy for idn@ops.ietf.org; Fri, 18 Feb 2005 13:41:48 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout2-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j1IDfjuq428776; Fri, 18 Feb 2005 08:41:46 -0500
Message-ID: <4215F099.9020002@vanderpoel.org>
Date: Fri, 18 Feb 2005 05:41:45 -0800
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0 (X11/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: unicode@unicode.org, idn@ops.ietf.org
Subject: [idn] IDN spoofing
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

All,

This email is being sent to both the Unicode and IDN mailing lists. I'm 
wondering how we can move forward with the IDN spoofing issue. Let me 
take a stab at it.

Regarding the proposal to unify (or map) all the homographs, Doug Ewell 
wrote a humorous email illustrating how difficult such an effort would be:

http://ops.ietf.org/lists/idn/idn.2002/msg00498.html

John Klensin says that a "one label, one language" rule has been 
suggested to combat look-alike confusion. See section 1.5.1 in:

http://www.ietf.org/internet-drafts/draft-klensin-reg-guidelines-06.txt

Indeed, this label-based idea makes sense because DNS is 
administratively divided into labels. For example, the .com operator 
might be able to impose some restrictions on the 2nd level domain label, 
but if someone registers foo.com, then it's up to them to decide what 
will be allowed at the 3rd level (e.g. bar.foo.com). No?

Recent discussion on the IDN mailing list has suggested that we might 
want to think more in terms of *script* than language. However, I note 
that there is a very diverse history of mixing scripts:

http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_mixing.html

But do we really need to allow for such rich script mixing in DNS? Some 
of the script mixing described in the document above is scholarly 
transliteration or "one-offs".

So, instead, I propose that we start thinking of a "one label, one 
writing system" rule. The Unicode book defines "writing system" as "a 
set of rules for using one or more scripts to write a particular language".

This makes a lot of sense for some of the ccTLDs. For example, the .jp 
domain could choose to allow the Japanese writing system in the 2nd 
level domain label.

But what can we do about .com? It's clearly a worldwide TLD now. It 
should probably allow multiple writing systems. Perhaps the .com 
operator could specify that 2nd level domain labels must stick to one 
writing system, and that that writing system must be indicated in the 
RRP (Registry Registrar Protocol) in order to validate the 2nd level 
name against the table of characters allowed in that writing system.

This would probably require a (new?) set of names for writing systems, 
somewhat similar to the language tags of ISO 639.

Some people might point out that it is unfair to impose a writing system 
rule on domain labels since DNS has not had such restrictions in the 
past. Or has it? The DNS spec itself may allow various octet values, but 
the infrastructure and conventions appear to be restricted to some of 
the ASCII characters, which I guess you could just call the English 
writing system, no?

Also, I'm guessing that any "one label, one writing system" rule cannot 
really be mandated, since TLD operators have historically been free to 
do whatever they want, to make as much money as they want. So this rule 
would just be a guideline (Klensin's document is titled "Suggested 
Practices ...") and the TLD operators could follow it, if they wish to 
combat the IDN spoofing problem more than they wish to make money (in 
the short term :-)

Erik