[idn] dichotomies

Erik van der Poel <erik@vanderpoel.org> Sun, 27 February 2005 19:15 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA23276 for <idn-archive@lists.ietf.org>; Sun, 27 Feb 2005 14:15:15 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D5Tma-0009RK-7x for idn-data@psg.com; Sun, 27 Feb 2005 19:08:48 +0000
Received: from [207.115.63.77] (helo=pimout1-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1D5TmY-0009Qx-Ta for idn@ops.ietf.org; Sun, 27 Feb 2005 19:08:47 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout1-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j1RJ8eSJ218144; Sun, 27 Feb 2005 14:08:40 -0500
Message-ID: <42221AB7.9070000@vanderpoel.org>
Date: Sun, 27 Feb 2005 11:08:39 -0800
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0 (X11/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: IETF idn working group <idn@ops.ietf.org>
Subject: [idn] dichotomies
References: <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <20050226081913.GD14956~@nicemice.net>
In-Reply-To: <20050226081913.GD14956~@nicemice.net>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Adam M. Costello wrote:
> That (or something very similar) was a principle that went into the
> IDNA spec.  I personally was inclined to define both internationalized
> domain names and internationalized host names, where the former would
> be completely general (allowing *all* Unicode characters, even the
> invisible ones), and the latter would be much narrower (excluding most
> punctuation and symbols).  This would be an analogy to traditional
> domain names (which allow all ASCII characters, even control characters)
> and traditional host names (which allow only the ASCII letters, digits,
> and one punctuation mark, the hyphen-minus).
> 
> On the other hand, there was an argument that the traditional
> distinction between domain names and host names was the source of
> endless confusion and debate, and was a mistake that should not be
> repeated with IDNs.  I have some sympathy for that argument.
> 
> In any case, we ended up with just one set of non-ASCII characters for
> IDNs, between the two extremes: only invisible characters are excluded.
> (I think there's one exception--a visible space character that is also
> excluded).

Another bifurcation that could be considered somewhat analogous is that 
of http vs https. We might even want to consider bringing the topic of 
security into the ACE prefix discussion. One could imagine a world where 
two different ACE prefixes co-exist, one new prefix for "secure" domain 
labels, the other (old) prefix for less secure labels. The secure prefix 
would have similar encoding and decoding rules, but would not have the 
sometimes-confusing mappings currently found in nameprep, and would 
prohibit a rather large number of Unicode characters and/or character 
types (for future expansion).

We might then choose "xn--s" as the prefix, so that the raw Punycode 
form would also be more secure since there would be an 's' next to 
whatever follows, rather than a hyphen, which looks more like a 
delimiter. E.g. xn--spypal-4ve instead of xn--pypal-4ve. Note that the 
spypal looks quite different from pypal. Of course, this example isn't 
very good since the beginning of pypal doesn't resemble the beginning of 
paypal. A better example would be one where the 2nd 'a' of paypal was a 
homograph.

However, a 2nd ACE prefix might be fraught with difficulties. Just for 
starters, we might end up with FQDNs with 3 different encodings (if 
there are 3 or more labels), i.e. both ACE prefixes and the pure ASCII 
TLD name. And then there would also be the question of *which* ACE 
prefix to choose while encoding. We might just have to specify that 
*all* the labels use the same ACE prefix (or pure ASCII, e.g. for the 
TLD). This would be consistent with RFC 1591 and current conventions 
(except for TLDs that allow just about anything underneath them). E.g. 
the .jp registry might have a rule that says that *all* domain labels 
either use one prefix or the other, together with pure ASCII for the 
final ".jp" part (or any part).

Co-existence is quite different from transition. Although migration 
typically requires the co-existence of the old and the new during the 
transition period, people normally intend to complete the transition by 
getting rid of the old (entirely or almost entirely). However, there are 
probably many examples of migrations that started with good intentions 
but ended up with rather long periods of co-existence. One that comes to 
mind is HTML vs XHTML. I don't know whether we will ever be able to 
exterminate HTML, regardless of our "good" intentions.

Erik