Re: [idn] Re: stability

Erik van der Poel <erik@vanderpoel.org> Tue, 15 March 2005 06:15 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id BAA13417 for <idn-archive@lists.ietf.org>; Tue, 15 Mar 2005 01:15:27 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DB5Cp-0007RY-V5 for idn-data@psg.com; Tue, 15 Mar 2005 06:07:03 +0000
Received: from [207.115.63.102] (helo=pimout3-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DB5Cj-0007Qc-AP for idn@ops.ietf.org; Tue, 15 Mar 2005 06:07:02 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout3-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j2F66S9N065464; Tue, 15 Mar 2005 01:06:32 -0500
Message-ID: <42367B63.6080300@vanderpoel.org>
Date: Mon, 14 Mar 2005 22:06:27 -0800
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0 (X11/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Simon Josefsson <jas@extundo.com>, Mark Davis <mark.davis@jtcsv.com>
CC: idn@ops.ietf.org
Subject: Re: [idn] Re: stability
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org> <4232BA56.5090001@vanderpoel.org> <iluk6odazwb.fsf@latte.josefsson.org> <00e801c528a8$99ad37d0$72703009@sanjose.ibm.com> <ilull8qb5n5.fsf@latte.josefsson.org>
In-Reply-To: <ilull8qb5n5.fsf@latte.josefsson.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,URIBL_SBL autolearn=no version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Simon Josefsson wrote:
> "Mark Davis" <mark.davis@jtcsv.com> writes:
>>Implementations that claim conformance to Unicode 3.2 normalization may
>>not produce identical results in all cases, and may not produce *correct*
>>normalizations, because versions of UAX #15 prior to 4.1.0 have been
>>internally inconsistent.
> 
> We seem to disagree on this.  I believe Unicode 3.2 was consistent.
> Only the non-normative sections was in conflict with the normative
> text.  I admit an implementation would not meet some normalization
> invariants discussed in the document.  But I don't believe the
> invariants were discussed as requirements on the implementation.

I read UAX #15 and PRI #29. It's quite unfortunate that such a mistake 
was made in the spec, and that several implementations have implemented 
that mistake so faithfully. Although I would normally feel that the IETF 
should just stick with the original normalization table and rules (to 
avoid DNS lookup failures or, heaven forbid, security breaches), in this 
case, it may be wiser to adopt the new UAX #15 rules, since the 
invariants are important to IDNA also. The idempotence invariant seems 
especially important.

I feel that we are still at the very beginning of the adoption of the 
particular Unicodes affected by this mistake. Most of them are for South 
Asian languages. Hangul is much further along, but not the particular 
Unicodes that are affected here (i.e. the Jamo). More importantly, this 
mistake only affects highly unusual, malformed data. I think that if 
IDNA decides not to follow Unicode's recommendation now or in the next 
couple of years, 10 or 20 years from now we would look back in time and 
regret this decision. If there is a time to break compatibility for 
something, it is now, for this.

The Korean IDN table at IANA does not contain the Jamo that are affected 
by this mistake. (They use the precomposed syllables, rather than the 
individual pieces.) I don't know anything about IDN in South Asia, but I 
doubt that any labels have been registered with this particular type of 
malformed data.

It is interesting that, in this case, Unicode seems to have implemented 
first and written the spec later, which is the way the IETF is supposed 
to do things too. It's just unfortunate that the Unicode spec was 
transcribed incorrectly from the implementation(s). On the other hand, 
IDNA seems to have done it in the opposite order. First, the spec was 
written, and now that we have deployed some implementations, we are 
finding serious problems with punctuation marks and symbols.

Erik