Re: [idn] Re: stability
"Martin v. Löwis" <martin@v.loewis.de> Tue, 15 March 2005 20:36 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA16470 for <idn-archive@lists.ietf.org>; Tue, 15 Mar 2005 15:36:54 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DBIcs-0000ap-T3 for idn-data@psg.com; Tue, 15 Mar 2005 20:26:50 +0000
Received: from [80.67.18.15] (helo=smtprelay03.ispgateway.de) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DBIcr-0000aD-0u for idn@ops.ietf.org; Tue, 15 Mar 2005 20:26:49 +0000
Received: (qmail 20883 invoked from network); 15 Mar 2005 20:26:51 -0000
Received: from unknown (HELO [80.185.154.213]) (544451@[80.185.154.213]) (envelope-sender <martin@v.loewis.de>) by smtprelay03.ispgateway.de (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for <erik@vanderpoel.org>; 15 Mar 2005 20:26:51 -0000
Message-ID: <4237450A.9010901@v.loewis.de>
Date: Tue, 15 Mar 2005 21:26:50 +0100
From: "\"Martin v. Löwis\"" <martin@v.loewis.de>
User-Agent: Debian Thunderbird 1.0 (X11/20050116)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Erik van der Poel <erik@vanderpoel.org>
CC: Simon Josefsson <jas@extundo.com>, Mark Davis <mark.davis@jtcsv.com>, idn@ops.ietf.org
Subject: Re: [idn] Re: stability
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org> <4232BA56.5090001@vanderpoel.org> <iluk6odazwb.fsf@latte.josefsson.org> <00e801c528a8$99ad37d0$72703009@sanjose.ibm.com> <ilull8qb5n5.fsf@latte.josefsson.org> <42367B63.6080300@vanderpoel.org>
In-Reply-To: <42367B63.6080300@vanderpoel.org>
X-Enigmail-Version: 0.90.0.0
X-Enigmail-Supports: pgp-inline, pgp-mime
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit
Erik van der Poel wrote: > I read UAX #15 and PRI #29. It's quite unfortunate that such a mistake > was made in the spec, and that several implementations have implemented > that mistake so faithfully. It's also quite understandable. It is not at all obvious that the correction is necessary; even know that I read it, and even though I have implemented the algorithm myself (for Python), I found it very difficult to understand the issue. Here is the problem: In NFD, combining characters are sorted according to their combining class, in increasing order. So you always have starter small_combiner_A large_combiner_B next_start ... (with A <= B) The old text says that a combiner is blocked if it has the same combining class, so starter combiner_A other_combiner_B (with A==B; if starter cannot be combined with combiner_A, then combiner_A blocks combiner_B) Now, the correction says that you should consider also the case starter combiner_A combiner_B; with A > B ?! How can that be? NFD should have sorted them so that combiner_B comes *before* combiner_A, so it would not be blocked. Think about it. The answer is this: This is *only* possible if combiner_B is a starter, i.e. B==0. But if so, why could you possibly combine it with the starter? Can you ever combine two starters? Think about it. The answer is yes: for Hangul Jamo. They all have combining class 0, yet they can be combined. There are also a few other characters which have combining class 0 and still can be combined. However, it is not at all obvious. For the specific case of Python, it turns out that I special-cased Hangul composition, so it won't apply the standard algorithm (of looking for blockers); this means that all the examples in PR#29 apparently work "correctly" with Python. However, for the non-Hangul cases, it is possible to produce the "bad" behaviour with Python 2.4. > I feel that we are still at the very beginning of the adoption of the > particular Unicodes affected by this mistake. Most of them are for South > Asian languages. Hangul is much further along, but not the particular > Unicodes that are affected here (i.e. the Jamo). It's not that easy. When you use the old algorithm, you get normal Hangul syllables, which would be allowed in IDNA. It's only that the sequence *before* the normalization should not be allowed. > More importantly, this > mistake only affects highly unusual, malformed data. I think that if > IDNA decides not to follow Unicode's recommendation now or in the next > couple of years, 10 or 20 years from now we would look back in time and > regret this decision. I don't think so. "We" could still change the decision in 20 years, and not a single registration would be affected. The sequences causing the behaviour change are *really* unusual - I don't know if software can visually render them in a meaningful way, and I guess a native speaker would just consider them moji-bake. So it is unlikely that anybody would try to use them as input to IDNA in the next 20 years in a reasonable application. > It is interesting that, in this case, Unicode seems to have implemented > first and written the spec later, which is the way the IETF is supposed > to do things too. It's just unfortunate that the Unicode spec was > transcribed incorrectly from the implementation(s). On the other hand, > IDNA seems to have done it in the opposite order. First, the spec was > written, and now that we have deployed some implementations, we are > finding serious problems with punctuation marks and symbols. That's why IDNA is still a Proposed Standard Protocol (not even a Draft Standard Protocol); see STD 1. It will advance to Draft Standard if two independent and interoperable implementations from different code bases have been developed, and sufficient successful operational experience has been gained; see BCP 9. It also *not* the case that it was specified first and implemented then. All along the process, people have been implementing bits and pieces of it, test beds have been run, and so on. You might not have been around, but some people still remember. Regards, Martin
- [idn] related work Erik van der Poel
- [idn] Unicode categories Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] something a little lighter for the week… Doug Ewell
- Re: [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: process Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] process Paul Hoffman
- Re: [idn] Re: character tables YAO Jiankang
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] punctuation tedd
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Adam M. Costello
- [idn] Re: character tables John C Klensin
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability John C Klensin
- [idn] Re: Unicode categories John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] character tables Erik van der Poel
- Re: [idn] Re: character tables John C Klensin
- Re: [idn] Re: stability Mark Davis
- Re: [idn] Re: stringprep: PRI #29 Erik van der Poel
- [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] process Adam M. Costello
- Re: [idn] Re: character tables William Tan
- Re: [idn] Re: process James Seng
- [idn] Re: stability Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] Re: stringprep: PRI #29 Adam M. Costello
- Re: [idn] punctuation tedd
- [idn] Re: dichotomies Erik van der Poel
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] process JFC (Jefsey) Morfin
- [idn] Re: stability Simon Josefsson
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- [idn] Re: stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] process John C Klensin
- Re: [idn] Re: Unicode categories Mark Davis
- Re: [idn] process Doug Ewell
- Re: [idn] Re: stability Adam M. Costello
- Re: [idn] process Erik van der Poel
- [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation tedd
- [idn] punctuation Erik van der Poel
- Re: [idn] Re: stability James Seng
- [idn] Re: stability Simon Josefsson
- [idn] something a little lighter for the weekend Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] process Gervase Markham
- [idn] Re: character tables Cary Karp
- [idn] Mozilla? JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation Erik van der Poel
- [idn] Re: Unicode categories Erik van der Poel
- [idn] Re: stability Simon Josefsson
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- [idn] Re: process Stephane Bortzmeyer
- Re: [idn] process Erik van der Poel
- Re: [idn] punctuation Jaap Akkerhuis
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: process James Seng
- [idn] stringprep mailing list Erik van der Poel
- Re: [idn] Re: dichotomies Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] stability Martin v. Löwis
- [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] Re: stability Simon Josefsson
- [idn] process Erik van der Poel
- [idn] stringprep: existing profiles and string pr… Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] dichotomies Erik van der Poel
- Re: [idn] stability JFC (Jefsey) Morfin
- [idn] Re: character tables Cary Karp
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep mailing list Simon Josefsson
- Re: [idn] Re: Unicode categories Martin v. Löwis
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] something a little lighter for the week… John C Klensin
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson