Re: [idn] Re: stability
"Mark Davis" <mark.davis@jtcsv.com> Mon, 14 March 2005 15:28 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA12629 for <idn-archive@lists.ietf.org>; Mon, 14 Mar 2005 10:28:15 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DArHv-0002YT-Ll for idn-data@psg.com; Mon, 14 Mar 2005 15:15:23 +0000
Received: from [32.97.110.131] (helo=e33.co.us.ibm.com) by psg.com with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.44 (FreeBSD)) id 1DArHl-0002Xk-Kx for idn@ops.ietf.org; Mon, 14 Mar 2005 15:15:13 +0000
Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j2EFF84I505338 for <idn@ops.ietf.org>; Mon, 14 Mar 2005 10:15:08 -0500
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay05.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j2EFF7Uq161114 for <idn@ops.ietf.org>; Mon, 14 Mar 2005 08:15:08 -0700
Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2EFF7WR021170 for <idn@ops.ietf.org>; Mon, 14 Mar 2005 08:15:07 -0700
Received: from markdavis (sig-9-48-123-179.mts.ibm.com [9.48.123.179]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with SMTP id j2EFF63I021153; Mon, 14 Mar 2005 08:15:06 -0700
Message-ID: <00e801c528a8$99ad37d0$72703009@sanjose.ibm.com>
From: Mark Davis <mark.davis@jtcsv.com>
To: Simon Josefsson <jas@extundo.com>, Erik van der Poel <erik@vanderpoel.org>
Cc: idn@ops.ietf.org
References: <421B8484.3070802@vanderpoel.org><20050223072837.GA21463~@nicemice.net><D872CCF059514053ECF8A198@scan.jck.com><421D8411.9030006@vanderpoel.org><p06210208be4390618c81@[192.168.0.101]><421E0D0C.2000309@vanderpoel.org><p06210202be43c3888991@[192.168.0.101]><E07CE813AD23B2D95DA0C740@scan.jck.com><421E30F2.1040408@vanderpoel.org><0E7F74C71945B923C52211F3@scan.jck.com><421EA0C9.1010500@vanderpoel.org><00a401c51af3$7863aae0$030aa8c0@DEWELL><A574CA1BE87BFDA3C2A1AC0E@scan.jck.com><42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org><4232BA56.5090001@vanderpoel.org> <iluk6odazwb.fsf@latte.josefsson.org>
Subject: Re: [idn] Re: stability
Date: Mon, 14 Mar 2005 07:15:04 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1437
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 8bit
You keep harping on that, but we really had no choice in that matter. The definition of normalization in UAX #15 was internally inconsistent. Certain implementations of the UAX algorithm would exhibit unacceptably aberrant behavior, although only in a small number of degenerate cases, none of which occurring in ordinary text. The problems are: 1. Broken Idempotency. A non-idempotent implementation by its very nature cannot be stable, because repeated application of a non-idempotent normalization could produce different results.The application of the inconsistent interpretation therefore causes fundamental problems for implementations as further outlined in PRI#29; briefly, these are comparable to using a comparison function that isn't transitive when sorting. 2. Broken Canonical Equivalence. The inconsistent interpretation of the old UAX version could "normalize" some text to something that is not canonically equivalent to the input -- it changes some text to some completely different text. 3. Broken Canonical Order. Application of NFC[old UAX] or NFKC[old UAX] produces output that is not only different text (not canonically equivalent) but also not in canonical order. As a result, something returned from a normalization function may not even pass the normalization quick check: NFC_quick_check(NFC(string))=NO. After carefully evaluating the nature and effects of this inconsistency the UTC reached a decision to address these problems as follows: The current version of UAX #15 in Unicode 4.1.0 addresses the internal inconsistency. The changes do not affect any versions of UAX #15 prior to Unicode 4.1.0 and therefore do not affect stringprep or IDN. No backwards-compatibility problems will be introduced as a result of the changes. Stringprep and IDN rely on Unicode 3.2 version of UAX #15, which is: http://www.unicode.org/unicode/reports/tr15/tr15-22.html Implementations that claim conformance to Unicode 3.2 normalization may not produce identical results in all cases, and may not produce *correct* normalizations, because versions of UAX #15 prior to 4.1.0 have been internally inconsistent. While normalization problems only happen in degenerate cases, the inconsistency in the definition is significant enough that UTC felt compelled to make the change. During deliberations, UTC did discuss stability policies in the standard, and concluded that this inconsistency itself is unstable; it led to demonstrably divergent implementations, and could not stand without correction. In addition to the new 4.1.0 version of UAX #15, the UTC decided to issue a corrigendum which can be applied to other versions of Unicode. None of the prior versions of the Unicode Standard or its annexes will be changed in any way. Any implementation that claims conformance to Unicode 3.2 can stay precisely the same. Only if an implementation claims conformance to 3.2 plus the new corrigendum, or to version 4.1.0 or later of Unicode, would it change. So the current stringprep and IDN are not affected. When it comes time to update stringprep to a new version of Unicode, such as 4.1.0, there are two paths that IETF can take: (a) simply update to the newer version, or (b) specify a method which takes the previous algorithm and applies it to the new Unicode data. Option (a) sacrifices some compatibility, although (1) strings that have already been stringprepped *once* with the old version will have the same results under either version, and (2) the UTC does not expect any real data to contain the degenerate cases that trigger the problem. The UTC strongly recommends against Option (b). While it maintains backwards compatibility It does not fix the underlying problems: two successive applications of stringprep can still result in different strings. And if you look carefully at the stability requirements, you see "If a string contains only characters from a given version of the Unicode Standard (e.g., Unicode 3.1.1), and it is put into a normalized form in accordance with that version of Unicode, then it will be in normalized form according to any past or future versions of Unicode. " Which is true, even after applying PRI #29. It would also be interesting to me to see the level of stability that is guaranteed by the other organizations. I know that there are W3C Recommendations that do not maintain perfect stability. How about the IETF? Is there a policy that any RFC that obsoletes another RFC is required to be absolutely -- bug-for-bug -- backwards compatible? Mark ----- Original Message ----- From: "Simon Josefsson" <jas@extundo.com> To: "Erik van der Poel" <erik@vanderpoel.org> Cc: <idn@ops.ietf.org> Sent: Saturday, March 12, 2005 03:04 Subject: [idn] Re: stability > Erik van der Poel <erik@vanderpoel.org> writes: > > > All, > > > > This is probably well known to most of you, but the General Category > > Value in the Unicode Character Database and the stability of that value > > are not very relevant to IDNA, which does not depend on the Unicode > > Categories. > > > > IDNA depends on the Unicode Normalization Form KC table, and there have > > been very few changes indeed in this table: > > > > http://www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt > > Don't forget the normalization flaw in Unicode 3.2 NFKC discussed in: > > http://www.unicode.org/review/pr-29.html > > Apparently the recommendation will be applied to future Unicode > versions. > > PR-29 doesn't merely affect a small set of code points, but rather a > class of strings. The special strings are all unstable under NFKC3.2. > > I think PR-29 is a useful example to consider when deciding how much > trust you should place in the UTC's stability guarantees. The UTC's > track record in this area suggest to me that the guarantee is > worthless in practice. I haven't seen an evaluation of alternative > solutions to the PR-29 problem. Not even signs that alternative > approaches were considered. I would have expected both. > > > Also, IDNA apps depend on tables for converting from various non-Unicode > > encodings to Unicode. This is another place where instability could > > affect lookups, potentially even in dangerous ways. Stringprep and IDNA > > already mention this issue in their Security Considerations sections. > > Right. > > Thanks, > Simon > >
- [idn] related work Erik van der Poel
- [idn] Unicode categories Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] something a little lighter for the week… Doug Ewell
- Re: [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: process Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] process Paul Hoffman
- Re: [idn] Re: character tables YAO Jiankang
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] punctuation tedd
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Adam M. Costello
- [idn] Re: character tables John C Klensin
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability John C Klensin
- [idn] Re: Unicode categories John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] character tables Erik van der Poel
- Re: [idn] Re: character tables John C Klensin
- Re: [idn] Re: stability Mark Davis
- Re: [idn] Re: stringprep: PRI #29 Erik van der Poel
- [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] process Adam M. Costello
- Re: [idn] Re: character tables William Tan
- Re: [idn] Re: process James Seng
- [idn] Re: stability Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] Re: stringprep: PRI #29 Adam M. Costello
- Re: [idn] punctuation tedd
- [idn] Re: dichotomies Erik van der Poel
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] process JFC (Jefsey) Morfin
- [idn] Re: stability Simon Josefsson
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- [idn] Re: stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] process John C Klensin
- Re: [idn] Re: Unicode categories Mark Davis
- Re: [idn] process Doug Ewell
- Re: [idn] Re: stability Adam M. Costello
- Re: [idn] process Erik van der Poel
- [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation tedd
- [idn] punctuation Erik van der Poel
- Re: [idn] Re: stability James Seng
- [idn] Re: stability Simon Josefsson
- [idn] something a little lighter for the weekend Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] process Gervase Markham
- [idn] Re: character tables Cary Karp
- [idn] Mozilla? JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation Erik van der Poel
- [idn] Re: Unicode categories Erik van der Poel
- [idn] Re: stability Simon Josefsson
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- [idn] Re: process Stephane Bortzmeyer
- Re: [idn] process Erik van der Poel
- Re: [idn] punctuation Jaap Akkerhuis
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: process James Seng
- [idn] stringprep mailing list Erik van der Poel
- Re: [idn] Re: dichotomies Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] stability Martin v. Löwis
- [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] Re: stability Simon Josefsson
- [idn] process Erik van der Poel
- [idn] stringprep: existing profiles and string pr… Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] dichotomies Erik van der Poel
- Re: [idn] stability JFC (Jefsey) Morfin
- [idn] Re: character tables Cary Karp
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep mailing list Simon Josefsson
- Re: [idn] Re: Unicode categories Martin v. Löwis
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] something a little lighter for the week… John C Klensin
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson