Re: [idn] Re: stability

"Mark Davis" <mark.davis@jtcsv.com> Mon, 14 March 2005 15:28 UTC

Message-ID: <00e801c528a8$99ad37d0$72703009@sanjose.ibm.com>
From: Mark Davis <mark.davis@jtcsv.com>
To: Simon Josefsson <jas@extundo.com>, Erik van der Poel <erik@vanderpoel.org>
Cc: idn@ops.ietf.org
References: <421B8484.3070802@vanderpoel.org><20050223072837.GA21463~@nicemice.net><D872CCF059514053ECF8A198@scan.jck.com><421D8411.9030006@vanderpoel.org><p06210208be4390618c81@[192.168.0.101]><421E0D0C.2000309@vanderpoel.org><p06210202be43c3888991@[192.168.0.101]><E07CE813AD23B2D95DA0C740@scan.jck.com><421E30F2.1040408@vanderpoel.org><0E7F74C71945B923C52211F3@scan.jck.com><421EA0C9.1010500@vanderpoel.org><00a401c51af3$7863aae0$030aa8c0@DEWELL><A574CA1BE87BFDA3C2A1AC0E@scan.jck.com><42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org><4232BA56.5090001@vanderpoel.org> <iluk6odazwb.fsf@latte.josefsson.org>
Subject: Re: [idn] Re: stability
Date: Mon, 14 Mar 2005 07:15:04 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 8bit

You keep harping on that, but we really had no choice in that matter. The
definition of normalization in UAX #15 was internally inconsistent. Certain
implementations of the UAX algorithm would exhibit unacceptably aberrant
behavior, although only in a small number of degenerate cases, none of which
occurring in ordinary text. The problems are:

1. Broken Idempotency. A non-idempotent implementation by its very nature
cannot be stable, because repeated application of a non-idempotent
normalization could produce different results.The application of the
inconsistent interpretation therefore causes fundamental problems for
implementations as further outlined in PRI#29; briefly, these are
comparable to using a comparison function that isn't transitive when
sorting.

2. Broken Canonical Equivalence. The inconsistent interpretation of the
old UAX version could "normalize" some text to something that is not
canonically equivalent to the input -- it changes some text to some
completely different text.

3. Broken Canonical Order. Application of NFC[old UAX] or NFKC[old UAX]
produces output that is not only different text (not canonically
equivalent) but also not in canonical order. As a result, something
returned from a normalization function may not even pass the normalization
quick check: NFC_quick_check(NFC(string))=NO.

After carefully evaluating the nature and effects of this inconsistency
the UTC reached a decision to address these problems as follows:

The current version of UAX #15 in Unicode 4.1.0 addresses the internal
inconsistency. The changes do not affect any versions of UAX #15 prior to
Unicode 4.1.0 and therefore do not affect stringprep or IDN. No
backwards-compatibility problems will be introduced as a result of the
changes.

Stringprep and IDN rely on Unicode 3.2 version of UAX #15, which is:

http://www.unicode.org/unicode/reports/tr15/tr15-22.html

Implementations that claim conformance to Unicode 3.2 normalization may
not produce identical results in all cases, and may not produce *correct*
normalizations, because versions of UAX #15 prior to 4.1.0 have been
internally inconsistent. While normalization problems only happen in
degenerate cases, the inconsistency in the definition is significant enough
that UTC felt compelled to make the change. During deliberations, UTC did
discuss stability policies in the standard, and concluded that this
inconsistency itself is unstable; it led to demonstrably divergent
implementations, and could not stand without correction.

In addition to the new 4.1.0 version of UAX #15, the UTC decided to issue
a corrigendum which can be applied to other versions of Unicode. None of
the prior versions of the Unicode Standard or its annexes will be changed
in any way. Any implementation that claims conformance to Unicode 3.2 can
stay precisely the same. Only if an implementation claims conformance to
3.2 plus the new corrigendum, or to version 4.1.0 or later of Unicode,
would it change. So the current stringprep and IDN are not affected.

When it comes time to update stringprep to a new version of Unicode, such
as 4.1.0, there are two paths that IETF can take:

(a) simply update to the newer version, or
(b) specify a method which takes the previous algorithm and applies it to
the new Unicode data.

Option (a) sacrifices some compatibility, although (1) strings that have
already been stringprepped *once* with the old version will have the same
results under either version, and (2) the UTC does not expect any real data
to contain the degenerate cases that trigger the problem.

The UTC strongly recommends against Option (b). While it maintains
backwards compatibility It does not fix the underlying problems: two
successive applications of stringprep can still result in different
strings.

And if you look carefully at the stability requirements, you see "If a
string contains only characters from a given version of the Unicode Standard
(e.g., Unicode 3.1.1), and it is put into a normalized form in accordance
with that version of Unicode, then it will be in normalized form according
to any past or future versions of Unicode. " Which is true, even after
applying PRI #29.

It would also be interesting to me to see the level of stability that is
guaranteed by the other organizations. I know that there are W3C
Recommendations that do not maintain perfect stability. How about the IETF?
Is there a policy that any RFC that obsoletes another RFC is required to be
absolutely -- bug-for-bug -- backwards compatible?

‎Mark

----- Original Message ----- 
From: "Simon Josefsson" <jas@extundo.com>
To: "Erik van der Poel" <erik@vanderpoel.org>
Cc: <idn@ops.ietf.org>
Sent: Saturday, March 12, 2005 03:04
Subject: [idn] Re: stability

> Erik van der Poel <erik@vanderpoel.org> writes:
>
> > All,
> >
> > This is probably well known to most of you, but the General Category
> > Value in the Unicode Character Database and the stability of that value
> > are not very relevant to IDNA, which does not depend on the Unicode
> > Categories.
> >
> > IDNA depends on the Unicode Normalization Form KC table, and there have
> > been very few changes indeed in this table:
> >
> > http://www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt
>
> Don't forget the normalization flaw in Unicode 3.2 NFKC discussed in:
>
> http://www.unicode.org/review/pr-29.html
>
> Apparently the recommendation will be applied to future Unicode
> versions.
>
> PR-29 doesn't merely affect a small set of code points, but rather a
> class of strings.  The special strings are all unstable under NFKC3.2.
>
> I think PR-29 is a useful example to consider when deciding how much
> trust you should place in the UTC's stability guarantees.  The UTC's
> track record in this area suggest to me that the guarantee is
> worthless in practice.  I haven't seen an evaluation of alternative
> solutions to the PR-29 problem.  Not even signs that alternative
> approaches were considered.  I would have expected both.
>
> > Also, IDNA apps depend on tables for converting from various non-Unicode
> > encodings to Unicode. This is another place where instability could
> > affect lookups, potentially even in dangerous ways. Stringprep and IDNA
> > already mention this issue in their Security Considerations sections.
>
> Right.
>
> Thanks,
> Simon
>
>

[idn] related work Erik van der Poel
[idn] Unicode categories Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue John C Klensin
Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
Re: [idn] something a little lighter for the week… Doug Ewell
Re: [idn] stability Erik van der Poel
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: process Adam M. Costello
Re: [idn] punctuation John C Klensin
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] Re: character tables Gervase Markham
Re: [idn] stringprep: PRI #29 Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
Re: [idn] Re: stability Erik van der Poel
Re: [idn] process Paul Hoffman
Re: [idn] Re: character tables YAO Jiankang
Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
Re: [idn] punctuation John C Klensin
Re: [idn] punctuation tedd
Re: [idn] Re: character tables JFC (Jefsey) Morfin
Re: [idn] punctuation Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
Re: [idn] Re: stability Erik van der Poel
Re: [idn] Re: character tables Adam M. Costello
[idn] Re: character tables John C Klensin
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] Re: character tables Paul Hoffman
Re: [idn] Re: stability Martin v. Löwis
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: stability John C Klensin
[idn] Re: Unicode categories John C Klensin
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
[idn] character tables Erik van der Poel
Re: [idn] Re: character tables John C Klensin
Re: [idn] Re: stability Mark Davis
Re: [idn] Re: stringprep: PRI #29 Erik van der Poel
[idn] stability Erik van der Poel
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
Re: [idn] process Adam M. Costello
Re: [idn] Re: character tables William Tan
Re: [idn] Re: process James Seng
[idn] Re: stability Simon Josefsson
Re: [idn] stability Erik van der Poel
[idn] Re: stability Martin v. Löwis
Re: [idn] Re: process Jaap Akkerhuis
Re: [idn] Re: stringprep: PRI #29 Adam M. Costello
Re: [idn] punctuation tedd
[idn] Re: dichotomies Erik van der Poel
Re: [idn] Re: stability Martin v. Löwis
Re: [idn] punctuation Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] process JFC (Jefsey) Morfin
[idn] Re: stability Simon Josefsson
Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
[idn] Re: stringprep: PRI #29 Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
Re: [idn] process John C Klensin
Re: [idn] Re: Unicode categories Mark Davis
Re: [idn] process Doug Ewell
Re: [idn] Re: stability Adam M. Costello
Re: [idn] process Erik van der Poel
[idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] punctuation tedd
[idn] punctuation Erik van der Poel
Re: [idn] Re: stability James Seng
[idn] Re: stability Simon Josefsson
[idn] something a little lighter for the weekend Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] something a little lighter for the week… Adam M. Costello
Re: [idn] process Gervase Markham
[idn] Re: character tables Cary Karp
[idn] Mozilla? JFC (Jefsey) Morfin
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] punctuation Erik van der Poel
[idn] Re: Unicode categories Erik van der Poel
[idn] Re: stability Simon Josefsson
Re: [idn] Re: character tables JFC (Jefsey) Morfin
[idn] Re: process Stephane Bortzmeyer
Re: [idn] process Erik van der Poel
Re: [idn] punctuation Jaap Akkerhuis
Re: [idn] Re: character tables Gervase Markham
Re: [idn] Re: process Jaap Akkerhuis
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] Re: process James Seng
[idn] stringprep mailing list Erik van der Poel
Re: [idn] Re: dichotomies Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] Re: stability Erik van der Poel
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] Re: process Erik van der Poel
[idn] Re: stringprep: PRI #29 Simon Josefsson
Re: [idn] punctuation Erik van der Poel
Re: [idn] stability Martin v. Löwis
[idn] stringprep: PRI #29 Erik van der Poel
Re: [idn] Re: character tables Paul Hoffman
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
[idn] Re: stability Simon Josefsson
[idn] process Erik van der Poel
[idn] stringprep: existing profiles and string pr… Erik van der Poel
Re: [idn] Re: stability Erik van der Poel
[idn] dichotomies Erik van der Poel
Re: [idn] stability JFC (Jefsey) Morfin
[idn] Re: character tables Cary Karp
Re: [idn] Re: process Erik van der Poel
[idn] Re: stringprep mailing list Simon Josefsson
Re: [idn] Re: Unicode categories Martin v. Löwis
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] something a little lighter for the week… John C Klensin
Re: [idn] something a little lighter for the week… Adam M. Costello
Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
Re: [idn] Re: stability Erik van der Poel
Re: [idn] Re: stability Erik van der Poel
[idn] Re: stringprep: PRI #29 Simon Josefsson
Re: [idn] stability Erik van der Poel
[idn] Re: stringprep: PRI #29 Simon Josefsson