Re: [idn] Re: stability

"Mark Davis" <mark.davis@jtcsv.com> Mon, 14 March 2005 15:28 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA12629 for <idn-archive@lists.ietf.org>; Mon, 14 Mar 2005 10:28:15 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DArHv-0002YT-Ll for idn-data@psg.com; Mon, 14 Mar 2005 15:15:23 +0000
Received: from [32.97.110.131] (helo=e33.co.us.ibm.com) by psg.com with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.44 (FreeBSD)) id 1DArHl-0002Xk-Kx for idn@ops.ietf.org; Mon, 14 Mar 2005 15:15:13 +0000
Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j2EFF84I505338 for <idn@ops.ietf.org>; Mon, 14 Mar 2005 10:15:08 -0500
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay05.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j2EFF7Uq161114 for <idn@ops.ietf.org>; Mon, 14 Mar 2005 08:15:08 -0700
Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2EFF7WR021170 for <idn@ops.ietf.org>; Mon, 14 Mar 2005 08:15:07 -0700
Received: from markdavis (sig-9-48-123-179.mts.ibm.com [9.48.123.179]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with SMTP id j2EFF63I021153; Mon, 14 Mar 2005 08:15:06 -0700
Message-ID: <00e801c528a8$99ad37d0$72703009@sanjose.ibm.com>
From: Mark Davis <mark.davis@jtcsv.com>
To: Simon Josefsson <jas@extundo.com>, Erik van der Poel <erik@vanderpoel.org>
Cc: idn@ops.ietf.org
References: <421B8484.3070802@vanderpoel.org><20050223072837.GA21463~@nicemice.net><D872CCF059514053ECF8A198@scan.jck.com><421D8411.9030006@vanderpoel.org><p06210208be4390618c81@[192.168.0.101]><421E0D0C.2000309@vanderpoel.org><p06210202be43c3888991@[192.168.0.101]><E07CE813AD23B2D95DA0C740@scan.jck.com><421E30F2.1040408@vanderpoel.org><0E7F74C71945B923C52211F3@scan.jck.com><421EA0C9.1010500@vanderpoel.org><00a401c51af3$7863aae0$030aa8c0@DEWELL><A574CA1BE87BFDA3C2A1AC0E@scan.jck.com><42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org><4232BA56.5090001@vanderpoel.org> <iluk6odazwb.fsf@latte.josefsson.org>
Subject: Re: [idn] Re: stability
Date: Mon, 14 Mar 2005 07:15:04 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1437
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 8bit

You keep harping on that, but we really had no choice in that matter. The
definition of normalization in UAX #15 was internally inconsistent. Certain
implementations of the UAX algorithm would exhibit unacceptably aberrant
behavior, although only in a small number of degenerate cases, none of which
occurring in ordinary text. The problems are:

1. Broken Idempotency. A non-idempotent implementation by its very nature
cannot be stable, because repeated application of a non-idempotent
normalization could produce different results.The application of the
inconsistent interpretation therefore causes fundamental problems for
implementations as further outlined in PRI#29; briefly, these are
comparable to using a comparison function that isn't transitive when
sorting.

2. Broken Canonical Equivalence. The inconsistent interpretation of the
old UAX version could "normalize" some text to something that is not
canonically equivalent to the input -- it changes some text to some
completely different text.

3. Broken Canonical Order. Application of NFC[old UAX] or NFKC[old UAX]
produces output that is not only different text (not canonically
equivalent) but also not in canonical order. As a result, something
returned from a normalization function may not even pass the normalization
quick check: NFC_quick_check(NFC(string))=NO.

After carefully evaluating the nature and effects of this inconsistency
the UTC reached a decision to address these problems as follows:

The current version of UAX #15 in Unicode 4.1.0 addresses the internal
inconsistency. The changes do not affect any versions of UAX #15 prior to
Unicode 4.1.0 and therefore do not affect stringprep or IDN. No
backwards-compatibility problems will be introduced as a result of the
changes.

Stringprep and IDN rely on Unicode 3.2 version of UAX #15, which is:

http://www.unicode.org/unicode/reports/tr15/tr15-22.html

Implementations that claim conformance to Unicode 3.2 normalization may
not produce identical results in all cases, and may not produce *correct*
normalizations, because versions of UAX #15 prior to 4.1.0 have been
internally inconsistent. While normalization problems only happen in
degenerate cases, the inconsistency in the definition is significant enough
that UTC felt compelled to make the change. During deliberations, UTC did
discuss stability policies in the standard, and concluded that this
inconsistency itself is unstable; it led to demonstrably divergent
implementations, and could not stand without correction.

In addition to the new 4.1.0 version of UAX #15, the UTC decided to issue
a corrigendum which can be applied to other versions of Unicode. None of
the prior versions of the Unicode Standard or its annexes will be changed
in any way. Any implementation that claims conformance to Unicode 3.2 can
stay precisely the same. Only if an implementation claims conformance to
3.2 plus the new corrigendum, or to version 4.1.0 or later of Unicode,
would it change. So the current stringprep and IDN are not affected.

When it comes time to update stringprep to a new version of Unicode, such
as 4.1.0, there are two paths that IETF can take:

(a) simply update to the newer version, or
(b) specify a method which takes the previous algorithm and applies it to
the new Unicode data.

Option (a) sacrifices some compatibility, although (1) strings that have
already been stringprepped *once* with the old version will have the same
results under either version, and (2) the UTC does not expect any real data
to contain the degenerate cases that trigger the problem.

The UTC strongly recommends against Option (b). While it maintains
backwards compatibility It does not fix the underlying problems: two
successive applications of stringprep can still result in different
strings.

And if you look carefully at the stability requirements, you see "If a
string contains only characters from a given version of the Unicode Standard
(e.g., Unicode 3.1.1), and it is put into a normalized form in accordance
with that version of Unicode, then it will be in normalized form according
to any past or future versions of Unicode. " Which is true, even after
applying PRI #29.

It would also be interesting to me to see the level of stability that is
guaranteed by the other organizations. I know that there are W3C
Recommendations that do not maintain perfect stability. How about the IETF?
Is there a policy that any RFC that obsoletes another RFC is required to be
absolutely -- bug-for-bug -- backwards compatible?

‎Mark

----- Original Message ----- 
From: "Simon Josefsson" <jas@extundo.com>
To: "Erik van der Poel" <erik@vanderpoel.org>
Cc: <idn@ops.ietf.org>
Sent: Saturday, March 12, 2005 03:04
Subject: [idn] Re: stability


> Erik van der Poel <erik@vanderpoel.org> writes:
>
> > All,
> >
> > This is probably well known to most of you, but the General Category
> > Value in the Unicode Character Database and the stability of that value
> > are not very relevant to IDNA, which does not depend on the Unicode
> > Categories.
> >
> > IDNA depends on the Unicode Normalization Form KC table, and there have
> > been very few changes indeed in this table:
> >
> > http://www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt
>
> Don't forget the normalization flaw in Unicode 3.2 NFKC discussed in:
>
> http://www.unicode.org/review/pr-29.html
>
> Apparently the recommendation will be applied to future Unicode
> versions.
>
> PR-29 doesn't merely affect a small set of code points, but rather a
> class of strings.  The special strings are all unstable under NFKC3.2.
>
> I think PR-29 is a useful example to consider when deciding how much
> trust you should place in the UTC's stability guarantees.  The UTC's
> track record in this area suggest to me that the guarantee is
> worthless in practice.  I haven't seen an evaluation of alternative
> solutions to the PR-29 problem.  Not even signs that alternative
> approaches were considered.  I would have expected both.
>
> > Also, IDNA apps depend on tables for converting from various non-Unicode
> > encodings to Unicode. This is another place where instability could
> > affect lookups, potentially even in dangerous ways. Stringprep and IDNA
> > already mention this issue in their Security Considerations sections.
>
> Right.
>
> Thanks,
> Simon
>
>