[idn] Re: stability

Simon Josefsson <jas@extundo.com> Wed, 16 March 2005 10:10 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id FAA28524 for <idn-archive@lists.ietf.org>; Wed, 16 Mar 2005 05:10:55 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DBVNo-000HVL-Ry for idn-data@psg.com; Wed, 16 Mar 2005 10:04:08 +0000
Received: from [217.13.230.178] (helo=yxa.extundo.com) by psg.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.44 (FreeBSD)) id 1DBVNm-000HU0-4B for idn@ops.ietf.org; Wed, 16 Mar 2005 10:04:06 +0000
Received: from latte.josefsson.org (c494102a.s-bi.bostream.se [217.215.27.65]) (authenticated bits=0) by yxa.extundo.com (8.13.3/8.13.3/Debian-6) with ESMTP id j2GA3vH1014100 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for <idn@ops.ietf.org>; Wed, 16 Mar 2005 11:03:59 +0100
From: Simon Josefsson <jas@extundo.com>
To: IETF idn working group <idn@ops.ietf.org>
Subject: [idn] Re: stability
References: <42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org> <4232BA56.5090001@vanderpoel.org> <iluk6odazwb.fsf@latte.josefsson.org> <00e801c528a8$99ad37d0$72703009@sanjose.ibm.com> <ilull8qb5n5.fsf@latte.josefsson.org> <42367B63.6080300@vanderpoel.org> <4237450A.9010901@v.loewis.de> <423754F3.50405@vanderpoel.org> <ilumzt47ezc.fsf@latte.josefsson.org> <20050316091126.GA24254~@nicemice.net>
OpenPGP: id=B565716F; url=http://josefsson.org/key.txt
Blog: http://www.livejournal.com/users/jas4711/
X-Hashcash: 1:21:050316:idn@ops.ietf.org::lixvnT3fwS5UsJe7:R9GR
Date: Wed, 16 Mar 2005 11:03:38 +0100
In-Reply-To: <20050316091126.GA24254~@nicemice.net> (Adam M. Costello's message of "Wed, 16 Mar 2005 09:11:27 +0000")
Message-ID: <iluzmx36h6t.fsf@latte.josefsson.org>
User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/22.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Virus-Scanned: ClamAV version 0.81, clamav-milter version 0.81b on yxa.extundo.com
X-Virus-Status: Clean
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, MAILTO_TO_REMOVE autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk

"Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord> writes:

> Erik van der Poel <erik@vanderpoel.org> wrote:
>
>> I was referring to the RFC that has a pointer to UAX #15, i.e. RFC
>> 3454 (Stringprep).  The pointer should be updated to tracking number
>> 24 or higher.
>>
>> http://www.unicode.org/reports/tr15/tr15-24.html
>
> I agree.  Right now there does not exist a correct implementation of
> Unicode normalization anywhere.  Some implementations are wrong because
> they deviate from the spec, and other implementations are wrong because
> they adhere to the spec which itself is wrong (because it violates all
> the fundamental properties that anything called a canonical form is
> assumed to have).  It's an unfortunate situation, but the best thing to
> do now is fix the spec and encourage all implementations to converge on
> the fixed spec, which is what the Unicode Consortium is doing.  Our part
> in this is to update our pointer, so that it points at a correct spec
> rather than a wrong spec.

I'm less certain that is the best thing.  I'd like to see alternatives
explored before making such confident statement.

There appears to me be a lot of decisions made out of subjective
opinions on how normalization "should" behave, or is "assumed" to
behave.  I'd like to see things focused on what the changes actually
will mean, technically, in practice.  Or perhaps I'm simply not
clued-in on the evaluations of different approaches.

>> I believe it would be useful to start thinking of the problem in terms
>> of a transition plan from what we have today and what we would like to
>> have tomorrow.  It is not clear to me exactly what we would like to
>> have tomorrow, so settling that would have to be part of the plan as
>> well.
>
> It's clear to me what we ought to have tomorrow: a canonical form
> (that is, a function that selects a unique representative from every
> equivalence class).  The Unicode Consortium is taking care of defining
> that.
>
> Do you have any ideas for a transition to that?

I think reaching that goal can be done in different ways.

One way is to incorporate the PR-29 fix, declare the earlier attempt
as buggy, and re-cycle at PROPOSED.  I suspect you prefer that way?  I
am hesitant about that approach, because we have already deployed the
old RFC and it is not clear what problems there will be in mixing the
old and the new code.  Both Kerberos and SASL appears to be going to
use the old StringPrep as well, so we will be seeing security critical
infrastructure based on the old interpretation.

Another way is to carry on with the Unicode 3.2 NFKC even though it
breaks some human's assumptions on what "normalization" means in a
theoretic setting.  Only a select few weird cases (the PR-29 problem
sequences) will be affected.  Machines will cope, they compute an
algorithm, they don't care if the output meet some unstated invariant
or not.

A third way, which is what I am deploying, is to use the Unicode 3.2
NFKC together with a filter to reject the PR-29 problem sequences.
This is in line with the RFC's, it solves problems related to PR-29
problem sequences, and is simple to implement.

Thanks,
Simon