[idn] stringprep: PRI #29

Erik van der Poel <erik@vanderpoel.org> Sun, 20 March 2005 02:11 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA11179 for <idn-archive@lists.ietf.org>; Sat, 19 Mar 2005 21:11:27 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DCpmi-000H1k-IG for idn-data@psg.com; Sun, 20 Mar 2005 02:03:20 +0000
Received: from [207.115.63.101] (helo=pimout2-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DCpmh-000H19-3h for idn@ops.ietf.org; Sun, 20 Mar 2005 02:03:19 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout2-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j2K239MW214474; Sat, 19 Mar 2005 21:03:13 -0500
Message-ID: <423CD9DC.5080401@vanderpoel.org>
Date: Sat, 19 Mar 2005 18:03:08 -0800
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0 (X11/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Simon Josefsson <jas@extundo.com>
CC: idn@ops.ietf.org
Subject: [idn] stringprep: PRI #29
References: <42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org> <4232BA56.5090001@vanderpoel.org> <iluk6odazwb.fsf@latte.josefsson.org> <00e801c528a8$99ad37d0$72703009@sanjose.ibm.com> <ilull8qb5n5.fsf@latte.josefsson.org> <42367B63.6080300@vanderpoel.org> <4237450A.9010901@v.loewis.de> <423754F3.50405@vanderpoel.org> <ilumzt47ezc.fsf@latte.josefsson.org> <20050316091126.GA24254~@nicemice.net> <iluzmx36h6t.fsf@latte.josefsson.org>
In-Reply-To: <iluzmx36h6t.fsf@latte.josefsson.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Simon Josefsson wrote:
> There appears to me be a lot of decisions made out of subjective
> opinions on how normalization "should" behave, or is "assumed" to
> behave.

I don't think it's subjective. The concept of normalization requires 
that it be idempotent.

> One way is to incorporate the PR-29 fix, declare the earlier attempt
> as buggy, and re-cycle at PROPOSED.  I suspect you prefer that way?  I
> am hesitant about that approach, because we have already deployed the
> old RFC and it is not clear what problems there will be in mixing the
> old and the new code.

We already have the situation where some implementations do it one way, 
and some do it the other way. It is quite clear what will happen when 
somebody uses a character sequence that is interpreted differently by 
these implementations. Keep in mind that Unicode may add new characters 
in the future that may also be affected.

> Both Kerberos and SASL appears to be going to
> use the old StringPrep as well, so we will be seeing security critical
> infrastructure based on the old interpretation.

You write "the old interpretation" as if there is only one 
interpretation of the old spec. That's not true. As we have seen, there 
are implementations that do it one way, and those that do it the other way.

> Another way is to carry on with the Unicode 3.2 NFKC even though it
> breaks some human's assumptions on what "normalization" means in a
> theoretic setting.

It's not just an "assumption", and it's not merely "theoretical". This 
is a very basic requirement for the normalization process.

> Machines will cope, they compute an
> algorithm, they don't care if the output meet some unstated invariant
> or not.

IDNA specifies that a Punycode label must be decoded and then 
Nameprepped and Punycoded again to make sure you get the same string 
back in order to decide what to display (Unicode vs Punycode). This, in 
itself, should make you realize that the process is supposed to be 
idempotent. So we *do* care how the machines compute this algorithm.

> A third way, which is what I am deploying, is to use the Unicode 3.2
> NFKC together with a filter to reject the PR-29 problem sequences.
> This is in line with the RFC's, it solves problems related to PR-29
> problem sequences, and is simple to implement.

I don't think this is in line with the RFCs. You are rejecting sequences 
that are not rejected by the RFCs.

More importantly, when you continue to ship your implementation as is, 
more and more installations of your popular library will occur, making 
it more difficult for the world to adjust if and when the affected types 
of character sequences are introduced, either with the current 
characters or new characters.

You are in a position to make a difference. You already have. Please 
reconsider.

Erik