Re: [http-auth] Normalization forms in draft-ietf-httpauth-basicauth-enc

Peter Saint-Andre <stpeter@stpeter.im> Tue, 02 July 2013 13:18 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: http-auth@ietfa.amsl.com
Delivered-To: http-auth@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E54F921F9CDC for <http-auth@ietfa.amsl.com>; Tue, 2 Jul 2013 06:18:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DTSPgrt5B-5O for <http-auth@ietfa.amsl.com>; Tue, 2 Jul 2013 06:18:22 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id 9AA6421F9CF5 for <http-auth@ietf.org>; Tue, 2 Jul 2013 06:18:22 -0700 (PDT)
Received: from ergon.local (unknown [71.237.13.154]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 9B82D4134D; Tue, 2 Jul 2013 07:18:56 -0600 (MDT)
Message-ID: <51D2D317.1050405@stpeter.im>
Date: Tue, 02 Jul 2013 07:18:15 -0600
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: Yoav Nir <ynir@checkpoint.com>
References: <20130630142838.31885.15315.idtracker@ietfa.amsl.com> <51D04326.5060600@gmx.de> <DEA2EA74-7587-4CAA-9424-4478B136308E@vpnc.org> <51D09F98.2070508@gmail.com> <D434C8F9-D3DC-40EB-A25A-3A259C1A22E6@vpnc.org> <51D1175C.3020007@it.aoyama.ac.jp> <51D11AD4.5050705@gmx.de> <FD268C10-8429-4D09-9A19-6755B9B0DC13@vpnc.org> <42111C34-9B32-45B2-AC94-B18CE5CC081F@checkpoint.com>
In-Reply-To: <42111C34-9B32-45B2-AC94-B18CE5CC081F@checkpoint.com>
X-Enigmail-Version: 1.5.1
X-Enigmail-Draft-Status: 513
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Cc: Julian Reschke <julian.reschke@gmx.de>, "http-auth@ietf.org" <http-auth@ietf.org>
Subject: Re: [http-auth] Normalization forms in draft-ietf-httpauth-basicauth-enc
X-BeenThere: http-auth@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: HTTP authentication methods <http-auth.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/http-auth>, <mailto:http-auth-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-auth>
List-Post: <mailto:http-auth@ietf.org>
List-Help: <mailto:http-auth-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-auth>, <mailto:http-auth-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Jul 2013 13:18:27 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/2/13 1:59 AM, Yoav Nir wrote:
> Hi
> 
> For those of us not so well versed in I18N issues, what do you mean
> by normalization?

Martin described it already.

You might find it useful to look at the slides I put together for a
two-hour tutorial on internationalization at an IETF meeting a few
years ago:

https://stpeter.im/files/i18n-into.pdf

Perhaps I need to offer that tutorial again? Or maybe turn it into a
small book?

> Is it just consolidating look-alike code points, like the multiple
> hyphens that exist in Unicode?

There is no complete solution to the problem of confusable characters.
Here is a fun example for you:

https://stpeter.im/journal/1420.html

However, normalization takes care of some (but not all) instances of
code points that look alike.

> Does it also involve removing Arabic and Hebrew points? (I think
> Paul raised this one)

There is no reason to *remove* characters from left-to-right scripts.
However, allowing them means you need to deal with interesting bidi
issues.

> Does it involve removing diacritics?

Not necessarily -- why can't my username be saintandré or whatever?
And *removing* diacritics might be problematic, if it implies that 'é'
would be transformed to 'e' (thus introducing the possibility of
additional false positives).

> Does it involve splitting combined characters (like U+00E6 into 'a'
> and 'e')?

Actually, there is no decomposition of U+00E6 æ into 'a' and 'e' in
Unicode. This might be counter-intuitive, but it's true.

> Is there a standard we can point to and say "do this before
> comparing or hashing"?

I'd suggest looking carefully at SASLprep (RFC 4013) and its proposed
replacement:

http://datatracker.ietf.org/doc/draft-ietf-precis-saslprepbis/

If you have feedback on the latter, please post to the precis@ietf.org
list:

https://www.ietf.org/mailman/listinfo/precis

Peter


- -- 
Peter Saint-Andre
https://stpeter.im/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR0tMXAAoJEOoGpJErxa2ptSwP/iKfNqKsVsw+e2C8BSfkHhgw
HWrqWOkbIfvmQyOCssdsEuw6+cjZCtyDpGYKbQIP1WulxFlvjTT7DxEaEkVExuFk
kHYVc16AL0HxBBfMZlGAgtHQDm0J0P4FQ1V0A5JFckh31ZnE2lgbcF1aQGwHm0CS
58IY+4bMdZP7DFoKPSSU7p7DGw7dpacY9GdMGNAnCopPFfFsuaP2WEA7Q8P+HcEk
0nK7/ucc72DcagpzAFfXQ7MQ5TaMy1echSmw2w7SmuSr3KmYOaDqiEj17azTEKlX
lwsfGnZ06Exd6Q1qgLfwmXJjmaa0bU2NWKOtsAMVbcKpSMUa3838KEynSxd/j5fT
+2+t8KKAVf8uKs5wC6IEFn3qGeORbD4/q3v+GhLve9hksmu7OaIhQ8/EqpkyomUP
cD6SU3DfFas2PSAynOWqkkHPOFPGaGrVGwn/9bQrtsXtC5rrlMphLLKngzD4Jf68
GeyJXYZ4GCp8K6xNAeiA9Tu9PVS3Cpvo1tgq2/c+oxVTAZc3PWCp7oUOfYxzG0c0
oLv7/Fwfio8PP94rRjWgD/iP7+ZVzMplA7FSL0tkYUtnLL/9zohlFZYmLgcZcTiZ
vIkuCSY0o7hh6p2VVPFEgxFm4M0Y1WxkZe1C8FKL+S7/PBF6Xi4eUCkqx66EdrDF
vS/SGLuK5sl4t81vXXjH
=sP1o
-----END PGP SIGNATURE-----