Re: [http-auth] Normalization forms in draft-ietf-httpauth-basicauth-enc

Yoav Nir <ynir@checkpoint.com> Tue, 02 July 2013 14:02 UTC

Return-Path: <ynir@checkpoint.com>
X-Original-To: http-auth@ietfa.amsl.com
Delivered-To: http-auth@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 13D0D21F9ED1 for <http-auth@ietfa.amsl.com>; Tue, 2 Jul 2013 07:02:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0mEMu+1Qzm0c for <http-auth@ietfa.amsl.com>; Tue, 2 Jul 2013 07:02:36 -0700 (PDT)
Received: from smtp.checkpoint.com (smtp.checkpoint.com [194.29.34.68]) by ietfa.amsl.com (Postfix) with ESMTP id 376D721F9ECF for <http-auth@ietf.org>; Tue, 2 Jul 2013 07:02:35 -0700 (PDT)
Received: from IL-EX10.ad.checkpoint.com ([194.29.34.147]) by smtp.checkpoint.com (8.13.8/8.13.8) with ESMTP id r62E2VRA013453; Tue, 2 Jul 2013 17:02:31 +0300
X-CheckPoint: {51D2DD77-F-1B221DC2-1FFFF}
Received: from DAG-EX10.ad.checkpoint.com ([169.254.3.48]) by IL-EX10.ad.checkpoint.com ([169.254.2.180]) with mapi id 14.02.0342.003; Tue, 2 Jul 2013 17:02:31 +0300
From: Yoav Nir <ynir@checkpoint.com>
To: Peter Saint-Andre <stpeter@stpeter.im>
Thread-Topic: [http-auth] Normalization forms in draft-ietf-httpauth-basicauth-enc
Thread-Index: AQHOdh5ETg5sRT+nxkGECfC1vUWCXplPIdsAgACKwgCAASjzgIAAWSCAgAAMYAA=
Date: Tue, 02 Jul 2013 14:02:31 +0000
Message-ID: <58223FFB-2395-45BF-8C6A-AF307E16FCEB@checkpoint.com>
References: <20130630142838.31885.15315.idtracker@ietfa.amsl.com> <51D04326.5060600@gmx.de> <DEA2EA74-7587-4CAA-9424-4478B136308E@vpnc.org> <51D09F98.2070508@gmail.com> <D434C8F9-D3DC-40EB-A25A-3A259C1A22E6@vpnc.org> <51D1175C.3020007@it.aoyama.ac.jp> <51D11AD4.5050705@gmx.de> <FD268C10-8429-4D09-9A19-6755B9B0DC13@vpnc.org> <42111C34-9B32-45B2-AC94-B18CE5CC081F@checkpoint.com> <51D2D317.1050405@stpeter.im>
In-Reply-To: <51D2D317.1050405@stpeter.im>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [172.31.20.56]
x-kse-antivirus-interceptor-info: protection disabled
x-cpdlp: 11fcfebbcead9940747fa55f7ac5acb516923f2bc3
Content-Type: text/plain; charset="utf-8"
Content-ID: <D1F8076CDFE04241B545C8F767AD0944@ad.checkpoint.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Cc: Julian Reschke <julian.reschke@gmx.de>, "http-auth@ietf.org" <http-auth@ietf.org>
Subject: Re: [http-auth] Normalization forms in draft-ietf-httpauth-basicauth-enc
X-BeenThere: http-auth@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: HTTP authentication methods <http-auth.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/http-auth>, <mailto:http-auth-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-auth>
List-Post: <mailto:http-auth@ietf.org>
List-Help: <mailto:http-auth-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-auth>, <mailto:http-auth-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Jul 2013 14:02:41 -0000

On Jul 2, 2013, at 4:18 PM, Peter Saint-Andre <stpeter@stpeter.im> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 7/2/13 1:59 AM, Yoav Nir wrote:
>> Hi
>> 
>> For those of us not so well versed in I18N issues, what do you mean
>> by normalization?
> 
> Martin described it already.
> 
> You might find it useful to look at the slides I put together for a
> two-hour tutorial on internationalization at an IETF meeting a few
> years ago:
> 
> https://stpeter.im/files/i18n-into.pdf
> 
> Perhaps I need to offer that tutorial again? Or maybe turn it into a
> small book?

Excellent resource. Yeah, you should.

>> Is it just consolidating look-alike code points, like the multiple
>> hyphens that exist in Unicode?
> 
> There is no complete solution to the problem of confusable characters.
> Here is a fun example for you:
> 
> https://stpeter.im/journal/1420.html
> 
> However, normalization takes care of some (but not all) instances of
> code points that look alike.

For our particular use case we need the results of normalization to be bit-for-bit equal, and to get the same result when typing the name or password in different browsers on PCs, Macs, all Linux distros, and all smartphones and tablets. As mentioned earlier in this thread, these are not always identical.

>> Does it also involve removing Arabic and Hebrew points? (I think
>> Paul raised this one)
> 
> There is no reason to *remove* characters from left-to-right scripts.
> However, allowing them means you need to deal with interesting bidi
> issues.

There is. In French, André is correct while Andre is simply wrong. In Arabic and Hebrew, the diacritics are reading aids that were bolted on to an existing script. The text is fine with and without them, so I could use either יואב or יוֹאָב for my name, and both would be perfectly fine and in some sense equivalent. It's still fine to decide that these forms are not equivalent for our purposes, so if you used points for your username when you created it, you'll have to use points forever.

>> Does it involve removing diacritics?
> 
> Not necessarily -- why can't my username be saintandré or whatever?
> And *removing* diacritics might be problematic, if it implies that 'é'
> would be transformed to 'e' (thus introducing the possibility of
> additional false positives).

It all goes back to the keyboard the user uses to input username and password. I have no idea how to type the example above on Windows or on any phone. I just don't know the key combinations (yes, I could google it). So having your username be saintandré could leave you unable to log in from some platforms. The case could be made that we need stronger normalization than other use cases.

>> Does it involve splitting combined characters (like U+00E6 into 'a'
>> and 'e')?
> 
> Actually, there is no decomposition of U+00E6 æ into 'a' and 'e' in
> Unicode. This might be counter-intuitive, but it's true.
> 
>> Is there a standard we can point to and say "do this before
>> comparing or hashing"?
> 
> I'd suggest looking carefully at SASLprep (RFC 4013) and its proposed
> replacement:
> 
> http://datatracker.ietf.org/doc/draft-ietf-precis-saslprepbis/
> 
> If you have feedback on the latter, please post to the precis@ietf.org
> list:
> 
> https://www.ietf.org/mailman/listinfo/precis
> 
> Peter