Re: [idn] space-like unicode char

Erik van der Poel <erik@vanderpoel.org> Fri, 08 April 2005 19:44 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA01382 for <idn-archive@lists.ietf.org>; Fri, 8 Apr 2005 15:44:18 -0400 (EDT)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DJzHe-000Fso-26 for idn-data@psg.com; Fri, 08 Apr 2005 19:36:50 +0000
Received: from [207.115.63.98] (helo=pimout4-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DJzHa-000FsS-JV for idn@ops.ietf.org; Fri, 08 Apr 2005 19:36:46 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout4-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j38JaP5K214294; Fri, 8 Apr 2005 15:36:25 -0400
Message-ID: <4256DD38.3070708@vanderpoel.org>
Date: Fri, 08 Apr 2005 12:36:24 -0700
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Soobok Lee <lsb@lsb.org>
CC: idn@ops.ietf.org
Subject: Re: [idn] space-like unicode char
References: <42181FD5.3070608@lsb.org> <4255E488.8010302@vanderpoel.org> <42562D22.3090609@lsb.org>
In-Reply-To: <42562D22.3090609@lsb.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Soobok Lee wrote:
> U+1160 problem has been raised 3.5 years ago (you can look into this
> huge idn-list archive by keyword search for 1160 or filler)
> with some additional hangul jamo problem. One draft has been submitted
> by me (you may find that in www.i-d-n.net)
> to filter out these invalid char sequences. But the draft had been
> discarded . Someone argued that such filtering * complicates *
> stringprep algorithms with context-sensitive filtering/prohibiting and
> the problem is up to UTC/NFC not to IETF. of course, i couldn't accept that.

The i-d-n.net name no longer takes you to a real site, but I believe I 
found your draft here:

http://www.watersprings.org/pub/id/draft-ietf-idn-hangeulchar-00.txt

I agree that the U+1160 issues would complicate a spec, and I can see 
why the IETF decided not to include them in the RFCs, but now that we 
have seen that a number of implementations display this character in a 
potentially dangerous way, we should reconsider the specs.

Unicode may not be able to address these issues in the normalization 
spec since they have promised not to make any incompatible changes. 
Unicode might be able to address the issues in other normative or 
informative parts of their book or documents, and the IETF might just 
want to refer to those parts of Unicode.

Alternatively, the IETF can write up its own specifications or 
recommendations. It's not immediately clear to me whether U+1160 ought 
to be addressed in Stringprep or Nameprep. As we have seen, Stringprep 
is used in various protocols, including SASLprep, which is for user 
names and passwords. Some perverse people might suggest that passwords 
ought to allow strange character sequences like multiple consecutive 
U+1160s in order to make it harder to guess the password. I'm new to 
Stringprep, so I don't know how most IETFers feel about this type of thing.

In the meantime, I have added U+1160 and the combining mark issue to my 
list and I have filed a bug report for Mozilla:

http://nameprep.org/#display
https://bugzilla.mozilla.org/show_bug.cgi?id=289588

Erik