Re: [idn] space-like unicode char

Erik van der Poel <erik@vanderpoel.org> Fri, 08 April 2005 02:02 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA29743 for <idn-archive@lists.ietf.org>; Thu, 7 Apr 2005 22:02:02 -0400 (EDT)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DJiiX-000GIT-QY for idn-data@psg.com; Fri, 08 Apr 2005 01:55:29 +0000
Received: from [207.115.63.98] (helo=pimout4-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DJiiW-000GIA-Ku for idn@ops.ietf.org; Fri, 08 Apr 2005 01:55:28 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout4-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j381tL5K222276; Thu, 7 Apr 2005 21:55:21 -0400
Message-ID: <4255E488.8010302@vanderpoel.org>
Date: Thu, 07 Apr 2005 18:55:20 -0700
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Soobok Lee <lsb@lsb.org>
CC: idn@ops.ietf.org
Subject: Re: [idn] space-like unicode char
References: <42181FD5.3070608@lsb.org>
In-Reply-To: <42181FD5.3070608@lsb.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Soobok Lee wrote:
> U+1160 is a space-like char and even stringprep/nameprep does not
> filter it out because the char is not for punctuational purpose.

U+1160 is HANGUL JUNGSEONG FILLER and it is used to transform 
nonstandard syllables into standard ones (Unicode 3.0 section 3.11 (RFC 
3454 refers to Unicode 3.2.0)). However, this transformation is one of 
the additional transformations not considered part of Unicode 
normalization (3.2.0's UAX #15 Annex 10). So this character is not 
generated by Stringprep/Nameprep.

However, it is not prohibited either, so it may occur in the input to 
(and output from) Stringprep/Nameprep. I read some of the sections on 
Hangul in the Unicode book and Web site, but I did not see any rules 
regarding repeated occurrences of U+1160 (as you had in your example, 
not quoted above). I also did not see any rules about what to do when a 
filler is not followed by a Hangul jamo. It would be nice to have these 
rules in Unicode or in Stringprep.

I tried U+1160 followed by a Latin character in MSIE with i-Nav and in 
Firefox with IDN turned on, and it was displayed as a wide space. It is 
unfortunate that both implementations chose to display it as a space 
instead of deleting it.

Erik