Re: [idn] space-like unicode char
Erik van der Poel <erik@vanderpoel.org> Fri, 08 April 2005 02:02 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA29743 for <idn-archive@lists.ietf.org>; Thu, 7 Apr 2005 22:02:02 -0400 (EDT)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DJiiX-000GIT-QY for idn-data@psg.com; Fri, 08 Apr 2005 01:55:29 +0000
Received: from [207.115.63.98] (helo=pimout4-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DJiiW-000GIA-Ku for idn@ops.ietf.org; Fri, 08 Apr 2005 01:55:28 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout4-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j381tL5K222276; Thu, 7 Apr 2005 21:55:21 -0400
Message-ID: <4255E488.8010302@vanderpoel.org>
Date: Thu, 07 Apr 2005 18:55:20 -0700
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Soobok Lee <lsb@lsb.org>
CC: idn@ops.ietf.org
Subject: Re: [idn] space-like unicode char
References: <42181FD5.3070608@lsb.org>
In-Reply-To: <42181FD5.3070608@lsb.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit
Soobok Lee wrote: > U+1160 is a space-like char and even stringprep/nameprep does not > filter it out because the char is not for punctuational purpose. U+1160 is HANGUL JUNGSEONG FILLER and it is used to transform nonstandard syllables into standard ones (Unicode 3.0 section 3.11 (RFC 3454 refers to Unicode 3.2.0)). However, this transformation is one of the additional transformations not considered part of Unicode normalization (3.2.0's UAX #15 Annex 10). So this character is not generated by Stringprep/Nameprep. However, it is not prohibited either, so it may occur in the input to (and output from) Stringprep/Nameprep. I read some of the sections on Hangul in the Unicode book and Web site, but I did not see any rules regarding repeated occurrences of U+1160 (as you had in your example, not quoted above). I also did not see any rules about what to do when a filler is not followed by a Hangul jamo. It would be nice to have these rules in Unicode or in Stringprep. I tried U+1160 followed by a Latin character in MSIE with i-Nav and in Firefox with IDN turned on, and it was displayed as a wide space. It is unfortunate that both implementations chose to display it as a space instead of deleting it. Erik
- [idn] combining marks and space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Soobok Lee
- [idn] space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Erik van der Poel
- Re: [idn] space-like unicode char Erik van der Poel
- Re: [idn] space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Soobok Lee