Re: [idn] space-like unicode char
Soobok Lee <lsb@lsb.org> Fri, 08 April 2005 07:11 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA12263 for <idn-archive@lists.ietf.org>; Fri, 8 Apr 2005 03:11:39 -0400 (EDT)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DJnYH-0000Fq-Tb for idn-data@psg.com; Fri, 08 Apr 2005 07:05:13 +0000
Received: from [211.196.150.53] (helo=postel5.postel.co.kr) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DJnYF-0000FO-De for idn@ops.ietf.org; Fri, 08 Apr 2005 07:05:11 +0000
Received: from [10.1.1.21] ([61.73.48.22]) by postel5.postel.co.kr (8.13.0.PreAlpha4/8.13.0.PreAlpha4) with ESMTP id j38758JR024364; Fri, 8 Apr 2005 16:05:08 +0900
Message-ID: <42562D22.3090609@lsb.org>
Date: Fri, 08 Apr 2005 16:05:06 +0900
From: Soobok Lee <lsb@lsb.org>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Erik van der Poel <erik@vanderpoel.org>
CC: idn@ops.ietf.org
Subject: Re: [idn] space-like unicode char
References: <42181FD5.3070608@lsb.org> <4255E488.8010302@vanderpoel.org>
In-Reply-To: <4255E488.8010302@vanderpoel.org>
Content-Type: text/plain; charset="EUC-KR"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit
Erik van der Poel wrote: > Soobok Lee wrote: > >> U+1160 is a space-like char and even stringprep/nameprep does not >> filter it out because the char is not for punctuational purpose. > > > U+1160 is HANGUL JUNGSEONG FILLER and it is used to transform > nonstandard syllables into standard ones (Unicode 3.0 section 3.11 > (RFC 3454 refers to Unicode 3.2.0)). However, this transformation is > one of the additional transformations not considered part of Unicode > normalization (3.2.0's UAX #15 Annex 10). Exactly. U+1160 is not "touched" by Unicode normalization (NFC). > So this character is not generated by Stringprep/Nameprep.However, it > is not prohibited either, so it may occur in the input to (and output > from) Stringprep/Nameprep. Yes, it may occur. > I read some of the sections on Hangul in the Unicode book and Web > site, but I did not see any rules regarding repeated occurrences of > U+1160 (as you had in your example, not quoted above). I also did not > see any rules about what to do when a filler is not followed by a > Hangul jamo. It would be nice to have these rules in Unicode or in > Stringprep. U+1160 problem has been raised 3.5 years ago (you can look into this huge idn-list archive by keyword search for 1160 or filler) with some additional hangul jamo problem. One draft has been submitted by me (you may find that in www.i-d-n.net) to filter out these invalid char sequences. But the draft had been discarded . Someone argued that such filtering * complicates * stringprep algorithms with context-sensitive filtering/prohibiting and the problem is up to UTC/NFC not to IETF. of course, i couldn't accept that. Anyway, we can't backtrack into 2002/Dec without giving up backward compatibility promise of stringprep. > > I tried U+1160 followed by a Latin character in MSIE with i-Nav and in > Firefox with IDN turned on, and it was displayed as a wide space. It > is unfortunate that both implementations chose to display it as a > space instead of deleting it. Yes. Plugins M U S T filter out U+1160 from validated ToUnicode()ed labels, whether or not IDNA requires that. Soobok
- [idn] combining marks and space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Soobok Lee
- [idn] space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Erik van der Poel
- Re: [idn] space-like unicode char Erik van der Poel
- Re: [idn] space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Soobok Lee