[idn] combining marks and space-like unicode char
Soobok Lee <lsb@lsb.org> Fri, 08 April 2005 07:23 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA13098 for <idn-archive@lists.ietf.org>; Fri, 8 Apr 2005 03:23:47 -0400 (EDT)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DJno4-0002BI-Uc for idn-data@psg.com; Fri, 08 Apr 2005 07:21:32 +0000
Received: from [211.196.150.53] (helo=postel5.postel.co.kr) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DJno3-0002Al-QM for idn@ops.ietf.org; Fri, 08 Apr 2005 07:21:32 +0000
Received: from [10.1.1.21] ([61.73.48.22]) by postel5.postel.co.kr (8.13.0.PreAlpha4/8.13.0.PreAlpha4) with ESMTP id j387LTJR025641; Fri, 8 Apr 2005 16:21:29 +0900
Message-ID: <425630F8.5030204@lsb.org>
Date: Fri, 08 Apr 2005 16:21:28 +0900
From: Soobok Lee <lsb@lsb.org>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Soobok Lee <lsb@lsb.org>
CC: Erik van der Poel <erik@vanderpoel.org>, idn@ops.ietf.org
Subject: [idn] combining marks and space-like unicode char
References: <42181FD5.3070608@lsb.org> <4255E488.8010302@vanderpoel.org> <42562D22.3090609@lsb.org>
In-Reply-To: <42562D22.3090609@lsb.org>
Content-Type: text/plain; charset="EUC-KR"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit
>>I tried U+1160 followed by a Latin character in MSIE with i-Nav and in >>Firefox with IDN turned on, and it was displayed as a wide space. It >>is unfortunate that both implementations chose to display it as a >>space instead of deleting it. >> >> > >Yes. Plugins M U S T filter out U+1160 from validated ToUnicode()ed >labels, whether or not IDNA requires that. > >Soobok > I will add this: In standard hangul writing system, U+1160 is meaningful only in some context (surrounded by at least one jamo char). But, is standalone U+1160 is illegal ? No, it is NOT illegal. So, blind filtering of U+1160 is fault. Plugins' filtering should be context-sensitive. That is why it would complicate stringprep if it were included into stringprep. :-) We can find similar problems in "combining diacritical marks" (U+3xx). What if a label with single char 'combining accent or above-dot ' without any preceding alphabet? It will combine with its preceding dot delimiter. and that will produce confusing looks ( looks like a colon which is a protocol delimiter). AFAIK, any single standalone combining accent char is not prohibited by stringprep. Sooobk
- [idn] combining marks and space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Soobok Lee
- [idn] space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Erik van der Poel
- Re: [idn] space-like unicode char Erik van der Poel
- Re: [idn] space-like unicode char Soobok Lee
- Re: [idn] space-like unicode char Soobok Lee