[idn] combining marks and space-like unicode char

Soobok Lee <lsb@lsb.org> Fri, 08 April 2005 07:23 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA13098 for <idn-archive@lists.ietf.org>; Fri, 8 Apr 2005 03:23:47 -0400 (EDT)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DJno4-0002BI-Uc for idn-data@psg.com; Fri, 08 Apr 2005 07:21:32 +0000
Received: from [211.196.150.53] (helo=postel5.postel.co.kr) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1DJno3-0002Al-QM for idn@ops.ietf.org; Fri, 08 Apr 2005 07:21:32 +0000
Received: from [10.1.1.21] ([61.73.48.22]) by postel5.postel.co.kr (8.13.0.PreAlpha4/8.13.0.PreAlpha4) with ESMTP id j387LTJR025641; Fri, 8 Apr 2005 16:21:29 +0900
Message-ID: <425630F8.5030204@lsb.org>
Date: Fri, 08 Apr 2005 16:21:28 +0900
From: Soobok Lee <lsb@lsb.org>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Soobok Lee <lsb@lsb.org>
CC: Erik van der Poel <erik@vanderpoel.org>, idn@ops.ietf.org
Subject: [idn] combining marks and space-like unicode char
References: <42181FD5.3070608@lsb.org> <4255E488.8010302@vanderpoel.org> <42562D22.3090609@lsb.org>
In-Reply-To: <42562D22.3090609@lsb.org>
Content-Type: text/plain; charset="EUC-KR"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit


>>I tried U+1160 followed by a Latin character in MSIE with i-Nav and in
>>Firefox with IDN turned on, and it was displayed as a wide space. It
>>is unfortunate that both implementations chose to display it as a
>>space instead of deleting it.
>>    
>>
>
>Yes. Plugins M U S T filter out U+1160 from validated ToUnicode()ed
>labels, whether or not IDNA requires that.
>
>Soobok
>
I will add this: In standard hangul writing system,
U+1160 is meaningful only in some context (surrounded by at least one
jamo char).
But, is standalone U+1160 is illegal ? No, it is NOT illegal.

So, blind filtering of U+1160 is fault. Plugins' filtering should be
context-sensitive.
That is why it would complicate stringprep if it were included into
stringprep. :-)

We can find similar problems in "combining diacritical marks" (U+3xx).
What if
a label with single char 'combining accent or above-dot ' without any
preceding
alphabet? It will combine with its preceding dot delimiter. and that
will produce
confusing looks ( looks like a colon which is a protocol delimiter).

AFAIK, any single standalone combining accent char is not prohibited by
stringprep.

Sooobk