[idn] combining marks and space-like unicode char

Soobok Lee <lsb@lsb.org> Fri, 08 April 2005 07:23 UTC

Message-ID: <425630F8.5030204@lsb.org>
Date: Fri, 08 Apr 2005 16:21:28 +0900
From: Soobok Lee <lsb@lsb.org>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: Soobok Lee <lsb@lsb.org>
CC: Erik van der Poel <erik@vanderpoel.org>, idn@ops.ietf.org
Subject: [idn] combining marks and space-like unicode char
References: <42181FD5.3070608@lsb.org> <4255E488.8010302@vanderpoel.org> <42562D22.3090609@lsb.org>
In-Reply-To: <42562D22.3090609@lsb.org>
Content-Type: text/plain; charset="EUC-KR"
Content-Transfer-Encoding: 7bit
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit


>>I tried U+1160 followed by a Latin character in MSIE with i-Nav and in
>>Firefox with IDN turned on, and it was displayed as a wide space. It
>>is unfortunate that both implementations chose to display it as a
>>space instead of deleting it.
>>    
>>
>
>Yes. Plugins M U S T filter out U+1160 from validated ToUnicode()ed
>labels, whether or not IDNA requires that.
>
>Soobok
>
I will add this: In standard hangul writing system,
U+1160 is meaningful only in some context (surrounded by at least one
jamo char).
But, is standalone U+1160 is illegal ? No, it is NOT illegal.

So, blind filtering of U+1160 is fault. Plugins' filtering should be
context-sensitive.
That is why it would complicate stringprep if it were included into
stringprep. :-)

We can find similar problems in "combining diacritical marks" (U+3xx).
What if
a label with single char 'combining accent or above-dot ' without any
preceding
alphabet? It will combine with its preceding dot delimiter. and that
will produce
confusing looks ( looks like a colon which is a protocol delimiter).

AFAIK, any single standalone combining accent char is not prohibited by
stringprep.

Sooobk

[idn] combining marks and space-like unicode char Soobok Lee
Re: [idn] space-like unicode char Soobok Lee
[idn] space-like unicode char Soobok Lee
Re: [idn] space-like unicode char Erik van der Poel
Re: [idn] space-like unicode char Erik van der Poel
Re: [idn] space-like unicode char Soobok Lee
Re: [idn] space-like unicode char Soobok Lee