[idn] turkish i

"Soobok Lee" <lsb@postel.co.kr> Wed, 21 November 2001 04:54 UTC

To: idn@ops.ietf.org
Content-Type: multipart/mixed; boundary="----------=_1006317445-7125-1973"
Mime-Version: 1.0
Message-Id: <091401c17246$48b679c0$ec1bd9d2@temp>
Subject: [idn] turkish i
From: Soobok Lee <lsb@postel.co.kr>
Date: Wed, 21 Nov 2001 13:37:53 +0900
Sender: owner-idn@ops.ietf.org
Precedence: bulk

Hi, All

While i study case preservation issues, i found the following sections well known

but not discussed thoroughlty recently AFAIK.

http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-06.txt" rel="nofollow">http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-06.txt

0130; 0069; Case map
0131; 0069; Case map

dot-less i (0131) and dot-above I (0130): both are mapped to small i (0069)

by language-independent casefolding.

just at the cost of shrinked turkish/azerbaijani namespaces. lost dot-less i.

0130 and 0131 can be regarded as the same from the turkish viewpoint ?

Latin-script-using people accept small i === dotless i ?

Turkish people accept small i === dotless i , too ?

1:n mappings:

I -> i ( latin)

I -> dot-less i ( turkish )

n:1 mappings:

Dot-above I -> i ( turkish )

I -> i ( latin )

Here, current case mapping loses dot-less i : I -> dot-less i -> i .

similar impasse due to cross-language conflicts in 1:n and n:1 mappings is also found in TC/SC JC/KC equivalence. TC/SC equivalence can be be done likewise in language-independant way ?

http://www.unicode.org/charts/PDF/U0100.pdf" rel="nofollow">http://www.unicode.org/charts/PDF/U0100.pdf

0130 İ LATIN CAPITAL LETTER I WITH DOT ABOVE
= LATIN CAPITAL LETTER I DOT
• Turkish, Azerbaijani
• lowercase is 0069 i
→0049 I latin capital letter i

≡0049 I 0307

0131 ı LATIN SMALL LETTER DOTLESS I
• Turkish, Azerbaijani
• uppercase is 0049 I
→0069 i latin small letter i

The following descibes another problem in the order of mappings and normalization in nameprep.

When doing RACE conversion on the next 4 code points sequences with mDNkit v2.0

0069 0307

bq--ap7wsby

0049 0307 İ
bq--ap7wsby

0130 İ
i

0131

i

Even though 0049 0307 === 0130 (modulo NFC), two have different output labels .

That could have been avoided

if we had chosen CaseMap(NFKC(?)) instead of NFKC(CaseMap(?)).

CaseMap(NFKC(?)) != NFKC(CaseMap(?)) ???

Soobok Lee

http://164.124.123.208/read/1006317443-2e1d1257931981a3.5100936c588ecf9d/idn@ops.ietf.org.confirm.postel.to" width="0" height="0">

http://147.28.0.62/readq/1006317443-2e1d1257931981a3.5100936c588ecf9d/idn@ops.ietf.org.confirm.to/idn@ops.ietf.org.confirm.postel.to" width="0" height="0">

[idn] turkish i Soobok Lee
[idn] nameprep inconsistency (was turkish i) Adam M. Costello