[idn] turkish i

"Soobok Lee" <lsb@postel.co.kr> Wed, 21 November 2001 04:54 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id XAA18495 for <idn-archive@lists.ietf.org>; Tue, 20 Nov 2001 23:54:17 -0500 (EST)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 166P8m-0004Ut-00 for idn-data@psg.com; Tue, 20 Nov 2001 20:37:40 -0800
Received: from [164.124.123.208] (helo=mrn.lsb.org) by psg.com with esmtp (Exim 3.33 #1) id 166P8l-0004Un-00 for idn@ops.ietf.org; Tue, 20 Nov 2001 20:37:39 -0800
Received: (from root@localhost) by mrn.lsb.org (8.9.3/8.8.7) id NAA07126 for idn@ops.ietf.org; Wed, 21 Nov 2001 13:37:25 +0900
To: idn@ops.ietf.org
Received: Postel SMTP Relay v1.0
Received: from marketing ([210.217.27.236]) by mrn.lsb.org (8.9.3/8.8.7) with SMTP id NAA07122 for <idn@ops.ietf.org>; Wed, 21 Nov 2001 13:37:23 +0900
Content-Type: multipart/mixed; boundary="----------=_1006317445-7125-1973"
Mime-Version: 1.0
X-Msmail-Priority: Normal
Message-Id: <091401c17246$48b679c0$ec1bd9d2@temp>
Subject: [idn] turkish i
From: Soobok Lee <lsb@postel.co.kr>
X-Priority: 3
X-Mailer: Microsoft Outlook Express 6.00.2600.0000
X-Mimeole: Produced By Microsoft MimeOLE V6.00.2600.0000
Date: Wed, 21 Nov 2001 13:37:53 +0900
Sender: owner-idn@ops.ietf.org
Precedence: bulk

 
 
Hi, All
 
While i study case preservation issues, i found the following sections well known
but not discussed thoroughlty recently AFAIK.
 
 
 
 
0130; 0069; Case map
0131; 0069; Case map
 
dot-less i (0131) and  dot-above I (0130): both are mapped to  small i (0069)
by  language-independent casefolding.
just at the cost of shrinked turkish/azerbaijani namespaces. lost dot-less i.
0130 and 0131 can be regarded as the same  from the turkish viewpoint ?   
Latin-script-using people accept  small i  === dotless i ?
Turkish people accept  small i === dotless i , too ?
 
1:n mappings:
 
 I -> i             ( latin)
 I -> dot-less i    ( turkish )
 
n:1 mappings:
 
 Dot-above I ->  i  ( turkish )
 I  -> i            ( latin )
 
 
Here, current case mapping loses dot-less i  :  I -> dot-less i -> i .
 
 
similar impasse due to cross-language conflicts in 1:n and n:1 mappings is also found in TC/SC JC/KC equivalence. TC/SC equivalence can be be done likewise in language-independant way ?
 
 
 
0130 İ LATIN CAPITAL LETTER I WITH DOT ABOVE
= LATIN CAPITAL LETTER I DOT
• Turkish, Azerbaijani
• lowercase is 0069 i
→0049 I latin capital letter i
≡0049 I 0307
 
0131 ı LATIN SMALL LETTER DOTLESS I
• Turkish, Azerbaijani
• uppercase is 0049 I
→0069 i latin small letter i
 
 
 
The following descibes another problem in the order of mappings and normalization in nameprep.
 
When doing RACE conversion on the next 4 code points sequences with mDNkit v2.0
 
0069 0307         
bq--ap7wsby

 
0049 0307        
bq--ap7wsby

0130                
i

0131
i
 
 
Even though 0049 0307 === 0130 (modulo NFC),  two have different  output labels . 
That could have been avoided 
 if we had chosen  CaseMap(NFKC(?)) instead of NFKC(CaseMap(?)).
 
CaseMap(NFKC(?))  !=  NFKC(CaseMap(?))  ???
 
 
Soobok Lee
 
 
http://164.124.123.208/read/1006317443-2e1d1257931981a3.5100936c588ecf9d/idn@ops.ietf.org.confirm.postel.to" width="0" height="0">http://147.28.0.62/readq/1006317443-2e1d1257931981a3.5100936c588ecf9d/idn@ops.ietf.org.confirm.to/idn@ops.ietf.org.confirm.postel.to" width="0" height="0">