CJK Incompatiblities (was: Re: Question about the agenda)
Kenneth Whistler <kenw@sybase.com> Sat, 21 March 2009 00:49 UTC
Return-Path: <kenw@sybase.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id C95D539E24E for <idna-update@alvestrand.no>; Sat, 21 Mar 2009 01:49:37 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 84RKO4elJedd for <idna-update@alvestrand.no>; Sat, 21 Mar 2009 01:49:33 +0100 (CET)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by eikenes.alvestrand.no (Postfix) with ESMTP id 1B08439E0B3 for <idna-update@alvestrand.no>; Sat, 21 Mar 2009 01:49:32 +0100 (CET)
Received: from smtp1.sybase.com (sybgate [10.22.97.84]) by inergen.sybase.com with ESMTP id n2L0nVL25799; Fri, 20 Mar 2009 16:49:31 -0800 (PST)
Received: from atlantis-new.sybase.com (localhost [127.0.0.1]) by smtp1.sybase.com with ESMTP id n2L0nVL00591; Fri, 20 Mar 2009 17:49:31 -0700 (PDT)
Received: from birdie.sybase.com (birdie.sybase.com [10.22.85.43]) by atlantis-new.sybase.com (8.13.7+Sun/8.13.7) with ESMTP id n2L0nU2V008095; Fri, 20 Mar 2009 17:49:30 -0700 (PDT)
Received: from birdie (birdie [10.22.85.43]) by birdie.sybase.com (8.11.6+Sun/8.11.6) with SMTP id n2L0nTY20646; Fri, 20 Mar 2009 17:49:29 -0700 (PDT)
Message-Id: <200903210049.n2L0nTY20646@birdie.sybase.com>
Date: Fri, 20 Mar 2009 17:49:29 -0700
From: Kenneth Whistler <kenw@sybase.com>
Subject: CJK Incompatiblities (was: Re: Question about the agenda)
To: phoffman@imc.org
MIME-Version: 1.0
Content-Type: TEXT/plain; charset="us-ascii"
Content-MD5: 66QhWJoiU7+UQ98cB2H2Lg==
X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.4.6_06 SunOS 5.8 sun4u sparc
Cc: idna-update@alvestrand.no, kenw@sybase.com
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Kenneth Whistler <kenw@sybase.com>
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Sat, 21 Mar 2009 00:49:37 -0000
> >Perhaps we are simply reflecting a different > >interpretation of "conclusions"? > > Not really. The abstract of the JET draft says "[IDNA2008] will > cause incompatibilities for Chinese, Japanese and Korean (CJK) > scripts and languages." Section 3 of that draft gives a good > list of incompatibilities, none of which were listed in your > document. It does not seem fair to ask the WG "complete discussions, > if necessary on IDNA2008 implications" while purposely ignoring > some of the implications that have been brought to the WG's > attention, particularly those from major registries with a > lot of IDNA experience who spent the time to write them down > in an Internet Draft. The incompatibilities noted in draft-jet-idnabis-cjk-localmapping-01 are a small subset of the incompatibilities noted and discussed in: http://www.unicode.org/reports/tr46/tr46-1.html which we (the UTC), although not being a major registry, have also spent the time to write down and bring to the WG's attention. To wit: jet-idnabis-cjk-localmapping-01 3.1 Label separators This deals with the well-known problem of the processing conventions for U+3002 IDEOGRAPHIC FULL STOP and the halfwidth and fullwidth versions as equivalent to "." for label separators. That is also accounted for in D-UTR #46. 3.2 Compatibility characters That deals with the the fullwidth letters and digits and the halfwidth katakana. Those are mapped in IDNA2003. They are simply DISALLOWED and not mapped in IDNA2008. The preprocessing mapping in D-UTR #46 accounts for those. Either IDNA2008 lets casing and NFKC mapping back into the protocol to eliminate this kind of incompatibility (which is widespread now -- hence the perceived need for "local mappings" such as that described in the JET draft), or IDNA2008 stands as is, without case and NFKC mapping, in which case D-UTR #46 will likely turn into the de facto standard for preprocessing to maintain maximal compatibility with existing IDNA2003 processing. That would also eliminate the need for a CJK-specific local mapping for this particular issue. 3.3 Exceptions U+3005 IDEOGRAPHIC ITERATION MARK (there is a code point error in the JET draft) U+30FB KATAKANA MIDDLE DOT (there is a name error in the JET draft) Those are CONTEXTO in the current tables document for IDNA, rather than PVALID, so there are potential incompatibilities where they might be valid in an IDNA2003 label that would be disallowed under IDNA2008 A.10 and A.12 CONTEXTO rules for the two characters, respectively. I have no idea how those two ended up getting CONTEXTO designations in the tables document -- I must have been snoozing when that happened. U+3005 should just get derived as PVALID by regular category derivation. It is no more contextually constrained than several other iteration marks that are PVALID in the table, such as U+309D HIRAGANA ITERATION MARK. So that is simply a mistake and an overabundance of misplaced caution for the tables document. U+3005 --> PVALID and that problem goes away. U+30FB KATAKANA MIDDLE DOT needed to have an exception for the derivation, since it is General_Category=Po in the Unicode Character Database. But, in my opinion, the right answer here is to specify that it is simply PVALID, and to give up on the overspecification of exactly where it has to occur in a label, which is causing the incompatibility that the JET draft notes. If the tables document is changed this way, then this unnecessary incompatibility also goes away. At that point, there are is only the very generic issue of mapping left (one part of which is the treatment of label separators, which is technically outside the context of the definition of the labels themselves, anyway). --Ken
- CJK Incompatiblities (was: Re: Question about theā¦ Kenneth Whistler