Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)
"James Seng/Personal" <jseng@pobox.org.sg> Tue, 04 December 2001 23:56 UTC
Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA18134 for <idn-archive@lists.ietf.org>; Tue, 4 Dec 2001 18:56:02 -0500 (EST)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 16BP6A-0004Ae-00 for idn-data@psg.com; Tue, 04 Dec 2001 15:35:38 -0800
Received: from mail.i-dns.net ([203.126.116.228]) by psg.com with esmtp (Exim 3.33 #1) id 16BP68-00048c-00 for idn@ops.ietf.org; Tue, 04 Dec 2001 15:35:36 -0800
Received: from jamessonyvaio (pl039.nas312.n-yokohama.nttpc.ne.jp [210.165.193.39]) by mail.i-dns.net (Postfix) with SMTP id DF11DFFC17; Wed, 5 Dec 2001 07:34:48 +0800 (SGT)
Message-ID: <0a0101c17d1c$56c45bf0$1119d73d@jamessonyvaio>
From: James Seng/Personal <jseng@pobox.org.sg>
To: liana Ye <liana.ydisg@juno.com>
Cc: DougEwell2@cs.com, liana.ydisg@juno.com, idn@ops.ietf.org
References: <20011204.140704.-319607.1.liana.ydisg@juno.com>
Subject: Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)
Date: Wed, 05 Dec 2001 07:35:06 +0800
MIME-Version: 1.0
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4807.1700
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit
Yes, unless you can demostrate the neccessary 7 supportors. Thank you. Please respect our need to be focus. -James Seng ----- Original Message ----- From: "liana Ye" <liana.ydisg@juno.com> To: <jseng@pobox.org.sg> Cc: <DougEwell2@cs.com>; <liana.ydisg@juno.com>; <idn@ops.ietf.org> Sent: Wednesday, December 05, 2001 3:36 AM Subject: Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?) > StepCode discussion has been dropped out > at June meeting, the reason is stated as "It is only > concerning Chinese character encoding". > It has never been brought back for discussion > at all. This version, 01, has been summitted > in Oct. which has included discussion of worldwide > scripts. This version has never been discussed > or remitted into the pool. > > Does this mean we can not discuss the idea > on the list? I am really puzzled. > > Liana > > > On Tue, 4 Dec 2001 21:50:32 +0800 "James Seng/Personal" > <jseng@pobox.org.sg> writes: > > In my email dated 23rd Oct 2001, > > http://www.imc.org/idn/mail-archive/msg04363.html > > > > I have indicated that "StepCode- A Mnemonic Internationalized Domain > > Name Encoding > > (draft-ietf-idn-step)" have been drop from the WG Pool. > > > > The following drafts remain in the wg pool: > > > > Internationalizing Host Names In Applications (IDNA) > > (draft-ietf-idn-idna) > > Preparation of Internationalized Host Names > > (draft-ietf-idn-nameprep) > > Proposal for a determining process of ACE identifier > > (draft-ietf-idn-aceid) > > Japanese characters in multilingual domain name label > > (draft-ietf-idn-jpchar) > > Traditional and Simplified Chinese Conversion > > (draft-ietf-idn-tsconv) > > Hangeul NAMEPREP recommendation version > > 1.0(draft-ietf-idn-hangeulchar) > > Improving ACE using code point reordering > > v1.0(draft-ietf-idn-lsb-ace) > > AMC-ACE-Z (draft-ietf-idn-amc-ace-z) > > The Internationalized Domain Name System (draft-hall-dm-idns-00.txt) > > > > I would like to remind the working group to remain focus with the > > discussion within the wg pool. Therefore, further discussion of > > drafts > > outside the pool, such as StepCode, should be bought offline. > > > > Thanks. > > > > -James Seng > > > > ----- Original Message ----- > > From: "liana Ye" <liana.ydisg@juno.com> > > To: <DougEwell2@cs.com> > > Cc: <idn@ops.ietf.org> > > Sent: Tuesday, December 04, 2001 2:18 PM > > Subject: Re: Layer 2 and "idn identities" (was: Re: [idn] what are > > the > > IDN identifiers?) > > > > > > > > > > > > > On Mon, 3 Dec 2001 16:29:57 EST DougEwell2@cs.com writes: > > > > In a message dated 2001-12-03 2:01:56 Pacific Standard Time, > > > > liana.ydisg@juno.com writes: > > > > > > > > > I see that you have not read the I-D yet, and Deng Xiang > > > > > has replied your Chinese vs. Japanese arguement, I > > > > > will wait for your comment on the language-tag issue, > > > > > or anything not up to your standard. > > > > > > > > Here are some specific concerns related to items in > > > > draft-Liana-idn-map-00.txt. > > > > > > > > | The proposed ACE is a mnemonic encoding scheme, > > > > | and is called StepCode [StepCode]. > > > > > > > > Hasn't AMC-ACE-Z already been chosen as the standard ACE for > > IDN? I > > > > would be > > > > surprised if the decision were made to use two different ACEs > > > > depending on > > > > the language, or script, of the encoded text. > > > > > > The current AMC treats all UCS codepoints the same. It can not > > > solve look-alike cross different languages. It does not make DNS > > > master records readable for administrators. It does not make > > > zonefile sortable by different regions or sensible user groups. > > > It does not help a user does not read Chinese but communicate > > > with Chinese partners. But It does compress the data and feed > > > that into DNS. > > > > > > Using StepCode can group users by language, so sorting > > > the names makes a lot more semantic sense for administrators. > > > StepCode also allows each character has its own ID to be treated > > > differently cross different languages. > > > > > > But StepCode is made only when user wants it, so there are may > > > be users don't want it. Then the AMC should be used to capture > > > such cases. StepCode and AMC are compliment to each other. > > > This is discussed in Section 4.5. > > > > > > > > > > | U-s U-p A-p > > > > | U+0041 U+0061 a (Latin Letter A case folding) > > > > | U+2fc2 U+2ee5 yv2 (Han character fish for Chinese case > > > > folding) > > > > > > > > Several Chinese speakers and other experts have already, > > repeatedly, > > > > claimed > > > > that SC/TC mapping is NOT a 1-1 operation like Latin case > > folding. > > > > If you > > > > think your users will be satisfied with the 1-1 solution only, > > go > > > > right > > > > ahead, but if this turns out to be inadequate and you need to > > > > propose a fix, > > > > get ready to hear a lot of people say "I told you so." > > > > > > For this please see my post on data-centric programming > > > techniques applicable to SC/TC problem. > > > > > > > > > > > > > > | To facilitate end users for the speed of IDN access as well as > > > > | compatibility with existing applications, it is RECOMMENDED > > that > > > > an IDN > > > > | code exchange table inculdes applicable local display > > standards > > > > | corresponding with each applicable codepoints in UCS. > > > > > > > > Backward mapping tables to convert Unicode to legacy standards, > > for > > > > the > > > > express purpose of allowing end-user software to delay the > > > > transition to > > > > Unicode? Does this sound like a solution for the future? > > > > > > As these legacy standards have to > > > be on servers to switch large user base to the new IDN. > > > After you have switched the users, then you can replace > > > user softwares and hardwards at the suppliers pace. That > > > means you can play the price/service game to lure the > > > users to switch. Without this feature, most users never > > > want to change due to change is too expensive for > > > the users, since they are happy with what they got: > > > relieble connection. > > > > > > If we make these legal with the exchange map, then > > > there will be no need to implement another code form > > > like UTF-8. > > > > > > > > > > | It is REQURIED to register a language tag with IANA and its > > > > | associated script range whenever it is modified. > > > > > > > > There is already a perfectly good update process in place for > > both > > > > ISO 639 > > > > and RFC 3066. > > > > > > But IDN may not need to implement all of these tags. Each tag > > > implemented need script specific procedures to be deployed. > > > > > > > > > > > | To use mixed scripts in one IDN label is NOT RECOMMEMDED for > > an > > > > | early deployment of IDN. > > > > > > > > This immediately outcasts the Japanese, who have every reason to > > mix > > > > > > > > hiragana, katakana, kanji, and romaji. > > > > > > Wrong. Japanese, Korean are the primary languages to > > > be tested in practic. That is in C,J,K tags, also are used > > > in the I-D to show the feasibility of the implementation. > > > > > > The recommendation is there for warning though C,J,K > > > are shown can be done, since there is no system installed > > > yet, unrealistic jump in is not encouraged, especially when > > > there is possible use of USC blanket treatment by AMC. > > > > > > > > > > > | Alphabet Sys. Consonant Sys. Character Sys. > > > > | > > > > | From: 0020 0530 2e80 > > > > | to: 052f 1bff d7af > > > > | > > > > | include:Latin Armenian CJK > > > > | Greek Hebrew Kanji > > > > | Cyrillic Arabic Kana > > > > | IPA Devanagari Hangul > > > > | Vietnamese Malayalam Yi > > > > | Thai > > > > | Lao > > > > | Tibetan > > > > | ... > > > > > > > > Sorry, it's just not that simple. There are plenty of alphabets > > and > > > > > > > > alphabetic characters encoded above U+0530. That's probably why > > the > > > > Unicode > > > > Consortium, while providing a list of blocks of code points like > > the > > > > > > > > following: > > > > > > > > # Start Code..End Code; Block Name > > > > 0000..007F; Basic Latin > > > > 0080..00FF; Latin-1 Supplement > > > > 0100..017F; Latin Extended-A > > > > > > > > is careful not to imply that ranges of code points are > > permanently > > > > reserved > > > > for *classifications* of scripts like this. > > > > > > > > You can tell that the three ranges listed here are arbitrary and > > > > bogus even > > > > for the CJK scripts, by noting that Korean jamos (alphabetic) > > are > > > > located in > > > > the "consonant system" block, while the Japanese syllabaries > > (kana) > > > > and > > > > precomposed Korean syllables are in the "character system" > > block. > > > > > > These are rough groups to study different cases > > > to cover broadest language variations. And this grouping > > > is proposed by a well known linguist. While we don't > > > need to copy their views, (just like I am against copy > > > UTC's recommendation), it is necessary to learn > > > what the different views proposed by linguists before > > > I feel confidence to propose a reasonable solution. > > > No specifics are placed on these groups. The real > > > term is in Language tag definition file. As you may see > > > they are indefinit number of code blocks defined > > > in data specification format, Section 3.2, and > > > associated with language specific procedures. > > > That is the reason, I proposed IANA registration to > > > the language tags we support. > > > > > > > > > > > > > > | Some cultures often use more than two scripts within the same > > > > group, > > > > | such as Japanese, but rarely using another script especially > > from > > > > a > > > > | different group. > > > > > > > > As noted above, the Japanese use four scripts from two different > > > > groups. > > > > > > > > | The main issue in IDN-Map > > > > | is to identify character equivalent sets, and reduce the > > number of > > > > | applicable IDN identifiers by 1) limiting the applicable IDN > > input > > > > code > > > > | points to Plane 0 of Unicode table, > > > > > > > > Has anyone else so far proposed that supplementary characters be > > > > flat-out > > > > prohibited from occurring in IDN identifiers? Why should they > > be > > > > singled out > > > > as a way to "reduce the number of applicable IDN identifiers"? > > > > > > This was a statement in an early ACE I-D of this > > > group. Since UTS released new case folding map, > > > [nameprep] took it without questioning, and everyone > > > dropped this issue. > > > > > > No one proposes to prohibit Plane 1 codepoints. > > > Here I am proposing to get the equivalent class > > > work first, before we allow Plane 1 and above code > > > points in. In fact, the more you let in the more it is > > > support my case for letting TC/SC in. And this is > > > the approve: > > > > > > in the current [nameprep] specification: > > > > > > 0048; 0068; Case map > > > > > > 210B; 0068; Additional folding > > > 210C; 0068; Additional folding > > > 210D; 0068; Additional folding > > > > > > 1D407; 0068; Additional folding > > > 1D43B; 0068; Additional folding > > > 1D46F; 0068; Additional folding > > > 1D4D7; 0068; Additional folding > > > 1D573; 0068; Additional folding > > > > > > You can see this is 9 to 1 case folding, and how > > > will you recover the 9 cases? > > > > > > > > > > > | It is RECOMMENDED that reasonable studies are given to each > > > > language to > > > > | classify script treatment model, and a cost vs. benifit > > analysis > > > > in select > > > > | a long term script specific processing protocol to be embedded > > in > > > > IDN > > > > | language specific modules. > > > > > > > > This won't disrupt the schedule of the working group, will it? > > > > > > I don't know what the WG schedule is based on. If it waves > > > the CJK case away, it has met its schedule last year > > > already. If you mean the current schedule, you have to > > > ask does the WG have a clear picture of the IDN or not. > > > If no one know how to deal with CJK, any schedule is > > > meaningless. That is the reason, I do not comment > > > on WG mile stones. > > > > > > > > > > > | canonicalization > > > > > > > > This word has no clear definition and is carefully avoided by > > > > Unicode, as Ken > > > > Whistler already explained. > > > > > > I think we are getting somewhere. We are getting > > > down on the codepoints now. When I don't need use > > > all these vague terms, we are near the solution. > > > > > > > > > > > > > > | A string mixed with CJK and Kana is Japanese, CJK and Hangul > > mix > > > > is > > > > | Korean. However, an all CJK character string MUST presumed to > > be > > > > in the > > > > | primary language tag, that is Chinese, and registered as the > > only > > > > IDN name, > > > > | unless the registrant requests a second and a third language > > to > > > > access the > > > > | same IDN name. > > > > > > > > Nothing prevents an all-Han string of any arbitrary length from > > > > being > > > > Japanese text. The priority given to Chinese here is not likely > > to > > > > be well > > > > received by other groups. > > > > > > Priority gives to Chinese has many reasons: > > > 1) Majority of these characters originated in China with > > > semantics and phonetic, and naturally be named and > > > known to people who use them. The number is > > > 100,000 - 20,003 = 80,000 on the way to be named. > > > 2) Kanji has more then two phonetics, and one of them > > > is Chinese phonetics. So it is not the worst case for Kanji. > > > 3) All Kanji label automaticly gets two registered names, > > > one is in Chinese and the other in Japanese . > > > > > > Japanese gets the Chinese registration for free, Chinese > > > gets the work for nothing. Who do think is the biggest > > > beneficiary? > > > > > > > > > > | Also, it > > > > | introduces more policy decisions, for example, an all CJK > > > > character > > > > | trademark registrant may have to registrate in three languages > > to > > > > ensure > > > > | the legitimacy of the trademark. > > > > > > > > Wait just a minute. Wasn't the whole idea of this > > language-tagging > > > > and > > > > CJK-folding scheme to PREVENT registrants from having to > > register an > > > > IDN > > > > identifier more than once? > > > > > > This registration is for the different user groups of the same > > > tradename, like AOL.com and AmericanOnLine.com in DNS, > > > but in IDN they are the same as <A><O><L>.com > > > > > > This is the IDN we have to work with, one match in IDN, one > > > match in DNS. If there are more then one accesses in DNS to > > > one IDN label, IDN has to block them all in registration unless > > > they are registered. That is the Chinese group has been > > > saying: if we don't implement TC/SC, then there will be > > > exponetial DNS names for the same IDN label. > > > > > > > > > > > | After all, a useful tool is to let its > > > > | user to make decisions. > > > > > > > > Some tools are interactive, others are not. > > > > > > This depends on which layer of user you have > > > in mind. I have several of them. > > > > > > > > > > > Finally, it is not yet clear to me whether the "idn-zh-" tag > > prefix > > > > > > Where is the idn- tag come in? The zh-- tag shall be on the > > > same footing with AMC tag bq-- and treated within the same > > > interface. Please look through the idn-map I-D again. If > > > I was not express that clearly, then tell me how to improve > > > it. > > > > > > > is > > > > supposed to be embedded within IDN identifiers or specified > > > > separately. But > > > > between this additional label and the use of the less efficient > > > > StepCode > > > > instead of ACE-Z, it seems that several bytes out of the > > precious > > > > 63-byte > > > > limit are required as overheard to support this tagging scheme. > > If > > > > > > Without tag there is little chance you can process CJK and > > > the like problems. For example, Latin and Armentian. > > > The tag takes the same bytes with bq-- used in AMC. > > > > > > StepCode is not compressed, is human readable, is > > > foreigner readable. You can compare readability with > > > code length efficience, but the judges are administrators > > > of zonefiles, internetional workers on a foreign land > > > and the IDN name owners. > > > > > > > > > > I > > > > remember correctly, it is CJK users (Soobok Lee is only the most > > > > vocal) who > > > > are most concerned about the space limitation and who want to > > find > > > > (or > > > > invent) the most efficient encoding system possible. Will these > > > > other CJK > > > > users agree to this proposal? > > > > > > > > -Doug Ewell > > > > Fullerton, California > > > > > > Each member joins this list independently. You have > > > to ask them. > > > > > > Liana > > > > >
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Bruce Thomson
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Maynard Kang
- Re: Layer 2 and "idn identities" (was: Re: [idn] … John C Klensin
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Rick H Wesson
- RE: Layer 2 and "idn identities" (was: Re: [idn] … Michel Suignard
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Bruce Thomson
- RE: Layer 2 and "idn identities" (was: Re: [idn] … ???
- Re: [idn] Mixed TC/SC (was Re: Layer 2 and "idn i… Rick H Wesson
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- RE: Layer 2 and "idn identities" (was: Re: [idn] … Michel Suignard
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Rick H Wesson
- Re: Layer 2 and "idn identities" (was: Re: [idn] … tsenglm
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … James Seng/Personal
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … John C Klensin
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … James Seng/Personal
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Bruce Thomson
- Re: Layer 2 and "idn identities" (was: Re: [idn] … John C Klensin
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Rick H Wesson
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Bruce Thomson
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Eric Brunner-Williams in Portland Maine
- Re: Layer 2 and "idn identities" (was: Re: [idn] … James Seng/Personal
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … John C Klensin
- Re: Layer 2 and "idn identities" (was: Re: [idn] … James Seng/Personal
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Maynard Kang
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Maynard Kang
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Paul Hoffman / IMC
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye
- Re: Layer 2 and "idn identities" (was: Re: [idn] … John C Klensin
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … James Seng/Personal
- Re: Layer 2 and "idn identities" (was: Re: [idn] … xiang deng
- Re: Layer 2 and "idn identities" (was: Re: [idn] … Bruce Thomson
- [idn] Mixed TC/SC (was Re: Layer 2 and "idn ident… David Hopwood
- Re: Layer 2 and "idn identities" (was: Re: [idn] … James Seng/Personal
- Re: Layer 2 and "idn identities" (was: Re: [idn] … DougEwell2
- Re: Layer 2 and "idn identities" (was: Re: [idn] … liana Ye