Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)

"James Seng/Personal" <jseng@pobox.org.sg> Tue, 04 December 2001 23:56 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA18134 for <idn-archive@lists.ietf.org>; Tue, 4 Dec 2001 18:56:02 -0500 (EST)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 16BP6A-0004Ae-00 for idn-data@psg.com; Tue, 04 Dec 2001 15:35:38 -0800
Received: from mail.i-dns.net ([203.126.116.228]) by psg.com with esmtp (Exim 3.33 #1) id 16BP68-00048c-00 for idn@ops.ietf.org; Tue, 04 Dec 2001 15:35:36 -0800
Received: from jamessonyvaio (pl039.nas312.n-yokohama.nttpc.ne.jp [210.165.193.39]) by mail.i-dns.net (Postfix) with SMTP id DF11DFFC17; Wed, 5 Dec 2001 07:34:48 +0800 (SGT)
Message-ID: <0a0101c17d1c$56c45bf0$1119d73d@jamessonyvaio>
From: James Seng/Personal <jseng@pobox.org.sg>
To: liana Ye <liana.ydisg@juno.com>
Cc: DougEwell2@cs.com, liana.ydisg@juno.com, idn@ops.ietf.org
References: <20011204.140704.-319607.1.liana.ydisg@juno.com>
Subject: Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)
Date: Wed, 05 Dec 2001 07:35:06 +0800
MIME-Version: 1.0
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4807.1700
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Yes, unless you can demostrate the neccessary 7 supportors. Thank you.
Please respect our need to be focus.

-James Seng

----- Original Message -----
From: "liana Ye" <liana.ydisg@juno.com>
To: <jseng@pobox.org.sg>
Cc: <DougEwell2@cs.com>; <liana.ydisg@juno.com>; <idn@ops.ietf.org>
Sent: Wednesday, December 05, 2001 3:36 AM
Subject: Re: Layer 2 and "idn identities" (was: Re: [idn] what are the
IDN identifiers?)


> StepCode discussion has been dropped out
> at June meeting, the reason is stated as "It is only
> concerning Chinese character encoding".
> It has never been brought back for discussion
> at all.  This version, 01, has been summitted
> in Oct. which has included discussion of worldwide
> scripts.  This version has never been discussed
> or remitted into the pool.
>
> Does this mean we can not discuss the idea
> on the list?  I am really puzzled.
>
> Liana
>
>
> On Tue, 4 Dec 2001 21:50:32 +0800 "James Seng/Personal"
> <jseng@pobox.org.sg> writes:
> > In my email dated 23rd Oct 2001,
> > http://www.imc.org/idn/mail-archive/msg04363.html
> >
> > I have indicated that "StepCode- A Mnemonic Internationalized Domain
> > Name Encoding
> > (draft-ietf-idn-step)" have been drop from the WG Pool.
> >
> > The following drafts remain in the wg pool:
> >
> > Internationalizing Host Names In Applications (IDNA)
> > (draft-ietf-idn-idna)
> > Preparation of Internationalized Host Names
> > (draft-ietf-idn-nameprep)
> > Proposal for a determining process of ACE identifier
> > (draft-ietf-idn-aceid)
> > Japanese characters in multilingual domain name label
> > (draft-ietf-idn-jpchar)
> > Traditional and Simplified Chinese Conversion
> > (draft-ietf-idn-tsconv)
> > Hangeul NAMEPREP recommendation version
> > 1.0(draft-ietf-idn-hangeulchar)
> > Improving ACE using code point reordering
> > v1.0(draft-ietf-idn-lsb-ace)
> > AMC-ACE-Z (draft-ietf-idn-amc-ace-z)
> > The Internationalized Domain Name System (draft-hall-dm-idns-00.txt)
> >
> > I would like to remind the working group to remain focus with the
> > discussion within the wg pool. Therefore, further discussion of
> > drafts
> > outside the pool, such as StepCode, should be bought offline.
> >
> > Thanks.
> >
> > -James Seng
> >
> > ----- Original Message -----
> > From: "liana Ye" <liana.ydisg@juno.com>
> > To: <DougEwell2@cs.com>
> > Cc: <idn@ops.ietf.org>
> > Sent: Tuesday, December 04, 2001 2:18 PM
> > Subject: Re: Layer 2 and "idn identities" (was: Re: [idn] what are
> > the
> > IDN identifiers?)
> >
> >
> > >
> > >
> > > On Mon, 3 Dec 2001 16:29:57 EST DougEwell2@cs.com writes:
> > > > In a message dated 2001-12-03 2:01:56 Pacific Standard Time,
> > > > liana.ydisg@juno.com writes:
> > > >
> > > > > I see that you have not read the I-D yet, and Deng Xiang
> > > > > has replied your Chinese vs. Japanese arguement, I
> > > > > will wait for your comment on the language-tag issue,
> > > > > or anything not up to your standard.
> > > >
> > > > Here are some specific concerns related to items in
> > > > draft-Liana-idn-map-00.txt.
> > > >
> > > > | The proposed ACE is a mnemonic encoding scheme,
> > > > | and is called StepCode [StepCode].
> > > >
> > > > Hasn't AMC-ACE-Z already been chosen as the standard ACE for
> > IDN?  I
> > > > would be
> > > > surprised if the decision were made to use two different ACEs
> > > > depending on
> > > > the language, or script, of the encoded text.
> > >
> > > The current AMC treats all UCS codepoints the same. It can not
> > > solve look-alike cross different languages.  It does not make DNS
> > > master records readable for administrators.  It does not make
> > > zonefile sortable by different regions or sensible user groups.
> > > It does not help a user does not read Chinese but communicate
> > > with Chinese partners.  But It does compress the data and feed
> > > that into DNS.
> > >
> > > Using StepCode can group users by language, so sorting
> > > the names makes a lot more semantic sense for administrators.
> > > StepCode also allows each character has its own ID to be treated
> > > differently cross different languages.
> > >
> > > But StepCode is made only when user wants it, so there are may
> > > be users don't want it.  Then the AMC should be used to capture
> > > such cases.  StepCode and AMC are compliment to each other.
> > > This is discussed in Section 4.5.
> > >
> > >
> > > > | U-s   U-p A-p
> > > > | U+0041  U+0061    a      (Latin Letter A case folding)
> > > > | U+2fc2  U+2ee5    yv2    (Han character fish for Chinese case
> > > > folding)
> > > >
> > > > Several Chinese speakers and other experts have already,
> > repeatedly,
> > > > claimed
> > > > that SC/TC mapping is NOT a 1-1 operation like Latin case
> > folding.
> > > > If you
> > > > think your users will be satisfied with the 1-1 solution only,
> > go
> > > > right
> > > > ahead, but if this turns out to be inadequate and you need to
> > > > propose a fix,
> > > > get ready to hear a lot of people say "I told you so."
> > >
> > > For this please see my post on data-centric programming
> > > techniques applicable to SC/TC problem.
> > >
> > >
> > > >
> > > > | To facilitate end users for the speed of IDN access as well as
> > > > | compatibility with existing applications, it is RECOMMENDED
> > that
> > > > an IDN
> > > > | code exchange table inculdes applicable local display
> > standards
> > > > | corresponding with each applicable codepoints in UCS.
> > > >
> > > > Backward mapping tables to convert Unicode to legacy standards,
> > for
> > > > the
> > > > express purpose of allowing end-user software to delay the
> > > > transition to
> > > > Unicode?  Does this sound like a solution for the future?
> > >
> > > As these legacy standards have to
> > > be on servers to switch large user base to the new IDN.
> > > After you have switched the users, then you can replace
> > > user softwares and hardwards at the suppliers pace.  That
> > > means you can play the price/service game to lure the
> > > users to switch.  Without this feature, most users never
> > > want to change due to change is too expensive for
> > > the users, since they are happy with what they got:
> > > relieble connection.
> > >
> > > If we make these legal with the exchange map, then
> > > there will be no need to implement another code form
> > > like UTF-8.
> > >
> > >
> > > > | It is REQURIED to register a language tag with IANA and its
> > > > | associated script range whenever it is modified.
> > > >
> > > > There is already a perfectly good update process in place for
> > both
> > > > ISO 639
> > > > and RFC 3066.
> > >
> > > But IDN may not need to implement all of these tags.  Each tag
> > > implemented need script specific procedures to be deployed.
> > >
> > > >
> > > > | To use mixed scripts in one IDN label is NOT RECOMMEMDED for
> > an
> > > > | early deployment of IDN.
> > > >
> > > > This immediately outcasts the Japanese, who have every reason to
> > mix
> > > >
> > > > hiragana, katakana, kanji, and romaji.
> > >
> > > Wrong.  Japanese, Korean are the primary languages to
> > > be tested in practic.  That is in C,J,K tags, also are used
> > > in the I-D to show the feasibility of the implementation.
> > >
> > > The recommendation is there for warning  though C,J,K
> > > are shown can be done, since there is no system installed
> > > yet, unrealistic jump in is not encouraged, especially when
> > > there is possible use of USC blanket treatment by AMC.
> > >
> > > >
> > > > |         Alphabet Sys.  Consonant Sys.  Character Sys.
> > > > |
> > > > | From: 0020            0530            2e80
> > > > | to:   052f            1bff            d7af
> > > > |
> > > > | include:Latin           Armenian        CJK
> > > > |         Greek           Hebrew          Kanji
> > > > |         Cyrillic        Arabic          Kana
> > > > |         IPA             Devanagari      Hangul
> > > > |         Vietnamese      Malayalam       Yi
> > > > |                         Thai
> > > > |                         Lao
> > > > |                         Tibetan
> > > > |                         ...
> > > >
> > > > Sorry, it's just not that simple.  There are plenty of alphabets
> > and
> > > >
> > > > alphabetic characters encoded above U+0530.  That's probably why
> > the
> > > > Unicode
> > > > Consortium, while providing a list of blocks of code points like
> > the
> > > >
> > > > following:
> > > >
> > > >     # Start Code..End Code; Block Name
> > > >     0000..007F; Basic Latin
> > > >     0080..00FF; Latin-1 Supplement
> > > >     0100..017F; Latin Extended-A
> > > >
> > > > is careful not to imply that ranges of code points are
> > permanently
> > > > reserved
> > > > for *classifications* of scripts like this.
> > > >
> > > > You can tell that the three ranges listed here are arbitrary and
> > > > bogus even
> > > > for the CJK scripts, by noting that Korean jamos (alphabetic)
> > are
> > > > located in
> > > > the "consonant system" block, while the Japanese syllabaries
> > (kana)
> > > > and
> > > > precomposed Korean syllables are in the "character system"
> > block.
> > >
> > > These are rough groups to study different cases
> > > to cover broadest language variations.  And this grouping
> > > is proposed by a well known linguist.  While we don't
> > > need to copy their views, (just like I am against copy
> > > UTC's recommendation),  it is necessary to learn
> > > what the different views proposed by linguists before
> > > I feel confidence to propose a reasonable solution.
> > > No specifics are placed on these groups.  The real
> > > term is in Language tag definition file.  As you may see
> > > they are indefinit number of code blocks defined
> > > in data specification format, Section 3.2, and
> > > associated with language specific procedures.
> > > That is the reason, I proposed IANA registration to
> > > the language tags we support.
> > >
> > >
> > > >
> > > > | Some cultures often use more than two scripts within the same
> > > > group,
> > > > | such as Japanese, but rarely using another script especially
> > from
> > > > a
> > > > | different group.
> > > >
> > > > As noted above, the Japanese use four scripts from two different
> > > > groups.
> > > >
> > > > | The main issue in IDN-Map
> > > > | is to identify character equivalent sets, and reduce the
> > number of
> > > > | applicable IDN identifiers by 1) limiting the applicable IDN
> > input
> > > > code
> > > > | points to Plane 0 of Unicode table,
> > > >
> > > > Has anyone else so far proposed that supplementary characters be
> > > > flat-out
> > > > prohibited from occurring in IDN identifiers?  Why should they
> > be
> > > > singled out
> > > > as a way to "reduce the number of applicable IDN identifiers"?
> > >
> > > This was a statement in an early ACE I-D of this
> > > group. Since UTS released new case folding map,
> > > [nameprep] took it without questioning, and everyone
> > > dropped this issue.
> > >
> > > No one proposes to prohibit Plane 1 codepoints.
> > > Here I am proposing to get the equivalent class
> > > work first, before we allow Plane 1 and above code
> > > points in.  In fact, the more you let in the more it is
> > > support my case for letting TC/SC in.  And this is
> > > the approve:
> > >
> > > in the current [nameprep] specification:
> > >
> > > 0048; 0068; Case map
> > >
> > > 210B; 0068; Additional folding
> > > 210C; 0068; Additional folding
> > > 210D; 0068; Additional folding
> > >
> > > 1D407; 0068; Additional folding
> > > 1D43B; 0068; Additional folding
> > > 1D46F; 0068; Additional folding
> > > 1D4D7; 0068; Additional folding
> > > 1D573; 0068; Additional folding
> > >
> > > You can see this is 9 to 1 case folding, and how
> > > will you recover the 9 cases?
> > >
> > > >
> > > > | It is RECOMMENDED that reasonable studies are given to each
> > > > language to
> > > > | classify script treatment model, and a cost vs. benifit
> > analysis
> > > > in select
> > > > | a long term script specific processing protocol to be embedded
> > in
> > > > IDN
> > > > | language specific modules.
> > > >
> > > > This won't disrupt the schedule of the working group, will it?
> > >
> > > I don't know what the WG schedule is based on.  If it waves
> > > the CJK case away, it has met its schedule last year
> > > already.  If you mean the current schedule, you have to
> > > ask does the WG have a clear picture of the IDN or not.
> > > If no one know how to deal with CJK, any schedule is
> > > meaningless.  That is the reason, I do not comment
> > > on WG mile stones.
> > >
> > > >
> > > > | canonicalization
> > > >
> > > > This word has no clear definition and is carefully avoided by
> > > > Unicode, as Ken
> > > > Whistler already explained.
> > >
> > > I think we are getting somewhere.  We are getting
> > > down on the codepoints now.  When I don't need use
> > > all these vague terms, we are near the solution.
> > >
> > >
> > > >
> > > > | A string mixed with CJK and Kana is Japanese, CJK and Hangul
> > mix
> > > > is
> > > > | Korean. However, an all CJK character string MUST presumed to
> > be
> > > > in the
> > > > | primary language tag, that is Chinese, and registered as the
> > only
> > > > IDN name,
> > > > | unless the registrant requests a second and a third language
> > to
> > > > access the
> > > > | same IDN name.
> > > >
> > > > Nothing prevents an all-Han string of any arbitrary length from
> > > > being
> > > > Japanese text.  The priority given to Chinese here is not likely
> > to
> > > > be well
> > > > received by other groups.
> > >
> > > Priority gives to Chinese has many reasons:
> > > 1) Majority of these characters originated in China with
> > >  semantics and phonetic, and naturally be named and
> > >  known to people who use them.  The number is
> > >  100,000 - 20,003 = 80,000 on the way to be named.
> > > 2) Kanji has more then two phonetics, and one of them
> > >  is Chinese phonetics.  So it is not the worst case for Kanji.
> > > 3) All Kanji label automaticly gets two registered names,
> > >  one is in Chinese and the other in Japanese .
> > >
> > > Japanese gets the  Chinese registration for free,  Chinese
> > >  gets the work  for nothing.  Who do think is the biggest
> > > beneficiary?
> > >
> > >
> > > > | Also, it
> > > > | introduces more policy decisions, for example, an all CJK
> > > > character
> > > > | trademark registrant may have to registrate in three languages
> > to
> > > > ensure
> > > > | the legitimacy of the trademark.
> > > >
> > > > Wait just a minute.  Wasn't the whole idea of this
> > language-tagging
> > > > and
> > > > CJK-folding scheme to PREVENT registrants from having to
> > register an
> > > > IDN
> > > > identifier more than once?
> > >
> > > This registration is for the different user groups of the same
> > > tradename, like AOL.com and AmericanOnLine.com in DNS,
> > > but in IDN they are the same as <A><O><L>.com
> > >
> > > This is the IDN we have to work with, one match in IDN, one
> > > match in DNS.  If there are more then one accesses in DNS to
> > > one IDN label, IDN has to block them all in registration unless
> > > they are registered.  That is the Chinese group has been
> > > saying: if we don't implement TC/SC, then there will be
> > > exponetial DNS names for the same IDN label.
> > >
> > > >
> > > > | After all, a useful tool is to let its
> > > > | user to make decisions.
> > > >
> > > > Some tools are interactive, others are not.
> > >
> > > This depends on which layer of user you have
> > > in mind.  I have several of them.
> > >
> > > >
> > > > Finally, it is not yet clear to me whether the "idn-zh-" tag
> > prefix
> > >
> > > Where is the idn- tag come in? The zh-- tag shall be on the
> > > same footing with AMC tag bq-- and treated within the same
> > > interface.  Please look through the idn-map I-D again.  If
> > > I was not express that clearly, then tell me how to improve
> > > it.
> > >
> > > > is
> > > > supposed to be embedded within IDN identifiers or specified
> > > > separately.  But
> > > > between this additional label and the use of the less efficient
> > > > StepCode
> > > > instead of ACE-Z, it seems that several bytes out of the
> > precious
> > > > 63-byte
> > > > limit are required as overheard to support this tagging scheme.
> > If
> > >
> > > Without tag there is little chance you can process CJK and
> > > the like problems.  For example, Latin and Armentian.
> > > The tag takes the same bytes with bq-- used in AMC.
> > >
> > > StepCode is not compressed, is human readable, is
> > > foreigner readable.  You can compare readability with
> > >  code length efficience, but the judges are administrators
> > > of zonefiles, internetional workers on a foreign land
> > > and the IDN name owners.
> > >
> > >
> > > > I
> > > > remember correctly, it is CJK users (Soobok Lee is only the most
> > > > vocal) who
> > > > are most concerned about the space limitation and who want to
> > find
> > > > (or
> > > > invent) the most efficient encoding system possible.  Will these
> > > > other CJK
> > > > users agree to this proposal?
> > > >
> > > > -Doug Ewell
> > > >  Fullerton, California
> > >
> > > Each member joins this list independently.  You have
> > > to ask them.
> > >
> > > Liana
> > >
> >