Re: [idn] Intro to my I-D

"xiang deng" <deng@cnnic.net.cn> Sun, 29 July 2001 07:59 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with SMTP id DAA09716 for <idn-archive@lists.ietf.org>; Sun, 29 Jul 2001 03:59:18 -0400 (EDT)
Received: from lserv by psg.com with local (Exim 3.31 #1) id 15Ql6V-000D3Y-00 for idn-data@psg.com; Sun, 29 Jul 2001 00:35:11 -0700
Received: from [159.226.6.187] (helo=whale.cnnic.net.cn) by psg.com with esmtp (Exim 3.31 #1) id 15Ql6B-000D30-00 for idn@ops.ietf.org; Sun, 29 Jul 2001 00:34:56 -0700
Received: from deng ([159.226.7.68]) by whale.cnnic.net.cn (Netscape Messaging Server 4.15) with SMTP id GH854T00.P7P; Sun, 29 Jul 2001 15:36:29 +0800
Message-ID: <009d01c117ff$d90d9590$4407e29f@deng>
From: xiang deng <deng@cnnic.net.cn>
To: wenhui zhang <zwh6810@yahoo.com>, ben <ben@cc-www.com>, John C Klensin <klensin@jck.com>
Cc: idn@ops.ietf.org
References: <20010729031829.25658.qmail@web5505.mail.yahoo.com>
Subject: Re: [idn] Intro to my I-D
Date: Sun, 29 Jul 2001 15:26:56 +0800
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by ietf.org id DAA09716

Dear Ben:
I'm apologies of missing "o"  from "Hello".

Best regards.
Deng Xiang

----- Original Message ----- 
From: "wenhui zhang" <zwh6810@yahoo.com>
To: "ben" <ben@cc-www.com>; "John C Klensin" <klensin@jck.com>
Cc: <idn@ops.ietf.org>
Sent: Sunday, July 29, 2001 11:18 AM
Subject: Re: [idn] Intro to my I-D


> hell Ben:
> 
>   I am wenhui from CNNIC, you can reach me via
> zwh6810@yahoo.com or zwh@cnnic.net.cn.
>   Our two drafts are as follows:
> http://www.i-d-n.net/draft/draft-ietf-idn-tsconv-00.txt
> 
>   
>   
> Wenhui
> 
> --- ben <ben@cc-www.com> wrote:
> > Hi John,
> > 
> > Thanks for your suggestion, I will do that.
> > 
> > By the way, fyi, under my supreme CDN system... a
> > registrant has the
> > choice of pointing the Simplified CDN and the
> > Traditional CDN both to
> > the same location OR to different locations.  (If
> > you still don't
> > understand... things will be more detailed / clear
> > when my draft comes
> > out.)
> > 
> > Thanks
> > Ben
> > 
> > ----- Original Message -----
> > From: "John C Klensin" <klensin@jck.com>
> > To: "ben" <ben@cc-www.com>
> > Cc: <idn@ops.ietf.org>
> > Sent: Saturday, July 28, 2001 12:14 PM
> > Subject: Re: [idn] Intro to my I-D
> > 
> > 
> > 
> > Ben (and David and Eric),
> > 
> > It seems to me that a high-level summary of the
> > difficulty here
> > is that you want to treat Simplified and Traditional
> > Chinese as
> > different so that you can assign semantics (e.g.,
> > different web
> > sites written respectively in the two forms) to the
> > two writing
> > styles.  Our CNNIC colleagues believe that
> > Simplified and
> > Traditional writing forms to express the same word
> > should be
> > treated as equivalent and mapped into each other.
> > 
> > That is very fundamental; we can't have it both ways
> > in the DNS
> > (although one can imagine "treat these alike and see
> > what is
> > found" instructions to a search system).  As I
> > understand what
> > they have said, mixtures of simplified and
> > traditional systems
> > within a given phrase are also possible, which
> > eliminates the
> > simplification you propose of automatically
> > registering "both"
> > forms.
> > 
> > I agree with David and Eric that you should
> > carefully examine
> > the Lee and Deng drafts.  But, since there seems to
> > be a more
> > basic philosophical difference here, I suggest that
> > you try to
> > work with them to understand each other's positions
> > and see if
> > some collectively acceptable position can be found.
> > 
> >     john
> > 
> > 
> > 
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Make international calls for as low as $.04/minute with Yahoo! Messenger
> http://phonecard.yahoo.com/


--------------------------------------------------------------------------------


Internet Draft                                     Authors: Xiang Deng
<draft-ietf-idn-icdn-00.txt>                                Yan Fang Wang     
July , 2001                                                    
Expires in six months                                    
                                                           
                                                           
                                                           
                                                              
       The Implementation of Chinese character in IDN

Status of this Memo

This document is an Internet-Draft and is in full conformance with all 
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task 
Force (IETF), its areas, and its working groups. Note that other groups 
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months 
and may be updated, replaced, or obsoleted by other documents at any 
time. It is inappropriate to use Internet-Drafts as reference material 
or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html

Terminology

The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119].

Abstract

This document mainly talks about Chinese characters and two proposed 
schemes of implemention based on [IDNREQ] and [NAMEPREP],though 
there are some differences among them.The distinction between these two schemes 
is the position of the implementation function:
    -- client side processing 
or 
    -- server side processing

In China, the most popular character set are [GBK],[BIG5],[GB18030],while
in this document,all examples are based on [UCS].


1.  Charateristics of Chinese characters and Chinese languange
1.1 The context dependent semantics of Chinese characters 
    In [UCS],each Chinese character is a codepoint,which is composed of two
    bytes.
    
    Chinese character can be classified as two groups. In one group,each
    character does its own meaning(notional character) while that of the
    other group has not(empty characters). Both notional characters and
    empty characters can be made words by combining with other 
    character(s),even sentences. the notional character 
    is the basic unit of Chinese language which has meaning similar to phonems.
    
1.2 A Chinese characters may have several writing forms.
    
    Chinese characters were continuously evolved and widely spread during
    5,000-year-long Chinese history. They were also largely introduced 
    into other countries and became a major component of their languages.
    Therefore, it is inevitably for a Chinese character has many other
    writing forms. In Unicode encoding standards, the criterion for
    distributing codepoint is the shape of character. So the different 
    glyph of the same Chinese character have several different 
    codepoint according to the international encoding standard.
    
    Currently,there are two forms of writing Chinese character:
      -- simplified character(SC): mainland of China
      -- traditional character(TC): Taiwan,Hongkong,Macao
    
    Except for some special writing forms of certain character, their 
    meaning had also been changed in the long history. Generally different
    writing forms of a Chinese character can substituted by each other 
    without changing the meaning of the word(phrase).    
 
   
1.3 The Usage of Appellation in China
    In China, Generally speaking,every companies,organizations and people
    have two names: full name and abbreviation.
    
    The abbreviated name is easy to remember and to communicate.The full
    name is a formal name which is used in formal document,situation.
    
    To the name owners,the two names are equal necessary and important. 
    So,in domain name registration,they usually register both full name 
    and the corresponding abbreviations in order to permit people to access
    the same domain name by typing the full name or the abbreviatied name.
    Some of the full names are quite long,that's why the length of domain 
    name is important for Chinese user.


2. Chinese characters in DNS
1.1 Traditional and Simplified Chinese Conversion has 3 forms:
    1-1 mapping: one traditional character(TC) maps to ONLY one simplified
                 characer(SC).
    1-n mapping: one TC has several SC writing forms 
    n-1 mapping: one SC has several forms of TC

1.2 Delimiter folding
    The full stop in chinese is "¡£". Therefore, the "¡£" in CDNS is equal 
    to the dot "." as the delimiter.

1.3 Label sequence
   Currently,the label sequence of LDH domain name is from left to right,
   (e.g.:abc.def.ghi.net),the subdomain is to the left and the superset 
   of the subdomain is to the right.
   
   In China,user has reverse convention of language. Considering the 
   culture different between the east and the west, it's necessary for
   people to access the Internet with the convention of using their native
   languages.for example:
   
        CDNLabel1.CDNLabel2.cn 
   perfer to :
        cn.CDNLabel2.CDNLabel1
         
   


3. Solutions  
3.1 Client side solution 

    +-----------------------------------------------+
    |                  user input     |
    +-----------------------------------------------+
   |       ^
   V       |
+-------------------+       |
| Delimiter folding |       |
|    "¡£" -> "."    |       |
+-------------------+       |
   |       | 
   V       | 
    +------------------------------+ +------------------------------+
    | label sequence normalization | | label sequence normalization |
    +------------------------------+ +------------------------------+
   |       ^
   V       | 
        +----------------------+     +----------------------+
  | local encoding ->UCS |    | UCS ->local encoding |
        +----------------------+    +----------------------+
     |       ^
     V       | 
       +------------------------+   +------------------------+
       | local mapping (TC - SC)|   | local mapping (TC - SC)|
       +------------------------+   +------------------------+
          |       ^
          V       | 
             +----------+                            |
              | NAMEPREP |                            |
              +----------+                            |
                 |       | 
                 V       | 
             +------------+ +-----------------+
             | UCS -> MDN | | MDN -> UCS |
             +------------+ +-----------------+
                |       ^ 
                V       |
             +-----------------------------------------------+
             |      local resolver      |
             +-----------------------------------------------+
             |    DNS server      |
             +-----------------------------------------------+


3.1 Server side solution

    +-----------------------------------------------+
    |                  user input     |
    +-----------------------------------------------+
   |       ^
   V       |
+-------------------+       |
| Delimiter folding |       |
|    "¡£" -> "."    |       |
+-------------------+       |
   |       | 
   V       | 
    +------------------------------+ +------------------------------+
    | label sequence normalization | | label sequence normalization |
    +------------------------------+ +------------------------------+
   |       ^
   V       | 
        +----------------------+     +----------------------+
  | local encoding ->UCS |    | UCS ->local encoding |
        +----------------------+    +----------------------+
     |       ^
     V       | 
             +----------+                            |
              | NAMEPREP |                            |
              +----------+                            |
                 |       | 
                 V       | 
             +------------+ +-----------------+
             | UCS -> MDN | | MDN -> UCS |
             +------------+ +-----------------+
                |       ^ 
                V       |
             +-----------------------------------------------+
             |      local resolver      |
             +-----------------------------------------------+
                  |
                  V
             +-----------------------------------------------+
             |      local mapping (TC - SC)      |
             |-----------------------------------------------|
             |    DNS server      |
             +-----------------------------------------------+


6 Authors' Address
Xiang Deng
China Internet Network Information Center 
NO.4  South 4th ST. Beijing, P.R.China, 100080, PO BOX 349
Tel: +86-10-62619750 

Yan Fang Wang
China Internet Network Information Center 
NO.4  South 4th ST. Beijing, P.R.China, 100080, PO BOX 349
Tel: +86-10-62619750 


7 References

[IDNREQ]  Requirements of Internationalized Domain Names, Zita Wenzel, 
                James Seng, draft-ietf-idn-requirements

[NAMEPREP] Paul Hoffman & Marc Blanchet, Preparation of
           Internationalized Host Names, draft-ietf-idn-nameprep

[RFC2119] Scott Bradner, Key words for use in RFCs to Indicate
          Requirement Levels, March 1997, RFC 2119.

[STD13]   Paul Mockapetris, Domain names - implementation and
          specification, November 1987, STD 13 (RFC 1034 and 1035).
             
[UNAME]   Internationalized Domain Names and Unique Identifiers/Names
                Li Ming TSENG, Jan Ming HO, Hua Lin QIAN, Kenny HUANG
                draft-ietf-idn-uname
                
[TSCONV]  Traditional and Simplified Chinese Conversion
Xiao Dong Lee, Nai Wen Hsu, Erin Chen, Guo Nian Sun
draft-ietf-idn-tsconv

[ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information
           technology -- Universal Multiple-Octet Coded Character Set 
           (UCS) -- Part 1: Architecture and Basic Multilingual Plane.

[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version3.0",
           ISBN 0-201-61633-5.