Re: revised "generic syntax" internet draft

Keld J|rn Simonsen <keld@dkuug.dk> Fri, 25 April 1997 13:48 UTC

Received: from cnri by ietf.org id aa18250; 25 Apr 97 9:48 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa09939; 25 Apr 97 9:48 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id IAA12784 for uri-out; Fri, 25 Apr 1997 08:17:41 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id IAA12770 for <uri@services.bunyip.com>; Fri, 25 Apr 1997 08:16:08 -0400 (EDT)
Received: from dkuug.dk by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA27372 (mail destined for uri@services.bunyip.com); Fri, 25 Apr 97 08:16:02 -0400
Received: (from keld@localhost) by dkuug.dk (8.6.12/8.6.12) id OAA17664; Fri, 25 Apr 1997 14:15:40 +0200
Message-Id: <199704251215.OAA17664@dkuug.dk>
From: Keld J|rn Simonsen <keld@dkuug.dk>
Date: Fri, 25 Apr 1997 14:15:39 +0200
In-Reply-To: John C Klensin <klensin@mci.net> "Re: revised "generic syntax" internet draft" (Apr 25, 10:40)
X-Charset: ISO-8859-1
X-Char-Esc: 29
Mime-Version: 1.0
Content-Type: Text/Plain; Charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Mnemonic-Intro: 29
X-Mailer: Mail User's Shell (7.2.2 4/12/91)
To: John C Klensin <klensin@mci.net>, Edward Cherlin <cherlin@newbie.net>
Subject: Re: revised "generic syntax" internet draft
Cc: uri@bunyip.com
Sender: owner-uri@bunyip.com
Precedence: bulk

John C Klensin writes:

> (iv) It is not hard to demonstrate that, in the medium to 
> long term, there are some requirements for character set 
> encoding for which Unicode will not suffice and it will be 
> necessary to go to multi-plane 10646 (which is one of 
> several reasons why IETF recommendation documents have 
> fairly consistently pointed to 10646 and not Unicode).  The 
> two are not the same.  In particular, while the comment in 
> (iii) can easily and correctly be rewritten as a UCS-4 
> statement, UTF-8 becomes, IMO, pathological (and its own 
> excuse for compression) when one starts dealing with plane 
> 3 or 4 much less, should we be unlucky enough to get there, 
> plane 200 or so.

Well, there is some kind of compression in 10646, as the BMP is
designed to contain the most frequently used characters in the world,
and characters outside BMP are thus overall meant to be very rarely used
Thus UTF-8 is still an economical encoding of 10646. The major advantage
of UTF-8 is that it is maintaining the ISO 646 (ASCII) encoding and
the control characters in C0 and C1, and thus can provide a straight-
forward migration path for ISO 646 supporting systems.

Keld Simonsen
Liaison from SC2/WG2 to IETF