Re: revised "generic syntax" internet draft

Edward Cherlin <> Sun, 27 April 1997 22:51 UTC

Received: from cnri by id aa29202; 27 Apr 97 18:51 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa16236; 27 Apr 97 18:51 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id SAA05291 for uri-out; Sun, 27 Apr 1997 18:38:26 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with SMTP id SAA05286 for <>; Sun, 27 Apr 1997 18:38:23 -0400 (EDT)
Received: from by with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA07397 (mail destined for; Sun, 27 Apr 97 18:38:22 -0400
Received: from [] ( []) by (8.8.5/8.6.5) with ESMTP id PAA09103 for <uri@Bunyip.Com>; Sun, 27 Apr 1997 15:38:06 -0700 (PDT)
Message-Id: <v03007802af895cefc84c@[]>
In-Reply-To: <>
References: "Martin J. Duerst" <> "Re: revised "generic syntax" internet draft" (Apr 25, 20:23)
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Sun, 27 Apr 1997 12:59:01 -0700
From: Edward Cherlin <>
Subject: Re: revised "generic syntax" internet draft
Precedence: bulk (Keld J|rn Simonsen) wrote:

>"Martin J. Duerst" writes:
>> > (iv) It is not hard to demonstrate that, in the medium to
>> > long term, there are some requirements for character set
>> > encoding for which Unicode will not suffice and it will be
>> > necessary to go to multi-plane 10646
>> You are not the first or only one to notice this. Unicode
>> currently can encode planes 0 to 16 (for a total of about
>> one million codepoints) by a mechanism called surrogates
>> or UTF-16. Please check your copy of Unicode vol. 2.
>Surely we are not talking Unicode, (an industry standard) but ISO 10646?
>IETF normally specifies ISO standards when available. 10646 is 32 bits.

ISO 10646 specifies Unicode as a 16-bit subset. There is nothing to argue
about here. We will formally specify 10646, but we will actually only use
Unicode, since there are no other characters defined in 10646, and current
expectation is that there never will be any 10646 characters not in
Unicode, since their alignment is part of the current definition of both.

Unicode 1.0 encoded plane 0 of ISO 10646, and Unicode 2.0 encodes 17 planes
of ISO 10646, including somewhat more than a million characters. The most
generous estimate of possible future need is a quarter million characters.

Edward Cherlin     Everything should be made
Vice President     Ask. Someone knows.       as simple as possible,
NewbieNet, Inc.                                 __but no simpler__.                Attributed to Albert Einstein