Re: revised "generic syntax" internet draft

"Martin J. Duerst" <> Sun, 27 April 1997 12:47 UTC

Received: from cnri by id aa18024; 27 Apr 97 8:47 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa08016; 27 Apr 97 8:47 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id IAA19204 for uri-out; Sun, 27 Apr 1997 08:34:42 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with SMTP id IAA19199 for <>; Sun, 27 Apr 1997 08:34:39 -0400 (EDT)
Received: from by with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA05753 (mail destined for; Sun, 27 Apr 97 08:34:37 -0400
Received: from by with SMTP (PP) id <>; Sun, 27 Apr 1997 14:34:25 +0200
Date: Sun, 27 Apr 1997 14:34:24 +0200
From: "Martin J. Duerst" <>
To: Keld J|rn Simonsen <>
Cc: John C Klensin <>, Edward Cherlin <>,
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <>
Message-Id: <Pine.SUN.3.96.970427141954.245B-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Precedence: bulk

On Sat, 26 Apr 1997, Keld J|rn Simonsen wrote:

> "Martin J. Duerst" writes:
> > > (iv) It is not hard to demonstrate that, in the medium to 
> > > long term, there are some requirements for character set 
> > > encoding for which Unicode will not suffice and it will be 
> > > necessary to go to multi-plane 10646
> > 
> > You are not the first or only one to notice this. Unicode
> > currently can encode planes 0 to 16 (for a total of about
> > one million codepoints) by a mechanism called surrogates
> > or UTF-16. Please check your copy of Unicode vol. 2.
> Surely we are not talking Unicode, (an industry standard) but ISO 10646?
> IETF normally specifies ISO standards when available. 10646 is 32 bits.

We are usually (implicitly or explicitly) talking both ISO 10646 and
Unicode, as they are the same for most practical purposes. For official
specification, I agree that ISO 10646 is to be preferred. On the other
hand, a lot of actual systems (in those cases where the differences
actually matter) are closer to Unicode than ISO 10646, and also a lot
of Unicode/ISO 10646 systems are anounced/marketed using the name
"Unicode" rather than the number "10646".

My above remark was to point out that if we specify ISO 10646,
but an actual industry standard system uses Unicode, then not
only are the codepoints in the BMP the same, but also both
standards/systems will have an unified code space up to well
over a million codepoints.

In addition, for the whole equivalence/normalization question,
we will have to base our work on the equivalences defined in
Unicode, because there are no such equivalences defined in
ISO 10646.

I hope that in the above sense, an occasional reference to
Unicode in this discussion and in the resulting specs will
be tolerated (:-) even by the strongest ISO 10646 proponents,
and that all of us that know about the usefulness of a Universal
Character Set can work towards making the best use of it
in URLs.

Regards,	Martin.