Re: UTF-8 and URLs

Dan Connolly <> Fri, 25 April 1997 05:45 UTC

Received: from cnri by id aa07328; 25 Apr 97 1:45 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa02666; 25 Apr 97 1:45 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id BAA06404 for uri-out; Fri, 25 Apr 1997 01:26:09 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with SMTP id BAA06399 for <>; Fri, 25 Apr 1997 01:26:06 -0400 (EDT)
Received: from by with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA26269 (mail destined for; Fri, 25 Apr 97 01:26:03 -0400
Received: from ( []) by (8.8.4/8.8.4) with SMTP id AAA05273; Fri, 25 Apr 1997 00:25:58 -0500
Message-Id: <>
Date: Fri, 25 Apr 1997 00:25:57 -0500
From: Dan Connolly <>
Organization: World Wide Web Consortium
X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.27 i586)
Mime-Version: 1.0
To: Francois Yergeau <>
Subject: Re: UTF-8 and URLs
References: <> <>
Content-Type: text/plain; charset="iso-8859-1"
X-MIME-Autoconverted: from quoted-printable to 8bit by id BAA06400
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by id BAA06404

Francois Yergeau wrote:
> À 09:56 24-04-97 PDT, Larry Masinter a écrit :
> >I think given its likely controversial nature, we should clearly
> >make these recommendations in a separate RFC, and perhaps with
> >a new working group.
> Meaning what?  Two separate standards?  Or worse, a standard and an
> experimental/informational/BCP?

I see it as a proposed standard and an internet draft. I don't feel
that RFC1738 is ready to go to draft standard without addressing I18N.

> Who wants a two-tier Web, with only the lower tier internationalized, raise
> your hand!

Cut it out with the FUD, OK? It's really counterproductive.

> Let's see: we would have an i18n RFC that would allow URLs to contain most
> any characters, and a (possibly Draft) standard that would say "All URLs
> consist of a restricted set of characters..." (we know which): clear
> contradiction.

Please don't cite out of context or paraphrase wildly. The _existing_
RFC limits the characters in URLs. In fact, the UTF-8-in-%XX encoding
propsal doesn't even change that: it just adds semantics to the syntax.

> Please let's drop the separate draft idea for good.

From what I can see, Larry is the only guy around here volunteering
to be editor; as such, it's up to him to decide whether it's more
convenient to present the ideas in one document or two.

Of course, you're welcome to attempt to merge the ideas into one
if you can find the time. I think your work in the HTML I18N draft was
excellent, and I would make every effort to review it.

I wish it were easier to work together on this. I'm willing to
hosting a face-to-face meeting or teleconference if folks would like me

Dan Connolly, W3C Architecture Domain Lead
<> +1 512 310-2971
PGP:EDF8 A8E4 F3BB 0F3C FD1B 7BE0 716C FF21