Re: UTF-8 URL for testing

Chris Newman <Chris.Newman@innosoft.com> Mon, 14 April 1997 21:47 UTC

Received: from cnri by ietf.org id aa10020; 14 Apr 97 17:47 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa21266; 14 Apr 97 17:47 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id RAA12427 for uri-out; Mon, 14 Apr 1997 17:19:53 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id RAA12422 for <uri@services.bunyip.com>; Mon, 14 Apr 1997 17:19:50 -0400 (EDT)
Received: from THOR.INNOSOFT.COM by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA07808 (mail destined for uri@services.bunyip.com); Mon, 14 Apr 97 17:19:48 -0400
Received: from eleanor.innosoft.com by INNOSOFT.COM (PMDF V5.1-8 #8694) with SMTP id <01IHP5S28RHA99F9D5@INNOSOFT.COM> for uri@bunyip.com; Mon, 14 Apr 1997 14:18:40 PDT
Date: Mon, 14 Apr 1997 14:20:04 -0700
From: Chris Newman <Chris.Newman@innosoft.com>
Subject: Re: UTF-8 URL for testing
In-Reply-To: <334E7F88.4703@w3.org>
To: IETF URI list <uri@bunyip.com>
Message-Id: <Pine.SOL.3.95.970414140622.21307H-100000@eleanor.innosoft.com>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk

On Fri, 11 Apr 1997, Dan Connolly wrote:
> Francois Yergeau wrote:
> >  What is needed is standardization, and I am *not* satisfied with
> > the current syntax draft, which continues to ignore a very basic need.
> > 
> > If *recommending* UTF-8 means that URL syntax cannot progress to Draft
> > Standard, so be it: recycle to Proposed and come back in 6 months.
> 
> That makes sense to me. I don't consider this a minor cleanup of RFC1738
> and RFC1808, but a fairly substantial re-write where we should take the
> opportunity to make some cost-effective changes like this one.

I do not believe the current URL syntax draft can or should go to draft
standard at this point.  At the very least, the relative-URL resolution
has completely changed syntax from RFC 1808 (with respect to parameters).
While this change may better reflect implementations, it is sufficiently
fundamental to the spec that it should be recycled at proposed.

In addition, I urge the URL community not to make the same mistake that
happened in the email community.  RFC 822 was quite clear that 8-bit
characters were not permitted in email and did not specify their meaning.
People implemented and still implement unlabelled 8-bit characters in
email with completely non-interoperable behavior -- despite MIME's
existance.

For URLs we can come up with all the perfectly valid reasons why US-ASCII
only would work better in practice, but that does not reflect reality.  I
think it is inappropriate for the next URL spec to fail to provide an
unambiguous interoperable way to support multilingual characters.  I see
no reasonable alternatives on the table to the proposed UTF-8 language.