Re: "Difficult Characters" draft

"Martin J. Duerst" <> Tue, 06 May 1997 19:50 UTC

Received: from cnri by id aa03094; 6 May 97 15:50 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa17633; 6 May 97 15:50 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id PAA18976 for uri-out; Tue, 6 May 1997 15:22:36 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id PAA18970 for <>; Tue, 6 May 1997 15:22:34 -0400 (EDT)
Received: from ( []) by (8.8.5/8.8.5) with SMTP id PAA27747 for <>; Tue, 6 May 1997 15:22:24 -0400 (EDT)
Received: from by with SMTP (PP) id <>; Tue, 6 May 1997 21:20:53 +0200
Date: Tue, 06 May 1997 21:20:46 +0200
From: "Martin J. Duerst" <>
To: Alain LaBont/e'/ <>
cc: URI mailing list <>
Subject: Re: "Difficult Characters" draft
In-Reply-To: <>
Message-ID: <Pine.SUN.3.96.970506210330.245U-100000@enoshima>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="ISO-8859-1"
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by id PAA18971
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by id PAA18976

On Tue, 22 Apr 1997, Alain LaBont/e'/ wrote:

> From a *real user*'s point of view what you say is disconcerting. In fact
> it does not correspond to a reality I exeperience every day. My insurance
> agent gave me his personal URL last week, for example, URL in which there
> were uppercase letters that were transformed into lower case when Netscape
> displayed the actual URL and in searching with both forms it is allright...

Well, I just tried the URL, and my Netscape didn't do any lowercasing.
But that's a detail.

> Hence in this actual concrete example,
> and
> are totally equivalent. Changing those habits would not be desirable.

These are indeed totally equivalent. But try to write

> or

and you will get a nasty error (all in English, with a pointer
to Some exceptions and surprises to the
contrary nonewithstanding, an uninformed user has to be tought to copy an
URL as is, including case. A more informed user may know about parts
of an URL that can be changed in capitalization. Actually, you can

> http://wWw.lAmUtUeLlE.CoM/agent/home.htm?aid=S200569

and it will still work. But please leave the part after the first
single slash alone.

> In French at least, case doesn't have in general the importance that has in
> German, for example. For accented and unaccented data, of course minimally
> a lower case accented letter should be equivalent to the upper case
> counterpart, but even in lower case, it is desirable that an unaccented
> letter be equivalent to its accented counterpart (an actual case is that it
> is processed like this since 1981 in DOS on a PC) for searching purposes.

If a lowercase accented letter appears in the later part of an URL,
it won't be equivalent to the corresponding uppercase letter because
there is also no equivalence for nonaccented letters.

In case there is indeed equivalence, as we currently have it in domain
names, it will be the task of domain name internationalization to
decide what to do about it, whether to make the usual domain names
case sensitive or whether to introduce case eqivalences for characters
outside ASCII or whatever. There is no problem with any kind of
URL scheme or mechanism to introduce additional eqivalences where
they see fit, but we can't introduce them for all URLs.

> What I suggest is that searching be done according to the same spirit as
> ISO/IEC CD 14651 which deals with such equivalences. At the limit (this
> does not have an influence on URLs but it should be considered) in
> searching URLs, expectations could be built on LOCALEs... that is what I
> suggest.

I full agree for searching. However, what is done usually with URLs
is not searching. It is binary matching. Only things that are absolutely
binary equivalent (after the last step in your sorting standard) match.
The normalization procedures in the draft only increase the level a tiny
bit, to avoid those cases where the binary representation is different,
but the user has absolutely no chance to make a difference.

> For example as was explained, o and ö are not equivalent in Swedish (while
> they are in German),

They are definitely not! Otherwise, we wouldn't need the ö :-).
It's only that we don't consider ö a letter of its own,
but that doesn't mean a German wouldn't be able to know where
to put an o and where to put an ö in an URL (with the exception
of those cases where both possibilities make sense and where it
is all the more important to make the difference :-).

> n and ñ are not equivalent in Spanish while they are
> in French and so on. That has no impact per se on the making of URLs, but
> it has one on their use, that was the only consideration I was trying to
> suggest.

I agree that it should have an inpact on the use in searching and such.
But that's not the main function of URLs.

Regards,	Martin.