Re: "Difficult Characters" draft

Alain LaBont/e'/ <> Tue, 06 May 1997 20:34 UTC

Received: from cnri by id aa04131; 6 May 97 16:34 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa18516; 6 May 97 16:34 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id QAA20367 for uri-out; Tue, 6 May 1997 16:18:30 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id QAA20362 for <>; Tue, 6 May 1997 16:18:28 -0400 (EDT)
Received: from ( []) by (8.8.5/8.8.5) with SMTP id QAA28266 for <>; Tue, 6 May 1997 16:18:23 -0400 (EDT)
Received: from by (5.x/SMI-SVR4) id AB11946; Tue, 6 May 1997 16:23:55 -0400
Message-Id: <>
X-Mailer: Windows Eudora Pro Version 3.0.1 beta 14 (16) [F]
Date: Mon, 21 Apr 1997 16:10:22 -0000
To: "Martin J. Duerst" <>
From: Alain LaBont/e'/ <>
Subject: Re: "Difficult Characters" draft
Cc: URI mailing list <>
In-Reply-To: <Pine.SUN.3.96.970506210330.245U-100000@enoshima>
References: <>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by id QAA20367

A 21:20 97-05-06 +0200, Martin J. Duerst a écrit :
>On Tue, 22 Apr 1997, Alain LaBont/e'/ wrote:
>> From a *real user*'s point of view what you say is disconcerting. In fact
>> it does not correspond to a reality I exeperience every day. My insurance
>> agent gave me his personal URL last week, for example, URL in which there
>> were uppercase letters that were transformed into lower case when Netscape
>> displayed the actual URL and in searching with both forms it is allright...
>Well, I just tried the URL, and my Netscape didn't do any lowercasing.

Someone does in my environment (I'm not sure it is Netscape)... but I use
French versions... of Netscape 2 under Win 3.1 and Netscape 3 Gold under

>But that's a detail.
>> Hence in this actual concrete example,
>> and
>> are totally equivalent. Changing those habits would not be desirable.
>These are indeed totally equivalent. But try to write
>> or
>and you will get a nasty error (all in English, with a pointer
>to Some exceptions and surprises to the
>contrary nonewithstanding, an uninformed user has to be tought to copy an
>URL as is, including case. A more informed user may know about parts
>of an URL that can be changed in capitalization. Actually, you can
>> http://wWw.lAmUtUeLlE.CoM/agent/home.htm?aid=S200569
>and it will still work. But please leave the part after the first
>single slash alone.

All right... That is not very user friendly. Totally inconsistent... from a
user perspective, undesirable...

>> In French at least, case doesn't have in general the importance that has in
>> German, for example. For accented and unaccented data, of course minimally
>> a lower case accented letter should be equivalent to the upper case
>> counterpart, but even in lower case, it is desirable that an unaccented
>> letter be equivalent to its accented counterpart (an actual case is that it
>> is processed like this since 1981 in DOS on a PC) for searching purposes.
>If a lowercase accented letter appears in the later part of an URL,
>it won't be equivalent to the corresponding uppercase letter because
>there is also no equivalence for nonaccented letters.

If I understood well, no equivalences at all even for case. But what about
the first part? What about user expectations in inconsistent behaviours?

>In case there is indeed equivalence, as we currently have it in domain
>names, it will be the task of domain name internationalization to
>decide what to do about it, whether to make the usual domain names
>case sensitive or whether to introduce case eqivalences for characters
>outside ASCII or whatever. There is no problem with any kind of
>URL scheme or mechanism to introduce additional eqivalences where
>they see fit, but we can't introduce them for all URLs.

I'm puzzled that the notion of consistency is neglected... I learned

>> What I suggest is that searching be done according to the same spirit as
>> ISO/IEC CD 14651 which deals with such equivalences. At the limit (this
>> does not have an influence on URLs but it should be considered) in
>> searching URLs, expectations could be built on LOCALEs... that is what I
>> suggest.
>I full agree for searching. However, what is done usually with URLs
>is not searching. It is binary matching. Only things that are absolutely
>binary equivalent (after the last step in your sorting standard) match.
>The normalization procedures in the draft only increase the level a tiny
>bit, to avoid those cases where the binary representation is different,
>but the user has absolutely no chance to make a difference.
>> For example as was explained, o and ö are not equivalent in Swedish (while
>> they are in German),
>They are definitely not! Otherwise, we wouldn't need the ö :-).
>It's only that we don't consider ö a letter of its own,
>but that doesn't mean a German wouldn't be able to know where
>to put an o and where to put an ö in an URL (with the exception
>of those cases where both possibilities make sense and where it
>is all the more important to make the difference :-).
>> n and ñ are not equivalent in Spanish while they are
>> in French and so on. That has no impact per se on the making of URLs, but
>> it has one on their use, that was the only consideration I was trying to
>> suggest.
>I agree that it should have an inpact on the use in searching and such.
>But that's not the main function of URLs.

Not the main, but if it is a function it becomes problematic. I do not want
to be a trouble maker, but just signal problems from a user point of view.

Alain LaBonté