Re: "Difficult Characters" draft

"Martin J. Duerst" <> Tue, 06 May 1997 10:23 UTC

Received: from cnri by id aa20083; 6 May 97 6:23 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa06779; 6 May 97 6:23 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id FAA06538 for uri-out; Tue, 6 May 1997 05:57:50 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id FAA06533 for <>; Tue, 6 May 1997 05:57:48 -0400 (EDT)
Received: from ( []) by (8.8.5/8.8.5) with SMTP id FAA24010 for <>; Tue, 6 May 1997 05:57:21 -0400 (EDT)
Received: from by with SMTP (PP) id <>; Tue, 6 May 1997 11:56:51 +0200
Date: Tue, 06 May 1997 11:56:45 +0200
From: "Martin J. Duerst" <>
To: Alain LaBont/e'/ <>
cc: URI mailing list <>
Subject: Re: "Difficult Characters" draft
In-Reply-To: <>
Message-ID: <Pine.SUN.3.96.970506111326.245L-100000@enoshima>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Precedence: bulk

On Mon, 5 May 1997, Alain LaBont/e'/ wrote:

> [Martin] :
> >- We are dealing with identifiers, and assuming precise matching up
> >	to the precision a human reader familiar with the script
> >	is able to handle. In this respect, discussions about
> >	unprecise searches are irrelevant.
> [Alain] :
> Really? Due to historical reasons (fortunately or not, some systems
> transform accented letters into their non-accented forms and this is also a
> requirement for searches in French, maybe in German too btw), that might be
> quite relevant.

We are dealing with identifiers and matching. Maybe I have to make this
clearer in my draft. As an example, significant parts of an URL now
distinguish upper case and lower case. Either you get it right, or the
URL is not found. That's what an identifier is for. Searching via
web search services, directories, and so on is not our concern.

> I may be wrong, but it might be also that bad habits formed
> expectations about unprecise searches. Do you mean that here we mean really
> precice seraches in which even case shall be used as is?

Definitely. That's what happens today with URLs. The intent of
the document is not to define equivalences for search, but to
define normalization at the source so that we can use the binary
comparison of existing software.

> That would really
> be  misleading for French-speaking users (I talk by experience, having done
> such tests by accident in an international audience).

ASCII web users have learned that they have to take care about case
in URLs. ASCII URL creators have learned that they, too, have to take
care about case in URLs, in order to make it easy for the users.
Beyond-ASCII users and URL creators will have to learn similar things
with respect to case and with respect to other stuff, such as accents.

French URL users may have to learn that on uppercase URLs, they should
not drop accents that they see. French URL creators may have to learn
that they better not create uppercase accented characters in their
URLs in order to not disturb their users. One of these things, or
both, may end up in the current draft. What would you suggest?

Regards,	Martin.