Re: "Difficult Characters" draft

Leslie Daigle <> Thu, 08 May 1997 16:04 UTC

Received: from cnri by id aa07605; 8 May 97 12:04 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa11532; 8 May 97 12:04 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id LAA10643 for uri-out; Thu, 8 May 1997 11:08:52 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id LAA10638 for <>; Thu, 8 May 1997 11:08:50 -0400 (EDT)
Received: from (beethoven.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id LAA12705; Thu, 8 May 1997 11:06:40 -0400 (EDT)
Received: from localhost (leslie@localhost) by (8.6.9/8.6.10) with SMTP id LAA05271; Thu, 8 May 1997 11:06:35 -0400
X-Authentication-Warning: leslie owned process doing -bs
Date: Thu, 08 May 1997 11:06:34 -0400
From: Leslie Daigle <>
To: Keld J|rn Simonsen <>
cc: Alain LaBont/e'/ <>, "Martin J. Duerst" <>, URI mailing list <>
Subject: Re: "Difficult Characters" draft
In-Reply-To: <>
Message-ID: <>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by id LAA10639
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by id LAA10643

On Wed, 7 May 1997, Keld J|rn Simonsen wrote:
> > A 11:23 97-05-07 +0200, Martin J. Duerst a =E9crit :
> > >	"Copy it exactly, with case and everything."
> > >is much more user friendly, because it is the only one that
> > >works consistently.
> > 
> > I can agree with that. I think everybody can agree with that.
> I also agree,  exact match should work in all cases.

Yes -- the point I was trying to make earlier was that exact match is
about the _only_ thing that can be _mandated_  -- because there are no
globally (across countries of one language, across languages) consistent
rules for (potential) equivalence of characters.  

Quite apart from the issue of how individual languages shape expectations
of equivalences between letters (you will find words starting with "W"
under the letter "V" in some Swedish dictionaries), there are matching
conventions that have grown up around specific letters to _accommodate_
the various realities that have been faced in transcribing words.  For
instance, failing to find something under "ström", a Swedish searcher
might expect to also search for "strom", or even "stroem" (not because
they are right -- because they are common transcriptions).

These things are in the realm of applications and services -- NOT equivalence
in URLs.  

A url:öm  

is not the same as  

If the URL spec says that these are not equivalent URLs, then it is perfectly
valid to have them refer to 3 different resources.  It might be "good practice"
to suggest people do otherwise, but there are so many such possibilities
that it is well out of the range of what should be considered equivalence
rules for URLs.



  "_Be_                                           Leslie Daigle
             where  you                           
                          _are_."                 Bunyip Information Systems
                                                  (514) 875-8611
                      -- ThinkingCat