Re: "Difficult Characters" draft

"Martin J. Duerst" <> Mon, 05 May 1997 19:34 UTC

Received: from cnri by id aa27671; 5 May 97 15:34 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa18119; 5 May 97 15:34 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id PAA11857 for uri-out; Mon, 5 May 1997 15:20:55 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id PAA11852 for <>; Mon, 5 May 1997 15:20:52 -0400 (EDT)
Received: from ( []) by (8.8.5/8.8.5) with SMTP id PAA18518 for <>; Mon, 5 May 1997 15:20:49 -0400 (EDT)
Received: from by with SMTP (PP) id <>; Mon, 5 May 1997 21:09:54 +0200
Date: Mon, 05 May 1997 21:09:50 +0200
From: "Martin J. Duerst" <>
To: Alain LaBont/e'/ <>
cc: Leslie Daigle <>, URI mailing list <>
Subject: Re: "Difficult Characters" draft
In-Reply-To: <>
Message-ID: <Pine.SUN.3.96.970505205750.245G-100000@enoshima>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="ISO-8859-1"
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by id PAA11853
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by id PAA11857

On Mon, 21 Apr 1997, Alain LaBont/e'/ wrote:

> A 12:40 97-05-05 -0400, Leslie Daigle a écrit :
> >
> >For example, "o" and "ö" are unrelated characters in Swedish, so it
> >would be erroneous to say that they are equivalent in an accent-insensitive
> >search.  Lexicographically, "ö" is the last character in the alphabet
> >in Swedish.
> >
> >So, "accent-insensitive" matching is pretty well language-dependent.
> [Alain] :
> Of course! Same for ñ which is simply an accented n in French cañon and a
> letter on its own in Spanish cañon... In other words, in Spanish, searching
> on "canon" shall never retrieve "cañon"; in French it could, for unprecise
> searches, as well as the word "canon"...

- What is retrieved and what not for unprecise searches may depend
	on many things. It is well possible that "canon" can retrieve
	"cañon" in a Spanish spelling checker, it is only a one-
	letter subsitituion.

- We are dealing with identifiers, and assuming precise matching up
	to the precision a human reader familiar with the script
	is able to handle. In this respect, discussions about
	unprecise searches are irrelevant.

Regards,	Martin.