Re: "Difficult Characters" draft

Alain LaBont/e'/ <> Mon, 05 May 1997 18:51 UTC

Received: from cnri by id aa26130; 5 May 97 14:51 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa17141; 5 May 97 14:51 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id OAA08087 for uri-out; Mon, 5 May 1997 14:32:08 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id OAA08080 for <>; Mon, 5 May 1997 14:32:05 -0400 (EDT)
Received: from ( []) by (8.8.5/8.8.5) with SMTP id OAA17902; Mon, 5 May 1997 14:31:58 -0400 (EDT)
Received: from ( by (5.x/SMI-SVR4) id AA22679; Mon, 5 May 1997 14:34:55 -0400
Message-Id: <>
X-Mailer: Windows Eudora Pro Version 3.0.1 beta 14 (16) [F]
Date: Mon, 21 Apr 1997 13:48:57 -0000
To: Leslie Daigle <>
From: Alain LaBont/e'/ <>
Subject: Re: "Difficult Characters" draft
Cc: "Martin J. Duerst" <>, Larry Masinter <>, URI mailing list <>
In-Reply-To: <Pine.SUN.3.95.970505123131.3239E-100000@beethoven.bunyip.c om>
References: <>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by id OAA08087

A 12:40 97-05-05 -0400, Leslie Daigle a écrit :
>On Mon, 21 Apr 1997, Alain LaBont/e'/ wrote:
>> A 17:58 97-05-02 +0200, Martin J. Duerst a écrit :
>> [Larry] :
>> >> Using UCS in identifiers that are normally "case insensitive"
>> >> in ASCII, and the issues, e.g., similar upper-case forms,
>> >> the role of accents and equivalence.

[Alain] :
>> However accents normally don't count much for alphabetic order, they are
>> considerwed only in case of quasi-homography (cote, côte, coté, côté,
>> pèche, pêche, péché).
[Leslie] :
>My apologies if this has already been addressed earlier in the thread, but
>this jumped out at me as being a potential point of confusion.
>Namely, while accents don't count for alphabetic order in French, there
>are other languages with characters which can wrongly be perceived as
>characters" to people familiar with only a-z.
>For example, "o" and "ö" are unrelated characters in Swedish, so it
>would be erroneous to say that they are equivalent in an accent-insensitive
>search.  Lexicographically, "ö" is the last character in the alphabet
>in Swedish.
>So, "accent-insensitive" matching is pretty well language-dependent.

[Alain] :
Of course! Same for ñ which is simply an accented n in French cañon and a
letter on its own in Spanish cañon... In other words, in Spanish, searching
on "canon" shall never retrieve "cañon"; in French it could, for unprecise
searches, as well as the word "canon"...

Tack so myket!

Alain LaBonté