Re: "Difficult Characters" draft

Alain LaBont/e'/ <alb@sct.gouv.qc.ca> Tue, 06 May 1997 20:24 UTC

Received: from cnri by ietf.org id aa03888; 6 May 97 16:24 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa18323; 6 May 97 16:24 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id PAA19941 for uri-out; Tue, 6 May 1997 15:59:46 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id PAA19936 for <uri@services.bunyip.com>; Tue, 6 May 1997 15:59:44 -0400 (EDT)
Received: from socrate.riq.qc.ca (socrate.riq.qc.ca [199.84.128.1]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id PAA28056 for <uri@bunyip.com>; Tue, 6 May 1997 15:59:39 -0400 (EDT)
Received: from 506.riq.qc.ca (riq-44-239.riq.qc.ca) by socrate.riq.qc.ca (5.x/SMI-SVR4) id AA09110; Tue, 6 May 1997 16:01:51 -0400
Message-Id: <3.0.1.16.19970421154814.093f3814@riq.qc.ca>
X-Sender: alb@riq.qc.ca
X-Mailer: Windows Eudora Pro Version 3.0.1 beta 14 (16) [F]
Date: Mon, 21 Apr 1997 15:48:14 -0000
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>, Larry Masinter <masinter@parc.xerox.com>
From: Alain LaBont/e'/ <alb@sct.gouv.qc.ca>
Subject: Re: "Difficult Characters" draft
Cc: URI mailing list <uri@bunyip.com>
In-Reply-To: <Pine.SUN.3.96.970506203138.245T-100000@enoshima>
References: <336F5302.64F7@parc.xerox.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Sender: owner-uri@bunyip.com
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by services.bunyip.com id PAA19941

A 20:50 97-05-06 +0200, Martin J. Duerst a écrit :
>Well, there have been some interesting examlpes. But
>frankly speaking, I don't think that the average
>French speaker should have more difficulties to
>transcribe the correct accents than the average
>US user should have difficulties to get case correct.
>Quite to the contrary, case in URLs is often rather
>random, whereas accents in a known word can easily
>be reconstructed. 

Don't forget that French French don't have all uppercase letters on their
PC keyboards... even if there are Canadian standards (CAN/CSA Z243.200) and
ISO (ISO/IEC 9995-3) standards for doing so. So capitalization remains a
problem in practice for the French people on upper case letters. Some
French keyboards have this, not all.

>Equivalence matching could save a lot of US typos also.
>But nobody ever cared to do equivalence matching for them.
>It's assumed that the user type things in correctly.

Not so, I demonstrated this in my earlier note about my insurance agent web
page. People care (of course), servers care, or browsers care and whoever
or whichwever does the correction, the net result is that equivalences are
done today and end-users got used to this... at least some... and likely a
big lot.

>> Fortunately, it's possible that equivalence-based matching
>> could be deployed for URLs;
>
>That's interesting. But it would be a lot more work than the
>conversions from and to UTF-8 that I have suggested for backwards
>compatibility and that have raised great concerns from Roy.

There exists methods for this in actual practice and it is about to be
standardized in ISO/IEC 14651 which defines an API for charactre string
comparisons at different level of precision.

>We don't want to ask the French user more than the US user,
>when compared to his/her language abilities. And up to now,
>we don't.

You do. If equiavlences are not processed adequately, given that
equivalence processing exists today. You ask either exact match or match
independent of case but dependent on accents... that's not good enough...
See ISO/IEC CD 14651 or CAN/CSA Z243.4.1 (published in 1992, revised this
year -- characters have been added but the logic is the same) and CAN/CSA
Z243.230 (this one to be published this year)...

Alain LaBonté
Québec