Re: "Difficult Characters" draft

Larry Masinter <> Tue, 06 May 1997 16:22 UTC

Received: from cnri by id aa27686; 6 May 97 12:22 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa13325; 6 May 97 12:22 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id LAA15349 for uri-out; Tue, 6 May 1997 11:50:35 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id LAA15344 for <>; Tue, 6 May 1997 11:50:32 -0400 (EDT)
Received: from (alpha.Xerox.COM []) by (8.8.5/8.8.5) with SMTP id LAA26073 for <>; Tue, 6 May 1997 11:50:20 -0400 (EDT)
Received: from ([]) by with SMTP id <18014(14)>; Tue, 6 May 1997 08:49:36 PDT
Received: from ([]) by with SMTP id <71839>; Tue, 6 May 1997 08:49:27 PDT
Message-ID: <>
Date: Tue, 06 May 1997 08:49:22 -0700
From: Larry Masinter <>
Organization: Xerox PARC
X-Mailer: Mozilla 3.01Gold (Win95; I)
MIME-Version: 1.0
To: "Martin J. Duerst" <>
CC: Alain LaBont/e'/ <>, URI mailing list <>
Subject: Re: "Difficult Characters" draft
References: <Pine.SUN.3.96.970506111326.245L-100000@enoshima>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Precedence: bulk


Perhaps you could mention in your draft about the use of
identifiers with characters outside of ASCII that such
use is actually problematic, and that some applications
which use canonical identifiers and exact match as a way
of doing symbol lookup when restricted to ASCII-only symbols
might find that users of languages other than English
will be ill-served by such a design; in some applications
using a careful language-sensitive equivalence lookup
(instead of exact-match) would make the software actually
accomodate the needs and practices of such users.

The mail in the recent week has been full of good examples
of places where canonicalization is either ill-specified
or context-sensitive, and "equivalence matching"
would be far more practical.

Fortunately, it's possible that equivalence-based matching
could be deployed for URLs; other kinds of exact-match
names will require a separate analysis. Both DNS and HTTP-servers
(if not FTP servers) could be coaxed into doing equivalence-matching
instead of exact matching for reference lookup; if they also respond
with the server's view of the "canonical" name, then we
won't be asking clients to do what it seems like is nearly