Re: 8 bit characters in DNS names (and URNs?)

Larry Masinter <masinter@parc.xerox.com> Thu, 07 March 1996 05:40 UTC

Received: from ietf.cnri.reston.va.us by IETF.CNRI.Reston.VA.US id aa06437; 7 Mar 96 0:40 EST
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa06433; 7 Mar 96 0:40 EST
Received: from services.Bunyip.COM by CNRI.Reston.VA.US id aa01292; 7 Mar 96 0:40 EST
Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id AAA10263 for uri-out; Thu, 7 Mar 1996 00:14:01 -0500
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id AAA10252 for <uri@services.bunyip.com>; Thu, 7 Mar 1996 00:13:27 -0500
Received: from alpha.Xerox.COM by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA22660 (mail destined for uri@services.bunyip.com); Thu, 7 Mar 96 00:13:20 -0500
Received: from nebula.parc.xerox.com ([13.1.100.115]) by alpha.xerox.com with SMTP id <15528(7)>; Wed, 6 Mar 1996 21:13:01 PST
Received: by nebula.parc.xerox.com id <168963>; Wed, 6 Mar 1996 21:12:58 -0800
To: keld@dkuug.dk
Cc: martin@terena.nl, wg-i18n@terena.nl, uri@bunyip.com
In-Reply-To: Keld J|rn Simonsen's message of Tue, 5 Mar 1996 08:32:40 -0800 <199603051632.RAA27148@dkuug.dk>
Subject: Re: 8 bit characters in DNS names (and URNs?)
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Larry Masinter <masinter@parc.xerox.com>
Fake-Sender: masinter@parc.xerox.com
Message-Id: <96Mar6.211258pst.168963@nebula.parc.xerox.com>
Date: Wed, 6 Mar 1996 21:12:56 PST
X-Orig-Sender: owner-uri@bunyip.com
Precedence: bulk

While in ASCII you can define 'case independent match' by
performing 'translate to upper case and then use string equality',
this does not work for other character repertoires, e.g., JIS might
have separate codes for single and double-wide codes yet want to treat
them equivalent for matching.

While uppercase mapping is culturally sensitive, can we not make a
culturally independent 'character matching' algorithm that is good
enough for directory services. Perhaps it means treating accented and
unaccented versions of French initial capitals equivalent, even though
this equivalence is not determined by 'canonicalization'?