Re: 8 bit characters in DNS names (and URNs?)

Peter Paul Sint <sint@oeaw.ac.at> Sat, 09 March 1996 02:04 UTC

Received: from ietf.cnri.reston.va.us by IETF.CNRI.Reston.VA.US id aa23089; 8 Mar 96 21:04 EST
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa23085; 8 Mar 96 21:04 EST
Received: from services.Bunyip.COM by CNRI.Reston.VA.US id aa16829; 8 Mar 96 21:04 EST
Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id UAA19077 for uri-out; Fri, 8 Mar 1996 20:34:34 -0500
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id UAA19072 for <uri@services.bunyip.com>; Fri, 8 Mar 1996 20:34:31 -0500
Received: from rani.arz.oeaw.ac.at by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA09627 (mail destined for uri@services.bunyip.com); Fri, 8 Mar 96 20:34:28 -0500
Received: from lezvax.arz.oeaw.ac.at by oeaw.ac.at with SMTP id AA26747 (5.65c8/IDA-1.4.4 for <uri@bunyip.com>); Sat, 9 Mar 1996 02:34:14 +0100
Received: by lezvax.arz.oeaw.ac.at (5.57/Ultrix2.4-C) id AA03236; Sat, 9 Mar 96 02:33:36 +0100
X-Sender: sint@lezvax.arz.oeaw.ac.at
Message-Id: <v02130505ad668ac95a8d@[193.170.88.66]>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Sat, 9 Mar 1996 02:34:10 +0100
To: Larry Masinter <masinter@parc.xerox.com>
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Peter Paul Sint <sint@oeaw.ac.at>
Subject: Re: 8 bit characters in DNS names (and URNs?)
Cc: keld@dkuug.dk, martin@terena.nl, wg-i18n@terena.nl, uri@bunyip.com
X-Charset: ASCII
X-Char-Esc: 29
X-Orig-Sender: owner-uri@bunyip.com
Precedence: bulk

At 9:44 08.03.1996, Masataka Ohta wrote:
>> JIS might
>> have separate codes for single and double-wide codes yet want to treat
>> them equivalent for matching.
>JIS does not.
>> While uppercase mapping is culturally sensitive, can we not make a
>> culturally independent 'character matching' algorithm that is good
>> enough for directory services.
>
>Theoretically, it is a union of all the matching rules of all
>the culture. But, in practice, it is hard especially because
>the expected degree of matching differs service by service.
>                                               Masataka Ohta

German has a lower case letter
(looks like a beta -  /tell your software to read next line latin-1 quoted
printable/
ß
Swiss German doesn't use it).
Equivalent to ss, capital SS (*two* letters).
Also, the canonical conversion of the
umlauts (vowel + two dots above)
ä   is ae
ö   is oe
ü   is ue
capitalised AE OE UE
(historically the two dots were originally an e written above).

You would never write umlaut A as an A. (only aliens do so - and software).

The back transformation is not unique!

German matching software handles this (as far as possible).





Peter Paul Sint    (sint@oeaw.ac.at, http://www.soe.oeaw.ac.at/~sint/)
Research Unit for Socio-Economics, Austrian Academy of Sciences
Kegelgasse 27, A-1030 Wien (=Vienna), Austria.
Phone:(+431) 712 21 40 - 36   Fax: (+431) 712 21 40 - 34