Re: http charset labelling

Keld J|rn Simonsen <keld@dkuug.dk> Tue, 13 February 1996 00:13 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa28549; 12 Feb 96 19:13 EST
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa28545; 12 Feb 96 19:13 EST
Received: from services.Bunyip.COM by CNRI.Reston.VA.US id aa18378; 12 Feb 96 19:13 EST
Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id SAA02888 for uri-out; Mon, 12 Feb 1996 18:36:17 -0500
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id SAA02883 for <uri@services.bunyip.com>; Mon, 12 Feb 1996 18:36:10 -0500
Received: from dkuug.dk by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA13503 (mail destined for uri@services.bunyip.com); Mon, 12 Feb 96 18:36:02 -0500
Received: (from keld@localhost) by dkuug.dk (8.6.12/8.6.12) id AAA22028; Tue, 13 Feb 1996 00:35:22 +0100
Message-Id: <199602122335.AAA22028@dkuug.dk>
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Keld J|rn Simonsen <keld@dkuug.dk>
Date: Tue, 13 Feb 1996 00:35:21 +0100
In-Reply-To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp> "Re: http charset labelling" (Feb 7, 5:16)
X-Charset: ISO-8859-1
X-Char-Esc: 29
Mime-Version: 1.0
Content-Type: Text/Plain; Charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Mnemonic-Intro: 29
X-Mailer: Mail User's Shell (7.2.2 4/12/91)
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, Gavin Nicol <gtn@ebt.com>
Subject: Re: http charset labelling
Cc: masinter@parc.xerox.com, uri@bunyip.com
X-Orig-Sender: owner-uri@bunyip.com
Precedence: bulk

Masataka Ohta writes:

> > The results might
> > vary widely depending on whether the data was transmitted as SJIS,
> > EUC or UTF-8, if there is no encoding information.
> 
> Because of duplicated shape of 'A' for Latin and Greek capital
> letter 'A' and alpha, and because of duplicated encoding of Big5,
> encoding information, in general, is no fix for unique conversion
> from shape on a paper to internal code.
> 
> Don't try to do something proven to be impossible.

Well, Otha, there are a number of ways to do it, for example 
considering all of greek capital letter alfa, latin capital letter
A and the cyrillic letter A as equivalent for matching, and similar
equivalence specs may be available for other characters.
Also narrow and full width letters may be equivalenced.

Anyway it should be clear from the context which version
the "A" is - if it is together with greek characters it is
most likely an Alfa, if with latin characters it is most likely
a latin letter etc. It is up to the maker of the URL to ensure that
the intended audience will get the message, and some careful choice
may be done there.

keld