Re: http charset labelling

Masataka Ohta <mohta@necom830.cc.titech.ac.jp> Mon, 19 February 1996 03:02 UTC

Received: from ietf.cnri.reston.va.us by IETF.CNRI.Reston.VA.US id aa17911; 18 Feb 96 22:02 EST
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa17855; 18 Feb 96 22:02 EST
Received: from services.Bunyip.COM by CNRI.Reston.VA.US id aa06826; 18 Feb 96 22:02 EST
Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id VAA24725 for uri-out; Sun, 18 Feb 1996 21:37:33 -0500
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id VAA24720 for <uri@services.bunyip.com>; Sun, 18 Feb 1996 21:37:30 -0500
Received: from necom830.cc.titech.ac.jp by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA06202 (mail destined for uri@services.bunyip.com); Sun, 18 Feb 96 21:37:25 -0500
Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Mon, 19 Feb 1996 11:28:06 +0900
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <199602190228.LAA05387@necom830.cc.titech.ac.jp>
Subject: Re: http charset labelling
To: Gavin Nicol <gtn@ebt.com>
Date: Mon, 19 Feb 96 11:28:05 JST
Cc: keld@dkuug.dk, dupuy@cs.columbia.edu, uri@bunyip.com
In-Reply-To: <199602190101.UAA09669@ebt-inc.ebt.com>; from "Gavin Nicol" at Feb 18, 96 8:01 pm
X-Mailer: ELM [version 2.3 PL11]
X-Orig-Sender: owner-uri@bunyip.com
Precedence: bulk

Gavin;

> The actual indication of the encoding should be hidden from the user,
> but it is still important for it to be there,

Could you please remember that, because of duplicated encoding,
character code itself is necessary?

I already stated so more than 3 times.

> because even for ASCII
> names, maybe the user entered it in zenkaku.

Another good example of duplicated encoding. But you misunderstand how
Japanese encoding is.

With RFC 1468 ISO-2022-JP encoding, character 'A' may be represented
with ASCII, JIS X 0201 or JIS X 0208.

But, there is nothing like "zenkaku". It's a display property and
have nothing to do with encoding (though brain-deadly broken Unicode
OPTIONALLY allow to encode some display property, which should properly
belongs to the HTML).

The character 'A', latin capital letter 'A', in both ASCII, JIS X 0201
and JIS X 0208, without any ambiguity, have the same name of "LATIN
CAPITAL LETTER A".

That is, as URL is ASCII only, even if JIS X 0201 or JIS X 0208 'A'
is entered through ISO-2022-JP encoding, an ASCII code of "LATIN
CAPITAL LETTER A" must be sent.

Finally, with ISO-2022-JP encoding, you can put a lot of necessary and
redundant escape sequences. The sequences are totally invisible.
So, how can you figure out the corrent number of escape sequences
are?

						Masataka Ohta