Re: [hybi] US-ASCII vs. ASCII in Web Socket Protocol

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Tue, 02 February 2010 10:19 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8325E28C266 for <hybi@core3.amsl.com>; Tue, 2 Feb 2010 02:19:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.061
X-Spam-Level:
X-Spam-Status: No, score=-3.061 tagged_above=-999 required=5 tests=[AWL=-3.271, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MmRfmgNaqE0R for <hybi@core3.amsl.com>; Tue, 2 Feb 2010 02:19:14 -0800 (PST)
Received: from scmailgw01.scop.aoyama.ac.jp (scmailgw01.scop.aoyama.ac.jp [133.2.251.41]) by core3.amsl.com (Postfix) with ESMTP id 6F69228C25D for <hybi@ietf.org>; Tue, 2 Feb 2010 02:18:31 -0800 (PST)
Received: from scmse01.scbb.aoyama.ac.jp (scmse01.scbb.aoyama.ac.jp [133.2.253.158]) by scmailgw01.scop.aoyama.ac.jp (secret/secret) with SMTP id o12AIwm7004679 for <hybi@ietf.org>; Tue, 2 Feb 2010 19:18:58 +0900
Received: from (unknown [133.2.206.133]) by scmse01.scbb.aoyama.ac.jp with smtp id 25a7_6005b402_0fe4_11df_9090_001d096c566a; Tue, 02 Feb 2010 19:18:58 +0900
Received: from [IPv6:::1] ([133.2.210.1]:50442) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S12E6235> for <hybi@ietf.org> from <duerst@it.aoyama.ac.jp>; Tue, 2 Feb 2010 19:18:58 +0900
Message-ID: <4B67FBF7.3030000@it.aoyama.ac.jp>
Date: Tue, 02 Feb 2010 19:18:31 +0900
From: =?ISO-8859-1?Q?=22Martin_J=2E_D=FCrst=22?= <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.1) Gecko/20090902 Eudora/3.0b3
MIME-Version: 1.0
To: Ian Hickson <ian@hixie.ch>
References: <9124e09b0911052218y5106a2d4qcda01ff67577679b@mail.gmail.com> <Pine.LNX.4.62.0912032337580.15540@hixie.dreamhostps.com> <4B1905FC.1000205@verizon.net> <Pine.LNX.4.64.1001300901270.22027@ps20323.dreamhostps.com> <4B6466EB.2090909@gmx.de> <4B656465.1080005@airemix.jp> <Pine.LNX.4.64.1002020027000.3846@ps20323.dreamhostps.com>
In-Reply-To: <Pine.LNX.4.64.1002020027000.3846@ps20323.dreamhostps.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Cc: "hybi@ietf.org" <hybi@ietf.org>, WeBMartians <webmartians@verizon.net>
Subject: Re: [hybi] US-ASCII vs. ASCII in Web Socket Protocol
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2010 10:19:16 -0000

On 2010/02/02 9:30, Ian Hickson wrote:
>
> (-cc whatwg to reduce cross-posting)
>
> On Sat, 30 Jan 2010, Julian Reschke wrote:
>> Ian Hickson wrote:
>>> On Fri, 4 Dec 2009, WeBMartians wrote:
>>>> Hmmm... Maybe it would be better to say ISO-646US rather than ASCII. There
>>>> is a lot of impreciseness about the very low value characters (less than
>>>> 0x20 space) in the ASCII "specifications." The same can be said about the
>>>> higher end.

The higher end (0x80-0xFF) is undefined. Everything else isn't US-ASCII. 
The lower end is well defined for some cases (HT, CR, LF, FF,...) and 
otherwise unused (or only used based on a private agreement, see e.g. 
section 16.1 of Unicode 5.0) and therefore irrelevant. Replacing 
US-ASCII with something cryptic like "ISO-646US" or "ANSI_X3.4-1968" or 
whatever doesn't change that at all.

>>> Where the interpretation was normative, I've used the term "ANSI_X3.4-1968
>>> (US-ASCII)" and referenced RFC1345.
>>
>> I think you just lost both readability and precision.
>>
>> Please keep saying "ASCII" or "US-ASCII", and then have a reference to the
>> ANSI or ISO spec that actually defines ASCII, such as
>>
>>     [ANSI.X3-4.1986]  American National Standards Institute, "Coded
>>                       Character Set - 7-bit American Standard Code for
>>                       Information Interchange", ANSI X3.4, 1986.
>>
>> (taken from the relatively recent RFC 5322).
>>
>> RFC 1345 is a non-maintained, historic informational RFC that's nit
>> really a good definition for ASCII. If you disagree, please name a
>> single RFC that has been published in the last 20 years that uses RFC
>> 1345 to reference ASCII (I just searched, and couldn't find any).
>
> I used "ANSI_X3.4-1968" because that's the canonical name for US-ASCII,

If there's something that says "preferred MIME name" in the IANA 
registry, then that's often a sign that the value in the Name: field is 
almost completely irrelevant. In the specific case at hand, that's 
clearly the case. Actually, for that specific case, even the explanatory 
text at the top of the registry page clearly says
"The use of the name US-ASCII is also encouraged."

> and I used RFC1345 because that's the canonical reference. If you disagree
> with these choices, please update the IANA registry.

It's not the canonical reference, it's a historical reference. RFC 1345 
is the document that was used to register US-ASCII in the IANA registry, 
not the document that defines US-ASCII, even from an IETF or IANA 
viewpoint. If you have any doubts about that, please feel free to 
contact the relevant mailing list (see 
http://www.iana.org/assignments/charset-info), where I can tell you that 
with my charset reviewer hat on.

> On Sun, 31 Jan 2010, NARUSE, Yui wrote:

>> In draft-hixie-thewebsocketprotocol-54, allthe term "US-ASCII" are used
>> as "encoded as US-ASCII". This use is as encoding name. So the prefered
>> MIME name, "US-ASCII" is correct.
>
> The term "US-ASCII" caused confusion, unfortunately,

Can you explain? I have quite a bit of experience with character 
encoding, but I don't remember ever having heard the claim that 
"US-ASCII" causes confusion. In an IETF and Internet context, it is very 
well defined and uniformly used.

> which is why I
> changed to the less ambiguous ANSI_X3.4-1968.

Not less ambiguous, but totally obscure for everbody.

Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp