Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]

Mark Nottingham <mnot@mnot.net> Mon, 20 August 2007 08:55 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IN32v-0006uC-AW; Mon, 20 Aug 2007 04:55:37 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1IN32t-0006u7-OX for discuss-confirm+ok@megatron.ietf.org; Mon, 20 Aug 2007 04:55:35 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IN32t-0006tz-Er for discuss@apps.ietf.org; Mon, 20 Aug 2007 04:55:35 -0400
Received: from mxout-03.mxes.net ([216.86.168.178]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IN32s-0006au-73 for discuss@apps.ietf.org; Mon, 20 Aug 2007 04:55:35 -0400
Received: from [127.0.0.1] (unknown [216.145.54.158]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTP id 5C3065197C; Mon, 20 Aug 2007 04:55:27 -0400 (EDT)
In-Reply-To: <157F4F253535B9C73F8EDC75@p3.JCK.COM>
References: <BA772834-227A-4C1B-9534-070C50DF05B3@mnot.net> <392C98BA-E7B8-44ED-964B-82FC48162924@mnot.net> <p06240843c2833f4d7f2f@[10.20.30.108]> <465D9142.9050506@gmx.de> <6.0.0.20.2.20070610165356.0a69cec0@localhost> <088FB13E-F12F-4BE7-94FB-78B21C51512E@mnot.net> <157F4F253535B9C73F8EDC75@p3.JCK.COM>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <6B8E3D7A-71B8-4B8D-9625-2AB3C74A9072@mnot.net>
Content-Transfer-Encoding: 7bit
From: Mark Nottingham <mnot@mnot.net>
Subject: Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]
Date: Mon, 20 Aug 2007 18:55:10 +1000
To: John C Klensin <john-ietf@jck.com>
X-Mailer: Apple Mail (2.752.3)
X-Spam-Score: -1.0 (-)
X-Scan-Signature: 39bd8f8cbb76cae18b7e23f7cf6b2b9f
Cc: Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

The (potential) problem is that an intermediary (for example) needs  
to be able to handle headers that it doesn't understand. If it's been  
built to store headers as iso-8859-1 strings as they pass through (a  
reasonable assumption, considering 2616), an unknown header with  
another encoding -- no matter how specified or flagged -- may break it.

So, going forward, I completely agree with you, but in the case of  
HTTP, I think the horse has already bolted; it is effectively fixed  
to 8859-1, and we can't fix this in the right way without versioning  
the protocol.

Or am I missing something?

On 20/08/2007, at 5:22 PM, John C Klensin wrote:

> Sigh.  My own sense is that, going forward, we need to lose
> 8859-N, not make it the default (or only) character set for more
> protocols.  It is, to put it mildly, a little Euro-centric (and
> not even completely suitable for Europe).  Much of the advantage
> of Unicode is that one does not need to designate/ nominate a
> particular CCS or encoding and then maintain state for it... and
> that is a fairly large advantage.  See also
> draft-klensin-unicode-escapes-03.txt(probably expired, but you
> should be able to find a copy somewhere -- I'll get back to it
> sometime soon) for a discussion of issues in ASCII encoding of
> multioctet character sets.   The IRI spec may constrain things
> to encoding of octets, but that doesn't make it a good idea.
>
> If we are going to consider changes in this area, let's make
> them improvements.  Locking in 8859-1 is not an improvement: it
> would, IMO, be better to deprecate its use and require explicit
> charset designation always if that is the only choice.


--
Mark Nottingham     http://www.mnot.net/