Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]

Keith Moore <moore@cs.utk.edu> Mon, 20 August 2007 06:57 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IN1Cm-0006oZ-7y; Mon, 20 Aug 2007 02:57:40 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1IN1Cl-0006oU-5a for discuss-confirm+ok@megatron.ietf.org; Mon, 20 Aug 2007 02:57:39 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IN1Ch-0006oI-3F for discuss@apps.ietf.org; Mon, 20 Aug 2007 02:57:35 -0400
Received: from shu.cs.utk.edu ([160.36.56.39]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IN1Cg-00040o-O2 for discuss@apps.ietf.org; Mon, 20 Aug 2007 02:57:35 -0400
Received: from localhost (localhost [127.0.0.1]) by shu.cs.utk.edu (Postfix) with ESMTP id 124C71EE379; Mon, 20 Aug 2007 02:57:29 -0400 (EDT)
X-Virus-Scanned: by amavisd-new with ClamAV and SpamAssasin at cs.utk.edu
Received: from shu.cs.utk.edu ([127.0.0.1]) by localhost (bes.cs.utk.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9qxoJxE97zWg; Mon, 20 Aug 2007 02:57:26 -0400 (EDT)
Received: from lust.indecency.org (user-119b1dm.biz.mindspring.com [66.149.133.182]) by shu.cs.utk.edu (Postfix) with ESMTP id 2CDFF1EE376; Mon, 20 Aug 2007 02:57:03 -0400 (EDT)
Message-ID: <46C93B36.7070503@cs.utk.edu>
Date: Mon, 20 Aug 2007 02:56:54 -0400
From: Keith Moore <moore@cs.utk.edu>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: Mark Nottingham <mnot@mnot.net>
Subject: Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]
References: <BA772834-227A-4C1B-9534-070C50DF05B3@mnot.net> <392C98BA-E7B8-44ED-964B-82FC48162924@mnot.net> <p06240843c2833f4d7f2f@[10.20.30.108]> <465D9142.9050506@gmx.de> <6.0.0.20.2.20070610165356.0a69cec0@localhost> <088FB13E-F12F-4BE7-94FB-78B21C51512E@mnot.net>
In-Reply-To: <088FB13E-F12F-4BE7-94FB-78B21C51512E@mnot.net>
X-Enigmail-Version: 0.95.2
OpenPGP: id=E1473978
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 2409bba43e9c8d580670fda8b695204a
Cc: Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

Mark Nottingham wrote:
> On 10/06/2007, at 6:05 PM, Martin Duerst wrote:
>> - RFC 2616 prescribes that headers containing non-ASCII have to use
>>   either iso-8859-1 or RFC 2047. This is unnecessarily complex and
>>   not necessarily followed. At the least, new extensions should be
>>   allowed to specify that UTF-8 is used.
>
> My .02;
>
> I'm concerned about allowing UTF-8; it may break existing
> implementations.
concur.  though at least it is possible to distinguish utf-8 from 8859-1. 

also, I'll note that supporting utf-8 in a way that is backward
compatible with existing implementations is almost certainly more
complex (and thus more costly, error-prone, etc) than supporting rfc 2047.
>
> I'd like to see the text just require that the actual character set be
> 8859-1, but to allow individual extensions to nominate encodings
> *like* 2047,without being restricted to it. For example, the encoding
> specified in 3987 is appropriate for URIs. However, it *has* to be
> explicit; I've heard some people read this requirement and think that
> they need to check *every* header for 2047 encoding.
2047 was specifically not intended for use with protocol elements that
have meaning to protocol engines.  how many HTTP headers contain text
that is intended solely for human use?