Re: Character encodings in headers [i74][was: Straw-man charter forhttp-bis]

Martin Duerst <duerst@it.aoyama.ac.jp> Mon, 20 August 2007 08:21 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IN2Vf-0007oi-4N; Mon, 20 Aug 2007 04:21:15 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1IN2Ve-0007od-Hu for discuss-confirm+ok@megatron.ietf.org; Mon, 20 Aug 2007 04:21:14 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IN2Ve-0007oV-8R for discuss@apps.ietf.org; Mon, 20 Aug 2007 04:21:14 -0400
Received: from scmailgw2.scop.aoyama.ac.jp ([133.2.251.195]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IN2Vc-0005vi-C7 for discuss@apps.ietf.org; Mon, 20 Aug 2007 04:21:14 -0400
Received: from scmse1.scbb.aoyama.ac.jp (scmse1 [133.2.253.16]) by scmailgw2.scop.aoyama.ac.jp (secret/secret) with SMTP id l7K8L83d020446 for <discuss@apps.ietf.org>; Mon, 20 Aug 2007 17:21:08 +0900 (JST)
Received: from (133.2.206.133) by scmse1.scbb.aoyama.ac.jp via smtp id 3fe8_4dd8e21a_4ef6_11dc_9e70_0014221fa3c9; Mon, 20 Aug 2007 17:21:08 +0900
X-AuthUser: duerst@it.aoyama.ac.jp
Received: from Tanzawa.it.aoyama.ac.jp ([133.2.210.1]:37637) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S116DF2> for <discuss@apps.ietf.org> from <duerst@it.aoyama.ac.jp>; Mon, 20 Aug 2007 17:18:22 +0900
Message-Id: <6.0.0.20.2.20070820170314.07449b20@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Version 6J
Date: Mon, 20 Aug 2007 17:10:27 +0900
To: Keith Moore <moore@cs.utk.edu>, Mark Nottingham <mnot@mnot.net>
From: Martin Duerst <duerst@it.aoyama.ac.jp>
Subject: Re: Character encodings in headers [i74][was: Straw-man charter forhttp-bis]
In-Reply-To: <46C93B36.7070503@cs.utk.edu>
References: <BA772834-227A-4C1B-9534-070C50DF05B3@mnot.net> <392C98BA-E7B8-44ED-964B-82FC48162924@mnot.net> <p06240843c2833f4d7f2f@[10.20.30.108]> <465D9142.9050506@gmx.de> <6.0.0.20.2.20070610165356.0a69cec0@localhost> <088FB13E-F12F-4BE7-94FB-78B21C51512E@mnot.net> <46C93B36.7070503@cs.utk.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: b19722fc8d3865b147c75ae2495625f2
Cc: Paul Hoffman <phoffman@imc.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Richard Ishida <ishida@w3.org>
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

At 15:56 07/08/20, Keith Moore wrote:
>Mark Nottingham wrote:
>> On 10/06/2007, at 6:05 PM, Martin Duerst wrote:
>>> - RFC 2616 prescribes that headers containing non-ASCII have to use
>>>   either iso-8859-1 or RFC 2047. This is unnecessarily complex and
>>>   not necessarily followed. At the least, new extensions should be
>>>   allowed to specify that UTF-8 is used.
>>
>> My .02;
>>
>> I'm concerned about allowing UTF-8; it may break existing
>> implementations.
>concur.  though at least it is possible to distinguish utf-8 from 8859-1. 

In practice indeed this can be done with high reliability; please
see http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf
for details. For iso-8859-1, see in particular p. 21.

>also, I'll note that supporting utf-8 in a way that is backward
>compatible with existing implementations is almost certainly more
>complex (and thus more costly, error-prone, etc) than supporting rfc 2047.

Well, if "backwards compatible" means also supporting RFC 2047,
then that's a tautology. If the choice is between UTF-8 and RFC 2047,
however, then I'd take UTF-8 any time, because RFC 2047 includes
UTF-8 as well as many other encodings.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp