Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]

Keith Moore <moore@cs.utk.edu> Tue, 21 August 2007 16:51 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1INWx1-0005aq-Et; Tue, 21 Aug 2007 12:51:31 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1INWx0-0005al-ID for discuss-confirm+ok@megatron.ietf.org; Tue, 21 Aug 2007 12:51:30 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1INWx0-0005ad-8a for discuss@apps.ietf.org; Tue, 21 Aug 2007 12:51:30 -0400
Received: from shu.cs.utk.edu ([160.36.56.39]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1INWwz-0000Vs-T3 for discuss@apps.ietf.org; Tue, 21 Aug 2007 12:51:30 -0400
Received: from localhost (localhost [127.0.0.1]) by shu.cs.utk.edu (Postfix) with ESMTP id 5D8EB1EE375; Tue, 21 Aug 2007 12:51:28 -0400 (EDT)
X-Virus-Scanned: by amavisd-new with ClamAV and SpamAssasin at cs.utk.edu
Received: from shu.cs.utk.edu ([127.0.0.1]) by localhost (bes.cs.utk.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TGKM-wqhBunK; Tue, 21 Aug 2007 12:51:28 -0400 (EDT)
Received: from lust.indecency.org (user-119b1dm.biz.mindspring.com [66.149.133.182]) by shu.cs.utk.edu (Postfix) with ESMTP id 0814A1EE2C0; Tue, 21 Aug 2007 12:51:20 -0400 (EDT)
Message-ID: <46CB17F5.8070702@cs.utk.edu>
Date: Tue, 21 Aug 2007 12:51:01 -0400
From: Keith Moore <moore@cs.utk.edu>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: Stefanos Harhalakis <v13@priest.com>
Subject: Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]
References: <BA772834-227A-4C1B-9534-070C50DF05B3@mnot.net> <6B8E3D7A-71B8-4B8D-9625-2AB3C74A9072@mnot.net> <6.0.0.20.2.20070820181338.07260770@localhost> <200708201652.26863.v13@priest.com>
In-Reply-To: <200708201652.26863.v13@priest.com>
X-Enigmail-Version: 0.95.2
OpenPGP: id=E1473978
Content-Type: text/plain; charset=ISO-8859-7
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 7655788c23eb79e336f5f8ba8bce7906
Cc: Felix Sasaki <fsasaki@w3.org>, Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

>
> My 2c:
>
>   UTF-8 introduces a requirement that ISO8859-X encodings don't have. UTF-8 
> strings may be invalid, in which case a proper action may be needed (drop ?). 
> Thus, all UTF-8 strings need to be validated.
>   
no.  the last thing we need in HTTP (or any protocol IMHO) is for
intermediaries to try to be smarter than their endpoints.
>   Apart from that, implementations may do various tricks like logging etc, 
> where:
> a) strlen() is used - not unicode aware
>   
strlen works the same for utf-8 as for ascii, as long as what you care
about is number of bytes in the string rather than, say, the amount of
space it will take up when displayed.
> b) iconv() is used to convert ISO8859-1 to UTF-8 either for presentation or 
> for internal storage (python or java perhaps?)
valid point.

Keith