Re: Delta Compression and UTF-8 Header Values

"Martin Nilsson" <nilsson@opera.com> Sat, 09 February 2013 14:13 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CC8AE21F8A49 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 9 Feb 2013 06:13:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.485
X-Spam-Level:
X-Spam-Status: No, score=-10.485 tagged_above=-999 required=5 tests=[AWL=-0.038, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6prq0JMZetif for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 9 Feb 2013 06:13:46 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 4CB6621F8A4B for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sat, 9 Feb 2013 06:13:46 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U4BAr-0007Qk-0T for ietf-http-wg-dist@listhub.w3.org; Sat, 09 Feb 2013 14:13:01 +0000
Resent-Date: Sat, 09 Feb 2013 14:13:01 +0000
Resent-Message-Id: <E1U4BAr-0007Qk-0T@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <nilsson@opera.com>) id 1U4BAk-0007Nj-Ej for ietf-http-wg@listhub.w3.org; Sat, 09 Feb 2013 14:12:54 +0000
Received: from smtp.opera.com ([213.236.208.81]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <nilsson@opera.com>) id 1U4BAj-0005vo-Hf for ietf-http-wg@w3.org; Sat, 09 Feb 2013 14:12:54 +0000
Received: from uranium.westinmy-starwoodgp.com ([211.24.237.2]) (authenticated bits=0) by smtp.opera.com (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id r19ECLf5023162 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <ietf-http-wg@w3.org>; Sat, 9 Feb 2013 14:12:25 GMT
Content-Type: text/plain; charset="iso-8859-15"; format="flowed"; delsp="yes"
To: ietf-http-wg@w3.org
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net>
Date: Sat, 09 Feb 2013 15:12:24 +0100
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Martin Nilsson <nilsson@opera.com>
Organization: Opera Software
Message-ID: <op.wr8p8u08iw9drz@uranium.westinmy-starwoodgp.com>
In-Reply-To: <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net>
User-Agent: Opera Mail/12.02 (Win32)
Received-SPF: pass client-ip=213.236.208.81; envelope-from=nilsson@opera.com; helo=smtp.opera.com
X-W3C-Hub-Spam-Status: No, score=-5.2
X-W3C-Hub-Spam-Report: AWL=-1.017, BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1U4BAj-0005vo-Hf e9b4d67f4883d37526021858709b11b8
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/op.wr8p8u08iw9drz@uranium.westinmy-starwoodgp.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16483
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Sat, 09 Feb 2013 00:53:10 +0100, Mark Nottingham <mnot@mnot.net> wrote:

> My .02 -
>
> RFC2616 implies that the range of characters available in headers is  
> ISO-8859-1 (while tilting the table heavily towards ASCII), and we've  
> clarified that in bis to recommend ASCII, while telling implementations  
> to handle anything else as opaque bytes.
>
> However, on the wire in HTTP/1, some bits are sent as UTF-8 (in  
> particular, the request-URI, from one or two browsers).
>

I don't see a reason to not UTF-8 encode all text fields. HTTP/1 forced a  
lot of heuristic code that tried to figure out how things where  
transformed on the way, and heuristics for decoders are bad. Though, as  
the world has moved to UTF-8, saying "opaque bytes" means UTF-8 in  
practice for everyone anyway. The problem are fields that ideally should  
be binary, say a hash for ETag. UTF-8 encoding would add 50% size there.

Creating a static huffman code for the ASCII part of Unicode shouldn't be  
a problem, as long as there is a prefix for non-ascii bytes.

/Martin Nilsson

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/