Re: Delta Compression and UTF-8 Header Values

Mark Nottingham <mnot@mnot.net> Fri, 08 February 2013 23:56 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 11FF221F8C04 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 8 Feb 2013 15:56:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.288
X-Spam-Level:
X-Spam-Status: No, score=-9.288 tagged_above=-999 required=5 tests=[AWL=1.159, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ly9tiYBO82UD for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 8 Feb 2013 15:56:11 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 21E7321F8BE2 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 8 Feb 2013 15:56:10 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U3xlK-0008QJ-HK for ietf-http-wg-dist@listhub.w3.org; Fri, 08 Feb 2013 23:53:46 +0000
Resent-Date: Fri, 08 Feb 2013 23:53:46 +0000
Resent-Message-Id: <E1U3xlK-0008QJ-HK@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1U3xlD-0008PT-H4 for ietf-http-wg@listhub.w3.org; Fri, 08 Feb 2013 23:53:39 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1U3xlC-0006sn-7w for ietf-http-wg@w3.org; Fri, 08 Feb 2013 23:53:39 +0000
Received: from [192.168.1.80] (unknown [118.209.138.158]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id D3CD522E256; Fri, 8 Feb 2013 18:53:15 -0500 (EST)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com>
Date: Sat, 09 Feb 2013 10:53:10 +1100
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net>
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com>
To: James M Snell <jasnell@gmail.com>
X-Mailer: Apple Mail (2.1499)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-3.3
X-W3C-Hub-Spam-Report: AWL=-3.332, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1U3xlC-0006sn-7w b48e739ca9d3b8bdc4682ce26ec15466
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16474
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

My .02 - 

RFC2616 implies that the range of characters available in headers is ISO-8859-1 (while tilting the table heavily towards ASCII), and we've clarified that in bis to recommend ASCII, while telling implementations to handle anything else as opaque bytes.

However, on the wire in HTTP/1, some bits are sent as UTF-8 (in particular, the request-URI, from one or two browsers).

I think our choices are roughly:

1) everything is opaque bytes
2) default to ASCII, flag headers using non-ASCII bytes to preserve them
3) everything is ASCII, require implementations that receive non-ASCII HTTP/1.1 to translate to ASCII (e.g., convert IRIs to URIs)

#1 is safest, but you don't get the benefit of re-encoding. The plan the the first implementation draft is to not try to take advantage of encoding, so it's the way we're likely to go -- for now.

#2 starts to walk down the encoding path. There are many variants; we could default to blobs, default to UTF-8, etc. We could just flag "ASCII or blob" or we could define many, many possible encodings, as discussed.

#3 seems risky to me.

Cheers, 


On 09/02/2013, at 6:28 AM, James M Snell <jasnell@gmail.com> wrote:

> Just going through more implementation details of the proposed delta
> encoding... one of the items that had come up previously in early
> http/2 discussions was the possibility of allowing for UTF-8 header
> values. Doing so would allow us to move away from things like
> punycode, pct-encoding, Q and B-Codecs, RFC 5987 mechanisms, etc it
> would bring along a range of other issues we would need to deal with.
> 
> One key challenge with allowing UTF-8 values, however, is that it
> conflicts with the use of the static huffman encoding in the proposed
> Delta Encoding for header compression. If we allow for non-ascii
> characters, the static huffman coding simply becomes too inefficient
> and unmanageable to be useful. There are a few ways around it but none
> of the strategies are all that attractive.
> 
> So the question is: do we want to allow UTF-8 header values? Is it
> worth the trade-off in less-efficient header compression? Or put
> another way, is increased compression efficiency worth ruling out
> UTF-8 header values?
> 
> (Obviously there are other issues with UTF-8 values we'd need to
> consider, such as http/1 interop)
> 
> - James
> 

--
Mark Nottingham   http://www.mnot.net/