Re: Delta Compression and UTF-8 Header Values

James M Snell <jasnell@gmail.com> Sat, 09 February 2013 01:12 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AADCF21F8B58 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 8 Feb 2013 17:12:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.414
X-Spam-Level:
X-Spam-Status: No, score=-10.414 tagged_above=-999 required=5 tests=[AWL=0.033, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pbVNe84h05tt for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 8 Feb 2013 17:12:54 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id CDC5121F8A52 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 8 Feb 2013 17:12:53 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U3yyA-0001K4-Ga for ietf-http-wg-dist@listhub.w3.org; Sat, 09 Feb 2013 01:11:06 +0000
Resent-Date: Sat, 09 Feb 2013 01:11:06 +0000
Resent-Message-Id: <E1U3yyA-0001K4-Ga@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1U3yy0-0001Hm-37 for ietf-http-wg@listhub.w3.org; Sat, 09 Feb 2013 01:10:56 +0000
Received: from mail-ia0-f178.google.com ([209.85.210.178]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1U3yxt-0007zc-IJ for ietf-http-wg@w3.org; Sat, 09 Feb 2013 01:10:56 +0000
Received: by mail-ia0-f178.google.com with SMTP id y26so4797384iab.23 for <ietf-http-wg@w3.org>; Fri, 08 Feb 2013 17:10:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=jgAd7wfTc70b/1IxmoXkahRy36AOzSOOOhiXjaa7tGc=; b=BecZ5GHzqRusKJjq1wq7+UkCuGWQVLjNquaoRd3semVDLzL3onbiJIUVFLXZsgVNqn tnwDpY2Q+GRHjCFcE5pk6syC9sAG5LGJC7QMfyucN9EkQROP5GFphzlLs10PElufUhBj 4asSUo4EEAy/bgGyt9qI9mKH0dCSNqVGSEW8QLTJGIR5g8b+bEGumxvt+iLhCI5oR0fV dio6RPUp7cnN09u11x6UB0PWluUISyraYhHeMIyBtn+CkwOWmthfD2XrBS5VGtLfkeg7 cYMd7TuP//7P8B2UDhSyJfn1FJHR31wRo8KjkMlPvFGv17aFdUCtYL8HXKqFbEvr0kY+ FXUw==
X-Received: by 10.50.178.10 with SMTP id cu10mr6565828igc.75.1360372222725; Fri, 08 Feb 2013 17:10:22 -0800 (PST)
MIME-Version: 1.0
Received: by 10.64.53.237 with HTTP; Fri, 8 Feb 2013 17:10:02 -0800 (PST)
In-Reply-To: <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net>
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net>
From: James M Snell <jasnell@gmail.com>
Date: Fri, 08 Feb 2013 17:10:02 -0800
Message-ID: <CABP7RbcRrjV7EhwoGbkWbYJEXeWOwH4gQuaCG7N0siQqeMtcag@mail.gmail.com>
To: Mark Nottingham <mnot@mnot.net>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset="UTF-8"
Received-SPF: pass client-ip=209.85.210.178; envelope-from=jasnell@gmail.com; helo=mail-ia0-f178.google.com
X-W3C-Hub-Spam-Status: No, score=-3.4
X-W3C-Hub-Spam-Report: AWL=-2.626, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1U3yxt-0007zc-IJ 544f8737511e0c2573f1215f5f9846c7
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/CABP7RbcRrjV7EhwoGbkWbYJEXeWOwH4gQuaCG7N0siQqeMtcag@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16475
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Fri, Feb 8, 2013 at 3:53 PM, Mark Nottingham <mnot@mnot.net> wrote:
> My .02 -
>
> RFC2616 implies that the range of characters available in headers is ISO-8859-1 (while tilting the table heavily towards ASCII), and we've clarified that in bis to recommend ASCII, while telling implementations to handle anything else as opaque bytes.
>
> However, on the wire in HTTP/1, some bits are sent as UTF-8 (in particular, the request-URI, from one or two browsers).
>
> I think our choices are roughly:
>
> 1) everything is opaque bytes
> 2) default to ASCII, flag headers using non-ASCII bytes to preserve them
> 3) everything is ASCII, require implementations that receive non-ASCII HTTP/1.1 to translate to ASCII (e.g., convert IRIs to URIs)
>
> #1 is safest, but you don't get the benefit of re-encoding. The plan the the first implementation draft is to not try to take advantage of encoding, so it's the way we're likely to go -- for now.
>
> #2 starts to walk down the encoding path. There are many variants; we could default to blobs, default to UTF-8, etc. We could just flag "ASCII or blob" or we could define many, many possible encodings, as discussed.
>
> #3 seems risky to me.
>

I have the distinct feeling we're going to end up somewhere between #1
and #2.. which means bad things for the static huffman-coding. If we
end up with #2, we'll be able to huffman code anything that is flagged
as ASCII, and won't be able to touch the rest.

- James

> Cheers,
>
>
> On 09/02/2013, at 6:28 AM, James M Snell <jasnell@gmail.com> wrote:
>
>> Just going through more implementation details of the proposed delta
>> encoding... one of the items that had come up previously in early
>> http/2 discussions was the possibility of allowing for UTF-8 header
>> values. Doing so would allow us to move away from things like
>> punycode, pct-encoding, Q and B-Codecs, RFC 5987 mechanisms, etc it
>> would bring along a range of other issues we would need to deal with.
>>
>> One key challenge with allowing UTF-8 values, however, is that it
>> conflicts with the use of the static huffman encoding in the proposed
>> Delta Encoding for header compression. If we allow for non-ascii
>> characters, the static huffman coding simply becomes too inefficient
>> and unmanageable to be useful. There are a few ways around it but none
>> of the strategies are all that attractive.
>>
>> So the question is: do we want to allow UTF-8 header values? Is it
>> worth the trade-off in less-efficient header compression? Or put
>> another way, is increased compression efficiency worth ruling out
>> UTF-8 header values?
>>
>> (Obviously there are other issues with UTF-8 values we'd need to
>> consider, such as http/1 interop)
>>
>> - James
>>
>
> --
> Mark Nottingham   http://www.mnot.net/
>
>
>