Re: HTTP/2 Header Encoding Status Update

Mark Nottingham <mnot@mnot.net> Wed, 27 February 2013 22:46 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7E9B821F884F for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 27 Feb 2013 14:46:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.606
X-Spam-Level:
X-Spam-Status: No, score=-9.606 tagged_above=-999 required=5 tests=[AWL=0.993, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gLTulOrGCoI9 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 27 Feb 2013 14:46:58 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id E907421F87C5 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 27 Feb 2013 14:46:57 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UApl4-0003Li-4I for ietf-http-wg-dist@listhub.w3.org; Wed, 27 Feb 2013 22:45:54 +0000
Resent-Date: Wed, 27 Feb 2013 22:45:54 +0000
Resent-Message-Id: <E1UApl4-0003Li-4I@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1UApkl-0003Ix-Tf for ietf-http-wg@listhub.w3.org; Wed, 27 Feb 2013 22:45:35 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by maggie.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1UApkk-00057m-Nd for ietf-http-wg@w3.org; Wed, 27 Feb 2013 22:45:35 +0000
Received: from [192.168.1.80] (unknown [118.209.5.152]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 0325C22E253; Wed, 27 Feb 2013 17:45:11 -0500 (EST)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CABP7RbfK9jT=-wXqv8wo6fJr8Wg0g9SYTZ3FeXHC=4yhihdsug@mail.gmail.com>
Date: Thu, 28 Feb 2013 09:45:08 +1100
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0DE7FB38-9484-4FED-84C7-A034AFBFA8E6@mnot.net>
References: <CABP7RbfK9jT=-wXqv8wo6fJr8Wg0g9SYTZ3FeXHC=4yhihdsug@mail.gmail.com>
To: James M Snell <jasnell@gmail.com>
X-Mailer: Apple Mail (2.1499)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-4.3
X-W3C-Hub-Spam-Report: AWL=-2.384, BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1UApkk-00057m-Nd 8ba19be176e735fe21382cb5e0239403
X-Original-To: ietf-http-wg@w3.org
Subject: Re: HTTP/2 Header Encoding Status Update
Archived-At: <http://www.w3.org/mid/0DE7FB38-9484-4FED-84C7-A034AFBFA8E6@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16912
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi James,

On 28/02/2013, at 8:16 AM, James M Snell <jasnell@gmail.com> wrote:

> As I'm not going to be able to meet y'all in Orlando, I wanted to give
> a quick status update on where I'm at with the Header Encoding.
> 
> After much experimentation, implementation and discussion with
> Roberto, I've got an implementation that is a good balance between
> Delta and BOHE. It uses the fundamental delta encoding mechanism that
> Roberto created (with a few notable changes) but allows for four
> distinct types of header values: text, numeric, date and raw-binary.
> Every header value is encoded with a single additional byte flag that
> indicates the value type. Further, the flags indicate whether text is
> ISO-8559-1 or UTF-8 and whether it is huffman-coded or not.  This
> scheme gives us a good balance and allows us to achieve maximum
> compression ratios and increased long term functionality without
> sacrificing too much in complexity or backwards compatibility.

Thanks. That sounds good.


> Text values:
> 
>  The rules for text values are simple:
> 
>  1. Text can be either UTF-8 or ISO-8859-1 Encoded. A single bit in
> the flags is used to indicate which it is.
>  2. Text can be huffman coded or not. A single bit in the flags is
> used to indicate which.
>  3. A single text header field may contain up to 0xFF separate text
> strings of arbitrary length.
>  4. Each individual text string is prefixed by a unsigned, variable
> length integer specifying the length of the string.
> 
>  For example, assuming UTF-8 and non-huffman coded values...
> 
>  The string value "Bar" is encoded as:
>    10 00 03 42 61 72
> 
>    10 = Flags byte
>    00 = Number of values encoded (0-based.. 00 == one value)
>    03 = uvarint length of the value
>    42 61 72 = UTF-8 bytes
> 
>  The multiple string values ["Bar", "Baz"] would encode as:
>     10 01 03 42 61 72 03 42 61 7A

So, the biggest concern here, I think, is that the conversion of a UTF-8 value to ASCII/Latin-1 -- to be able to forward the header on a HTTP/1.x hop -- requires knowledge of the header.

Would you want to define a standard way to encode UTF-8 in Latin-1 (e.g., percent-encoding) for headers that use this? It would constrain the headers (and likely rule out any existing headers from using UTF-8), but I don't see how this is going to be viable otherwise.


> Numeric values:
> 
>  1. Numeric values are encoded as variable-length, unsigned integers.
>  2. Numeric headers can only have a single encoded value (text values
> are the only ones that allow multiples)
> 
>  For example, value "100" is encoded as 24 64
> 
>  25 = Flags byte indicating numeric value
>  64 = uvarint encoded value
> 
> Date values:
> 
>  1. Dates are encoded as the number of seconds since a new epoch
> (Midnight GMT, Jan 1 1990)
>  2. Encoded as uvarints, just like Numeric values, but with an
> additional flag bit set
> 
>  For example, the date "2013-02-27T12:51:19.409-08:00" is encoded as:
>    26 AA A9 BF DC 02
> 
>  26 = Flags byte indicating date value
>  AA A9 BF DC 02 = uvarint indicating the date
> 
> Binary values:
> 
>  1. Binary values are encoded as a raw sequence of octets prefixed by
> the length encoded as a uvarint
> 
>  For example, the byte sequence 01 02 03 is encoded as:
>    20 03 01 02 03

Similar problem as with UTF-8, unless you define a single transformation between binary and ASCII (base64?).


> Those four encodings would be all we define within the basic header
> encoding. At the HTTP semantics layer, specific headers would be
> required to explicitly declare support for numeric, date or binary
> encoding. This means that all of the existing HTTP header definitions,
> which are currently all text, would be required to use the Text value
> encoding unless they are specifically redefined to use the new type
> options. Specific headers in the core set (:status, content-length,
> date, last-modified, etc) would be have updated definitions to allow
> those to use the new more compact encodings right from the start. This
> allows us to get some immediate benefit out of the box and gives us a
> way of smoothly transitioning headers over to more compact and
> optimized encoding options later on without sacrificing backwards
> compatibility.
> 
> Differences from Delta:
> 
>  The scheme I have implemented has a number of technical details that
> are different from Roberto's delta implementation. These details are
> fairly technical and low level so feel free to ignore them unless you
> have a particular interest in implementation details at this point...
> 
>  1. For one, I wrote the thing in Java.. which is good, we now have
> at least three different language implementations of the basic delta
> scheme. (C++, python and Java)
>  2. It uses uvarints for all length prefixing rather than fixed width
> integer encodings.
>  3. It uses all of the above type encodings
>  4. It uses a slightly different mechanism for managing and
> renumbering the delta storage state index. This is an internal
> difference in implementation but all implementations will be required
> to implement the same basic scheme for reindexing internal state so I
> am evaluating different options to see which is the most efficient.
>  5. It introduces an additional "Ephemeral Clone" opcode which is
> essentially a combination of Clone and Eref (it's like clone but
> doesn't change state).
>  6. It implements additional heuristics for determining when to merge
> multiple adjacent toggl indices into a single trang operation.
> Basically, it just checks to see if converting to a Trang would save
> or cost additional bytes before doing the conversion.
>  7. It uses eref or eclone operations for :path, referer,
> authorization, www-authenticate, proxy-authenticate, date and
> last-modified headers. These tend to change very frequently and take
> up a significant amount of space in the storage context. Given how
> variable these are, it doesn't make sense to store them. The
> likelihood of reuse within a given context is minimal compared to the
> cost of storage.
>  8. It uses a modified version of the default header dictionary used
> by Roberto's delta implementation. I've added additional values and
> changed it to use the new value encodings.
>  9. I have not yet implemented the additional huffman coding for text values.
> 
> I'm still working on making the delta state management as efficient as
> possible in my implementation. The running time for the
> serialize-deserialize roundtrip is still well above what I'm happy
> with. Part of that is Java's fault, part of it is the delta mechanism
> itself.
> 
> Anyway, that's the run down. Still a ways off from having something
> that I'd be comfortable calling "complete" but looking pretty good so
> far.


All good stuff. Thanks again,

--
Mark Nottingham   http://www.mnot.net/