Re: HTTP/2 Header Encoding Status Update
Mark Nottingham <mnot@mnot.net> Wed, 27 February 2013 22:46 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7E9B821F884F for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 27 Feb 2013 14:46:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.606
X-Spam-Level:
X-Spam-Status: No, score=-9.606 tagged_above=-999 required=5 tests=[AWL=0.993, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gLTulOrGCoI9 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 27 Feb 2013 14:46:58 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id E907421F87C5 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 27 Feb 2013 14:46:57 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UApl4-0003Li-4I for ietf-http-wg-dist@listhub.w3.org; Wed, 27 Feb 2013 22:45:54 +0000
Resent-Date: Wed, 27 Feb 2013 22:45:54 +0000
Resent-Message-Id: <E1UApl4-0003Li-4I@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1UApkl-0003Ix-Tf for ietf-http-wg@listhub.w3.org; Wed, 27 Feb 2013 22:45:35 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by maggie.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1UApkk-00057m-Nd for ietf-http-wg@w3.org; Wed, 27 Feb 2013 22:45:35 +0000
Received: from [192.168.1.80] (unknown [118.209.5.152]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 0325C22E253; Wed, 27 Feb 2013 17:45:11 -0500 (EST)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CABP7RbfK9jT=-wXqv8wo6fJr8Wg0g9SYTZ3FeXHC=4yhihdsug@mail.gmail.com>
Date: Thu, 28 Feb 2013 09:45:08 +1100
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0DE7FB38-9484-4FED-84C7-A034AFBFA8E6@mnot.net>
References: <CABP7RbfK9jT=-wXqv8wo6fJr8Wg0g9SYTZ3FeXHC=4yhihdsug@mail.gmail.com>
To: James M Snell <jasnell@gmail.com>
X-Mailer: Apple Mail (2.1499)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-4.3
X-W3C-Hub-Spam-Report: AWL=-2.384, BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1UApkk-00057m-Nd 8ba19be176e735fe21382cb5e0239403
X-Original-To: ietf-http-wg@w3.org
Subject: Re: HTTP/2 Header Encoding Status Update
Archived-At: <http://www.w3.org/mid/0DE7FB38-9484-4FED-84C7-A034AFBFA8E6@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16912
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Hi James, On 28/02/2013, at 8:16 AM, James M Snell <jasnell@gmail.com> wrote: > As I'm not going to be able to meet y'all in Orlando, I wanted to give > a quick status update on where I'm at with the Header Encoding. > > After much experimentation, implementation and discussion with > Roberto, I've got an implementation that is a good balance between > Delta and BOHE. It uses the fundamental delta encoding mechanism that > Roberto created (with a few notable changes) but allows for four > distinct types of header values: text, numeric, date and raw-binary. > Every header value is encoded with a single additional byte flag that > indicates the value type. Further, the flags indicate whether text is > ISO-8559-1 or UTF-8 and whether it is huffman-coded or not. This > scheme gives us a good balance and allows us to achieve maximum > compression ratios and increased long term functionality without > sacrificing too much in complexity or backwards compatibility. Thanks. That sounds good. > Text values: > > The rules for text values are simple: > > 1. Text can be either UTF-8 or ISO-8859-1 Encoded. A single bit in > the flags is used to indicate which it is. > 2. Text can be huffman coded or not. A single bit in the flags is > used to indicate which. > 3. A single text header field may contain up to 0xFF separate text > strings of arbitrary length. > 4. Each individual text string is prefixed by a unsigned, variable > length integer specifying the length of the string. > > For example, assuming UTF-8 and non-huffman coded values... > > The string value "Bar" is encoded as: > 10 00 03 42 61 72 > > 10 = Flags byte > 00 = Number of values encoded (0-based.. 00 == one value) > 03 = uvarint length of the value > 42 61 72 = UTF-8 bytes > > The multiple string values ["Bar", "Baz"] would encode as: > 10 01 03 42 61 72 03 42 61 7A So, the biggest concern here, I think, is that the conversion of a UTF-8 value to ASCII/Latin-1 -- to be able to forward the header on a HTTP/1.x hop -- requires knowledge of the header. Would you want to define a standard way to encode UTF-8 in Latin-1 (e.g., percent-encoding) for headers that use this? It would constrain the headers (and likely rule out any existing headers from using UTF-8), but I don't see how this is going to be viable otherwise. > Numeric values: > > 1. Numeric values are encoded as variable-length, unsigned integers. > 2. Numeric headers can only have a single encoded value (text values > are the only ones that allow multiples) > > For example, value "100" is encoded as 24 64 > > 25 = Flags byte indicating numeric value > 64 = uvarint encoded value > > Date values: > > 1. Dates are encoded as the number of seconds since a new epoch > (Midnight GMT, Jan 1 1990) > 2. Encoded as uvarints, just like Numeric values, but with an > additional flag bit set > > For example, the date "2013-02-27T12:51:19.409-08:00" is encoded as: > 26 AA A9 BF DC 02 > > 26 = Flags byte indicating date value > AA A9 BF DC 02 = uvarint indicating the date > > Binary values: > > 1. Binary values are encoded as a raw sequence of octets prefixed by > the length encoded as a uvarint > > For example, the byte sequence 01 02 03 is encoded as: > 20 03 01 02 03 Similar problem as with UTF-8, unless you define a single transformation between binary and ASCII (base64?). > Those four encodings would be all we define within the basic header > encoding. At the HTTP semantics layer, specific headers would be > required to explicitly declare support for numeric, date or binary > encoding. This means that all of the existing HTTP header definitions, > which are currently all text, would be required to use the Text value > encoding unless they are specifically redefined to use the new type > options. Specific headers in the core set (:status, content-length, > date, last-modified, etc) would be have updated definitions to allow > those to use the new more compact encodings right from the start. This > allows us to get some immediate benefit out of the box and gives us a > way of smoothly transitioning headers over to more compact and > optimized encoding options later on without sacrificing backwards > compatibility. > > Differences from Delta: > > The scheme I have implemented has a number of technical details that > are different from Roberto's delta implementation. These details are > fairly technical and low level so feel free to ignore them unless you > have a particular interest in implementation details at this point... > > 1. For one, I wrote the thing in Java.. which is good, we now have > at least three different language implementations of the basic delta > scheme. (C++, python and Java) > 2. It uses uvarints for all length prefixing rather than fixed width > integer encodings. > 3. It uses all of the above type encodings > 4. It uses a slightly different mechanism for managing and > renumbering the delta storage state index. This is an internal > difference in implementation but all implementations will be required > to implement the same basic scheme for reindexing internal state so I > am evaluating different options to see which is the most efficient. > 5. It introduces an additional "Ephemeral Clone" opcode which is > essentially a combination of Clone and Eref (it's like clone but > doesn't change state). > 6. It implements additional heuristics for determining when to merge > multiple adjacent toggl indices into a single trang operation. > Basically, it just checks to see if converting to a Trang would save > or cost additional bytes before doing the conversion. > 7. It uses eref or eclone operations for :path, referer, > authorization, www-authenticate, proxy-authenticate, date and > last-modified headers. These tend to change very frequently and take > up a significant amount of space in the storage context. Given how > variable these are, it doesn't make sense to store them. The > likelihood of reuse within a given context is minimal compared to the > cost of storage. > 8. It uses a modified version of the default header dictionary used > by Roberto's delta implementation. I've added additional values and > changed it to use the new value encodings. > 9. I have not yet implemented the additional huffman coding for text values. > > I'm still working on making the delta state management as efficient as > possible in my implementation. The running time for the > serialize-deserialize roundtrip is still well above what I'm happy > with. Part of that is Java's fault, part of it is the delta mechanism > itself. > > Anyway, that's the run down. Still a ways off from having something > that I'd be comfortable calling "complete" but looking pretty good so > far. All good stuff. Thanks again, -- Mark Nottingham http://www.mnot.net/
- Re: HTTP/2 Header Encoding Status Update James Cloos
- HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update Julian Reschke
- Re: HTTP/2 Header Encoding Status Update Martin Thomson
- Re: HTTP/2 Header Encoding Status Update Mark Nottingham
- Re: HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update Mark Nottingham
- Re: HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update Amos Jeffries
- Re: HTTP/2 Header Encoding Status Update Julian Reschke
- Re: HTTP/2 Header Encoding Status Update Martin J. Dürst
- Re: HTTP/2 Header Encoding Status Update Julian Reschke
- Re: HTTP/2 Header Encoding Status Update Mark Nottingham
- Re: HTTP/2 Header Encoding Status Update Julian Reschke
- Re: HTTP/2 Header Encoding Status Update Nicolas Mailhot
- Re: HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update Scott Schmit
- Re: HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update Scott Schmit
- Re: HTTP/2 Header Encoding Status Update Scott Schmit
- Re: HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update Nicolas Mailhot
- Re: HTTP/2 Header Encoding Status Update Yoav Nir
- Re: HTTP/2 Header Encoding Status Update Eliot Lear
- Re: HTTP/2 Header Encoding Status Update James Cloos
- Re: HTTP/2 Header Encoding Status Update Adrien W. de Croy
- Re: HTTP/2 Header Encoding Status Update James M Snell
- Re: HTTP/2 Header Encoding Status Update Nicolas Mailhot
- Re: HTTP/2 Header Encoding Status Update Nicolas Mailhot
- Re: HTTP/2 Header Encoding Status Update Poul-Henning Kamp
- Re: HTTP/2 Header Encoding Status Update Eliot Lear
- Re: HTTP/2 Header Encoding Status Update Poul-Henning Kamp