Re: Reference set in HPACK
Johnny Graettinger <jgraettinger@chromium.org> Wed, 02 July 2014 16:08 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 723FA1B2983 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 2 Jul 2014 09:08:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.03
X-Spam-Level:
X-Spam-Status: No, score=-7.03 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.651, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8E1-GfAxiofL for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 2 Jul 2014 09:08:56 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CF5681B281B for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 2 Jul 2014 09:08:55 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1X2N3x-0002lw-Oo for ietf-http-wg-dist@listhub.w3.org; Wed, 02 Jul 2014 16:07:13 +0000
Resent-Date: Wed, 02 Jul 2014 16:07:13 +0000
Resent-Message-Id: <E1X2N3x-0002lw-Oo@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <jgraettinger@google.com>) id 1X2N3p-0002l6-J9 for ietf-http-wg@listhub.w3.org; Wed, 02 Jul 2014 16:07:05 +0000
Received: from mail-vc0-f176.google.com ([209.85.220.176]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <jgraettinger@google.com>) id 1X2N3o-0006y7-GV for ietf-http-wg@w3.org; Wed, 02 Jul 2014 16:07:05 +0000
Received: by mail-vc0-f176.google.com with SMTP id ik5so10565741vcb.35 for <ietf-http-wg@w3.org>; Wed, 02 Jul 2014 09:06:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=qVsofFwB7bCQeqIVfB9x6bH6nmqEB+iR/hlPsEnYxwU=; b=N+6JZUuWZompxhlqWY0a8WgprpbjTqjBNfZjbFxhLXdwxEPKw25Ypn9tDUgEX0JLMe r8K8VDPCUEHYF4cESoNsOVX4hmmaO9TG2p+kO7TmWT6Sa1GTqgDZA5Feazxod+czr/Yj ZiD5nS1ouX/mLOYVgJ5OCXMHbwsoaf1kXJBUz72fd5xylZ817CBV4blR3bEopZki2fwZ qufstNoPXpENkzlW+yzGH7XZPNxFJmCT+Ucwvq8pZDDtJdtuUm1XTyHbG3kXwY/3c2TS zyB6OsxO7+T7XDkhmFSWnUdiTxgW9VtP4KaDVMFlXVhXVhHH5Ye8oiAEEGqV8yc5Qijq 7zbQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=qVsofFwB7bCQeqIVfB9x6bH6nmqEB+iR/hlPsEnYxwU=; b=bhgId47OHbkwxwIvT9wyXZuaYhZuYRI8W0DOW00/wBXwx+0ucS48yzysOmN8DMAZKQ QapqsB8kBQ4lOFTj1RdRFcGYbcWJCLnh4vfAlFuoLfjLo3J8o/LFCzZlM2hjEIOBu+cw IfXKffYaNmziOCjtDY73wLYWqsviJOLhNOn3A=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=qVsofFwB7bCQeqIVfB9x6bH6nmqEB+iR/hlPsEnYxwU=; b=Q4Dt4s4dmiq1Qu47AKjHleqlOltIOOgdbfpDrW53gNtVSVoTsCQ6aqEtDin6azyNi8 jz2uO7dQZ/1znxgu+CW0yeI1vla8cUpTPBGQz/WmFzLi9XCLta3n8mZOLqAKBwLbvAj4 O/BvO97gubKRRgC2AiTW46bD+p91sMLZtCr03+2ufWi571GzXs0KB0WB8Bo3eNVn3Xkc n5XaykiTcIgDa7VqYtqHOfDIE0MrG1wEU9rBha2qhK/NUp+JLSkJ+5qTI87WlNbztKwG D6yDIxOwK5N2l4fmoSVttNLL0NxcvpDfZrym3AmiQxL3gQOR7TQ3ZacfuHEvgTW/5kO7 DVpQ==
X-Gm-Message-State: ALoCoQnMFNyeH4Ygl4ux/N+ULZAc5j/LCBWV/2JoKfW0x01KSIqjFybH8pQ2RH0Lvys4IRz7r5HX
MIME-Version: 1.0
X-Received: by 10.52.133.202 with SMTP id pe10mr665893vdb.78.1404317198428; Wed, 02 Jul 2014 09:06:38 -0700 (PDT)
Sender: jgraettinger@google.com
Received: by 10.52.76.134 with HTTP; Wed, 2 Jul 2014 09:06:38 -0700 (PDT)
In-Reply-To: <CAFDeSfeAyUQ86Owr512L7wzv_bashOJ6OPk4qxx1pdd1DnDP2Q@mail.gmail.com>
References: <20140702.143041.283993814131065692.kazu@iij.ad.jp> <CAP+FsNexzVzt+YV7oBeMdGrMoajbMVj1Z90XvQfaCuNMDjYdHg@mail.gmail.com> <1F0B6FCE-9143-42C2-AB92-500D266C1BE7@apple.com> <CAFDeSfeAyUQ86Owr512L7wzv_bashOJ6OPk4qxx1pdd1DnDP2Q@mail.gmail.com>
Date: Wed, 02 Jul 2014 12:06:38 -0400
X-Google-Sender-Auth: Udh1uLBZTB4UVu72CXEu4OiFkhc
Message-ID: <CAEn92TqDW68vMov0Pfn=k3GCeKd1gDKjbxs8YxBAEfaqwWnPtQ@mail.gmail.com>
From: Johnny Graettinger <jgraettinger@chromium.org>
To: Kaoru Maeda <kaorumaeda.ml@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>, Roberto Peon <grmocg@gmail.com>, Michael Sweet <msweet@apple.com>, Kazu Yamamoto <kazu@iij.ad.jp>
Content-Type: multipart/alternative; boundary="bcaec52999ffc3c4b804fd3813f4"
Received-SPF: pass client-ip=209.85.220.176; envelope-from=jgraettinger@google.com; helo=mail-vc0-f176.google.com
X-W3C-Hub-Spam-Status: No, score=-3.6
X-W3C-Hub-Spam-Report: AWL=-2.763, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01
X-W3C-Scan-Sig: maggie.w3.org 1X2N3o-0006y7-GV 37be9793a71d19b337d33da236e111a8
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Reference set in HPACK
Archived-At: <http://www.w3.org/mid/CAEn92TqDW68vMov0Pfn=k3GCeKd1gDKjbxs8YxBAEfaqwWnPtQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/25152
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
A couple of comments: The test set at https://github.com/http2jp/hpack-test-case/ tremendously under-represents the impact of cookies. It's not *at all* uncommon to see sites with 50+ cookie crumbs (nytimes.com had 58 at last count). My own anecdote from browsing around and observing HPACK performance is that the reference set helped, and that the real problem was the default initial table size was too small to make full use of it. Of course, for obvious privacy reasons it's tough to create a public test-set which fully incorporates natural browsing sessions & cookies. Chromium is already running a trial where all headers (including HTTP/1 origins) are run through an HPACK encoder (for purposes of identifying a better Huffman code). It may be possible to extend the trial to examine compression performance with & without the reference set; I'll look into it. On Wed, Jul 2, 2014 at 9:52 AM, Kaoru Maeda <kaorumaeda.ml@gmail.com> wrote: > Another advantage of removing reference set is that the order of header > fields is preserved. > This removes the needs for the following rule: > - section-8.1.2.3 > To preserve the order of multiple occurrences of a header field with > the same name, its ordered values are concatenated into a single > value using a zero-valued octet (0x0) to delimit them. > > After decompression, header fields that have values containing zero > octets (0x0) MUST be split into multiple header fields before being > processed. > > Cookie now preserves the order of key-value pairs even if they are split > for "better-compression". This might be important with regard to request > signing. > > -- > Kaoru Maeda > > > 2014-07-02 19:52 GMT+09:00 Michael Sweet <msweet@apple.com>: > > Roberto, >> >> On Jul 2, 2014, at 1:39 AM, Roberto Peon <grmocg@gmail.com> wrote: >> >> You're basing conclusions on today's data, instead of looking forward as >> to what might happen when the set of headers sent adapts to the compression >> method, making it significantly more likely for items in the reference set >> to be emitted. >> >> >> Isn't that basically confirming what Kazu found: the reference set >> doesn't help with today's headers? >> >> Here is running code that demonstrates that the reference set does not >> contribute significantly to the performance of HPACK. Unless you can >> demonstrate a significant improvement from (simple) server/client changes, >> your assertion that things will improve doesn't have any evidence to >> support it. >> >> My observation is that the headers emitted by most web sites are not >> controlled by the web site developer, they will rely on the underlying web >> server and scripting engine (PHP, Perl, Python, Ruby, etc.) to do that. >> The only header they generally do control is Set-Cookie, and then only for >> their own site (i.e. not for the advertising networks that are used). What >> changes on the server side would be useful here to get the full benefit of >> the reference table? >> >> (And IMHO if we do have this information then it should be in the HPACK >> spec...) >> >> >> >> >> You may want to look at how many of those entries would be regularized if >> HPACK was in use and servers/clients intended on sending headers that were >> similar. >> -=R >> >> >> On Tue, Jul 1, 2014 at 10:30 PM, Kazu Yamamoto <kazu@iij.ad.jp> wrote: >> >>> Hi, >>> >>> As you may remember, I implemented several HPACK *encoding* algorithms >>> and calculated compression ratio. I tried it again based on HPACK >>> 08. I have 8 algorithms. >>> >>> - Naive -- No compression >>> - Naive-H -- Using Huffman only >>> - Static -- Using static table only >>> - Static-H -- Using static table and Huffman >>> - Linear -- Using header table >>> - Linear-H -- Using header table and Huffman >>> - Diff -- Using header table and reference set >>> - Diff-H -- Using header table, reference set and Huffman >>> >>> The implementations above pass all test cases in >>> https://github.com/http2jp/hpack-test-case/. Using this test cases as >>> input, I calculated compression ratio again. The ratio is calculated >>> by dividing the number of bytes after compression by that before >>> compression. >>> >>> Here is results: >>> >>> Naive 1.10 >>> Naive-H 0.86 >>> Static 0.84 >>> Static-H 0.66 >>> Linear 0.39 >>> Linear-H 0.31 >>> Diff 0.39 >>> Diff-H 0.31 >>> >>> Linear-H and Diff-H results in almost the same. To my calculation, >>> Diff-H is only 1.6 byte shorter than Linear-H in average. This means >>> that reference set does NOT much contribute to compress headers >>> although it is very difficult to implement. >>> >>> I have NOT seen any header examples for which reference set work >>> effectively so far. >>> >>> So, if the authors of HPACK want to retain reference set, I would like >>> to see evidence that there are some cases in which reference set >>> contributes the compression ratio. HPACK 08 says "Updated Huffman >>> table, using data set provided by Google". So, I guess that the >>> authors can calculate the compression ratio based on this data. >>> >>> If there is not such an evidence, I would like to strongly recommend >>> to remove reference set from HPACK. This makes HPACK much simpler, so >>> implementations gets bug less and inter-operability is improved. Plus, >>> the order of headers is reserved always. >>> >>> Regards, >>> >>> --Kazu >>> >>> >>> >>> >>> >>> >> >> _________________________________________________________ >> Michael Sweet, Senior Printing System Engineer, PWG Chair >> >> >
- Reference set in HPACK Kazu Yamamoto ( 山本和彦 )
- Re: Reference set in HPACK Roberto Peon
- Re: Reference set in HPACK Mark Nottingham
- Re: Reference set in HPACK Kazu Yamamoto ( 山本和彦 )
- Re: Reference set in HPACK Kazu Yamamoto ( 山本和彦 )
- Re: Reference set in HPACK Kazu Yamamoto ( 山本和彦 )
- Re: Reference set in HPACK Roberto Peon
- Re: Reference set in HPACK Poul-Henning Kamp
- Re: Reference set in HPACK Roberto Peon
- Re: Reference set in HPACK Roberto Peon
- Re: Reference set in HPACK Poul-Henning Kamp
- Re: Reference set in HPACK Greg Wilkins
- Re: Reference set in HPACK Roberto Peon
- Re: Reference set in HPACK Roberto Peon
- RE: Reference set in HPACK K.Morgan
- Re: Reference set in HPACK Roberto Peon
- RE: Reference set in HPACK K.Morgan
- RE: Reference set in HPACK K.Morgan
- Re: Reference set in HPACK Poul-Henning Kamp
- Re: Reference set in HPACK Poul-Henning Kamp
- RE: Reference set in HPACK K.Morgan
- Re: Reference set in HPACK Michael Sweet
- Re: Reference set in HPACK Kaoru Maeda
- Re: Reference set in HPACK Johnny Graettinger
- Re: Reference set in HPACK Martin Thomson
- Re: Reference set in HPACK Eric J. Bowman
- Re: Reference set in HPACK Roberto Peon
- Re: Reference set in HPACK Eric J. Bowman
- Re: Reference set in HPACK Roberto Peon
- Re: Reference set in HPACK Eric J. Bowman
- RE: Reference set in HPACK RUELLAN Herve
- Re: Reference set in HPACK Eric J. Bowman
- Compression ratio of 09 (was: Reference set in HP… Kazu Yamamoto ( 山本和彦 )