RE: Compression analysis of perfect atom-based compressor
RUELLAN Herve <Herve.Ruellan@crf.canon.fr> Fri, 05 April 2013 16:04 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 76E9821F97C4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 5 Apr 2013 09:04:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.249
X-Spam-Level:
X-Spam-Status: No, score=-10.249 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_FR=0.35, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iRNJb4If6eyG for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 5 Apr 2013 09:04:50 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id AAA3621F9773 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 5 Apr 2013 09:04:50 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UO96o-0008U3-70 for ietf-http-wg-dist@listhub.w3.org; Fri, 05 Apr 2013 16:03:22 +0000
Resent-Date: Fri, 05 Apr 2013 16:03:22 +0000
Resent-Message-Id: <E1UO96o-0008U3-70@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <Herve.Ruellan@crf.canon.fr>) id 1UO96l-0008TK-7A for ietf-http-wg@listhub.w3.org; Fri, 05 Apr 2013 16:03:19 +0000
Received: from inari-msr.crf.canon.fr ([194.2.158.67]) by maggie.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <Herve.Ruellan@crf.canon.fr>) id 1UO96g-0002ey-Lf for ietf-http-wg@w3.org; Fri, 05 Apr 2013 16:03:19 +0000
Received: from mir-bsr.corp.crf.canon.fr (mir-bsr.corp.crf.canon.fr [172.19.77.99]) by inari-msr.crf.canon.fr (8.13.8/8.13.8) with ESMTP id r35G2kAd030997; Fri, 5 Apr 2013 18:02:46 +0200
Received: from ADELE.crf.canon.fr (adele.fesl2.crf.canon.fr [172.19.70.17]) by mir-bsr.corp.crf.canon.fr (8.13.8/8.13.8) with ESMTP id r35G2ihi029865; Fri, 5 Apr 2013 18:02:44 +0200
Received: from ADELE.crf.canon.fr ([::1]) by ADELE.crf.canon.fr ([::1]) with mapi id 14.02.0342.003; Fri, 5 Apr 2013 18:02:44 +0200
From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
To: Roberto Peon <grmocg@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Thread-Topic: Compression analysis of perfect atom-based compressor
Thread-Index: AQHOMZLHtSrZsx1Lx0+cggpFS0RK8pjHxnnQ
Date: Fri, 05 Apr 2013 16:02:43 +0000
Message-ID: <6C71876BDCCD01488E70A2399529D5E5163F79CF@ADELE.crf.canon.fr>
References: <CAP+FsNew+0ce6q24KRAdut7g4OjU5ysOSxd8Yk-FBmRrx0550w@mail.gmail.com>
In-Reply-To: <CAP+FsNew+0ce6q24KRAdut7g4OjU5ysOSxd8Yk-FBmRrx0550w@mail.gmail.com>
Accept-Language: en-US, fr-FR
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [172.20.5.239]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Received-SPF: none client-ip=194.2.158.67; envelope-from=Herve.Ruellan@crf.canon.fr; helo=inari-msr.crf.canon.fr
X-W3C-Hub-Spam-Status: No, score=-4.6
X-W3C-Hub-Spam-Report: AWL=-2.243, RP_MATCHES_RCVD=-2.373
X-W3C-Scan-Sig: maggie.w3.org 1UO96g-0002ey-Lf ef68f4c8b4bfce326ed5a367e038cb97
X-Original-To: ietf-http-wg@w3.org
Subject: RE: Compression analysis of perfect atom-based compressor
Archived-At: <http://www.w3.org/mid/6C71876BDCCD01488E70A2399529D5E5163F79CF@ADELE.crf.canon.fr>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/17203
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
> -----Original Message----- > From: Roberto Peon [mailto:grmocg@gmail.com] > Sent: vendredi 5 avril 2013 01:56 > To: HTTP Working Group > Subject: Compression analysis of perfect atom-based compressor > [snip] > Take aways? > > * We need a better survey of headers from everywhere :) > > * Compression over our corpus should scale favorably with small table > (and state) size. > > * Encoding index as dist-from-newest really works well, and LRU > appears to be extremely effective as an expiration policy (the attached graph > looks good). For HeaderDiff, we changed the expiration policy on the encoder side to use LRU: we found it was more effective than our previous "smart" algorithm. > * We're getting substantial compression from both key and value > backreferences/tokenization. > > * Algorithmically, there isn't a whole lot to do-- the devil is really in the > serialization details and the tradeoffs involved in generating/parsing. There > are obvious tweaks that compressors could do when space constrained (e.g. > looking at the first table, above, as the likely benefit and making decisions > based upon that), but the data which suggests that the LRU is so effective > also suggests that this benefit is likely limited unless they can predict the > future :) > For information, here is the size of the literal values that have to be transmitted. Requests --------- header name | Encoded value size :path | 1454063 referer | 245761 cookie | 112619 user-agent | 63628 accept | 23975 :host | 22855 accept-language | 12325 accept-charset | 11100 accept-encoding | 9223 if-modified-since | 3915 nt_w3c | 2650 :scheme | 2623 :method | 1961 Responses --------- header name | Encoded value size last-modified | 266750 expires | 199575 date | 155803 etag | 114298 set-cookie | 110527 via | 84709 cache-control | 65506 location | 61699 content-length | 49104 x-amz-cf-id | 43120 x-amz-id-2 | 38960 x-varnish | 32757 p3p | 28416 content-type | 27362 age | 25023 content-disposition | 22147 x-cache | 15376 x-cache-lookup | 13911 server | 13199 x-amz-request-id | 9168 x-fb-debug | 7348 vary | 4688 x-json | 4384 To get any compression on these values we can use: - Deflate (I'm not sure it will get any traction in the group ;-)). - Prefix sharing (we're looking for a way to make it fully secure). - Static Huffman encoding (it adds some computational costs). - Typed codec (it should work well with last-modified, expires, date...). Hervé.
- Compression analysis of perfect atom-based compre… Roberto Peon
- Re: Compression analysis of perfect atom-based co… Mark Nottingham
- Re: Compression analysis of perfect atom-based co… Roberto Peon
- RE: Compression analysis of perfect atom-based co… RUELLAN Herve
- Re: Compression analysis of perfect atom-based co… Martin Nilsson
- Re: Compression analysis of perfect atom-based co… Mark Nottingham