Re: bohe implementation for compression tests

James M Snell <jasnell@gmail.com> Tue, 15 January 2013 01:32 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7FE2221F8B67 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 14 Jan 2013 17:32:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.508
X-Spam-Level:
X-Spam-Status: No, score=-8.508 tagged_above=-999 required=5 tests=[AWL=2.090, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gakZNAnOrNeQ for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 14 Jan 2013 17:32:50 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id CF35B21F8B4C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 14 Jan 2013 17:32:49 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1TuvMU-000669-N6 for ietf-http-wg-dist@listhub.w3.org; Tue, 15 Jan 2013 01:30:46 +0000
Resent-Date: Tue, 15 Jan 2013 01:30:46 +0000
Resent-Message-Id: <E1TuvMU-000669-N6@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1TuvMQ-00064s-C7 for ietf-http-wg@listhub.w3.org; Tue, 15 Jan 2013 01:30:42 +0000
Received: from mail-ie0-f178.google.com ([209.85.223.178]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1TuvMP-0005NG-7G for ietf-http-wg@w3.org; Tue, 15 Jan 2013 01:30:42 +0000
Received: by mail-ie0-f178.google.com with SMTP id c12so6194366ieb.23 for <ietf-http-wg@w3.org>; Mon, 14 Jan 2013 17:30:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=eUT4Ky7ZhQpTywQk8e0s0EUI/pcFHGpU/uYN1phpBXg=; b=MGiuQe8wC+05NaNzM+RElPX+DkxABslkFsN+QaMLmKM+VULwHzzOW9H4qGegfi/XwT rlTC1z8RUEzt1WHCl06TasLy3HDMibdkrSGa1gN0ldIuT795Eb84mJB7RcyLvHvnFCW4 zzruuA9gwUSdIupPhRUV/5UDD2BRplOBHQjPKz3sikBxD8EflHC2jwBzSH3nIX2GlROc 9YEGpJdG75MDOtNlCVUHYp83oItydZrAIQ0DW89FrvKVZ8MaVpSt/xgz+3rOfY4vBAd5 j9ZkLZ2QCxjXI8B7WPmP5bB5u1hnxmeKKHQ0rR9PAt2ksJnFzcBhD4fbkl5Y52lnZPtq E4Kg==
Received: by 10.50.150.174 with SMTP id uj14mr409210igb.19.1358213415351; Mon, 14 Jan 2013 17:30:15 -0800 (PST)
MIME-Version: 1.0
Received: by 10.64.26.137 with HTTP; Mon, 14 Jan 2013 17:29:54 -0800 (PST)
In-Reply-To: <CABP7Rbe-B89vVm8=OnHtAG0Y3G2UOysX+DKaTQ3+rAKBJBJyKA@mail.gmail.com>
References: <CABP7Rbe-B89vVm8=OnHtAG0Y3G2UOysX+DKaTQ3+rAKBJBJyKA@mail.gmail.com>
From: James M Snell <jasnell@gmail.com>
Date: Mon, 14 Jan 2013 17:29:54 -0800
Message-ID: <CABP7Rbc30mXrQyGE_hCQd1ydbNFcOhrGC-Mi32+tX7aqxrEjpw@mail.gmail.com>
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="f46d043d644b26b6fc04d349b4c6"
Received-SPF: pass client-ip=209.85.223.178; envelope-from=jasnell@gmail.com; helo=mail-ie0-f178.google.com
X-W3C-Hub-Spam-Status: No, score=-4.4
X-W3C-Hub-Spam-Report: AWL=-1.683, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1TuvMP-0005NG-7G ebc07d73f7ed2735332e00e4a92fb4c7
X-Original-To: ietf-http-wg@w3.org
Subject: Re: bohe implementation for compression tests
Archived-At: <http://www.w3.org/mid/CABP7Rbc30mXrQyGE_hCQd1ydbNFcOhrGC-Mi32+tX7aqxrEjpw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/15869
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Just continuing the investigation on various compression strategies. I
spent part of the day going through delta to make sure I understand it and
how it compares with bohe... I'll have some additional thoughts (and
concerns) with regards to that later on... The other half of the day has
been spent with various other bohe variations. Late in the after I hit upon
a particularly interesting variation... I've checked it in here:
https://github.com/jasnell/compression-test/tree/master/compressor/bohe4

This variation encodes headers and randomly assigns them to one of two
separate buckets. Those are then randomly ordered and compressed using two
separate compressor instances within the header block...

# +-------------+--------------------------+
# | num_headers |   block 1 len (4 bytes)  |
# +-------------+--------------------------+
# |        compressed header block 1       |
# +----------------------------+-----------+
# |  block 2 len (4 bytes)     |           |
# +----------------------------+           |
# |        compressed header block 2       |
# +----------------------------+-----------+

Because of the randomization, there is no way of determining in advance
which block any individual piece of data will land... making it much harder
for an attacker to use the compression ratio to reverse engineer any
particular value... every time the information is sent, it can be in a
different location. You can take the exact same request and encode it
multiple times and end up with a different message size every time (up to a
given limit, of course).

Some numbers from various test runs... note how bohe4 produces variable
compression ratios given identical input.

./compare_compressors.py -c bohe -c bohe4 -c delta -t
/Users/james/git/http_samples/mnot/wikipedia.org.har
408 req messages processed
             compressed | ratio min   max   std
req  bohe        10,784 | 0.13  0.05  0.65  0.07
req bohe4        13,496 | 0.16  0.05  0.69  0.08
req delta        16,725 | 0.20  0.04  0.72  0.09
req http1        84,388 | 1.00  1.00  1.00  0.00

408 res messages processed
             compressed | ratio min   max   std
res  bohe        19,882 | 0.25  0.06  0.58  0.10
res bohe4        20,610 | 0.26  0.09  0.63  0.09
res delta        24,523 | 0.30  0.04  0.60  0.12
res http1        80,613 | 1.00  1.00  1.00  0.00

./compare_compressors.py -c bohe -c bohe4 -c delta -t
/Users/james/git/http_samples/mnot/wikipedia.org.har
408 req messages processed
             compressed | ratio min   max   std
req  bohe        10,784 | 0.13  0.05  0.65  0.07
req bohe4        13,820 | 0.16  0.07  0.67  0.08
req delta        16,725 | 0.20  0.04  0.72  0.09
req http1        84,388 | 1.00  1.00  1.00  0.00

408 res messages processed
             compressed | ratio min   max   std
res  bohe        19,882 | 0.25  0.06  0.58  0.10
res bohe4        21,644 | 0.27  0.09  0.61  0.09
res delta        24,523 | 0.30  0.04  0.60  0.12
res http1        80,613 | 1.00  1.00  1.00  0.00

Again, this is just intended as fodder for discussion right now. I'll have
some comments specifically on delta encoding tomorrow sometime.

- James


On Thu, Jan 10, 2013 at 11:08 AM, James M Snell <jasnell@gmail.com> wrote:

> I have an initial bohe implementation for the compression tests... it's
> very preliminary and uses the same gzip compression as the current spdy3.
> I'm going to be playing around with the delta compression mechanism as well
> and see how much of an impact that has. Initial results are very promising
> but I haven't done much debugging yet. Just wanted folks to know that this
> work was underway...
>
> https://github.com/jasnell/compression-test/tree/master/compressor/bohe
>
> Some test runs....
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/amazon.com.har
> 732 req messages processed
>              compressed | ratio min   max   std
> req  bohe        26,122 | 0.13  0.04  0.70  0.08
> req delta        33,955 | 0.17  0.02  0.71  0.09
> req http1       195,386 | 1.00  1.00  1.00  0.00
> req spdy3        27,238 | 0.14  0.04  0.71  0.08
>
> 732 res messages processed
>              compressed | ratio min   max   std
> res  bohe        39,628 | 0.25  0.04  0.66  0.07
> res delta        44,499 | 0.28  0.02  0.65  0.09
> res http1       159,968 | 1.00  1.00  1.00  0.00
> res spdy3        41,325 | 0.26  0.04  0.67  0.08
>
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/craigslist.org.har
> 66 req messages processed
>              compressed | ratio min   max   std
> req  bohe         1,948 | 0.15  0.06  0.73  0.11
> req delta         2,036 | 0.16  0.07  0.71  0.11
> req http1        12,894 | 1.00  1.00  1.00  0.00
> req spdy3         2,016 | 0.16  0.07  0.75  0.11
>
> 66 res messages processed
>              compressed | ratio min   max   std
> res  bohe         1,786 | 0.18  0.07  0.77  0.13
> res delta         2,858 | 0.28  0.08  0.69  0.12
> res http1        10,147 | 1.00  1.00  1.00  0.00
> res spdy3         1,869 | 0.18  0.09  0.78  0.13
>
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/flickr.com.har
> 438 req messages processed
>              compressed | ratio min   max   std
> req  bohe        11,988 | 0.10  0.02  0.69  0.07
> req delta        26,372 | 0.22  0.01  0.71  0.14
> req http1       121,854 | 1.00  1.00  1.00  0.00
> req spdy3        12,550 | 0.10  0.02  0.71  0.07
>
> 438 res messages processed
>              compressed | ratio min   max   std
> res  bohe        13,073 | 0.09  0.05  0.66  0.06
> res delta        25,236 | 0.18  0.02  0.70  0.11
> res http1       140,457 | 1.00  1.00  1.00  0.00
> res spdy3        14,142 | 0.10  0.05  0.66  0.06
>
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/facebook.com.har
> 234 req messages processed
>              compressed | ratio min   max   std
> req  bohe         6,091 | 0.15  0.06  0.78  0.07
> req delta         7,800 | 0.19  0.02  0.70  0.07
> req http1        41,980 | 1.00  1.00  1.00  0.00
> req spdy3         6,301 | 0.15  0.06  0.77  0.07
>
> 234 res messages processed
>              compressed | ratio min   max   std
> res  bohe         9,458 | 0.23  0.07  0.68  0.07
> res delta        12,045 | 0.30  0.13  0.60  0.08
> res http1        40,252 | 1.00  1.00  1.00  0.00
> res spdy3         9,788 | 0.24  0.07  0.69  0.07
>
>
>
>
>