Re: bohe implementation for compression tests

James M Snell <jasnell@gmail.com> Tue, 15 January 2013 07:50 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 182F421F8B3F for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 14 Jan 2013 23:50:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.675
X-Spam-Level:
X-Spam-Status: No, score=-8.675 tagged_above=-999 required=5 tests=[AWL=1.923, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Zx+7p+FMkRDQ for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 14 Jan 2013 23:50:12 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 2F01D21F8B3A for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 14 Jan 2013 23:50:11 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Tv1Fn-0003kL-RB for ietf-http-wg-dist@listhub.w3.org; Tue, 15 Jan 2013 07:48:15 +0000
Resent-Date: Tue, 15 Jan 2013 07:48:15 +0000
Resent-Message-Id: <E1Tv1Fn-0003kL-RB@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1Tv1Fj-0003jI-EN for ietf-http-wg@listhub.w3.org; Tue, 15 Jan 2013 07:48:11 +0000
Received: from mail-ie0-f180.google.com ([209.85.223.180]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1Tv1FT-0002lF-28 for ietf-http-wg@w3.org; Tue, 15 Jan 2013 07:48:10 +0000
Received: by mail-ie0-f180.google.com with SMTP id c10so6857097ieb.11 for <ietf-http-wg@w3.org>; Mon, 14 Jan 2013 23:47:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=9AUOQVUJvKvxlVRJyw5EF9mdHApx+TKU9noN3u97P68=; b=VSoSQkfKnVcWupeiTXFr+m15YZCUT+p+iItkFBrOirMt476SNxj0HMAnhdid8A/lH0 FRGc7YkmnjTvyi+iSZXYxGmltv/0FBHnI0SDi73jj+44WsCkzG38f7kGQXUkDruktxWy XGebj79PojbMLFOhS9h6BcowF2AzUlOt10wNAUTadxkNKrskpaklQSDRrjiTmYS8SGvl hwq3jJ8OUsF8cTMF6TT9lIYWZoVGKxR9FH8hk5gVi7RrOFXfSpl5XP2HmZGMLEuKOCcs Oex20ISj4dkL9OtZj76SXG+E8ABIIKvmvfenjdFsuBwh0jZGYyyW2xA4qErguQwMFNWr q65Q==
Received: by 10.42.32.71 with SMTP id c7mr66593119icd.35.1358236049316; Mon, 14 Jan 2013 23:47:29 -0800 (PST)
MIME-Version: 1.0
Received: by 10.64.26.137 with HTTP; Mon, 14 Jan 2013 23:47:09 -0800 (PST)
In-Reply-To: <CAP+FsNfUHf6bUBoUF-qUbuadZzvG6BLiTuJUVtFdL3ERiVDOyg@mail.gmail.com>
References: <CABP7Rbe-B89vVm8=OnHtAG0Y3G2UOysX+DKaTQ3+rAKBJBJyKA@mail.gmail.com> <CABP7Rbc30mXrQyGE_hCQd1ydbNFcOhrGC-Mi32+tX7aqxrEjpw@mail.gmail.com> <CAP+FsNcjWOJORRkEUeY6zos9jCc8931yp6g4n0nS-mQq9LVPbg@mail.gmail.com> <CABP7RbdzVuaA0sp+F37+cMehnfKK2qtOSZKMQyKq0xXHpoMRtg@mail.gmail.com> <CAP+FsNfUHf6bUBoUF-qUbuadZzvG6BLiTuJUVtFdL3ERiVDOyg@mail.gmail.com>
From: James M Snell <jasnell@gmail.com>
Date: Mon, 14 Jan 2013 23:47:09 -0800
Message-ID: <CABP7Rbe-WEy3M+kXShwwngbB-wUhQkZjS+9onXGynwsxkDLnaQ@mail.gmail.com>
To: Roberto Peon <grmocg@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="bcaec517cda83d9d6804d34ef926"
Received-SPF: pass client-ip=209.85.223.180; envelope-from=jasnell@gmail.com; helo=mail-ie0-f180.google.com
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: AWL=-2.710, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Tv1FT-0002lF-28 0763d39cf69ae8ee459233f95a4e3552
X-Original-To: ietf-http-wg@w3.org
Subject: Re: bohe implementation for compression tests
Archived-At: <http://www.w3.org/mid/CABP7Rbe-WEy3M+kXShwwngbB-wUhQkZjS+9onXGynwsxkDLnaQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/15873
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Oh yes, I fully understand... like I said, this is not intended to make it
impossible, just harder, and the idea is to throw some variations on the
table so we can get a good comparison of the options. For this particular
variation, there are only two buckets. I tested with three and the
compression efficiency drops off by an order of magnitude. With two buckets
we are able to maintain a compression ratio roughly on par with http1-gzip
and consistently better than delta. But, like you said, it's not a "fix"..
it just makes attacks like CRIME harder to execute by some degree.


On Mon, Jan 14, 2013 at 9:34 PM, Roberto Peon <grmocg@gmail.com> wrote:

> I'm trying to express, I guess, that as the number of buckets goes up, the
> compression becomes correspondingly less effective, and it has a limited
> (eventually constant) security benefit after enough requests.
>
> I believe this ends up being a binomial distribution thus, for k buckets
> and k requests I believe that we end with a probability of hitting a bucket
> that does not have the sensitive data of only 1/e (increasing security by
> roughly 36%) while decreasing compression effectiveness by strictly worse
> than 54% (i.e. 1-1/e at the limit).
>
> The attackers are better math geeks than I am. This may explain my
> wariness to play with any kind of stream compressor and make assertions
> about safety :)
> -=R
>
>
> On Mon, Jan 14, 2013 at 7:07 PM, James M Snell <jasnell@gmail.com> wrote:
>
>> The randomization is not intended to make reverse engineering impossible,
>> just a large degree more difficult (and potentially more impractical). Used
>> in combination with other strategies -- such as moving away from the use of
>> unauthenticated session ids in cookies -- it could prove effective enough
>> to get the job done.
>>
>>
>> On Mon, Jan 14, 2013 at 7:01 PM, Roberto Peon <grmocg@gmail.com> wrote:
>>
>>> An attacker would run its experiment some increased number of times and
>>> still get information about what is in the context, assuming i am
>>> understanding it correctly. :/
>>>
>>> -=R
>>> On Jan 14, 2013 5:33 PM, "James M Snell" <jasnell@gmail.com> wrote:
>>>
>>> Just continuing the investigation on various compression strategies. I
>>> spent part of the day going through delta to make sure I understand it and
>>> how it compares with bohe... I'll have some additional thoughts (and
>>> concerns) with regards to that later on... The other half of the day has
>>> been spent with various other bohe variations. Late in the after I hit upon
>>> a particularly interesting variation... I've checked it in here:
>>> https://github.com/jasnell/compression-test/tree/master/compressor/bohe4
>>>
>>> This variation encodes headers and randomly assigns them to one of two
>>> separate buckets. Those are then randomly ordered and compressed using two
>>> separate compressor instances within the header block...
>>>
>>> # +-------------+--------------------------+
>>> # | num_headers |   block 1 len (4 bytes)  |
>>> # +-------------+--------------------------+
>>> # |        compressed header block 1       |
>>> # +----------------------------+-----------+
>>> # |  block 2 len (4 bytes)     |           |
>>>  # +----------------------------+           |
>>> # |        compressed header block 2       |
>>> # +----------------------------+-----------+
>>>
>>> Because of the randomization, there is no way of determining in advance
>>> which block any individual piece of data will land... making it much harder
>>> for an attacker to use the compression ratio to reverse engineer any
>>> particular value... every time the information is sent, it can be in a
>>> different location. You can take the exact same request and encode it
>>> multiple times and end up with a different message size every time (up to a
>>> given limit, of course).
>>>
>>> Some numbers from various test runs... note how bohe4 produces variable
>>> compression ratios given identical input.
>>>
>>> ./compare_compressors.py -c bohe -c bohe4 -c delta -t
>>> /Users/james/git/http_samples/mnot/wikipedia.org.har
>>> 408 req messages processed
>>>              compressed | ratio min   max   std
>>> req  bohe        10,784 | 0.13  0.05  0.65  0.07
>>> req bohe4        13,496 | 0.16  0.05  0.69  0.08
>>>  req delta        16,725 | 0.20  0.04  0.72  0.09
>>> req http1        84,388 | 1.00  1.00  1.00  0.00
>>>
>>> 408 res messages processed
>>>              compressed | ratio min   max   std
>>> res  bohe        19,882 | 0.25  0.06  0.58  0.10
>>> res bohe4        20,610 | 0.26  0.09  0.63  0.09
>>> res delta        24,523 | 0.30  0.04  0.60  0.12
>>> res http1        80,613 | 1.00  1.00  1.00  0.00
>>>
>>> ./compare_compressors.py -c bohe -c bohe4 -c delta -t
>>> /Users/james/git/http_samples/mnot/wikipedia.org.har
>>> 408 req messages processed
>>>              compressed | ratio min   max   std
>>> req  bohe        10,784 | 0.13  0.05  0.65  0.07
>>> req bohe4        13,820 | 0.16  0.07  0.67  0.08
>>> req delta        16,725 | 0.20  0.04  0.72  0.09
>>> req http1        84,388 | 1.00  1.00  1.00  0.00
>>>
>>> 408 res messages processed
>>>              compressed | ratio min   max   std
>>> res  bohe        19,882 | 0.25  0.06  0.58  0.10
>>> res bohe4        21,644 | 0.27  0.09  0.61  0.09
>>> res delta        24,523 | 0.30  0.04  0.60  0.12
>>> res http1        80,613 | 1.00  1.00  1.00  0.00
>>>
>>> Again, this is just intended as fodder for discussion right now. I'll
>>> have some comments specifically on delta encoding tomorrow sometime.
>>>
>>> - James
>>>
>>>
>>> On Thu, Jan 10, 2013 at 11:08 AM, James M Snell <jasnell@gmail.com>wrote:
>>>
>>>> I have an initial bohe implementation for the compression tests... it's
>>>> very preliminary and uses the same gzip compression as the current spdy3.
>>>> I'm going to be playing around with the delta compression mechanism as well
>>>> and see how much of an impact that has. Initial results are very promising
>>>> but I haven't done much debugging yet. Just wanted folks to know that this
>>>> work was underway...
>>>>
>>>> https://github.com/jasnell/compression-test/tree/master/compressor/bohe
>>>>
>>>> Some test runs....
>>>>
>>>> ./compare_compressors.py -c bohe -c spdy3 -c delta
>>>> ../http_samples/mnot/amazon.com.har
>>>> 732 req messages processed
>>>>              compressed | ratio min   max   std
>>>> req  bohe        26,122 | 0.13  0.04  0.70  0.08
>>>> req delta        33,955 | 0.17  0.02  0.71  0.09
>>>> req http1       195,386 | 1.00  1.00  1.00  0.00
>>>> req spdy3        27,238 | 0.14  0.04  0.71  0.08
>>>>
>>>> 732 res messages processed
>>>>              compressed | ratio min   max   std
>>>> res  bohe        39,628 | 0.25  0.04  0.66  0.07
>>>> res delta        44,499 | 0.28  0.02  0.65  0.09
>>>> res http1       159,968 | 1.00  1.00  1.00  0.00
>>>> res spdy3        41,325 | 0.26  0.04  0.67  0.08
>>>>
>>>>
>>>> ./compare_compressors.py -c bohe -c spdy3 -c delta
>>>> ../http_samples/mnot/craigslist.org.har
>>>> 66 req messages processed
>>>>              compressed | ratio min   max   std
>>>> req  bohe         1,948 | 0.15  0.06  0.73  0.11
>>>> req delta         2,036 | 0.16  0.07  0.71  0.11
>>>> req http1        12,894 | 1.00  1.00  1.00  0.00
>>>> req spdy3         2,016 | 0.16  0.07  0.75  0.11
>>>>
>>>> 66 res messages processed
>>>>              compressed | ratio min   max   std
>>>> res  bohe         1,786 | 0.18  0.07  0.77  0.13
>>>> res delta         2,858 | 0.28  0.08  0.69  0.12
>>>> res http1        10,147 | 1.00  1.00  1.00  0.00
>>>> res spdy3         1,869 | 0.18  0.09  0.78  0.13
>>>>
>>>>
>>>> ./compare_compressors.py -c bohe -c spdy3 -c delta
>>>> ../http_samples/mnot/flickr.com.har
>>>> 438 req messages processed
>>>>              compressed | ratio min   max   std
>>>> req  bohe        11,988 | 0.10  0.02  0.69  0.07
>>>> req delta        26,372 | 0.22  0.01  0.71  0.14
>>>> req http1       121,854 | 1.00  1.00  1.00  0.00
>>>> req spdy3        12,550 | 0.10  0.02  0.71  0.07
>>>>
>>>> 438 res messages processed
>>>>              compressed | ratio min   max   std
>>>> res  bohe        13,073 | 0.09  0.05  0.66  0.06
>>>> res delta        25,236 | 0.18  0.02  0.70  0.11
>>>> res http1       140,457 | 1.00  1.00  1.00  0.00
>>>> res spdy3        14,142 | 0.10  0.05  0.66  0.06
>>>>
>>>>
>>>> ./compare_compressors.py -c bohe -c spdy3 -c delta
>>>> ../http_samples/mnot/facebook.com.har
>>>> 234 req messages processed
>>>>              compressed | ratio min   max   std
>>>> req  bohe         6,091 | 0.15  0.06  0.78  0.07
>>>> req delta         7,800 | 0.19  0.02  0.70  0.07
>>>> req http1        41,980 | 1.00  1.00  1.00  0.00
>>>> req spdy3         6,301 | 0.15  0.06  0.77  0.07
>>>>
>>>> 234 res messages processed
>>>>              compressed | ratio min   max   std
>>>> res  bohe         9,458 | 0.23  0.07  0.68  0.07
>>>> res delta        12,045 | 0.30  0.13  0.60  0.08
>>>> res http1        40,252 | 1.00  1.00  1.00  0.00
>>>> res spdy3         9,788 | 0.24  0.07  0.69  0.07
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>