Re: Choosing a header compression algorithm

Roberto Peon <grmocg@gmail.com> Thu, 28 March 2013 19:18 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 273A021F905A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 28 Mar 2013 12:18:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -11.162
X-Spam-Level:
X-Spam-Status: No, score=-11.162 tagged_above=-999 required=5 tests=[AWL=0.836, BAYES_00=-2.599, GB_I_LETTER=-2, HTML_MESSAGE=0.001, J_CHICKENPOX_62=0.6, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lbq4IurLh8zv for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 28 Mar 2013 12:18:30 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id C00AE21F9049 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 28 Mar 2013 12:18:30 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ULIKL-0005Tz-NH for ietf-http-wg-dist@listhub.w3.org; Thu, 28 Mar 2013 19:17:33 +0000
Resent-Date: Thu, 28 Mar 2013 19:17:33 +0000
Resent-Message-Id: <E1ULIKL-0005Tz-NH@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1ULIK9-0005Sm-I3 for ietf-http-wg@listhub.w3.org; Thu, 28 Mar 2013 19:17:21 +0000
Received: from mail-ob0-f178.google.com ([209.85.214.178]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1ULIK8-0006Rj-1k for ietf-http-wg@w3.org; Thu, 28 Mar 2013 19:17:21 +0000
Received: by mail-ob0-f178.google.com with SMTP id wd20so9420873obb.23 for <ietf-http-wg@w3.org>; Thu, 28 Mar 2013 12:16:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=+67b8BcAjGt70YSeDW3/kgFKUvbrNMQb+7utOfZRdUw=; b=j9R0GAEFXsMBW/6VVL2megv59MiXA7xE3uMRX0swCSMK0xKhLnxap3/JkZ75MRtkl+ gaebjHQfvq+ZfSAHfhD5dztcCYDqeK5q271g2QheviTGxFu7BraIq0zNAl7qM19I1tZQ RaZbKOnb0G0ZE85hOm9DCZhtddEw4h/g/cD/0kh8d+1pgyVscs8hX1wkUz1DHh+zLxpF mZmv1rjptGAaF3Fdlh33wwDn24fLpJd4ER+mCEQZS18QeHqCFo/Z8I64/UOlJLb7IrDk sipRuWVwEAD2bQBCwTj31M+c1RQGs5zUqdLpGm8244AyA++5E4mOLGg0OKCOP0z+77LD cQiA==
MIME-Version: 1.0
X-Received: by 10.182.23.101 with SMTP id l5mr6016218obf.16.1364498213969; Thu, 28 Mar 2013 12:16:53 -0700 (PDT)
Received: by 10.76.109.72 with HTTP; Thu, 28 Mar 2013 12:16:53 -0700 (PDT)
In-Reply-To: <6C71876BDCCD01488E70A2399529D5E5163F68FE@ADELE.crf.canon.fr>
References: <254AABEE-22B9-418E-81B0-2729902C4413@mnot.net> <A14105FB-ED1A-4B70-8840-9648847BCC3A@mnot.net> <6C71876BDCCD01488E70A2399529D5E5163F3C67@ADELE.crf.canon.fr> <CAP+FsNfFohSwrX2DxthNcnn+wDj6T5W7xpcg4yA56Gvt_nP3_Q@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F3D72@ADELE.crf.canon.fr> <7CA7F3EB-A492-471A-8AC4-23293DD10840@mnot.net> <6C71876BDCCD01488E70A2399529D5E5163F4076@ADELE.crf.canon.fr> <CAP+FsNdztfCJjvP58ryVXDRgGyGSPO-37gRMjAuwikz2eviBiw@mail.gmail.com> <CAP+FsNdQ7mNbsaAiUqEF22Oh8KMaK3UWUWFzWE=K0jQbkM7t1Q@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F4263@ADELE.crf.canon.fr> <CAP+FsNeXvn6UasR6e56A7pHtrkXg6A0NnrVGmRYNW49Qu2vxqg@mail.gmail.com> <CAP+FsNfy0ognTBaF7USBcP9dTL5bQsruav30FmGB70m_0JfjyA@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F68FE@ADELE.crf.canon.fr>
Date: Thu, 28 Mar 2013 15:16:53 -0400
Message-ID: <CAP+FsNee45EUbzRyF+58gkytX1WvD_cPJu_GW7N7YKE5NhszVw@mail.gmail.com>
From: Roberto Peon <grmocg@gmail.com>
To: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Cc: "agl@google.com" <agl@google.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="f46d043749e157761e04d900ffaa"
Received-SPF: pass client-ip=209.85.214.178; envelope-from=grmocg@gmail.com; helo=mail-ob0-f178.google.com
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: AWL=-2.680, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1ULIK8-0006Rj-1k fe9ae704256e4110fef5905b857246d5
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Choosing a header compression algorithm
Archived-At: <http://www.w3.org/mid/CAP+FsNee45EUbzRyF+58gkytX1WvD_cPJu_GW7N7YKE5NhszVw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/17166
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

The net effect of the character-delimited prefix match you propose is that
an attacker must only brute force each subsequence which matches [^/&=,
]*([^/&=, ]|$) ,

Running the numbers, that is:

k^s * m

where:

k = possible characters
s = subsequence length or characters not ending in [^/&= ,] or end-of-line
m = number of subsequences



instead of full atom matching, which is (is a good special case of above
where m is always 1 and s is always n):

k^n

where:

k = possible characters
n = length of field



In other words, full atom matching is *exponentially* more difficult (a
desirable trait)!

In the example above, assuming 27 commonly used characters per entry, I'm
very likely to completely discover that data in:

http://www.example.com/path/first/myfile

in:

27^4 + 27^5 + 27^6 tries


Note that it is likely that an attacker would be able to do substantially
better than above if they know the letter frequency (very likely) or have
similar data:
In the case of a whole atom match, I'd discover the data in:

27^(17) tries


So, full atom matching is ~ 5 quadrillion (5,353,441,417,462,320) times
more difficult... that is a lot.


When I originally considered prefix matching in the past, this was the math
that I did, and I decided that I was not confident that compressor
implementors would ensure that their encoder prevents prefix matching when
the subsequence lengths are too small (and thus easily attacked). The same
consideration applies to dynamic huffman coding. It *CAN* be safe, but it
requires complexity and computation, and I think there is significant risk
that implementors may knowing or unknowingly sacrifice security. It is
actually more safe to ignore delimiters and allow for a match of a prefix
so long as it is greater than a certain length. At least in that case there
is guarantee of the number of times that it must be brute forced...

Full-atom matching is simply much more difficult to get wrong from a
security perspective, and it results in pretty decent compression...

-=R



On Thu, Mar 28, 2013 at 1:00 PM, RUELLAN Herve
<Herve.Ruellan@crf.canon.fr>wrote:

> > -----Original Message-----
> > From: Roberto Peon [mailto:grmocg@gmail.com]
> > Sent: jeudi 28 mars 2013 01:15
> > To: RUELLAN Herve
> > Cc: agl@google.com; Mark Nottingham; ietf-http-wg@w3.org Group
> > Subject: Re: Choosing a header compression algorithm
> >
> > I've checked in some changes to delta2 which expands and documents
> > various options for delta2 in the README.md.
> >
> > After running a number of variations of delta2, The following defaults
> look
> > good for small buffer sizes:
> >
> >
> > delta2=max_entries=256, small_index=1
> >
> > small_index basically says use a uint8 instead of a uint16 for
> representing
> > indices, and is the kind of thing that could be messaged somewhere
> (opcode,
> > flag, whatever).
> >
> > The best headerdiff option which I believe is safe against CRIME in the
> future
> > is:
> >   headerdiff=delta_type=false,Huffman
>
> I think that for headerdiff, the best option which is safe against CRIME
> is:
> headerdiff=delta_type='/&= \coma',Huffman
>
> >
> > I removed prefix matching from delta some months ago (~6 I think?) after
> > cogitating on it for a while and then speaking with security folks.. I
> just
> > couldn't come up with a way I could prove was safe, unlike the atom-
> > matching, which one can prove is no worse than a brute-force attack.
>
> The limited prefix matching defined above also need a brute-force attack
> to be broken.
>
> Hervé.
>
> >
> > I've appended runs with these values@4k buffer size for delta2 and
> > headerdiff below.
> > -=R
> >
> >
> >
> >
> > * TOTAL: 5949 req messages
> >
>                                               size  time | ratio
> > min   max   std
> >
>                                http1     3,460,925  0.13 |
> > 1.00  1.00  1.00  0.00
> >   delta2 (max_byte_size=4096, max_entries=256, small_index=1,
> > hg_adjust=0, implicit_hg_add=0, refcnt_vals=0)       664,683  4.16 |
> 0.19  0.02
> > 0.83  0.15
> >                                                          headerdiff
> (buffer=4096, delta_type=false,
> > huffman)       759,783  2.03 | 0.22  0.01  0.78  0.18
> >
> >
> > * TOTAL: 5948 res messages
> >
>                                               size  time | ratio
> > min   max   std
> >
>                                http1     2,186,162  0.12 |
> > 1.00  1.00  1.00  0.00
> >   delta2 (max_byte_size=4096, max_entries=256, small_index=1,
> > hg_adjust=0, implicit_hg_add=0, refcnt_vals=0)       585,475  5.32 |
> 0.27  0.02
> > 1.28  0.13
> >                                                          headerdiff
> (buffer=4096, delta_type=false,
> > huffman)       543,047  3.29 | 0.25  0.02  0.73  0.14
> >
> >
> >
> >
>
>