RE: Choosing a header compression algorithm

RUELLAN Herve <Herve.Ruellan@crf.canon.fr> Fri, 29 March 2013 15:27 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E6FF321F93E7 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 29 Mar 2013 08:27:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.707
X-Spam-Level:
X-Spam-Status: No, score=-10.707 tagged_above=-999 required=5 tests=[AWL=1.542, BAYES_00=-2.599, GB_I_LETTER=-2, HELO_EQ_FR=0.35, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X24f6Kwhd67r for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 29 Mar 2013 08:27:37 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 1ADFA21F9402 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 29 Mar 2013 08:27:36 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ULbDA-0002XB-PE for ietf-http-wg-dist@listhub.w3.org; Fri, 29 Mar 2013 15:27:24 +0000
Resent-Date: Fri, 29 Mar 2013 15:27:24 +0000
Resent-Message-Id: <E1ULbDA-0002XB-PE@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <Herve.Ruellan@crf.canon.fr>) id 1ULbCy-0002WB-F2 for ietf-http-wg@listhub.w3.org; Fri, 29 Mar 2013 15:27:12 +0000
Received: from inari-msr.crf.canon.fr ([194.2.158.67]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <Herve.Ruellan@crf.canon.fr>) id 1ULbCx-0001Gd-B0 for ietf-http-wg@w3.org; Fri, 29 Mar 2013 15:27:12 +0000
Received: from mir-msr.corp.crf.canon.fr (mir-msr.corp.crf.canon.fr [172.19.77.98]) by inari-msr.crf.canon.fr (8.13.8/8.13.8) with ESMTP id r2TFQe59027799; Fri, 29 Mar 2013 16:26:40 +0100
Received: from ADELE.crf.canon.fr (adele.fesl2.crf.canon.fr [172.19.70.17]) by mir-msr.corp.crf.canon.fr (8.13.8/8.13.8) with ESMTP id r2TFQeWX030002; Fri, 29 Mar 2013 16:26:40 +0100
Received: from ADELE.crf.canon.fr ([::1]) by ADELE.crf.canon.fr ([::1]) with mapi id 14.02.0342.003; Fri, 29 Mar 2013 16:26:40 +0100
From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
To: Roberto Peon <grmocg@gmail.com>
CC: "agl@google.com" <agl@google.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Thread-Topic: Choosing a header compression algorithm
Thread-Index: AQHOJgaI2I9gxWQLAUOB6sMRXNhzJpixMSEAgAB5jdCAAEKngIAAFPgQgAPbnwCAAMyn8IAACJEAgABgHoCAAJ7DUIAAi5aAgAH4HwCAASiv0IAAFmKAgAFZ7gA=
Date: Fri, 29 Mar 2013 15:26:39 +0000
Message-ID: <6C71876BDCCD01488E70A2399529D5E5163F6C3B@ADELE.crf.canon.fr>
References: <254AABEE-22B9-418E-81B0-2729902C4413@mnot.net> <A14105FB-ED1A-4B70-8840-9648847BCC3A@mnot.net> <6C71876BDCCD01488E70A2399529D5E5163F3C67@ADELE.crf.canon.fr> <CAP+FsNfFohSwrX2DxthNcnn+wDj6T5W7xpcg4yA56Gvt_nP3_Q@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F3D72@ADELE.crf.canon.fr> <7CA7F3EB-A492-471A-8AC4-23293DD10840@mnot.net> <6C71876BDCCD01488E70A2399529D5E5163F4076@ADELE.crf.canon.fr> <CAP+FsNdztfCJjvP58ryVXDRgGyGSPO-37gRMjAuwikz2eviBiw@mail.gmail.com> <CAP+FsNdQ7mNbsaAiUqEF22Oh8KMaK3UWUWFzWE=K0jQbkM7t1Q@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F4263@ADELE.crf.canon.fr> <CAP+FsNeXvn6UasR6e56A7pHtrkXg6A0NnrVGmRYNW49Qu2vxqg@mail.gmail.com> <CAP+FsNfy0ognTBaF7USBcP9dTL5bQsruav30FmGB70m_0JfjyA@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F68FE@ADELE.crf.canon.fr> <CAP+FsNee45EUbzRyF+58gkytX1WvD_cPJu_GW7N7YKE5NhszVw@mail.gmail.com>
In-Reply-To: <CAP+FsNee45EUbzRyF+58gkytX1WvD_cPJu_GW7N7YKE5NhszVw@mail.gmail.com>
Accept-Language: en-US, fr-FR
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [172.20.8.8]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Received-SPF: none client-ip=194.2.158.67; envelope-from=Herve.Ruellan@crf.canon.fr; helo=inari-msr.crf.canon.fr
X-W3C-Hub-Spam-Status: No, score=-4.1
X-W3C-Hub-Spam-Report: AWL=-2.777, RP_MATCHES_RCVD=-1.303
X-W3C-Scan-Sig: lisa.w3.org 1ULbCx-0001Gd-B0 9c80c2f6ec5de00b2ec56390c7b9f89d
X-Original-To: ietf-http-wg@w3.org
Subject: RE: Choosing a header compression algorithm
Archived-At: <http://www.w3.org/mid/6C71876BDCCD01488E70A2399529D5E5163F6C3B@ADELE.crf.canon.fr>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/17172
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

> -----Original Message-----
> From: Roberto Peon [mailto:grmocg@gmail.com]
> Sent: jeudi 28 mars 2013 20:17
> To: RUELLAN Herve
> Cc: agl@google.com; Mark Nottingham; ietf-http-wg@w3.org Group
> Subject: Re: Choosing a header compression algorithm
> 
> The net effect of the character-delimited prefix match you propose is that an
> attacker must only brute force each subsequence which matches [^/&=,
> ]*([^/&=, ]|$) ,
> 
> Running the numbers, that is:
> 
> 	k^s * m
> 
> where:
> 
> 	k = possible characters
> 	s = subsequence length or characters not ending in [^/&= ,] or end-
> of-line
> 	m = number of subsequences
> 
> 
> 
> instead of full atom matching, which is (is a good special case of above where
> m is always 1 and s is always n):
> 
> 	k^n
> 
> 
> where:
> 
> 
> 	k = possible characters
> 	n = length of field
> 
> 
> 
> In other words, full atom matching is *exponentially* more difficult (a
> desirable trait)!
> 
> In the example above, assuming 27 commonly used characters per entry, I'm
> very likely to completely discover that data in:
> 
> 	http://www.example.com/path/first/myfile
> 
> in:
> 
> 	27^4 + 27^5 + 27^6 tries

This is roughly 400 million tries. And this figure is obtained using several assumptions that may not hold:
- The number of possible characters is only 27.
- The length of each chunk is known.

> Note that it is likely that an attacker would be able to do substantially better
> than above if they know the letter frequency (very likely) or have similar
> data:
> In the case of a whole atom match, I'd discover the data in:
> 
> 
> 	27^(17) tries
> 
> 
> So, full atom matching is ~ 5 quadrillion (5,353,441,417,462,320) times more
> difficult... that is a lot.

True full atom matching is much harder, but limited prefix matching is already very hard.

We must also keep in mind that each try correspond to a request that the attacker for the client to do. Such a huge number of requests should be detected somewhere, just in order to prevent DoS attacks.

> When I originally considered prefix matching in the past, this was the math
> that I did, and I decided that I was not confident that compressor
> implementors would ensure that their encoder prevents prefix matching
> when the subsequence lengths are too small (and thus easily attacked). The
> same consideration applies to dynamic huffman coding. It *CAN* be safe,
> but it requires complexity and computation, and I think there is significant
> risk that implementors may knowing or unknowingly sacrifice security. It is
> actually more safe to ignore delimiters and allow for a match of a prefix so
> long as it is greater than a certain length. At least in that case there is
> guarantee of the number of times that it must be brute forced...
> 
> Full-atom matching is simply much more difficult to get wrong from a security
> perspective, and it results in pretty decent compression...

I think that prefix matching with constrained ending is fairly secure: it compels an attacker to used brute force to break it. In addition, it provides very interesting results regarding compression.

Hervé.

> -=R