Re: Choosing a header compression algorithm

Roberto Peon <grmocg@gmail.com> Fri, 29 March 2013 18:42 UTC

Resent-Date: Fri, 29 Mar 2013 18:39:55 +0000
Resent-Message-Id: <E1ULeDT-0001wk-76@frink.w3.org>
MIME-Version: 1.0
In-Reply-To: <6C71876BDCCD01488E70A2399529D5E5163F6C3B@ADELE.crf.canon.fr>
References: <254AABEE-22B9-418E-81B0-2729902C4413@mnot.net> <A14105FB-ED1A-4B70-8840-9648847BCC3A@mnot.net> <6C71876BDCCD01488E70A2399529D5E5163F3C67@ADELE.crf.canon.fr> <CAP+FsNfFohSwrX2DxthNcnn+wDj6T5W7xpcg4yA56Gvt_nP3_Q@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F3D72@ADELE.crf.canon.fr> <7CA7F3EB-A492-471A-8AC4-23293DD10840@mnot.net> <6C71876BDCCD01488E70A2399529D5E5163F4076@ADELE.crf.canon.fr> <CAP+FsNdztfCJjvP58ryVXDRgGyGSPO-37gRMjAuwikz2eviBiw@mail.gmail.com> <CAP+FsNdQ7mNbsaAiUqEF22Oh8KMaK3UWUWFzWE=K0jQbkM7t1Q@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F4263@ADELE.crf.canon.fr> <CAP+FsNeXvn6UasR6e56A7pHtrkXg6A0NnrVGmRYNW49Qu2vxqg@mail.gmail.com> <CAP+FsNfy0ognTBaF7USBcP9dTL5bQsruav30FmGB70m_0JfjyA@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F68FE@ADELE.crf.canon.fr> <CAP+FsNee45EUbzRyF+58gkytX1WvD_cPJu_GW7N7YKE5NhszVw@mail.gmail.com> <6C71876BDCCD01488E70A2399529D5E5163F6C3B@ADELE.crf.canon.fr>
Date: Fri, 29 Mar 2013 11:39:13 -0700
Message-ID: <CAP+FsNddv3EHwJ+v+74ycGXc2-8BV-w6KkC1zGDSnaY60uGJow@mail.gmail.com>
From: Roberto Peon <grmocg@gmail.com>
To: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Cc: "agl@google.com" <agl@google.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="20cf302239fd764c1204d9149692"
Received-SPF: pass client-ip=209.85.214.52; envelope-from=grmocg@gmail.com; helo=mail-bk0-f52.google.com
Subject: Re: Choosing a header compression algorithm
Archived-At: <http://www.w3.org/mid/CAP+FsNddv3EHwJ+v+74ycGXc2-8BV-w6KkC1zGDSnaY60uGJow@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list

Herve--

With the way you've implemented prefix matching, there are no guarantees on
security whatsoever.
In order to provide a guarantee of at least a minimum amount of effort on
the part of the attacker, you MUST provide guarantees on the minimum
subsequence sizes, as this relates directly to the amount of tries...
As an example,
  foo.com/a/b/c
will be exceptionally easy to figure out, as each subsequence will be
guessed in a very small amount of time.

There are many caveats and gotchas in this space, and I feel uncomfortable
that you're making assertions about safety unless you've got some security
folks looking at it critically.
I've done that for the delta compressor, and I suggest you do the same. If
it turns out that we do get consensus from security folks that some kind of
prefix matching is safe, I'm happy to add it back into delta. Until then, I
don't want to go down the rabbit hole!
-=R


On Fri, Mar 29, 2013 at 8:26 AM, RUELLAN Herve
<Herve.Ruellan@crf.canon.fr>wrote:

>
>
> > -----Original Message-----
> > From: Roberto Peon [mailto:grmocg@gmail.com]
> > Sent: jeudi 28 mars 2013 20:17
> > To: RUELLAN Herve
> > Cc: agl@google.com; Mark Nottingham; ietf-http-wg@w3.org Group
> > Subject: Re: Choosing a header compression algorithm
> >
> > The net effect of the character-delimited prefix match you propose is
> that an
> > attacker must only brute force each subsequence which matches [^/&=,
> > ]*([^/&=, ]|$) ,
> >
> > Running the numbers, that is:
> >
> >       k^s * m
> >
> > where:
> >
> >       k = possible characters
> >       s = subsequence length or characters not ending in [^/&= ,] or end-
> > of-line
> >       m = number of subsequences
> >
> >
> >
> > instead of full atom matching, which is (is a good special case of above
> where
> > m is always 1 and s is always n):
> >
> >       k^n
> >
> >
> > where:
> >
> >
> >       k = possible characters
> >       n = length of field
> >
> >
> >
> > In other words, full atom matching is *exponentially* more difficult (a
> > desirable trait)!
> >
> > In the example above, assuming 27 commonly used characters per entry, I'm
> > very likely to completely discover that data in:
> >
> >       http://www.example.com/path/first/myfile
> >
> > in:
> >
> >       27^4 + 27^5 + 27^6 tries
>
> This is roughly 400 million tries. And this figure is obtained using
> several assumptions that may not hold:
> - The number of possible characters is only 27.
> - The length of each chunk is known.
>
> > Note that it is likely that an attacker would be able to do
> substantially better
> > than above if they know the letter frequency (very likely) or have
> similar
> > data:
> > In the case of a whole atom match, I'd discover the data in:
> >
> >
> >       27^(17) tries
> >
> >
> > So, full atom matching is ~ 5 quadrillion (5,353,441,417,462,320) times
> more
> > difficult... that is a lot.
>
> True full atom matching is much harder, but limited prefix matching is
> already very hard.
>
> We must also keep in mind that each try correspond to a request that the
> attacker for the client to do. Such a huge number of requests should be
> detected somewhere, just in order to prevent DoS attacks.
>
> > When I originally considered prefix matching in the past, this was the
> math
> > that I did, and I decided that I was not confident that compressor
> > implementors would ensure that their encoder prevents prefix matching
> > when the subsequence lengths are too small (and thus easily attacked).
> The
> > same consideration applies to dynamic huffman coding. It *CAN* be safe,
> > but it requires complexity and computation, and I think there is
> significant
> > risk that implementors may knowing or unknowingly sacrifice security. It
> is
> > actually more safe to ignore delimiters and allow for a match of a
> prefix so
> > long as it is greater than a certain length. At least in that case there
> is
> > guarantee of the number of times that it must be brute forced...
> >
> > Full-atom matching is simply much more difficult to get wrong from a
> security
> > perspective, and it results in pretty decent compression...
>
> I think that prefix matching with constrained ending is fairly secure: it
> compels an attacker to used brute force to break it. In addition, it
> provides very interesting results regarding compression.



>
> Hervé.
>
> > -=R
>
>

Re: Choosing a header compression algorithm Stephen Farrell
Choosing a header compression algorithm Mark Nottingham
RE: Choosing a header compression algorithm RUELLAN Herve
Re: Choosing a header compression algorithm James M Snell
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm Mark Nottingham
Re: Choosing a header compression algorithm Roberto Peon
RE: Choosing a header compression algorithm RUELLAN Herve
Re: Choosing a header compression algorithm Roberto Peon
RE: Choosing a header compression algorithm RUELLAN Herve
Re: Choosing a header compression algorithm Martin Thomson
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm Mark Nottingham
Re: Choosing a header compression algorithm Mark Nottingham
RE: Choosing a header compression algorithm RUELLAN Herve
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm Roberto Peon
RE: Choosing a header compression algorithm RUELLAN Herve
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm James M Snell
Re: Choosing a header compression algorithm Roberto Peon
RE: Choosing a header compression algorithm RUELLAN Herve
RE: Choosing a header compression algorithm RUELLAN Herve
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm Roberto Peon
RE: Choosing a header compression algorithm RUELLAN Herve
Re: Choosing a header compression algorithm Roberto Peon
Re: Choosing a header compression algorithm James M Snell