Re: Dictionary Compression for HTTP (at Facebook)

Patrick McManus <mcmanus@ducksong.com> Fri, 21 September 2018 22:58 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0045E130EEF for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Sep 2018 15:58:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.749
X-Spam-Level:
X-Spam-Status: No, score=-2.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=ducksong.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GbkbJrkifhj5 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Sep 2018 15:58:32 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [IPv6:2603:400a:ffff:804:801e:34:0:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EBF03130ED0 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 21 Sep 2018 15:58:31 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.89) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1g3ULB-0004HD-4y for ietf-http-wg-dist@listhub.w3.org; Fri, 21 Sep 2018 22:56:01 +0000
Resent-Date: Fri, 21 Sep 2018 22:56:01 +0000
Resent-Message-Id: <E1g3ULB-0004HD-4y@frink.w3.org>
Received: from titan.w3.org ([2603:400a:ffff:804:801e:34:0:4c]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <mcmanus@ducksong.com>) id 1g3UL5-0004GV-VU for ietf-http-wg@listhub.w3.org; Fri, 21 Sep 2018 22:55:55 +0000
Received: from outbound1.eu.mailhop.org ([52.28.251.132]) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <mcmanus@ducksong.com>) id 1g3UL3-00068m-LP for ietf-http-wg@w3.org; Fri, 21 Sep 2018 22:55:55 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ducksong.com; s=duo-1537391512170-ea99bbb3; h=content-type:cc:to:subject:message-id:date:from:references:in-reply-to: mime-version:from; bh=QfnlQnA5XMHKYVR+tgzLmfPP/jt699YqF9YOz95kfi8=; b=HzTvMurQ1zy5l396GK49Dnxoxl4+ilUEYff1+MPADHtLKnrJfCZ9opq5r1Dbsw2TkM2ne7MErFoJN r61MZFpeU9o7MHLPpyX/S1LHAwgFGlK/22/aPUix6/JI5N8k94juvE2R4TD87V3WQ6yDWErqYeaPe3 DTAwFFXDlVE68JyA=
X-MHO-RoutePath: bWNtYW51cw==
X-MHO-User: 6eb96091-bdf1-11e8-af31-edadc92cdc1a
X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information
X-Originating-IP: 209.85.218.54
X-Mail-Handler: DuoCircle Outbound SMTP
Received: from mail-oi0-f54.google.com (unknown [209.85.218.54]) by outbound1.eu.mailhop.org (Halon) with ESMTPSA id 6eb96091-bdf1-11e8-af31-edadc92cdc1a; Fri, 21 Sep 2018 22:55:29 +0000 (UTC)
Received: by mail-oi0-f54.google.com with SMTP id x197-v6so12726560oix.5 for <ietf-http-wg@w3.org>; Fri, 21 Sep 2018 15:55:28 -0700 (PDT)
X-Gm-Message-State: APzg51ApV653ZUUZX0w7j9Dg6oNXz5gjGiaF7cp5bBKS8OlaN4cnF4HD fr7jII/1YpCjMByr0csn3S29BnoXHUu1pD8GmdU=
X-Google-Smtp-Source: ANB0VdaRTHUQLGndF8S8pB92lZbDY6j9iA9Xnh9hBfAvsnFySXNirecZcMbnB+JppCTAoslSzYYLeh58QeGhsOFnRkM=
X-Received: by 2002:aca:91a:: with SMTP id 26-v6mr382663oij.33.1537570527410; Fri, 21 Sep 2018 15:55:27 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a4a:5012:0:0:0:0:0 with HTTP; Fri, 21 Sep 2018 15:55:26 -0700 (PDT)
In-Reply-To: <38bd7ae4-c7f1-f547-029c-139b039d222a@fb.com>
References: <18eb0343-640c-8b95-1cc2-273bc72ec134@fb.com> <CAPapA7RLncAsHH5pr5RJSYjvPiNk8JvgBJ8T-tKebnC1C5ptHw@mail.gmail.com> <ED51E194-503A-4339-B564-A6543F42D0A1@mnot.net> <652edc11-2d19-aef9-e3fd-ecb77ab47c1a@fb.com> <CAErg=HH7bqarp4e=mj_4rSfJwi6ycECOT1Wf1t-HttGAzO8RJw@mail.gmail.com> <38bd7ae4-c7f1-f547-029c-139b039d222a@fb.com>
From: Patrick McManus <mcmanus@ducksong.com>
Date: Fri, 21 Sep 2018 18:55:26 -0400
X-Gmail-Original-Message-ID: <CAOdDvNqU8SGoguH=+j1HqepSqKbK+JnNZ6dN8SKaju=ENimXrg@mail.gmail.com>
Message-ID: <CAOdDvNqU8SGoguH=+j1HqepSqKbK+JnNZ6dN8SKaju=ENimXrg@mail.gmail.com>
To: Felix Handte <felixh@fb.com>
Cc: Ryan Sleevi <ryan-ietf@sleevi.com>, Mark Nottingham <mnot@mnot.net>, "jyrki@google.com" <jyrki@google.com>, "chaals@yandex-team.ru" <chaals@yandex-team.ru>, "eustas@google.com" <eustas@google.com>, Vlad Krasnov <vlad@cloudflare.com>, Nick Terrell <terrelln@fb.com>, Yann Collet <cyan@fb.com>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="0000000000001a85cf05766989f1"
X-W3C-Hub-Spam-Status: No, score=-6.7
X-W3C-Hub-Spam-Report: AWL=1.254, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1g3UL3-00068m-LP 7b32fd6afead2eeca17993dce9cee891
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Dictionary Compression for HTTP (at Facebook)
Archived-At: <https://www.w3.org/mid/CAOdDvNqU8SGoguH=+j1HqepSqKbK+JnNZ6dN8SKaju=ENimXrg@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/35924
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi Felix,

On Fri, Sep 21, 2018 at 5:31 PM, Felix Handte <felixh@fb.com> wrote:

> Very well, I will attempt to grab the bull by the horns, then. Let's
> talk security.
>
> I guess my first question is this: What is the acceptance criterion for
> proposals in this space with respect to security? From my survey of
>

you are not going to be able to pre-negotiate working group acceptance
criteria. The criteria is what it always is - rough consensus on a draft
from the working group and the approval of the IESG.

But to help with more background, the past concern that has been there
hasn't been sufficient/proactive analysis of the various proposals - and
given that the mixture of compression of encryption is known to be a
problem (as you mention) a bar of "no known problems" hasn't been enough to
get anywhere near rough consensus. I believe people wanted to see a
proactive analysis of what the concerns of a particular proposal are. At
that point we can debate whether they are reasonable or not for their
anticipated gains.

make sense? You're certainly going in a reasonable direction considering
the interactions of dictionaries, what attackers control, and the ways in
which public and private data are mixed. Of course confidentiality can
apply to 'public data' as well and its not clear how/if folks would want to
handle that.


> previous conversations on this topic, it has sounded like the bar that
> proposals are being held to is that they are expected not to have any
> vulnerabilities. This is of course a reasonable expectation in general.
> However, compression as it exists in HTTP is well known to have security
> flaws (primarily, BREACH and its extensions). Given that flawed status
> quo, in order to clear that bar, a new proposal would not only have to
> avoid introducing new vulnerabilities, it would have to solve existing
> ones.
>
> If we are going to make a serious attempt to fix BREACH et al., let's do
> so. Otherwise, let's hold compression work to a practical bar, which is
> to avoid introducing new security issues and to avoid making existing
> ones worse.
>
> If we accept that criterion, my question becomes whether there are known
> issues that would prevent the use of dictionary compression? Many people
> have invoked the idea of security concerns to explain their hesitancy to
> pursue solutions in this space. Despite the frequency with which they're
> brought up, I haven't seen any specific allegations that describe a
> vulnerability introduced by dictionary-based compression. Are there
> known attacks that are made possible or improved by the use of
> dictionaries?
>
> Obviously the above question is hugely dependent on how dictionaries are
> sourced. Since that's an open question, my sense is that it's probably
> best to look at the narrowest possible scope first and then work our way
> out from there. So I'm particularly curious whether there are known
> issues even when you leave out the challenges of dictionary creation /
> distribution / etc., when you just use statically-defined dictionaries.
>
> In particular, BREACH and friends describe the dangers of mixing private
> data and attacker-controlled data in the same compression window.
> Dictionary-based compression mixes a presumably public dictionary with
> private data. Is that sufficient to enable attacks? Or if you have
> dictionary + private data + attacker data, is that easier to attack than
> in the absence of a dictionary?
>
> I'll follow up with my own impressions of the security concerns and
> possible mitigations soon.
>
> - Felix
>
> On 08/31/2018 07:58 AM, Ryan Sleevi wrote:
> >
> >
> > On Fri, Aug 24, 2018 at 6:24 AM Felix Handte <felixh@fb.com
> > <mailto:felixh@fb.com>> wrote:
> >
> >     For our own part, we find ourselves drawn towards a solution that
> >     makes a lot of the same choices as SDCH. That is, one that treats
> >     dictionaries as explicit resources that can be dynamically
> >     advertised by an origin, fetched and cached by a client, and then
> >     negotiated to be used in requests/responses between the two. The
> >     ability to treat a previous, cached response as a base on which to
> >     apply a "diff" (negotiated by ETag?) is also attractive to us.
> >
> >
> > I would strongly advise against such solutions, as they are a
> > significant part of why SDCH support was removed from browsers.
> >
> > I think, to the set of concerns you need to consider in any such
> > solution (which, in my mind, demonstrating the security concerns can be
> > mitigated is paramount of those), you need to define not only the
> > interaction in the 'simple' HTTP sense of Request/Response pairs, but
> > also in the complexity of those interactions as they apply to browsers,
> > for which concerns like same-origin versus cross-origin apply, the
> > re-ordering of requests, and the potential of multiple requests
> > proceeding simultaneously (which H/2 also has to countenance). This also
> > further interacts with models of cache storage and in-memory
> > representation - challenges such as "What happens if a dictionary
> > expires midway during the processing of a response" were fairly fatal,
> > as were the issues around TOCTOU - that is, advertising a dictionary
> > from a request, making a request with said dictionary, and finding it
> > was evicted from the cache prior to the response.
> >
> > Models such as the approach by vkrasnov h2-compression-dictionaries are
> > substantially superior in these respects, because it more closely models
> > and defines these interactions, through the association with and scoping
> > to a single H/2 resource.
> >
> > It might be that your concern is not the dominant HTTP case of browsers,
> > in which case, it may be fine to ignore these. But I think, from the
> > experiences implementing and maintaining SDCH, models that approximate
> > that space (of resourced dictionaries, advertisements, etc) are likely
> > to be too great an implementation cost, and too great a cognitive cost
> > to the predictability of the platform, to see any meaningful adoption.
> >
> > Of course, this is all after the security concerns are mitigated ;)
>