Re: Dictionary Compression for HTTP (at Facebook)

Ryan Sleevi <ryan-ietf@sleevi.com> Sat, 22 September 2018 03:54 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3A4EE128B14 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Sep 2018 20:54:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.75
X-Spam-Level:
X-Spam-Status: No, score=-2.75 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=sleevi.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Wyzh5R31HEty for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Sep 2018 20:54:09 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [IPv6:2603:400a:ffff:804:801e:34:0:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 43B4D130DD8 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 21 Sep 2018 20:54:09 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.89) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1g3Ywr-0001af-LC for ietf-http-wg-dist@listhub.w3.org; Sat, 22 Sep 2018 03:51:13 +0000
Resent-Date: Sat, 22 Sep 2018 03:51:13 +0000
Resent-Message-Id: <E1g3Ywr-0001af-LC@frink.w3.org>
Received: from titan.w3.org ([2603:400a:ffff:804:801e:34:0:4c]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <ryan-ietf@sleevi.com>) id 1g3Ywk-0001Zy-TA for ietf-http-wg@listhub.w3.org; Sat, 22 Sep 2018 03:51:06 +0000
Received: from smtp.dreamhost.com ([64.90.62.162] helo=pdx1-sub0-mail-a40.g.dreamhost.com) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <ryan-ietf@sleevi.com>) id 1g3Ywi-0004va-Fn for ietf-http-wg@w3.org; Sat, 22 Sep 2018 03:51:06 +0000
Received: from pdx1-sub0-mail-a40.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a40.g.dreamhost.com (Postfix) with ESMTP id 4F60E7F2AE for <ietf-http-wg@w3.org>; Fri, 21 Sep 2018 20:50:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sleevi.com; h=mime-version :references:in-reply-to:from:date:message-id:subject:to:cc :content-type; s=sleevi.com; bh=2qG2nuE3OLSk8Ab2upGJGZiM9a0=; b= njvW1jK3ZtmuK8ENBdTgr/z8y+ap2u/LtchtojhRyFhyi/GyhZ2o1MQ2rMbCKTTD HfaGu6s2ky0LwX7HSCMyu0JDek7Uh4yDcXBXOeEWcNarNJA9XhhmON8dKYtpvjsm BCH6MnUWWpoCfSUoyqGno0uQKs5TTyrLlbjZGct1+Q8=
Received: from mail-it1-f175.google.com (mail-it1-f175.google.com [209.85.166.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: ryan@sleevi.com) by pdx1-sub0-mail-a40.g.dreamhost.com (Postfix) with ESMTPSA id 3E9657F29C for <ietf-http-wg@w3.org>; Fri, 21 Sep 2018 20:50:42 -0700 (PDT)
Received: by mail-it1-f175.google.com with SMTP id 139-v6so4370712itf.0 for <ietf-http-wg@w3.org>; Fri, 21 Sep 2018 20:50:42 -0700 (PDT)
X-Gm-Message-State: ABuFfojw+gwcE1GkZA0VVJRJf0pi8FmEgwqOf37/uHglDFm/NviOeH9X V67+yOnXhKe6Z9Adl9WW7Tdqcz1sWIR2h5NvT/o=
X-Google-Smtp-Source: ACcGV63alTnvA6xTFmSfifJnuh6xPojRapsXvUgCN9LlwEI8R7HaWm4+JdXGuRmF3f3Y2vC1OemS9+lq3UaLe/7balM=
X-Received: by 2002:a24:bc84:: with SMTP id n126-v6mr404800ite.152.1537588241613; Fri, 21 Sep 2018 20:50:41 -0700 (PDT)
MIME-Version: 1.0
References: <18eb0343-640c-8b95-1cc2-273bc72ec134@fb.com> <CAPapA7RLncAsHH5pr5RJSYjvPiNk8JvgBJ8T-tKebnC1C5ptHw@mail.gmail.com> <ED51E194-503A-4339-B564-A6543F42D0A1@mnot.net> <652edc11-2d19-aef9-e3fd-ecb77ab47c1a@fb.com> <CAErg=HH7bqarp4e=mj_4rSfJwi6ycECOT1Wf1t-HttGAzO8RJw@mail.gmail.com> <38bd7ae4-c7f1-f547-029c-139b039d222a@fb.com> <CAOdDvNqU8SGoguH=+j1HqepSqKbK+JnNZ6dN8SKaju=ENimXrg@mail.gmail.com>
In-Reply-To: <CAOdDvNqU8SGoguH=+j1HqepSqKbK+JnNZ6dN8SKaju=ENimXrg@mail.gmail.com>
X-DH-BACKEND: pdx1-sub0-mail-a40
X-DH-BACKEND: pdx1-sub0-mail-a40
From: Ryan Sleevi <ryan-ietf@sleevi.com>
Date: Fri, 21 Sep 2018 23:50:31 -0400
X-Gmail-Original-Message-ID: <CAErg=HGqHNvbLWwFMUyoEj2wFUYsO5UBvtTi+JfnWGgXJ-TDKQ@mail.gmail.com>
Message-ID: <CAErg=HGqHNvbLWwFMUyoEj2wFUYsO5UBvtTi+JfnWGgXJ-TDKQ@mail.gmail.com>
To: Patrick McManus <mcmanus@ducksong.com>
Cc: Felix Handte <felixh@fb.com>, Ryan Sleevi <ryan-ietf@sleevi.com>, Mark Nottingham <mnot@mnot.net>, Jyrki Alakuijala <jyrki@google.com>, Charles McCathie Nevile <chaals@yandex-team.ru>, Evgenii Kliuchnikov <eustas@google.com>, Vlad Krasnov <vlad@cloudflare.com>, Nick Terrell <terrelln@fb.com>, Yann Collet <cyan@fb.com>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="000000000000f3ccd405766da88a"
X-W3C-Hub-Spam-Status: No, score=-4.1
X-W3C-Hub-Spam-Report: AWL=-0.119, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1g3Ywi-0004va-Fn b55fa4c0e66ffdf54ee16fc14100de69
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Dictionary Compression for HTTP (at Facebook)
Archived-At: <https://www.w3.org/mid/CAErg=HGqHNvbLWwFMUyoEj2wFUYsO5UBvtTi+JfnWGgXJ-TDKQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/35925
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Fri, Sep 21, 2018 at 6:55 PM Patrick McManus <mcmanus@ducksong.com>
wrote:

> Hi Felix,
>
> On Fri, Sep 21, 2018 at 5:31 PM, Felix Handte <felixh@fb.com> wrote:
>
>> Very well, I will attempt to grab the bull by the horns, then. Let's
>> talk security.
>>
>> I guess my first question is this: What is the acceptance criterion for
>> proposals in this space with respect to security? From my survey of
>>
>
> you are not going to be able to pre-negotiate working group acceptance
> criteria. The criteria is what it always is - rough consensus on a draft
> from the working group and the approval of the IESG.
>
> But to help with more background, the past concern that has been there
> hasn't been sufficient/proactive analysis of the various proposals - and
> given that the mixture of compression of encryption is known to be a
> problem (as you mention) a bar of "no known problems" hasn't been enough to
> get anywhere near rough consensus. I believe people wanted to see a
> proactive analysis of what the concerns of a particular proposal are. At
> that point we can debate whether they are reasonable or not for their
> anticipated gains.
>
> make sense? You're certainly going in a reasonable direction considering
> the interactions of dictionaries, what attackers control, and the ways in
> which public and private data are mixed. Of course confidentiality can
> apply to 'public data' as well and its not clear how/if folks would want to
> handle that.
>

Exactly this. To expand on this a bit more, there are other technical
considerations, such as:
- Are the dictionaries dynamically constructed based on the resources?
Meaning that, unlike gzip-et-al at an HTTP layer, and more like the
now-disabled TLS compression, there's inter-resource implications to the
security model. Does loading A then B reveal different information than
loading B then A? How should that be modeled?
- If dictionaries are static, how are the dictionaries determined? Are they
baked into the specification, or can they be self-declared?
- If the site self-declares/creates its own dictionaries, how will clients
receive these dictionaries? What are the protocol interactions there as it
relates to both 'classic' HTTP cases (for example, a libcurl utility or a
simple proxy) and more complicated cases, like browsers, which have their
own set of loading behaviours?
- Are these things addressable at the HTTP protocol layer, or do they
require being integrated into the application-layer fabric (like a Web
browser)? How will that interact with functionality like CORS and CORB, to
prevent cross-origin leakages?

Felix, you can see different approaches to the 'add dictionary compression'
have explored some of the design space above and chosen different things,
but as Patrick mentioned, there hasn't been a real sit down on the analysis
as to what the implications are of these decisions, their merits, and their
risks. I think Vlad's work is perhaps the closest one to feeling right
based on 'gut', but even in past IETF discussions, the uncertainty and
difficulty reasoning about that gut instinct has meant that a compression
scheme represents a large intellectual investment and time commitment.

To further build on why the status-quo may not be a reasonable bar, given
the profoundly negative interactions compression can have on the
confidentiality of the data being compressed, a parallel might be drawn to
TLS clients supporting 3DES or AES-CBC. These are ciphers or constructions
with known weaknesses and sharp edges, and require extreme care to get
right - but they are (or, at this point, were) widely deployed. Just
because they were widely deployed, however, wouldn't justify making those
same design decisions for new ciphersuites - as the TLS WG demonstrated
through TLS 1.3.

Deprecating HTTP compression support in browsers is, arguably, the right
thing to do. The number of organizations that can and have integrated
thorough analysis about the relation between 'public' and 'private' data
likely ranges in the handfuls, given how it keeps biting people. Yet its
widespread use means that, for practical purposes, we're in a rock and a
hard-place. Introducing new schemes in this space would have the
(personally) undesirable effect of encouraging more folks to adopt
compression, which would see even greater losses to confidentiality. Of
course, the high-order bit is getting the adoption of better
confidentiality protections in the first place - the adoption of TLS
instead of unencrypted HTTP and the adoption of TLS 1.3 are both more
pressing and relevant to keeping a vibrant and healthy Web ecosystem.

Understandably, I'm also biased towards the browser case, which deserves
acknowledging because it may be that the WG does adopt new compression
schemes as a work activity for intra-CDN activity or 'enterprise' or custom
bespoke applications. However, since most folks seem to be most keen on
HTTP compression due to the ability to save bytes to clients using web
browsers, it sets a higher bar than the status quo in order to be
compelling.

Hope that further expands on these concerns.