Re: Dictionary Compression for HTTP (at Facebook)

Felix Handte <felixh@fb.com> Fri, 21 September 2018 21:35 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4E3D5124BE5 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Sep 2018 14:35:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.77
X-Spam-Level:
X-Spam-Status: No, score=-2.77 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fb.com header.b=Dz9BQlkm; dkim=pass (1024-bit key) header.d=fb.onmicrosoft.com header.b=Qjm+n27V
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jBPHRb7dA7ZT for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Sep 2018 14:35:12 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [IPv6:2603:400a:ffff:804:801e:34:0:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 73CDF1200D7 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 21 Sep 2018 14:35:12 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.89) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1g3T1t-0008Iq-8c for ietf-http-wg-dist@listhub.w3.org; Fri, 21 Sep 2018 21:32:01 +0000
Resent-Date: Fri, 21 Sep 2018 21:32:01 +0000
Resent-Message-Id: <E1g3T1t-0008Iq-8c@frink.w3.org>
Received: from titan.w3.org ([2603:400a:ffff:804:801e:34:0:4c]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <prvs=2802f7b9e9=felixh@fb.com>) id 1g3T1n-0008ID-EP for ietf-http-wg@listhub.w3.org; Fri, 21 Sep 2018 21:31:55 +0000
Received: from mx0b-00082601.pphosted.com ([67.231.153.30]) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <prvs=2802f7b9e9=felixh@fb.com>) id 1g3T1l-0008HY-LP for ietf-http-wg@w3.org; Fri, 21 Sep 2018 21:31:55 +0000
Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w8LLStVs012431; Fri, 21 Sep 2018 14:31:24 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=meBr+Y3uHV8oIgRBeUEqvk+/9Im4NCdTYxHD2qskC/Q=; b=Dz9BQlkmEh3heTeWIGXGgxgYA0xXGlTI1S8pou4P9XmgGLUO+fyo9i/9gyKxh1CNp/T/ b11cr7R0hjfrRlk+lzqfJHJ3IDBlpCYwdxaN4tcH+A5D6yb/1zP9yVwrDdfMb45hVNJE EGTkw39VQnNUuMvScGD4uApmg0gJgQUK95E=
Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 2mn57sgnh8-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 21 Sep 2018 14:31:24 -0700
Received: from NAM05-CO1-obe.outbound.protection.outlook.com (192.168.183.28) by o365-in.thefacebook.com (192.168.177.31) with Microsoft SMTP Server (TLS) id 14.3.361.1; Fri, 21 Sep 2018 17:31:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=meBr+Y3uHV8oIgRBeUEqvk+/9Im4NCdTYxHD2qskC/Q=; b=Qjm+n27VNcHmHjTXcQrEEOhurW1zpsap/PYrq67wq9P+hgWHR+uTOYG9wjtHjIfzYwM6NiePsz8Zzvq7uVLTFrUJUBHz2g16kyXi6FoVde4YTtKMMtRiAoKtXu0uqyH3QKTV5cIiHA88CUMjrQAOmfL9Rl6eICrwaXvhYxyVgxE=
Received: from BY1PR15MB0021.namprd15.prod.outlook.com (10.161.96.23) by BY1PR15MB0135.namprd15.prod.outlook.com (10.161.97.27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1122.18; Fri, 21 Sep 2018 21:31:20 +0000
Received: from BY1PR15MB0021.namprd15.prod.outlook.com ([fe80::757e:c8e2:d2c8:719]) by BY1PR15MB0021.namprd15.prod.outlook.com ([fe80::757e:c8e2:d2c8:719%2]) with mapi id 15.20.1122.017; Fri, 21 Sep 2018 21:31:20 +0000
From: Felix Handte <felixh@fb.com>
To: Ryan Sleevi <ryan-ietf@sleevi.com>
CC: Mark Nottingham <mnot@mnot.net>, "jyrki@google.com" <jyrki@google.com>, "chaals@yandex-team.ru" <chaals@yandex-team.ru>, "eustas@google.com" <eustas@google.com>, Vlad Krasnov <vlad@cloudflare.com>, Nick Terrell <terrelln@fb.com>, Yann Collet <cyan@fb.com>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Thread-Topic: Dictionary Compression for HTTP (at Facebook)
Thread-Index: AQHUOa9W+CjZaxwo6UOOoKnyMNZkKqTLb3QAgAFt6YCAAIHHAIAMo44AgCFuyIA=
Date: Fri, 21 Sep 2018 21:31:20 +0000
Message-ID: <38bd7ae4-c7f1-f547-029c-139b039d222a@fb.com>
References: <18eb0343-640c-8b95-1cc2-273bc72ec134@fb.com> <CAPapA7RLncAsHH5pr5RJSYjvPiNk8JvgBJ8T-tKebnC1C5ptHw@mail.gmail.com> <ED51E194-503A-4339-B564-A6543F42D0A1@mnot.net> <652edc11-2d19-aef9-e3fd-ecb77ab47c1a@fb.com> <CAErg=HH7bqarp4e=mj_4rSfJwi6ycECOT1Wf1t-HttGAzO8RJw@mail.gmail.com>
In-Reply-To: <CAErg=HH7bqarp4e=mj_4rSfJwi6ycECOT1Wf1t-HttGAzO8RJw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-clientproxiedby: MWHPR14CA0040.namprd14.prod.outlook.com (2603:10b6:300:12b::26) To BY1PR15MB0021.namprd15.prod.outlook.com (2a01:111:e400:5079::23)
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [2620:10d:c090:200::6:ea3f]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BY1PR15MB0135; 20:W1ySIAjFIh2qlK7wP1e6CJUK9/uTvfT4Gd9y5fYXLvVkyoFJsq7qdWB7Vy4V5JHbU8RTYo6HjdOpgPtQIN5S/OvWldSyn5JjFIfNhs+tfP0kfC5fZBiEkn4Oh7ML/aanmP7kVdP3p/2/Uq3TgWhA+tqV17JvUbAJLHxxl2c8nOc=
x-ms-office365-filtering-correlation-id: 2056515a-a2f0-4e6c-3b66-08d620099192
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:BY1PR15MB0135;
x-ms-traffictypediagnostic: BY1PR15MB0135:
x-microsoft-antispam-prvs: <BY1PR15MB0135939CAE4975B79380CB54A7120@BY1PR15MB0135.namprd15.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(81227570615382)(192374486261705)(67672495146484);
x-ms-exchange-senderadcheck: 1
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(3231355)(11241501184)(944501410)(52105095)(93006095)(93001095)(10201501046)(3002001)(149027)(150027)(6041310)(20161123562045)(20161123560045)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051); SRVR:BY1PR15MB0135; BCL:0; PCL:0; RULEID:; SRVR:BY1PR15MB0135;
x-forefront-prvs: 0802ADD973
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(376002)(136003)(346002)(396003)(366004)(39860400002)(47680400002)(189003)(199004)(14454004)(105586002)(6486002)(6506007)(102836004)(386003)(53546011)(6916009)(186003)(68736007)(25786009)(316002)(4326008)(52116002)(71190400001)(76176011)(2906002)(2616005)(2900100001)(81156014)(71200400001)(99286004)(53936002)(6436002)(7520500002)(5250100002)(561944003)(6246003)(476003)(446003)(36756003)(86362001)(6116002)(478600001)(6512007)(81166006)(54906003)(11346002)(5660300001)(486006)(7736002)(256004)(8936002)(8676002)(93886005)(106356001)(305945005)(97736004)(46003)(14444005)(31686004)(31696002)(229853002); DIR:OUT; SFP:1102; SCL:1; SRVR:BY1PR15MB0135; H:BY1PR15MB0021.namprd15.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: wY7pKwzEW6NTvVS153blOpidvvIkDX1FKRqI0mXEIhXaNoBVhSiu/eyHhdg2adCytB8xNUlx7C85BtwLubU5qpQLqFDRb3lsuaFYRTlBMNz8g86zm5ABnf6kONBtQKKNRnLw/GNlroJ0SQDpnci1SOoLsKOgU5NSbsisr8H4ZtA8JLp+SqqeA994rsCHVckaynpCFmJnUB9ZjSe9g2iJ+lO40hzBntM98Zjen4C8lSRf5Mre4dTt7izX0bzSCkpCEk4wK21dAdKQTc1jsVzRy/7vJoZakmelinQgm4CYH9XrOglolEgZ0fLPFPaJH7If1KrUlpFJ8LEYeNUE810/ys3kuX7FqsC5k+cY9wse1e4=
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <30A2B8C6F2558B40A5C79B0E8F6C175D@namprd15.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 2056515a-a2f0-4e6c-3b66-08d620099192
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Sep 2018 21:31:20.7640 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR15MB0135
X-OriginatorOrg: fb.com
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-09-21_09:, , signatures=0
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-W3C-Hub-Spam-Status: No, score=-6.7
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KHOP_DYNAMIC=1.997, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1g3T1l-0008HY-LP 45ea0507157573888df804ced9e8ef62
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Dictionary Compression for HTTP (at Facebook)
Archived-At: <https://www.w3.org/mid/38bd7ae4-c7f1-f547-029c-139b039d222a@fb.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/35923
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Very well, I will attempt to grab the bull by the horns, then. Let's 
talk security.

I guess my first question is this: What is the acceptance criterion for 
proposals in this space with respect to security? From my survey of 
previous conversations on this topic, it has sounded like the bar that 
proposals are being held to is that they are expected not to have any 
vulnerabilities. This is of course a reasonable expectation in general. 
However, compression as it exists in HTTP is well known to have security 
flaws (primarily, BREACH and its extensions). Given that flawed status 
quo, in order to clear that bar, a new proposal would not only have to 
avoid introducing new vulnerabilities, it would have to solve existing ones.

If we are going to make a serious attempt to fix BREACH et al., let's do 
so. Otherwise, let's hold compression work to a practical bar, which is 
to avoid introducing new security issues and to avoid making existing 
ones worse.

If we accept that criterion, my question becomes whether there are known 
issues that would prevent the use of dictionary compression? Many people 
have invoked the idea of security concerns to explain their hesitancy to 
pursue solutions in this space. Despite the frequency with which they're 
brought up, I haven't seen any specific allegations that describe a 
vulnerability introduced by dictionary-based compression. Are there 
known attacks that are made possible or improved by the use of dictionaries?

Obviously the above question is hugely dependent on how dictionaries are 
sourced. Since that's an open question, my sense is that it's probably 
best to look at the narrowest possible scope first and then work our way 
out from there. So I'm particularly curious whether there are known 
issues even when you leave out the challenges of dictionary creation / 
distribution / etc., when you just use statically-defined dictionaries.

In particular, BREACH and friends describe the dangers of mixing private 
data and attacker-controlled data in the same compression window. 
Dictionary-based compression mixes a presumably public dictionary with 
private data. Is that sufficient to enable attacks? Or if you have 
dictionary + private data + attacker data, is that easier to attack than 
in the absence of a dictionary?

I'll follow up with my own impressions of the security concerns and 
possible mitigations soon.

- Felix

On 08/31/2018 07:58 AM, Ryan Sleevi wrote:
> 
> 
> On Fri, Aug 24, 2018 at 6:24 AM Felix Handte <felixh@fb.com 
> <mailto:felixh@fb.com>> wrote:
> 
>     For our own part, we find ourselves drawn towards a solution that
>     makes a lot of the same choices as SDCH. That is, one that treats
>     dictionaries as explicit resources that can be dynamically
>     advertised by an origin, fetched and cached by a client, and then
>     negotiated to be used in requests/responses between the two. The
>     ability to treat a previous, cached response as a base on which to
>     apply a "diff" (negotiated by ETag?) is also attractive to us.
> 
> 
> I would strongly advise against such solutions, as they are a 
> significant part of why SDCH support was removed from browsers.
> 
> I think, to the set of concerns you need to consider in any such 
> solution (which, in my mind, demonstrating the security concerns can be 
> mitigated is paramount of those), you need to define not only the 
> interaction in the 'simple' HTTP sense of Request/Response pairs, but 
> also in the complexity of those interactions as they apply to browsers, 
> for which concerns like same-origin versus cross-origin apply, the 
> re-ordering of requests, and the potential of multiple requests 
> proceeding simultaneously (which H/2 also has to countenance). This also 
> further interacts with models of cache storage and in-memory 
> representation - challenges such as "What happens if a dictionary 
> expires midway during the processing of a response" were fairly fatal, 
> as were the issues around TOCTOU - that is, advertising a dictionary 
> from a request, making a request with said dictionary, and finding it 
> was evicted from the cache prior to the response.
> 
> Models such as the approach by vkrasnov h2-compression-dictionaries are 
> substantially superior in these respects, because it more closely models 
> and defines these interactions, through the association with and scoping 
> to a single H/2 resource.
> 
> It might be that your concern is not the dominant HTTP case of browsers, 
> in which case, it may be fine to ignore these. But I think, from the 
> experiences implementing and maintaining SDCH, models that approximate 
> that space (of resourced dictionaries, advertisements, etc) are likely 
> to be too great an implementation cost, and too great a cognitive cost 
> to the predictability of the platform, to see any meaningful adoption.
> 
> Of course, this is all after the security concerns are mitigated ;)