Re: HTTP/2 Server Push and solid compression

"W. Felix Handte" <w@felixhandte.com> Fri, 24 May 2019 07:57 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A60C91202AC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 24 May 2019 00:57:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.899
X-Spam-Level:
X-Spam-Status: No, score=-2.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0q5XcuzvbY_2 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 24 May 2019 00:57:04 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [IPv6:2603:400a:ffff:804:801e:34:0:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2AF9B1201D0 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 24 May 2019 00:57:04 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.89) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1hU52P-0001Pb-Si for ietf-http-wg-dist@listhub.w3.org; Fri, 24 May 2019 07:54:49 +0000
Resent-Date: Fri, 24 May 2019 07:54:49 +0000
Resent-Message-Id: <E1hU52P-0001Pb-Si@frink.w3.org>
Received: from uranus.w3.org ([128.30.52.58]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <w@felixhandte.com>) id 1hU52M-0001Om-Fq for ietf-http-wg@listhub.w3.org; Fri, 24 May 2019 07:54:46 +0000
Received: from www-data by uranus.w3.org with local (Exim 4.89) (envelope-from <w@felixhandte.com>) id 1hU52M-0004SL-C4 for ietf-http-wg@listhub.w3.org; Fri, 24 May 2019 07:54:46 +0000
Received: from titan.w3.org ([2603:400a:ffff:804:801e:34:0:4c]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <w@felixhandte.com>) id 1hTBMb-0005y5-79 for ietf-http-wg@listhub.w3.org; Tue, 21 May 2019 20:27:57 +0000
Received: from felixhandte.com ([54.172.180.13] helo=mail.felixhandte.com) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <w@felixhandte.com>) id 1hTBMZ-00024z-MP for ietf-http-wg@w3.org; Tue, 21 May 2019 20:27:57 +0000
Received: from [IPv6:2620:10d:c096:120::1fc] (unknown [199.201.64.141]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.felixhandte.com (Postfix) with ESMTPSA id 1828B2EC04; Tue, 21 May 2019 20:27:33 +0000 (UTC)
To: ietf-http-wg@w3.org, Alan Egerton <eggyal@gmail.com>
References: <CA+phaedE0m4LniC38GBkJ-M0gAph0LSSGhQ1ZWJE6k0UOFcokw@mail.gmail.com> <CA+phaecOMRAd8R=oEYj+DMVkzaVKq5Qbt9AxECrtofLqMxeKQA@mail.gmail.com>
From: "W. Felix Handte" <w@felixhandte.com>
Message-ID: <9e3a99e6-8c06-7690-119c-dbe7afb1a2ca@felixhandte.com>
Date: Tue, 21 May 2019 16:27:32 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <CA+phaecOMRAd8R=oEYj+DMVkzaVKq5Qbt9AxECrtofLqMxeKQA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=54.172.180.13; envelope-from=w@felixhandte.com; helo=mail.felixhandte.com
X-W3C-Hub-Spam-Status: No, score=-3.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1hTBMZ-00024z-MP f1a3b4af3d6ec05abcccfd42e800357c
X-caa-id: 22a16c541b
X-Original-To: ietf-http-wg@w3.org
Subject: Re: HTTP/2 Server Push and solid compression
Archived-At: <https://www.w3.org/mid/9e3a99e6-8c06-7690-119c-dbe7afb1a2ca@felixhandte.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/36683
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi Alan,

I absolutely agree with your premise. I think you've identified a good
example of the kinds of models which are not well supported by existing
options for compression in HTTP (which may contribute to explaining why
unbundling resources into server pushes has not seen very wide adoption).

Finding a solution to this problem is extremely desirable, and a number
of attempts have been made to do so. That is, to allow individual
responses to access a shared or external compression context, and
thereby achieve good compression for individually small responses.

For examples, see Compression Dictionaries for HTTP/2 [1], and Shared
Dictionary Compression over HTTP [2]. In extremely broad strokes, your
two proposals are similar (respectively) to those two drafts.

As Patrick mentions, these solutions have largely fallen victim in the
past to security concerns: mixing different types of data into the same
compression window can provide attackers the ability to exfiltrate
private data by observing overall compression effectiveness (i.e.,
CRIME/BREACH/HEIST). This is already a problem in the existing world of
HTTP compression, when applications allow attacker-controlled and
private user data to intermingle in a single response. Extending that,
by allowing interactions between different responses, would be to throw
gasoline on that fire.

I have been working to make another attempt at addressing these problems
(both the narrow one of resolving the security questions and the broad
one of building a solution overall).

I am working on a draft [3] that discusses the security concerns in this
space, which will hopefully let us chart a path forward. I hope to have
something to circulate in advance of the next meeting in July.

Longer term, I am working on building and deploying a dictionary-based
compression scheme for HTTP (at Facebook initially, with an eye towards
eventual standardization).

Collaboration along either front would be very welcome!

- Felix

[1]
https://tools.ietf.org/html/draft-vkrasnov-h2-compression-dictionaries-03
[2] https://tools.ietf.org/html/draft-lee-sdch-spec-00
[3] https://tools.ietf.org/html/draft-kucherawy-httpbis-dict-sec-00

On 5/21/19 11:16 AM, Alan Egerton wrote:
> On Tue, May 21, 2019 at 3:33 PM Alan Egerton <eggyal@gmail.com> wrote:
>> I see two possible solutions:
>>
>> (1) standardise the bundle format in order that caches can separate
>> and store the underlying resources: plenty of hazards here—especially
>> since there will no longer be an HTTP response per resource, requiring
>> metadata (including cache control etc) to be encoded somehow else.  My
>> gut says this is probably a bad idea.
>>
>> (2) use a compression format that produces a separate output file for
>> each input file, yet still achieves better overall compression than
>> compressing the files individually: I imagine that this will produce
>> an additional output file that is common to/referenced by all the
>> compressed files being returned by that single operation;
>> decompression of any of the transmitted resources would be achieved
>> using only the common file and the resource-specific file as input.
> 
> Just following my own thoughts with an observation: in extremis, these
> two approaches can actually become analogous.
> 
> For example, a .tar.gz could serve as both the standardised "bundle"
> format (1) and the common output file (2) with the metadata
> transmitted in the form of separate HTTP responses (1) whose payloads
> reference the relevant constituent of that tarball (2).
> 
> I recognise that such an approach would also be a regression, because
> it defeats the benefits of HTTP/2's multiplexing (the constituents of
> the tarball only become available in sequence); therefore any solution
> of type (2) must balance the competing requirements to minimise both
> the "common file" and the overall size.  Perhaps there is no such
> balance that yields material benefit over the status quo.
> 
> -- Alan
>