Re: delta encoding and state management

William Chan (陈智昌) <willchan@chromium.org> Tue, 22 January 2013 21:50 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EC5021F8853 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 13:50:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.637
X-Spam-Level:
X-Spam-Status: No, score=-8.637 tagged_above=-999 required=5 tests=[AWL=1.040, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Yt8de1wk1q+P for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 13:50:25 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 3B21021F881A for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 22 Jan 2013 13:50:25 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Txlib-0007gk-9Y for ietf-http-wg-dist@listhub.w3.org; Tue, 22 Jan 2013 21:49:21 +0000
Resent-Date: Tue, 22 Jan 2013 21:49:21 +0000
Resent-Message-Id: <E1Txlib-0007gk-9Y@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <willchan@google.com>) id 1TxliQ-0007dq-5q for ietf-http-wg@listhub.w3.org; Tue, 22 Jan 2013 21:49:10 +0000
Received: from mail-qa0-f46.google.com ([209.85.216.46]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <willchan@google.com>) id 1TxliP-00005x-3R for ietf-http-wg@w3.org; Tue, 22 Jan 2013 21:49:10 +0000
Received: by mail-qa0-f46.google.com with SMTP id r4so112197qaq.12 for <ietf-http-wg@w3.org>; Tue, 22 Jan 2013 13:48:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=75BdPvCEGS9GhK3edbMloHwt1CyAGX5sCGWF4bs5PvM=; b=TQjXbEWOMPp49hPOfqjcyNZMrKxnaewN5S+9Q3LH4wbzGllIR9EIaivJFGbZus6gGw f2vxKtXR5wDygNE1A4CTiS06fmA49w/GlEzqRZLyuJNY4gJaJWDTQS5mXl2Xud59qsjA Hyj+Z+Ubks4l9cWmswpXtFaMvRGDH+ed/ZTvU1fK9cicSaHm8xaAW5DYrt9no73BRJt1 +kUS4LQhxsZrCekFOFb4TExysWFhX/UzKDRRFyplI1OTDL1Myl+jubSbu3ArnOyK4iNO N2ekh4noIl8HLGHsd9tV2aC76s8E0MMnbTKjvEP02nJH7nmiOqjKBA+xap+/m5Ztj+Yd 6Rmg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=75BdPvCEGS9GhK3edbMloHwt1CyAGX5sCGWF4bs5PvM=; b=DTRmYYcXGPV1adBui2SmFGN64rNAffTC5whwGTnA/+K0Izq9UT2sxS0DhQvgxZt7Sf 2G8ETfX7Ig+HOo10jyBLV2nz2I/rONrrtpARMHb0QiCcC01bU0wighT1/ZLtnpTfADse TZlilwBvQ4pINIjbkhL2AXoTiNX2bvxiWdp+Q=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=75BdPvCEGS9GhK3edbMloHwt1CyAGX5sCGWF4bs5PvM=; b=iWkRaZY7mrorGvj2WufSGz8+auF0OfILWIS1vdHNxX7YxAl+svisdGLQaJdoKHxfHE fSiwwhdTjjOjCN9C0D8u2lUfRpXQ7EgSi289PTkUGBaKc7dhB7eMASRSyhU7BO4yPJ2V aziyB/GLWJfNM91eTLyYIQxHYIWXaoPggmKqj1ln3ktnb2cBwQNJQkwmGEH6ucnqx94J NoFdm5ZcNW5cnumFGCxa0CZ8lVDyTGMpU88DUTGlGD42ktQYn47+3hDRaN2h37x4nCy4 KIoq/cxLDl6oGtCPF66DU+lGyE5Ou8GjqiLc40vn3kCfiNPTKffJvgmXnShkC6pFSlX/ RwPQ==
MIME-Version: 1.0
X-Received: by 10.224.31.196 with SMTP id z4mr24849630qac.20.1358891323045; Tue, 22 Jan 2013 13:48:43 -0800 (PST)
Sender: willchan@google.com
Received: by 10.229.57.163 with HTTP; Tue, 22 Jan 2013 13:48:42 -0800 (PST)
In-Reply-To: <20130122212748.GJ30692@1wt.eu>
References: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com> <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com> <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com> <CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com> <CAK3OfOh0xqZsPYcb0uRLnebKWTKO7ARkJ4joFZoqjiBSTmwBTA@mail.gmail.com> <CABP7Rbeb6MOYmYPhhsKFFtQwE0JxuPyShXY0zpkA5YX2JPSY_w@mail.gmail.com> <CAA4WUYhg2qt_z_TrOAH0ax6mUpYPNeG4x740CgQi5Voq=50K_Q@mail.gmail.com> <20130122212748.GJ30692@1wt.eu>
Date: Tue, 22 Jan 2013 13:48:42 -0800
X-Google-Sender-Auth: FuJtT6DOYyZdcHq8Uj2hMXsDenc
Message-ID: <CAA4WUYj51jRFosut2RsdE46SqoMDqa_r5EB7g4pj5eC2i73j7Q@mail.gmail.com>
From: "William Chan (陈智昌)" <willchan@chromium.org>
To: Willy Tarreau <w@1wt.eu>
Cc: James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset="ISO-8859-1"
X-Gm-Message-State: ALoCoQlyvzmFCpLaBUoY7PaJ0oUwy+ieBSnLBrtAX/Glys1PnnHYNPRj5ErWJy9DZ4/Nuy7VnbP18Rv6StXKzeSmwmDGEJkUkA3Q8V31npnFjpnKbA+qO+GQl3auwi0bAPTIRPaDaem6wUDnlpacMWBoxXM6T4klSj7Ki/9RBpscLZR5pSkx6oyZFrhinNhDW40Q5XlQC6I6
Received-SPF: pass client-ip=209.85.216.46; envelope-from=willchan@google.com; helo=mail-qa0-f46.google.com
X-W3C-Hub-Spam-Status: No, score=-3.4
X-W3C-Hub-Spam-Report: AWL=-2.614, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1TxliP-00005x-3R 7bc81f653ed0482a19471df56d6c6726
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/CAA4WUYj51jRFosut2RsdE46SqoMDqa_r5EB7g4pj5eC2i73j7Q@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16116
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Tue, Jan 22, 2013 at 1:27 PM, Willy Tarreau <w@1wt.eu> wrote:
> Hi William,
>
> On Tue, Jan 22, 2013 at 12:33:37PM -0800, William Chan (?????????) wrote:
>> From the SPDY whitepaper
>> (http://www.chromium.org/spdy/spdy-whitepaper), we note that:
>> "Header compression resulted in an ~88% reduction in the size of
>> request headers and an ~85% reduction in the size of response headers.
>> On the lower-bandwidth DSL link, in which the upload link is only 375
>> Kbps, request header compression in particular, led to significant
>> page load time improvements for certain sites (i.e. those that issued
>> large number of resource requests). We found a reduction of 45 - 1142
>> ms in page load time simply due to header compression."
>>
>> That result was using gzip compression, but I don't really think
>> there's a huge difference in PLT between stateful compression
>> algorithms. That you use stateful compression at all is the biggest
>> win, since as Mark already noted, big chunks of the headers are
>> repeated opaque blobs. And I think the wins will only be greater in
>> bandwidth constrained devices like mobile. I think this brings us back
>> to the question, at what point do the wins of stateful compression
>> outweigh the costs? Are implementers satisfied with the rough order of
>> costs of stateful compression of algorithms like the delta encoding or
>> simple compression?
>
> I agree that most of the header overhead is from repeated headers.
> In fact, most of the requests we see for large pages with 100 objects
> contain many similar headers. I could be wrong, but I think that browsers
> are aware about the fact that they're fetching many objects at once in
> most situations (eg: images on an inline catalogue).
>
> Thus maybe we should think a different way : initially the web was
> designed to retrieve one object at a time and it made sense to have
> one request, one response. Now we have much more contents and we
> want many objects at once to load a page. Why now define that as the
> standard way to load pages and bring in the ability to load *groups*
> of objects ?
>
> We could then send a request for several objects at once, all using
> the same (encoded) headers, plus maybe additional per-object headers.
> The smallest group is one object and works like today. But when you
> need 10 images, 3 CSS and 2 JS, maybe it makes sense to send 1,2 or
> 3 requests only. We would also probably find it useful to define
> a base for common objects.
>
> We could then see requests like this :
>
>     group 1
>        header fields ...
>        base http://static.example.com/images/articles/20130122/
>        req1: GET corner-left.jpg
>        req2: GET corner-right.jpg
>        req3: GET center-banner.jpg
>        req4: GET company-logo.png
>
> etc...
>
> Another big benefit I'm seeing there is that it's easy to switch from 1.1
> to/from this encoding. And also intermediaries and servers will process
> much less requests because they don't have to revalidate all headers each
> time. The Host header would only be validated/rewritten once per group.
> Cookies would be matched once per group, etc...
>
> It would be processed exactly like pipelining, with responses delivered
> in the same order as the requests. Intermediaries could even split that
> into multiple streams to forward some of them to some servers and other
> ones to other servers. Having the header fields and base URI before the
> requests makes that easy because once they're passed, you can read all
> requests as they come without the need to additionally buffer.
>
> When you have an ETag or a date for an object, its I-M-S/I-N-M values
> would be passed along with the requests and not the group.
>
> I think this should often be more efficient than brute compression and
> still probably compatible with it.
>
> What do you think ?

This is an intriguing counterproposal. Perhaps we should fork the
thread to discuss it? I'd still like to get an answer here about what
folks think about the acceptability of the rough costs of stateful
compression.

One issue I see in this proposal is that, as always, it is difficult
to predict the future. You don't know when you're parsing the document
when you'll discover a new resource to request. How long do you delay
the resource request in order to consolidate requests into a load
group? The same thing is even more true for response headers.

>
> Willy
>