Re: delta encoding and state management

Roberto Peon <grmocg@gmail.com> Tue, 22 January 2013 21:55 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6AFAB21F881A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 13:55:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.985
X-Spam-Level:
X-Spam-Status: No, score=-9.985 tagged_above=-999 required=5 tests=[AWL=0.013, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_48=0.6, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tgepkFFlQHJ9 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 13:55:22 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id E878021F8807 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 22 Jan 2013 13:55:21 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Txlnw-0001nW-CV for ietf-http-wg-dist@listhub.w3.org; Tue, 22 Jan 2013 21:54:52 +0000
Resent-Date: Tue, 22 Jan 2013 21:54:52 +0000
Resent-Message-Id: <E1Txlnw-0001nW-CV@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1Txlnq-0001ml-E4 for ietf-http-wg@listhub.w3.org; Tue, 22 Jan 2013 21:54:46 +0000
Received: from mail-lb0-f177.google.com ([209.85.217.177]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1Txlnp-0000FG-7o for ietf-http-wg@w3.org; Tue, 22 Jan 2013 21:54:46 +0000
Received: by mail-lb0-f177.google.com with SMTP id go11so1662431lbb.22 for <ietf-http-wg@w3.org>; Tue, 22 Jan 2013 13:54:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=UEn9f4nxzOMRnxlw5P4g02iX2V4ZIVtJR02XVoUg3vI=; b=tC7Q8u0+qIta5U0BpXU2/F0mRNTU6zp20L++L5OT7KZl+EuzRb7zrz6ybuq2EtHQVT n+VU/0sL2A+v9XHWGrLVo7xpUtIO9Ca0g5wLrODLpOoH6/KeR3w6Av5Sy9+lS73qIIXH F50/J3fMGOFxc4cQ9PDDlF5uiZO6d/eCixPBHYHzxiEJjI7f/DH5vwb30YFvO188ASWO xwLWsz3LqsL3T8xRZkdape8zKLI3sg6TcVA+BC8Mq11rr7jkMlC3zH3QKwD3SjiD420O ffGQ8Wr3kXx0qfv/PiEP5mes8jIviS9l7ERU3BmYaWTgK8ZmwDntz1IQwEQfbWYaP/CU aM5g==
MIME-Version: 1.0
X-Received: by 10.152.104.199 with SMTP id gg7mr22452817lab.14.1358891658220; Tue, 22 Jan 2013 13:54:18 -0800 (PST)
Received: by 10.112.81.5 with HTTP; Tue, 22 Jan 2013 13:54:18 -0800 (PST)
In-Reply-To: <CAP+FsNfgLBYjn7D5rgTRvPnaRuAi4rNB_E6vXE4b3B=_dtx=-w@mail.gmail.com>
References: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com> <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com> <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com> <CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com> <CAK3OfOh0xqZsPYcb0uRLnebKWTKO7ARkJ4joFZoqjiBSTmwBTA@mail.gmail.com> <CABP7Rbeb6MOYmYPhhsKFFtQwE0JxuPyShXY0zpkA5YX2JPSY_w@mail.gmail.com> <CAA4WUYhg2qt_z_TrOAH0ax6mUpYPNeG4x740CgQi5Voq=50K_Q@mail.gmail.com> <20130122212748.GJ30692@1wt.eu> <CAP+FsNfgLBYjn7D5rgTRvPnaRuAi4rNB_E6vXE4b3B=_dtx=-w@mail.gmail.com>
Date: Tue, 22 Jan 2013 13:54:18 -0800
Message-ID: <CAP+FsNcm_VBOsbptkLoOQXfgM-xAfYiZuqZusDm2YkoiszUfxA@mail.gmail.com>
From: Roberto Peon <grmocg@gmail.com>
To: Willy Tarreau <w@1wt.eu>
Cc: "William Chan (?????????)" <willchan@chromium.org>, James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="f46d040890c19393d604d3e79e13"
Received-SPF: pass client-ip=209.85.217.177; envelope-from=grmocg@gmail.com; helo=mail-lb0-f177.google.com
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: AWL=-2.680, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1Txlnp-0000FG-7o eb7d572e690982b55199c823cbc0c9fd
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/CAP+FsNcm_VBOsbptkLoOQXfgM-xAfYiZuqZusDm2YkoiszUfxA@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16118
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

The thing that isn't in delta, etc. already is the idea of 'rooting' the
path space with the single request (which I like, but... it is subject to
the CRIME exploit if path-prefix grouping is done automatically by the
browser (instead of being defined by the content-developer)).

IF we take your proposal for eliminating much of the common-path prefix and
ensure that it isn't subject to CRIME, that is a winner in any scheme.
-=R


On Tue, Jan 22, 2013 at 1:51 PM, Roberto Peon <grmocg@gmail.com> wrote:

> You've described server push+stateful compression (delta) pretty closely
> there ('cause that is what we get when we combine them, without requiring
> web page writers to change how they write their pages...)! :)
>
> With server push, you can do one request, many responses (except that you
> can also cache and cancel and prioritize them, unlike bundling or inlining
> or pipelining, which has nasty head-of-line blocking and infinite buffering
> requirements... bleh).
> With the delta compressor, you can define 'header groups', which allow you
> to do exactly what you just described. The default implementation as it
> exists today just guesses at the groupings by examining the hostname, but
> that is a very naive approach-- splitting based on cookies and other
> repeated fields makes the most sense.
>
> The biggest hurdle, at least in my opinion, to usage of the new features
> is how much effort the content writers have to put in to change their
> content (basically never happens), or change their knowledge of the best
> practices (also difficult :( ). The best solution (again in my opinion), is
> one where the optimizations can be done automatically (while not
> necessarily perfectly, close enough :) ) are, thus freeing ourselves from
> both categories...
>
> -=R
>
>
> On Tue, Jan 22, 2013 at 1:27 PM, Willy Tarreau <w@1wt.eu> wrote:
>
>> Hi William,
>>
>> On Tue, Jan 22, 2013 at 12:33:37PM -0800, William Chan (?????????) wrote:
>> > From the SPDY whitepaper
>> > (http://www.chromium.org/spdy/spdy-whitepaper), we note that:
>> > "Header compression resulted in an ~88% reduction in the size of
>> > request headers and an ~85% reduction in the size of response headers.
>> > On the lower-bandwidth DSL link, in which the upload link is only 375
>> > Kbps, request header compression in particular, led to significant
>> > page load time improvements for certain sites (i.e. those that issued
>> > large number of resource requests). We found a reduction of 45 - 1142
>> > ms in page load time simply due to header compression."
>> >
>> > That result was using gzip compression, but I don't really think
>> > there's a huge difference in PLT between stateful compression
>> > algorithms. That you use stateful compression at all is the biggest
>> > win, since as Mark already noted, big chunks of the headers are
>> > repeated opaque blobs. And I think the wins will only be greater in
>> > bandwidth constrained devices like mobile. I think this brings us back
>> > to the question, at what point do the wins of stateful compression
>> > outweigh the costs? Are implementers satisfied with the rough order of
>> > costs of stateful compression of algorithms like the delta encoding or
>> > simple compression?
>>
>> I agree that most of the header overhead is from repeated headers.
>> In fact, most of the requests we see for large pages with 100 objects
>> contain many similar headers. I could be wrong, but I think that browsers
>> are aware about the fact that they're fetching many objects at once in
>> most situations (eg: images on an inline catalogue).
>>
>> Thus maybe we should think a different way : initially the web was
>> designed to retrieve one object at a time and it made sense to have
>> one request, one response. Now we have much more contents and we
>> want many objects at once to load a page. Why now define that as the
>> standard way to load pages and bring in the ability to load *groups*
>> of objects ?
>>
>> We could then send a request for several objects at once, all using
>> the same (encoded) headers, plus maybe additional per-object headers.
>> The smallest group is one object and works like today. But when you
>> need 10 images, 3 CSS and 2 JS, maybe it makes sense to send 1,2 or
>> 3 requests only. We would also probably find it useful to define
>> a base for common objects.
>>
>> We could then see requests like this :
>>
>>     group 1
>>        header fields ...
>>        base http://static.example.com/images/articles/20130122/
>>        req1: GET corner-left.jpg
>>        req2: GET corner-right.jpg
>>        req3: GET center-banner.jpg
>>        req4: GET company-logo.png
>>
>> etc...
>>
>> Another big benefit I'm seeing there is that it's easy to switch from 1.1
>> to/from this encoding. And also intermediaries and servers will process
>> much less requests because they don't have to revalidate all headers each
>> time. The Host header would only be validated/rewritten once per group.
>> Cookies would be matched once per group, etc...
>>
>> It would be processed exactly like pipelining, with responses delivered
>> in the same order as the requests. Intermediaries could even split that
>> into multiple streams to forward some of them to some servers and other
>> ones to other servers. Having the header fields and base URI before the
>> requests makes that easy because once they're passed, you can read all
>> requests as they come without the need to additionally buffer.
>>
>> When you have an ETag or a date for an object, its I-M-S/I-N-M values
>> would be passed along with the requests and not the group.
>>
>> I think this should often be more efficient than brute compression and
>> still probably compatible with it.
>>
>> What do you think ?
>>
>> Willy
>>
>>
>