Re: delta encoding and state management

Roberto Peon <grmocg@gmail.com> Tue, 22 January 2013 21:52 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F68E21F86C5 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 13:52:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.983
X-Spam-Level:
X-Spam-Status: No, score=-9.983 tagged_above=-999 required=5 tests=[AWL=0.015, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_48=0.6, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RXfdzhqKgWCZ for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 13:52:33 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 7B88221F8746 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 22 Jan 2013 13:52:31 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1TxllE-0001QE-BT for ietf-http-wg-dist@listhub.w3.org; Tue, 22 Jan 2013 21:52:04 +0000
Resent-Date: Tue, 22 Jan 2013 21:52:04 +0000
Resent-Message-Id: <E1TxllE-0001QE-BT@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1Txll8-0001NJ-Kc for ietf-http-wg@listhub.w3.org; Tue, 22 Jan 2013 21:51:58 +0000
Received: from mail-la0-f49.google.com ([209.85.215.49]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1Txll6-0005mo-AD for ietf-http-wg@w3.org; Tue, 22 Jan 2013 21:51:58 +0000
Received: by mail-la0-f49.google.com with SMTP id fs13so1736069lab.22 for <ietf-http-wg@w3.org>; Tue, 22 Jan 2013 13:51:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=02jgVMSI05p1S/Hl9kmKhG5sUD0raZPbzSWs5SyLxgQ=; b=lVFi+z9RHMEO+wyURiuT5pMoq7QC9tKXClnvTSolRVBfhWv/5fEP9W6OZMNT/P+DUT klu/jkk2QpU2iAwQkBXTsQ3VXQg4a9Wh3HOIdw1VHjondUAl+KB16ZXrvSqbn/SLzPix fIcutHPeEJ8k6KJvc99l7eF5xW+vIzr3kRcC5nHk3DbznIMzZ8j9P3hxsQjljnlhcB0r OLLBEy2aEHbb7GAwKFmvYjUg6RI9DqTZYGZSL43nBSE3h691XZdC5wR8XEfRS4KPmdZd 63HsLSS2Lw/wRgQlEbYqD93aDcaZkZOxWYMhFdMBrwm3wtzHhtxWy4LaPMdETY1x7Bg2 o3eA==
MIME-Version: 1.0
X-Received: by 10.152.104.199 with SMTP id gg7mr22446015lab.14.1358891489405; Tue, 22 Jan 2013 13:51:29 -0800 (PST)
Received: by 10.112.81.5 with HTTP; Tue, 22 Jan 2013 13:51:29 -0800 (PST)
In-Reply-To: <20130122212748.GJ30692@1wt.eu>
References: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com> <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com> <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com> <CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com> <CAK3OfOh0xqZsPYcb0uRLnebKWTKO7ARkJ4joFZoqjiBSTmwBTA@mail.gmail.com> <CABP7Rbeb6MOYmYPhhsKFFtQwE0JxuPyShXY0zpkA5YX2JPSY_w@mail.gmail.com> <CAA4WUYhg2qt_z_TrOAH0ax6mUpYPNeG4x740CgQi5Voq=50K_Q@mail.gmail.com> <20130122212748.GJ30692@1wt.eu>
Date: Tue, 22 Jan 2013 13:51:29 -0800
Message-ID: <CAP+FsNfgLBYjn7D5rgTRvPnaRuAi4rNB_E6vXE4b3B=_dtx=-w@mail.gmail.com>
From: Roberto Peon <grmocg@gmail.com>
To: Willy Tarreau <w@1wt.eu>
Cc: "William Chan (?????????)" <willchan@chromium.org>, James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="f46d040890c183aa2f04d3e79442"
Received-SPF: pass client-ip=209.85.215.49; envelope-from=grmocg@gmail.com; helo=mail-la0-f49.google.com
X-W3C-Hub-Spam-Status: No, score=-3.3
X-W3C-Hub-Spam-Report: AWL=-2.519, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Txll6-0005mo-AD 8e5aee2a1d6ebe64d3e2348d12c80e1d
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/CAP+FsNfgLBYjn7D5rgTRvPnaRuAi4rNB_E6vXE4b3B=_dtx=-w@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16117
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

You've described server push+stateful compression (delta) pretty closely
there ('cause that is what we get when we combine them, without requiring
web page writers to change how they write their pages...)! :)

With server push, you can do one request, many responses (except that you
can also cache and cancel and prioritize them, unlike bundling or inlining
or pipelining, which has nasty head-of-line blocking and infinite buffering
requirements... bleh).
With the delta compressor, you can define 'header groups', which allow you
to do exactly what you just described. The default implementation as it
exists today just guesses at the groupings by examining the hostname, but
that is a very naive approach-- splitting based on cookies and other
repeated fields makes the most sense.

The biggest hurdle, at least in my opinion, to usage of the new features is
how much effort the content writers have to put in to change their content
(basically never happens), or change their knowledge of the best practices
(also difficult :( ). The best solution (again in my opinion), is one where
the optimizations can be done automatically (while not necessarily
perfectly, close enough :) ) are, thus freeing ourselves from both
categories...

-=R


On Tue, Jan 22, 2013 at 1:27 PM, Willy Tarreau <w@1wt.eu> wrote:

> Hi William,
>
> On Tue, Jan 22, 2013 at 12:33:37PM -0800, William Chan (?????????) wrote:
> > From the SPDY whitepaper
> > (http://www.chromium.org/spdy/spdy-whitepaper), we note that:
> > "Header compression resulted in an ~88% reduction in the size of
> > request headers and an ~85% reduction in the size of response headers.
> > On the lower-bandwidth DSL link, in which the upload link is only 375
> > Kbps, request header compression in particular, led to significant
> > page load time improvements for certain sites (i.e. those that issued
> > large number of resource requests). We found a reduction of 45 - 1142
> > ms in page load time simply due to header compression."
> >
> > That result was using gzip compression, but I don't really think
> > there's a huge difference in PLT between stateful compression
> > algorithms. That you use stateful compression at all is the biggest
> > win, since as Mark already noted, big chunks of the headers are
> > repeated opaque blobs. And I think the wins will only be greater in
> > bandwidth constrained devices like mobile. I think this brings us back
> > to the question, at what point do the wins of stateful compression
> > outweigh the costs? Are implementers satisfied with the rough order of
> > costs of stateful compression of algorithms like the delta encoding or
> > simple compression?
>
> I agree that most of the header overhead is from repeated headers.
> In fact, most of the requests we see for large pages with 100 objects
> contain many similar headers. I could be wrong, but I think that browsers
> are aware about the fact that they're fetching many objects at once in
> most situations (eg: images on an inline catalogue).
>
> Thus maybe we should think a different way : initially the web was
> designed to retrieve one object at a time and it made sense to have
> one request, one response. Now we have much more contents and we
> want many objects at once to load a page. Why now define that as the
> standard way to load pages and bring in the ability to load *groups*
> of objects ?
>
> We could then send a request for several objects at once, all using
> the same (encoded) headers, plus maybe additional per-object headers.
> The smallest group is one object and works like today. But when you
> need 10 images, 3 CSS and 2 JS, maybe it makes sense to send 1,2 or
> 3 requests only. We would also probably find it useful to define
> a base for common objects.
>
> We could then see requests like this :
>
>     group 1
>        header fields ...
>        base http://static.example.com/images/articles/20130122/
>        req1: GET corner-left.jpg
>        req2: GET corner-right.jpg
>        req3: GET center-banner.jpg
>        req4: GET company-logo.png
>
> etc...
>
> Another big benefit I'm seeing there is that it's easy to switch from 1.1
> to/from this encoding. And also intermediaries and servers will process
> much less requests because they don't have to revalidate all headers each
> time. The Host header would only be validated/rewritten once per group.
> Cookies would be matched once per group, etc...
>
> It would be processed exactly like pipelining, with responses delivered
> in the same order as the requests. Intermediaries could even split that
> into multiple streams to forward some of them to some servers and other
> ones to other servers. Having the header fields and base URI before the
> requests makes that easy because once they're passed, you can read all
> requests as they come without the need to additionally buffer.
>
> When you have an ETag or a date for an object, its I-M-S/I-N-M values
> would be passed along with the requests and not the group.
>
> I think this should often be more efficient than brute compression and
> still probably compatible with it.
>
> What do you think ?
>
> Willy
>
>