Re: delta encoding and state management

Willy Tarreau <w@1wt.eu> Tue, 22 January 2013 22:48 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D2A3721F85B2 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 14:48:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.949
X-Spam-Level:
X-Spam-Status: No, score=-9.949 tagged_above=-999 required=5 tests=[AWL=0.650, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lkmBRxPRdx3h for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 14:48:05 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 0763121F8586 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 22 Jan 2013 14:48:04 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Txmci-0007OC-4I for ietf-http-wg-dist@listhub.w3.org; Tue, 22 Jan 2013 22:47:20 +0000
Resent-Date: Tue, 22 Jan 2013 22:47:20 +0000
Resent-Message-Id: <E1Txmci-0007OC-4I@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <w@1wt.eu>) id 1Txmcd-0007NX-G1 for ietf-http-wg@listhub.w3.org; Tue, 22 Jan 2013 22:47:15 +0000
Received: from 1wt.eu ([62.212.114.60]) by lisa.w3.org with esmtp (Exim 4.72) (envelope-from <w@1wt.eu>) id 1Txmcc-0008Bs-Hy for ietf-http-wg@w3.org; Tue, 22 Jan 2013 22:47:15 +0000
Received: (from willy@localhost) by mail.home.local (8.14.4/8.14.4/Submit) id r0MMkk4s032144; Tue, 22 Jan 2013 23:46:46 +0100
Date: Tue, 22 Jan 2013 23:46:46 +0100
From: Willy Tarreau <w@1wt.eu>
To: "William Chan (?????????)" <willchan@chromium.org>
Cc: James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <20130122224646.GO30692@1wt.eu>
References: <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com> <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com> <CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com> <CAK3OfOh0xqZsPYcb0uRLnebKWTKO7ARkJ4joFZoqjiBSTmwBTA@mail.gmail.com> <CABP7Rbeb6MOYmYPhhsKFFtQwE0JxuPyShXY0zpkA5YX2JPSY_w@mail.gmail.com> <CAA4WUYhg2qt_z_TrOAH0ax6mUpYPNeG4x740CgQi5Voq=50K_Q@mail.gmail.com> <20130122212748.GJ30692@1wt.eu> <CAA4WUYj51jRFosut2RsdE46SqoMDqa_r5EB7g4pj5eC2i73j7Q@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAA4WUYj51jRFosut2RsdE46SqoMDqa_r5EB7g4pj5eC2i73j7Q@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
Received-SPF: pass client-ip=62.212.114.60; envelope-from=w@1wt.eu; helo=1wt.eu
X-W3C-Hub-Spam-Status: No, score=-3.0
X-W3C-Hub-Spam-Report: AWL=-3.009, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Txmcc-0008Bs-Hy da79d6fc1a2acf522aa244275a549ef0
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/20130122224646.GO30692@1wt.eu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16121
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Tue, Jan 22, 2013 at 01:48:42PM -0800, William Chan (?????????) wrote:
> This is an intriguing counterproposal. Perhaps we should fork the
> thread to discuss it?

maybe, yes.

> I'd still like to get an answer here about what
> folks think about the acceptability of the rough costs of stateful
> compression.

It's hard to get opinions on what is considered as "heavy", it depends
a lot on what you're doing. In haproxy we default to 2*16kB of buffers
plus around 1 kB of state per connection. Some people are already
pressuring me to get rid of the 2*16kB for websocket connections so
that they don't need 32 GB of RAM per million connection.

And I know other contexts where completely buffering a 5 MB POST request
before passing it to a server is considered fairly acceptable.

One of the issue comes from the fact that no limit is suggested for how
large a request can be, and due to this we generally have to allocate
large amounts of resources "just in case". This is what is problematic
with whatever is stateful. However, I do think that being able to buffer
a full request will be acceptable for any agent (client, intermediary,
server), because all of them have to see the full request at least once,
and its buffers are adequately sized for this. If part of this request
is reused for next requests, there is no need for allocating more memory
and it's a win at the same time.

What is difficult to judge is how much we need to store for compression
states which have to be stored in addition to the request itself. As a
rule of thumb, I'd guess that doubling the whole state is quite annoying
but still manageable.

> One issue I see in this proposal is that, as always, it is difficult
> to predict the future. You don't know when you're parsing the document
> when you'll discover a new resource to request.

I don't understand what you mean here.

> How long do you delay
> the resource request in order to consolidate requests into a load
> group? The same thing is even more true for response headers.

I never want to delay anything, delays only do bad things when we
try to reduce latency.

In the example I proposed, the recipient receives the full headers
block, then from that point, all requests reuse the same headers
and can be processed immediately (just like pipelining in fact).

Concerning response headers, I'd say that you emit a first response
group with the headers from the first response, followed by the
response. When another response comes in, you have two possibilities,
either it shares the same headers and you can add a response to the
existing group, or it does not and you open a new group.

But that said, I would not spend too much energy trying to optimize
response headers. Right now they're commonly less important because
often accompanied with data and also because the downstream link
generally is much bigger than the upstream one. Still that's not
something to ditch either.

Regards,
Willy