Re: delta encoding and state management

William Chan (陈智昌) <willchan@chromium.org> Tue, 22 January 2013 23:09 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F4B921F8A6D for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 15:09:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.732
X-Spam-Level:
X-Spam-Status: No, score=-8.732 tagged_above=-999 required=5 tests=[AWL=0.945, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LZJK9tyQXQCJ for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 22 Jan 2013 15:09:39 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 9DC6B21F8A6C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 22 Jan 2013 15:09:38 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1TxmxO-0008Aq-2S for ietf-http-wg-dist@listhub.w3.org; Tue, 22 Jan 2013 23:08:42 +0000
Resent-Date: Tue, 22 Jan 2013 23:08:42 +0000
Resent-Message-Id: <E1TxmxO-0008Aq-2S@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <willchan@google.com>) id 1TxmxH-0008A1-Fw for ietf-http-wg@listhub.w3.org; Tue, 22 Jan 2013 23:08:35 +0000
Received: from mail-qa0-f50.google.com ([209.85.216.50]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <willchan@google.com>) id 1TxmxG-0002aJ-Rz for ietf-http-wg@w3.org; Tue, 22 Jan 2013 23:08:35 +0000
Received: by mail-qa0-f50.google.com with SMTP id cr7so164259qab.2 for <ietf-http-wg@w3.org>; Tue, 22 Jan 2013 15:08:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=X0aSE24ULKkxEWxGZnSVUiFOXf3nfuwjefCnlioP7qQ=; b=CVMdwlYA+V+BKjtSjpHlPc6co8fmGuPHbzdXnJraZwuDmyHv6m+U1qRpbApvZQoNx6 aEpu22gSeJJ9X5mnK9zdu979uak4juylHXmdwMAg8YL4FIQueDCBRt+nJ/YnXsBf7Pxg 9vDXfLUMJgs/BnJ8/+Ph+MpY4ElaZFBUSBj08f0wlEsGD6412d6GhEi5hzmTZnDjvSKo 2/CNwvOQB+naOi6kbQu7s403OWdt1VOycuGFC2SoEQPd0DoPFjCsBkw9MJjvHZm4wGiu erVD4qoeQ8ptBZ8WZHl3vSpTaF+s57P/6OtHUQtM0QPGKqKoxbMTAuV7akMkYeLKCzau +Ngw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=X0aSE24ULKkxEWxGZnSVUiFOXf3nfuwjefCnlioP7qQ=; b=lRfCXW8Qi5iNBrPyk5pa9728tOS95Sb5ZD+xPlEeOIJokwU/ecOMXF/5xpp5DGhECU 8aAZykRWyY/FZUHpfZKsU4YdtT6gjooUPbt3RC/JapE2E3JAyX81fuIu/P02ABkvojTR QrXbMHFxNkZ6K6MhDpjYqreLVny88RZKE76Ek=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=X0aSE24ULKkxEWxGZnSVUiFOXf3nfuwjefCnlioP7qQ=; b=UIU0WTIMqQNDqnOhfpEEgObrnYPbvRpJbqrAciAcTwZzeOp4inFJK5KgsHPSpO2Rdw A0PpunIgj05al8GIi6ay51hHWon26b5LG0kfnEkDyG0wwi1jhVZswaLMux++TjBUbdJO kZX5y7EXvdKYVMIneeRPXY5Kj8KiK7hTGSiXBbMtt/5yw8znhE7HDsDycmcI0XmYBTOe jmJc7MAaL8zozk1uk4wxJ8A9Nse/ATCxPMq/9gG8vP62hxTvzGkpPqOkpt8tcfinNrEH ngn9flsj3LJI+GSIGmWI9dGVIW4Qu9CgwJytCv2XzjsOHAzRBK/OiTkIuREstMZ5ZJhR Z4Zg==
MIME-Version: 1.0
X-Received: by 10.224.76.208 with SMTP id d16mr25105778qak.46.1358896088880; Tue, 22 Jan 2013 15:08:08 -0800 (PST)
Sender: willchan@google.com
Received: by 10.229.57.163 with HTTP; Tue, 22 Jan 2013 15:08:08 -0800 (PST)
In-Reply-To: <20130122224646.GO30692@1wt.eu>
References: <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com> <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com> <CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com> <CAK3OfOh0xqZsPYcb0uRLnebKWTKO7ARkJ4joFZoqjiBSTmwBTA@mail.gmail.com> <CABP7Rbeb6MOYmYPhhsKFFtQwE0JxuPyShXY0zpkA5YX2JPSY_w@mail.gmail.com> <CAA4WUYhg2qt_z_TrOAH0ax6mUpYPNeG4x740CgQi5Voq=50K_Q@mail.gmail.com> <20130122212748.GJ30692@1wt.eu> <CAA4WUYj51jRFosut2RsdE46SqoMDqa_r5EB7g4pj5eC2i73j7Q@mail.gmail.com> <20130122224646.GO30692@1wt.eu>
Date: Tue, 22 Jan 2013 15:08:08 -0800
X-Google-Sender-Auth: tDPq66__sjINebeW6TfWs5gxor8
Message-ID: <CAA4WUYjuCGyJjA2nN_-oh8TunrA7owWFQRhLg-ps+fkp9T47Ew@mail.gmail.com>
From: "William Chan (陈智昌)" <willchan@chromium.org>
To: Willy Tarreau <w@1wt.eu>
Cc: James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset="ISO-8859-1"
X-Gm-Message-State: ALoCoQkELAwYspqHrm07i2XX4zOvq8OUPKF4Uh3cROJjNrJ3joMKe9VkIbeFelSYr9ai9QLL5QDT7T7gSPkmJ1OVt08ETZtbmHLH+yyh7Hmqkx9Wkzq1UDpsGkfRrNKFLq6HNYYlrqlWc/ywq0ahlIqM5wk12pv1UDgTmfvah4zsPAUEwP//ruQ5MbsspXHazxXpcplRh4Zj
Received-SPF: pass client-ip=209.85.216.50; envelope-from=willchan@google.com; helo=mail-qa0-f50.google.com
X-W3C-Hub-Spam-Status: No, score=-3.3
X-W3C-Hub-Spam-Report: AWL=-2.489, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1TxmxG-0002aJ-Rz 9d455e0739dc0f918c1ce2acebfd110b
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/CAA4WUYjuCGyJjA2nN_-oh8TunrA7owWFQRhLg-ps+fkp9T47Ew@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16122
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Tue, Jan 22, 2013 at 2:46 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Tue, Jan 22, 2013 at 01:48:42PM -0800, William Chan (?????????) wrote:
>> This is an intriguing counterproposal. Perhaps we should fork the
>> thread to discuss it?
>
> maybe, yes.
>
>> I'd still like to get an answer here about what
>> folks think about the acceptability of the rough costs of stateful
>> compression.
>
> It's hard to get opinions on what is considered as "heavy", it depends
> a lot on what you're doing. In haproxy we default to 2*16kB of buffers
> plus around 1 kB of state per connection. Some people are already
> pressuring me to get rid of the 2*16kB for websocket connections so
> that they don't need 32 GB of RAM per million connection.
>
> And I know other contexts where completely buffering a 5 MB POST request
> before passing it to a server is considered fairly acceptable.
>
> One of the issue comes from the fact that no limit is suggested for how
> large a request can be, and due to this we generally have to allocate
> large amounts of resources "just in case". This is what is problematic
> with whatever is stateful. However, I do think that being able to buffer
> a full request will be acceptable for any agent (client, intermediary,
> server), because all of them have to see the full request at least once,
> and its buffers are adequately sized for this. If part of this request
> is reused for next requests, there is no need for allocating more memory
> and it's a win at the same time.
>
> What is difficult to judge is how much we need to store for compression
> states which have to be stored in addition to the request itself. As a
> rule of thumb, I'd guess that doubling the whole state is quite annoying
> but still manageable.
>
>> One issue I see in this proposal is that, as always, it is difficult
>> to predict the future. You don't know when you're parsing the document
>> when you'll discover a new resource to request.
>
> I don't understand what you mean here.
>
>> How long do you delay
>> the resource request in order to consolidate requests into a load
>> group? The same thing is even more true for response headers.
>
> I never want to delay anything, delays only do bad things when we
> try to reduce latency.

One of us has the wrong mental model for how the proposal would work.
Let's figure this out.

Let's say the browser requests foo.html. It receives a response packet
for foo.html, referencing 1.js. 5ms later, it receives packet 2 for
foo.html which references 2.js. 5ms it receives packet 3 for foo.html
which references 3.js. And so on. You say no delays. So does this mean
each "group" only includes one object each time?

And now let's ignore the 5ms delays. Consider how WebKit works. Let's
say WebKit has all of foo.html. It starts parsing it. It encounters
1.js. It immediately sends the resource request to the network stack.
It hasn't parsed the full document yet, so it doesn't know if it'll
encounter any more resources. Each time it encounters a resource while
parsing the document, it will send it to the network stack (in
Chromium and latest versions of Safari, this is a separate process).
What is the network stack to do if, as you say, it should never delay
anything? If I understand correctly, each "group" would always only
include one object then.

>
> In the example I proposed, the recipient receives the full headers
> block, then from that point, all requests reuse the same headers
> and can be processed immediately (just like pipelining in fact).
>
> Concerning response headers, I'd say that you emit a first response
> group with the headers from the first response, followed by the
> response. When another response comes in, you have two possibilities,
> either it shares the same headers and you can add a response to the
> existing group, or it does not and you open a new group.

Wait, is this the critical misunderstanding? Are you maintaining state
across requests and responses? Isn't this a minor modification on the
"simple" compressor? I was assuming you were trying to be stateless.

>
> But that said, I would not spend too much energy trying to optimize
> response headers. Right now they're commonly less important because
> often accompanied with data and also because the downstream link
> generally is much bigger than the upstream one. Still that's not
> something to ditch either.
>
> Regards,
> Willy
>