Re: HTTP/2 and TCP CWND

Peter Lepeska <bizzbyster@gmail.com> Fri, 26 April 2013 11:58 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B0A5021F9867 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 26 Apr 2013 04:58:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.298
X-Spam-Level:
X-Spam-Status: No, score=-10.298 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4E0riwTFgh1f for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 26 Apr 2013 04:58:12 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 8928E21F9865 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 26 Apr 2013 04:58:12 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UVhGz-0000Kr-So for ietf-http-wg-dist@listhub.w3.org; Fri, 26 Apr 2013 11:57:05 +0000
Resent-Date: Fri, 26 Apr 2013 11:57:05 +0000
Resent-Message-Id: <E1UVhGz-0000Kr-So@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <bizzbyster@gmail.com>) id 1UVhGs-0000KB-Sr for ietf-http-wg@listhub.w3.org; Fri, 26 Apr 2013 11:56:58 +0000
Received: from mail-qe0-f44.google.com ([209.85.128.44]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <bizzbyster@gmail.com>) id 1UVhGq-0002DC-V0 for ietf-http-wg@w3.org; Fri, 26 Apr 2013 11:56:58 +0000
Received: by mail-qe0-f44.google.com with SMTP id w7so2710207qeb.31 for <ietf-http-wg@w3.org>; Fri, 26 Apr 2013 04:56:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:message-id:references:to:x-mailer; bh=K9CSCm6eo5L8yOptiWSBXCnAKNDgTmEvbkBhwSsTdyk=; b=eKUW9VrL2CfQP+4J21zHBS9SAXWkoruI7Ted4QuNSEzNh6ZlWiyaf8rAqBCfzuJPoX i7FUthlBPP/LJPRE3QwLQmIsrA2FWw1vtwLUXyJ6tpG2CR7wQ8AXnLlGVrGwA4/cHBmz dHQka2gxP2z8TPiA/Qrc/EApdpNwVUOkNVQQXKVKFrOa2Hyi10taAyzelVbptJ6swAZq 3TPkJh3mbIkiL5W7cMN2jFijJr8cZd992sAHvMzcV1TjF78Lelh8VnbqbVfnslEp5y4I L5hjxECf2UvT8ro/8dpArO8zsBEi0gzA9PNK0vDaAjxqP3HQzzXalZCIqzPu4JYanWPN 3mSA==
X-Received: by 10.224.51.18 with SMTP id b18mr40116187qag.50.1366977391289; Fri, 26 Apr 2013 04:56:31 -0700 (PDT)
Received: from [192.168.1.2] (c-98-217-194-99.hsd1.ma.comcast.net. [98.217.194.99]) by mx.google.com with ESMTPSA id bv6sm16604267qab.5.2013.04.26.04.56.16 for <multiple recipients> (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 26 Apr 2013 04:56:29 -0700 (PDT)
Content-Type: multipart/alternative; boundary="Apple-Mail=_00E845B4-571B-4E1B-BAA4-A82CBB1FECF5"
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
From: Peter Lepeska <bizzbyster@gmail.com>
In-Reply-To: <CAA4WUYgpmYyBu337vsjhh=9toTNrb-nT9MQkk7j3inOs4GZwCA@mail.gmail.com>
Date: Fri, 26 Apr 2013 07:56:16 -0400
Cc: Roberto Peon <grmocg@gmail.com>, "Eggert, Lars" <lars@netapp.com>, Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>, "Simpson, Robby (GE Energy Management)" <robby.simpson@ge.com>, Eliot Lear <lear@cisco.com>, Robert Collins <robertc@squid-cache.org>, Jitu Padhye <padhye@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "Brian Raymor (MS OPEN TECH)" <Brian.Raymor@microsoft.com>, Rob Trace <Rob.Trace@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Martin Thomson <martin.thomson@skype.net>, Martin Stiemerling <martin.stiemerling@neclab.eu>
Message-Id: <178D206F-D3F4-4A6F-883D-DAADB71DF9D4@gmail.com>
References: <516B8824.8040904@cisco.com> <DF8F6DB7E5D58B408041AE4D927B2F48CBB88103@CINURCNA14.e2k.ad.ge.com> <CAP+FsNfeUtKfOMPKriYP7Ak_YzsjEFKvprJOAQaxYP7_BxTBsw@mail.gmail.com> <cf53405c48dc431693573a9148776c8a@BN1PR03MB072.namprd03.prod.outlook.com> <8B0AAE84-CAB8-483B-99FD-DA6A0CA13395@netapp.com> <CAP+FsNca6TOB2B-ntnEHvzPx3JY=6Qcp34RgF7uQsbdsLUbptQ@mail.gmail.com> <95367D0C-D34C-4542-A0DE-921BBDE6A239@netapp.com> <CAP+FsNfGBYXABwLJJMk6rC_GAMVD2RXaMFEu93oGwMaCuCzN7Q@mail.gmail.com> <856946E5-2239-40BB-AC2D-716D6FDAA9FF@netapp.com> <CAP+FsNd97LUZNRJrf=vCc_tmnxn8ygGZ4EyOfVywt=cuc_qutA@mail.gmail.com> <CANmPAYFhD8kwiM5F1vG0A5Thkrf4Dmw+64nDhvOjzPDVONU7mQ@mail.gmail.com> <CAA4WUYi+ewPmapspBETX=7m1Pxvft2u7C_7MHVJ7h1s0BFWN-Q@mail.gmail.com> <DF66CBBE-D828-4647-B42F-E3014309AFA7@gmail.com> <CAA4WUYgnUr_-Zja9y-+=uRjses=qU9MxQ4pZZa5xYjNzLRv4+g@mail.gmail.com> <9F6F9423-2164-4E7A-95DF-A9CE60E718C8@gmail.com> <CAA4WUYjBQc2njX70tHrkzw5Z2HHA8YtX0asySC4Y8y-PZV9efA@mail.gmail.com> <BCA217 8C-5F80-4A6F-95EF-0DC40B84BA7A@gmail.com> <CAA4WUYgpmYyBu337vsjhh=9toTNrb-nT9MQkk7j3inOs4GZwCA@mail.gmail.com>
To: =?utf-8?Q?William_Chan_=28=E9=99=88=E6=99=BA=E6=98=8C=29?= <willchan@chromium.org>
X-Mailer: Apple Mail (2.1503)
Received-SPF: pass client-ip=209.85.128.44; envelope-from=bizzbyster@gmail.com; helo=mail-qe0-f44.google.com
X-W3C-Hub-Spam-Status: No, score=-0.8
X-W3C-Hub-Spam-Report: DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1UVhGq-0002DC-V0 b46a782cfa7b03fb17c76177e9556f1e
X-Original-To: ietf-http-wg@w3.org
Subject: Re: HTTP/2 and TCP CWND
Archived-At: <http://www.w3.org/mid/178D206F-D3F4-4A6F-883D-DAADB71DF9D4@gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/17603
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

You are right on both counts. First I was fooled by IE9 firing the load event earlier so I see now that both are fully loaded at about 6.5 seconds. Chrome a tiny bit quicker even. 

Second I am unacostomed to looking at captures for bandwidth limited links and so was fooled by the evenly spaced acks coming up from the browser into thinking that the sender was waiting on those acks to send it's next burst of packets, but on closer inspection that's not always the case. So I do see the cwnd (what I was calling send window) increasing as expected.

Thanks for the clarifications and also the tip on how to enable tcpdumps on WPT.

Peter 
On Apr 25, 2013, at 10:09 PM, William Chan (陈智昌) <willchan@chromium.org>; wrote:

> Cwnd is an internal implementation state variable maintained on a per connection basis. It's never explicitly advertised on the wire. You can only guess at it by the number of in flight packets. I've already told you that due to how the SSL handshake does not saturate IW10 due to Google's smallish certificate chain. Can you clarify what you mean by send window? Did you mean receive window? And were you applying window scaling? Please clarify further. I don't see the 2*MSS that you're talking about.
> 
> Why are you saying SPDY is slower? Please, you don't actually think SPDY is slower here do you? :P Hehe, J/K, Chrome's got some issues here for sure. However, what's probably confusing you is that IE9 fires the load event way earlier, so if you are strictly evaluating by the time the load event fires, IE9 certainly looks faster. But look at when all these resources load. SPDY is rockin nearly a second earlier (5.8~ compared to 6.5 for IE9). Don't let browsers fool you, look at the waterfalls yourself to see when the resource downloads complete.
> 
> You can look at the tcpdumps if you re-run WPT with tcpdumps enabled. Check out my run: http://www.webpagetest.org/result/130426_NX_1ZA/. I've captured the tcpdump for you.
> 
> Let's not get sidetracked on a tangent analyzing this waterfall. IW10 is a reality, and highly parallelized web sites are out there. Effective initcwnd>=120 is not uncommon. If you need more examples, I can provide more example URLs offlist (I don't want to derail this thread).
> 
> On Thu, Apr 25, 2013 at 12:16 PM, Peter Lepeska <bizzbyster@gmail.com>; wrote:
> I don't see evidence of IW10 in these packet captures.  In fact I'm not seeing the send window increase at all -- http://cloudshark.org/captures/35ff23aa38e1?filter=tcp.stream%20eq%202 -- from the initial 2 MSS. These connections are running very slowly.
> 
> This effect might be what's killing SPDY (chrome) performance for the same site over DSL, which is more than 2x slower than IE -- http://www.webpagetest.org/result/130425_VZ_ZC1/1/details/. My guess is that because SPDY operates over a single connection, it's SOL when that connection get stuck in a low gear like these appear to be. IE9 uses too many connections and suffers from retrans but is more robust to this effect b/c it has such high TCP connection concurrency. Again, this is just a guess though since I can't look at the WPT captures from the chrome test.
> 
> Peter
> 
> On Apr 25, 2013, at 12:21 PM, William Chan (陈智昌) <willchan@chromium.org>; wrote:
> 
>> There's this small internet company that has a cute kitten photo search product, check it out: https://www.google.com/search?q=kittens&tbm=isch.
>> 
>> I just kicked off a WebPageTest run so you can analyze it for yourself: http://www.webpagetest.org/result/130425_7X_TQT/
>> It shards 4 ways, with 6 connections per host shard. That's 24 connections with IW10. That's effective initcwnd=240. And I'm not even mentioning the other connections it has open. Check out how freakishly long the SSL handshakes take when you model this on a "DSL" connection with dummynet. Ouch.
>> You can see how it triggers wonderful TCP level behavior in http://cloudshark.org/captures/20818577e6b9/graphs/~?filters=tcp.analysis.retransmission%2C%21%28tcp.analysis.retransmission%29+%7Bother%7D. That's a graph of the retransmitted packets vs non-retransmitted.
>> 
>> Despite doing 4 way sharding, this example isn't as bad as it could be, because the SSL SERVER_HELLO and CERTIFICATE messages don't use the full IW10 available cwnd. When you examine sites that don't use HTTPS, and blast out an image object, you see this get way worse. I've got a lot of those URLs if you want to see this too.
>> 
>> Patrick and I have been discussing this off and on and how to ameliorate this in our respective browsers. We've got different takes, and I'm watching his work closely to see how it turns out :)
>> 
>> 
>> On Thu, Apr 25, 2013 at 8:58 AM, Peter Lepeska <bizzbyster@gmail.com>; wrote:
>> I have not seen initcwnds of 120+. Can you send me a URL that would have that behavior?
>> 
>> Thanks,
>> 
>> Peter
>> 
>> On Apr 24, 2013, at 3:27 PM, William Chan (陈智昌) <willchan@chromium.org>; wrote:
>> 
>>> On Wed, Apr 24, 2013 at 11:52 AM, Peter Lepeska <bizzbyster@gmail.com>; wrote:
>>> 
>>> On Apr 24, 2013, at 12:36 PM, William Chan (陈智昌) <willchan@chromium.org>; wrote:
>>> 
>>>> On Wed, Apr 24, 2013 at 8:40 AM, Peter Lepeska <bizzbyster@gmail.com>; wrote:
>>>> Not sure this has been proposed before, but better than caching would be dynamic initial CWND based on web server object size hinting.
>>>> 
>>>> Web servers often know the size of the object that will be sent to the browser. The web server therefore can help the transport make smart initial CWND decisions. For instance, if an object is less than 20KB, which is true for the majority of objects on web pages, the web server could tell the transport to increase the CWND to a size that would allow the object to be sent in the initial window.
>>>> 
>>>> In the HTTP/2 case where we often are multiplexing, this doesn't seem to make as much sense. Also, I'm not sure that it's a reasonable argument to select initcwnd in absence of any congestion information...or were you suggesting merely tweaking the initcwnd a little bit if that little bit would make a difference in terms of fitting the whole object in the initcwnd?
>>> 
>>> Right. A small number of multiplexed connections transfer less of a given page's data in slow start so this will have less impact for those connections. However it's worth nothing that often the first object requested over the multiplexed channel will be the root object alone and of course number of round trips to download the root object directly impacts page load time.
>>> 
>>> We should move away from this assumption that the first request is for the root object. I've been advising companies on how to do SPDY deployments, and a common scenario is origin server hosting the root doc + SPDY capable CDN for the subresources (primarily images served on the edge). For these CDNs, they're going to serve a burst of traffic immediately, and those subresources often have high impact on the above the fold perceived latency (in many of today's websites, images form a big part of the initial viewport's content, so serving these images quickly is vital). In today's non-SPDY / HTTP2 case, they just domain shard and do 6 * [2-4] sharded hosts, for 12-24 connections with IW10, starting out with effective initcwnds of 120+. They are gaming initcwnd to the benefit of their users that don't have a congested path, and severe detriment of users that cannot handle such high bursts. This situation sucks.
>>> 
>>> 
>>>> 
>>>> Caching attempts to reuse old congestion information, although it has been reasonably pointed out that the validity of that information is questionable. It's an open research question as far as I'm concerned, and I'd love to see any data people had.
>>>>  
>>>> 
>>>> For larger objects, the benefit of a large CWND is minimal so the web server could tell the transport to use the default and let the connection ramp slowly. 
>>>> 
>>>> I'm not sure this makes sense. GMail and Google+ and I'm sure other large web apps have rather large scripts and stylesheets, but they still care about their initial page load latency. Perhaps you're making the assumption that large objects implies the user does not have interactivity / low-latency expectations? If so, that's invalid. Those roundtrips still matter and I can tell you our Google app teams work very hard to eliminate them. Or maybe your definition is large is larger than what I'm thinking.
>>> 
>>> The threshold is tunable. My point here is if the TCP connection is going to be used to download a 100 MB file,  or stream a video, then slow start has a negligible impact on overall download time for the file.
>>> 
>>> Sure, if you're doing non-interactive large data transfers, then the slow start latency isn't going to matter much. I don't view that conversation as very interesting, and no one's agitating for change there. The contentious and more interesting discussion is how to safely, yet quickly start up TCP connections for interactive bursty traffic like web browsing. I include video web sites like Youtube amongst that, even if their objects are large, since the time to start viewing the video is still important.
>>>  
>>> 
>>>> 
>>>> 
>>>> Peter
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Mon, Apr 15, 2013 at 8:16 PM, Roberto Peon <grmocg@gmail.com>; wrote:
>>>> 
>>>> 
>>>> 
>>>> On Mon, Apr 15, 2013 at 4:03 PM, Eggert, Lars <lars@netapp.com>; wrote:
>>>> Hi,
>>>> 
>>>> 
>>>> On Apr 15, 2013, at 15:56, Roberto Peon <grmocg@gmail.com>; wrote:
>>>> > The interesting thing about the client mucking with this data is that, so
>>>> > long as the server's TCP implementation is smart enough not to kill itself
>>>> > (and some simple limits accomplish that), the only on the client harms is
>>>> > itself...
>>>> 
>>>> I fail to see how you'd be able to achieve this. If the server uses a CWND that is too large, it will inject a burst of packets into the network that will overflow a queue somewhere. Unless you use WFQ or something similar on all bottleneck queues (not generally possible), that burst will likely cause packet loss to other flows, and will therefore impact them.
>>>> 
>>>> The most obvious way is that the server doesn't use a CWND which is larger than the largest currently active window to a similar RTT. The other obvious way is to limit it to something like 32, which is about what we'd see with the opening of a mere 3 regular HTTP connections! This at least makes the one connection competitive with the circumventions that HTTP/1.X currently exhibits.
>>>>  
>>>> TCP is a distributed resource sharing algorithm to allocate capacity throughout a network. Although the rates for all flows are computed in isolation, the effect of that computation is not limited to the flow in question, because all flows share the same queues.
>>>> 
>>>> Yes, that is what I've been arguing w.r.t. the many connections that the application-layer currently opens :)
>>>> It becomes a question of which dragon is actually most dangerous.
>>>> 
>>>> -=R
>>>>  
>>>> 
>>>> Lars
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>