Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tuning for HTTP

Joe Touch <touch@isi.edu> Thu, 03 March 2016 18:06 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E68AC1B2B70 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 3 Mar 2016 10:06:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.908
X-Spam-Level:
X-Spam-Status: No, score=-6.908 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.006, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VCp5_AvqnZmJ for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 3 Mar 2016 10:06:22 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B6D411B2B23 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 3 Mar 2016 10:06:21 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1abXZ2-0003Jd-8q for ietf-http-wg-dist@listhub.w3.org; Thu, 03 Mar 2016 18:01:28 +0000
Resent-Date: Thu, 03 Mar 2016 18:01:28 +0000
Resent-Message-Id: <E1abXZ2-0003Jd-8q@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <touch@isi.edu>) id 1abXYv-0003IA-7l for ietf-http-wg@listhub.w3.org; Thu, 03 Mar 2016 18:01:21 +0000
Received: from boreas.isi.edu ([128.9.160.161]) by maggie.w3.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <touch@isi.edu>) id 1abXYs-00034K-Hq for ietf-http-wg@w3.org; Thu, 03 Mar 2016 18:01:20 +0000
Received: from [192.168.1.189] (cpe-172-250-251-17.socal.res.rr.com [172.250.251.17]) (authenticated bits=0) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id u23I0FqQ023694 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Thu, 3 Mar 2016 10:00:16 -0800 (PST)
To: Willy Tarreau <w@1wt.eu>
References: <56D74C23.5010705@isi.edu> <56D76A7E.7090507@isi.edu> <20160302232125.GA18275@1wt.eu> <56D77892.2000308@isi.edu> <20160303065545.GA18412@1wt.eu>
Cc: touch@isi.edu, ietf-http-wg@w3.org
From: Joe Touch <touch@isi.edu>
Message-ID: <56D87BAC.4060204@isi.edu>
Date: Thu, 03 Mar 2016 10:00:12 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <20160303065545.GA18412@1wt.eu>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: touch@isi.edu
Received-SPF: none client-ip=128.9.160.161; envelope-from=touch@isi.edu; helo=boreas.isi.edu
X-W3C-Hub-Spam-Status: No, score=-8.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1abXYs-00034K-Hq 829dd9b24a97959197b9088c3faa59be
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tuning for HTTP
Archived-At: <http://www.w3.org/mid/56D87BAC.4060204@isi.edu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31165
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>


On 3/2/2016 10:55 PM, Willy Tarreau wrote:
> On Wed, Mar 02, 2016 at 03:34:42PM -0800, Joe Touch wrote:
>>
>>
>> On 3/2/2016 3:21 PM, Willy Tarreau wrote:
>>> On Wed, Mar 02, 2016 at 02:34:38PM -0800, Joe Touch wrote:
>>>> - it has significant errors
>>>>
>>>> 	TIME-WAIT issues apply to servers, not clients.
>>>
>>> Sorry but no it's the opposite. 
>>
>> TIME-WAIT is a state caused by the side that closes the connection.
> 
> ... that closes the connection *first* (since both sides close it).

Of course.

> This point is important because it means some proxies often should
> better wait for a passive close from a server than deciding to
> close themselves.

Transparent proxies don't have that choice - they're governed by the
semantics of the connection (whether EOF == close or not).

Non-transparent proxies shouldn't be opening one connection per
transaction anyway; they ought to use one or more persistent connections
and leave them open while they are interacting with the proxy. If they
do this, there won't be an issue with who closes the connection because
the close frequency should be very low.

>> In the bulk of HTTP connections, the server closes the connection,
>> either to drop a persistent connection or to indicate "EOF" for a transfer.
> 
> Yes.
> 
>> Clients generally don't enter TIME-WAIT, so reducing the time they spend
>> in a state they don't enter has no effect.
> 
> They can if they close first and that's exactly the problem we absolutely
> want to avoid.

TW buildup has two effects:

	1) limits the number connection rate to a given IP address

	2) consumes memory space (and potentially CPU resources)

Neither is typically an issue for HCI-based clients. Servers have much
higher rate requirements for a given address when they act as a proxy
and consume more memory overall because they interact with a much larger
set of addresses.

> There are certain cases where we had to put warnings in
> rfc7230/7540, especially in relation with proxies. The typical case is
> when a client closes a connection to a proxy (eg: a CONNECT tunnel) and
> the proxy is supposed to in turn close the connection to the server. In
> this case the proxy is the connection initiator, and it can very quickly
> condemn all of its source ports by accumulating TIME_WAITs there. 

That speaks to a mismanagement of port resources. If they are allocated
on a per-IP basis, they won't run out. The error is in treating the pool
of source ports as global across all IP addresses, which TW does not
require.

> But the
> same problem exists with idle persistent connections that clients must
> avoid to close themselves if there's any hope the server will close soon.

I agree - this is the same problem if it exists -- port mismanagement.

>>> A server has no issue with knowing that
>>> a SYN belongs to a new session by seeing its ISN greater than the end
>>> of the previous window. 
>>
>> That's exactly the reason the server keeps information in the TIME-WAIT
>> state.
>>
>>> On the opposite, a client cannot know if the
>>> remote server it wants to connect to is safe for reuse 
>>
>> TIME-WAIT isn't just for new connections; it's to protect against
>> injecting traffic from previous connections that is delayed into new
>> connections...
> 
> Yes I'm well aware of this :-)
> 
>>> and will refrain
>>> from establishing a new connection during the whole TIME_WAIT state,
>>> effectively preventing itself from doing its job.
>>
>> If that's what it doesn, that's not TIME-WAIT - it's some new state in
>> the OS to avoid the possibility of hitting a TIME-WAIT at the server.
>> That's mislabeled at best, and defeats the entire purpose of the
>> TIME-WAIT at the server anyway.
> 
> No I'm not saying any such thing,

OK - glad to hear that..

> I'm saying that by all means the
> server must close first to keep the TIME_WAIT on its side and never
> on the client side. A TIME_WAIT on a server is very cheap (a few tens
> of bytes of memory at worst) 

It costs exactly the same on the client and the server when implemented
correctly.

> and can be recycled when a new valid SYN
> arrives.

The purpose of TW is to inhibit new SYNs involving the same port. When a
new SYN arrives on another port, that has no impact on existing TWs.

> A TIME_WAIT on the client is not recyclable. That's why
> TIME_WAIT is a problem for the client and not for the server.

See above; TW is *never* recyclable.

> The problem is that in some cases it's suggested that the client
> closes first and this causes such problems.

That actually helps the server (see our 99 Infocom paper).

> The only workaround for
> the client is to close with an RST by disabling lingering,

That's not what SO_LINGER does. See:
http://man7.org/linux/man-pages/man7/socket.7.html

> but that's
> really ugly and unreliable : if the RST is lost while the server is
> in LAST_ACK (and chances are that it will happen if the ACK was lost
> already), the new connection will not open until this connection
> expires.

TCP has a significant error regarding RSTs; the side that throws a RST
on an existing connection should really go into TW - for all the same
reasons that TW exists in the first place, to protect new connections
from old data still in the network.

> So by all means we must do whatever we can to avoid to cause
> TIME_WAITs to be accumulated on the client side and that was the
> point mentionned in the document since it's supposed to be used as
> a reference for future protocol designs. 

This is the error I mentioned, and it should not be recommended.

> For example in HTTP/2, the
> GOAWAY frame makes this a bit easier to take care of, since we can
> declare an intent to close that will cause the other side to close.

Receiving a GOAWAY says "don't start new connections", not "shut this
one down now". That's the right behavior, because it says nothing about
the semantics of existing connections.

> Also, there are people who face this issue and work around them using
> some OS-specific tunables which allow to blindly recycle some of these
> connections and these people don't understand the impacts of doing so.

They really ought to read the literature. It's been out there so long it
can probably apply for a driver's license by now.

> The doc will have to be clear enough to discourage them from doing so,
> and to adapt the client code instead.

We don't need a new doc to address this, especially (IMO) incorrectly.

Joe