Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tuning for HTTP
Willy Tarreau <w@1wt.eu> Thu, 03 March 2016 18:49 UTC
Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3596C1B31E3 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 3 Mar 2016 10:49:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.908
X-Spam-Level:
X-Spam-Status: No, score=-6.908 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.006, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2ULq-2YwoPRP for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 3 Mar 2016 10:49:51 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0BB061B3F0C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 3 Mar 2016 10:49:50 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1abYFB-0005Mw-CW for ietf-http-wg-dist@listhub.w3.org; Thu, 03 Mar 2016 18:45:01 +0000
Resent-Date: Thu, 03 Mar 2016 18:45:01 +0000
Resent-Message-Id: <E1abYFB-0005Mw-CW@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <w@1wt.eu>) id 1abYF5-0005M5-TN for ietf-http-wg@listhub.w3.org; Thu, 03 Mar 2016 18:44:55 +0000
Received: from wtarreau.pck.nerim.net ([62.212.114.60] helo=1wt.eu) by lisa.w3.org with esmtp (Exim 4.80) (envelope-from <w@1wt.eu>) id 1abYF3-0007Bk-Mi for ietf-http-wg@w3.org; Thu, 03 Mar 2016 18:44:55 +0000
Received: (from willy@localhost) by pcw.home.local (8.14.3/8.14.3/Submit) id u23IiI9Q023863; Thu, 3 Mar 2016 19:44:18 +0100
Date: Thu, 03 Mar 2016 19:44:18 +0100
From: Willy Tarreau <w@1wt.eu>
To: Joe Touch <touch@isi.edu>
Cc: ietf-http-wg@w3.org
Message-ID: <20160303184418.GA18774@1wt.eu>
References: <56D74C23.5010705@isi.edu> <56D76A7E.7090507@isi.edu> <20160302232125.GA18275@1wt.eu> <56D77892.2000308@isi.edu> <20160303065545.GA18412@1wt.eu> <56D87BAC.4060204@isi.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <56D87BAC.4060204@isi.edu>
User-Agent: Mutt/1.4.2.3i
Received-SPF: pass client-ip=62.212.114.60; envelope-from=w@1wt.eu; helo=1wt.eu
X-W3C-Hub-Spam-Status: No, score=-7.0
X-W3C-Hub-Spam-Report: AWL=0.924, BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: lisa.w3.org 1abYF3-0007Bk-Mi 561c9a7513a99672bdf51f3d2558bb4a
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tuning for HTTP
Archived-At: <http://www.w3.org/mid/20160303184418.GA18774@1wt.eu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31166
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
On Thu, Mar 03, 2016 at 10:00:12AM -0800, Joe Touch wrote: > > This point is important because it means some proxies often should > > better wait for a passive close from a server than deciding to > > close themselves. > > Transparent proxies don't have that choice - they're governed by the > semantics of the connection (whether EOF == close or not). > > Non-transparent proxies shouldn't be opening one connection per > transaction anyway; they ought to use one or more persistent connections > and leave them open while they are interacting with the proxy. If they > do this, there won't be an issue with who closes the connection because > the close frequency should be very low. It not that black and white unfortunately, and in practice it's very common to see proxies fail in field above 500 connections per second because their TCP stack was not appropriately tuned, and with the default 60s TIME_WAIT timeout of their OS, they exhaust the default 28k source ports. The first things admins do in this case is to enable tcp_tw_recycle (which basically causes timewaits to be killed when needed), and this appears to solve the situation while it makes it even worse. Among the solutions, we can count on : - putting back idle connections into pools hoping that they will be reusable. Connection reuse rate still remains low on average. - keeping a high enough keep-alive idle timeout on the proxy and a smaller one on the server (when the proxy is a gateway installed on the server side) hoping for the server to close first - appropriately add "connection: close" into outgoing requests to ask the server to close after the response. - disabling lingering before closing when the HTTP state indicates the proxy has received all data - doing whatever is imaginable to avoid closing first These are just general principles and many derivatives may exist in various contexts, but these ones are definitely important points that HTTP implementors have to be aware of before falling into the same traps as the ones having done so previously. > >> In the bulk of HTTP connections, the server closes the connection, > >> either to drop a persistent connection or to indicate "EOF" for a transfer. > > > > Yes. > > > >> Clients generally don't enter TIME-WAIT, so reducing the time they spend > >> in a state they don't enter has no effect. > > > > They can if they close first and that's exactly the problem we absolutely > > want to avoid. > > TW buildup has two effects: > > 1) limits the number connection rate to a given IP address Exactly. > 2) consumes memory space (and potentially CPU resources) This one is vey cheap. A typical TW connection is just a few tens of bytes. > Neither is typically an issue for HCI-based clients. I don't know what you call HCI here, I'm sorry. > Servers have much > higher rate requirements for a given address when they act as a proxy > and consume more memory overall because they interact with a much larger > set of addresses. Servers are not penalized at all with the connection rate since it only limits the *outgoing* connection rate and not the incoming one. There's never any ambiguity when a SYN is received regarding the possibility that the connection still exists on the other side, which is why TW connections are recycled when receiving a new SYN. Regarding the memory usage, it remains very low compared to the memory used by the application itself. My personal record was at 5.5 million timewaits on a server at 90000 connections per second. It was only 300 MB of RAM on a server having something like 64 GB. And not everyone needs 90k conns/s but everyone needs more than 500/s nowadays in any infrastructure. > > There are certain cases where we had to put warnings in > > rfc7230/7540, especially in relation with proxies. The typical case is > > when a client closes a connection to a proxy (eg: a CONNECT tunnel) and > > the proxy is supposed to in turn close the connection to the server. In > > this case the proxy is the connection initiator, and it can very quickly > > condemn all of its source ports by accumulating TIME_WAITs there. > > That speaks to a mismanagement of port resources. If they are allocated > on a per-IP basis, they won't run out. Yes they do, that's the problem everyone running a load balancer faces! The highest connection rate you can reach per server is around 1000 with 64k ports! That started not being enough 15 years ago! > The error is in treating the pool > of source ports as global across all IP addresses, which TW does not > require. No, the problem is to keep a TW which blocks a precious resource which is the source port that is only addressed on 16 bits! > > I'm saying that by all means the > > server must close first to keep the TIME_WAIT on its side and never > > on the client side. A TIME_WAIT on a server is very cheap (a few tens > > of bytes of memory at worst) > > It costs exactly the same on the client and the server when implemented > correctly. It costs the same except that in one case it prevents a connection from being established while in the othe case it does not. I've seen people patch their kernels to lower the TIME_WAIT down to 2 seconds to address such shortcomings! Quite frankly, this workaround *is* causing trouble! > > and can be recycled when a new valid SYN > > arrives. > > The purpose of TW is to inhibit new SYNs involving the same port. When a > new SYN arrives on another port, that has no impact on existing TWs. I'm always talking about the same port. On todays hardware and real world workloads, source ports can be reused every second (60k conns/s). Only the server with TIME_WAIT can tell whether or not an incoming SYN is a retransmit or a new one. The client knows it's a new one but doesn't know if the server is still in LAST_ACK or has really closed, and due to this uncertainty it refrains from connecting. > > A TIME_WAIT on the client is not recyclable. That's why > > TIME_WAIT is a problem for the client and not for the server. > > See above; TW is *never* recyclable. Yes it definitely is on the server side, which is the point. When you receive a SYN whose ISN is higher than the end of the current window, it's a new one by definition (as indicated in RFC1122). > > The problem is that in some cases it's suggested that the client > > closes first and this causes such problems. > > That actually helps the server (see our 99 Infocom paper). Sure since the server doesn't receive any more traffic from this client, that definitely helps, but the point is to ensure traffic flows between the two hosts, not that one of them refrains from connecting. > > The only workaround for > > the client is to close with an RST by disabling lingering, > > That's not what SO_LINGER does. See: > http://man7.org/linux/man-pages/man7/socket.7.html But in practice it's used for this. When you disable lingering before closing, you purge any pending data which has the benefit that the data you just received from the server that carried an ACK for data you don't have anymore triggers a reset. Yes it's absolutely ugly but you have no other option when you are a client and are forced to close first due to the protocol. Don't forget that we're discussing a document whose outcome should be that protocols are designed in the future to avoid such horrible workarounds. > > but that's > > really ugly and unreliable : if the RST is lost while the server is > > in LAST_ACK (and chances are that it will happen if the ACK was lost > > already), the new connection will not open until this connection > > expires. > > TCP has a significant error regarding RSTs; the side that throws a RST > on an existing connection should really go into TW - for all the same > reasons that TW exists in the first place, to protect new connections > from old data still in the network. There are many other issues regarding RST. When you send an RST through a firewall, you'd better cross fingers for it not to be lost between the firewall and the destination, otherwise chances are that you won't get a second chance. That's one of the reasons why I'd love to live in a world where a client never has to close first. > > Also, there are people who face this issue and work around them using > > some OS-specific tunables which allow to blindly recycle some of these > > connections and these people don't understand the impacts of doing so. > > They really ought to read the literature. It's been out there so long it > can probably apply for a driver's license by now. When people see their production servers stall at 5% CPU because their LBs or proxies can't open new connections while full of TIME_WAIT, what they do is ask their preferred search engine which simply proposes them such advices : - https://ihazem.wordpress.com/2012/02/07/reducing-time_wait-socket-connections-recyclereuse/ - http://serverfault.com/questions/212093/how-to-reduce-number-of-sockets-in-time-wait - http://kaivanov.blogspot.fr/2010/09/linux-tcp-tuning.html - http://www.linuxbrigade.com/reduce-time_wait-socket-connections/ - http://www.stolk.org/debian/timewait.html Yes they all involve the wrong and nasty workarounds consisting in allowing to recycle outgoing TIME_WAIT connections, which is the worst ever thing to do (except the last one which explains how to modify the TW timeout in the kernel). This is a *real* problem in field, it has been for a while because some protocols have been designed for lower loads without imagining that one day source ports would be reused that often. While we have to deal with this the best we can, it's important to ensure the same mistake is not done again in the future. Regards, Willy
- Call for Adoption: TCP Tuning for HTTP Mark Nottingham
- Re: Call for Adoption: TCP Tuning for HTTP Willy Tarreau
- Re: Call for Adoption: TCP Tuning for HTTP Tim Wicinski
- Re: Call for Adoption: TCP Tuning for HTTP Cory Benfield
- Re: Call for Adoption: TCP Tuning for HTTP Thomas Mangin
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Call for Adoption: TCP Tuning for HTTP Scharf, Michael (Nokia - DE)
- Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tuning… Joe Touch
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Joe Touch
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Joe Touch
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Joe Touch
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Joe Touch
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Patrick McManus
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Joe Touch
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Joe Touch
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Patrick McManus
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: [tcpm] Call for Adoption: TCP Tuning for HTTP Mark Nottingham
- Re: [tcpm] Call for Adoption: TCP Tuning for HTTP Tim Wicinski
- Re: [tcpm] Call for Adoption: TCP Tuning for HTTP Willy Tarreau
- [Reposted to list] Re: Fwd: Re: [tcpm] FW: Call f… Kari Hurtta
- Re: [tcpm] Call for Adoption: TCP Tuning for HTTP Ben Niven-Jenkins
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Yoshifumi Nishida
- Re: [tcpm] Call for Adoption: TCP Tuning for HTTP Willy Tarreau
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Yoshifumi Nishida
- RE: Call for Adoption: TCP Tuning for HTTP Salvatore Loreto
- RE: Call for Adoption: TCP Tuning for HTTP Daniel Stenberg
- Re: Call for Adoption: TCP Tuning for HTTP Leif Hedstrom
- Re: Call for Adoption: TCP Tuning for HTTP Amos Jeffries
- Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tu… Willy Tarreau
- Re: Call for Adoption: TCP Tuning for HTTP Leif Hedstrom
- Re: Call for Adoption: TCP Tuning for HTTP Joe Touch
- Re: Call for Adoption: TCP Tuning for HTTP Matthew Kerwin
- RE: Call for Adoption: TCP Tuning for HTTP Daniel Stenberg
- RE: Call for Adoption: TCP Tuning for HTTP Daniel Stenberg