Re: statement regarding keepalives

"Olle E. Johansson" <> Thu, 16 August 2018 07:44 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4A797130EF7; Thu, 16 Aug 2018 00:44:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9riUKoyu3d8T; Thu, 16 Aug 2018 00:44:29 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 7EA53130E74; Thu, 16 Aug 2018 00:44:28 -0700 (PDT)
Received: from [] ( []) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id E603431B; Thu, 16 Aug 2018 09:44:23 +0200 (CEST)
From: "Olle E. Johansson" <>
Message-Id: <>
Content-Type: multipart/alternative; boundary="Apple-Mail=_F86199A2-5FDC-4C5F-B689-8768368BC2A1"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Subject: Re: statement regarding keepalives
Date: Thu, 16 Aug 2018 09:44:20 +0200
In-Reply-To: <>
Cc: Olle E Johansson <>, Kent Watsen <>, "" <>, "" <>, "" <>, "" <>
To: Mikael Abrahamsson <>
References: <> <> <> <> <> <> <> <>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <>
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Transport and Services Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 16 Aug 2018 07:44:33 -0000

> On 16 Aug 2018, at 09:28, Mikael Abrahamsson <> wrote:
> On Wed, 15 Aug 2018, Kent Watsen wrote:
>> You bring up an interesting point, it goes to the motivation for wanting to do keepalives in the first place.  The text doesn't yet mention maintain flow state as a motivation.
> It's not only to maintain flow state, it's also to close the connection when the network goes down and doesn't work anymore, and "give up" on connections that doesn't work anymore (for some definition of "anymore").
> I have operationally been in the situation where a server/client application was implemented so that the server could only handle 256 connections (some filedescriptor limit). Every time the firewall was rebooted, lost state, the connection hung around forever. So the server administrators had to go in and restart the process to clear these connections, otherwise there were 256 hung connections and no new connections could be established.
> Sometimes the other endpoint goes down, and doesn't come back. We will for instance deploy home gateways probably keeping netconf-call-home sessions to an NMS, and we want them to be around forever, as long as they work. TCP level keepalives would solve this, as if the customer just powers off the device, after a while the session will be cleared. Using TCP keepalives here means you get this kind of behaviour even if the upper-layer application doesn't support it (netconf might have been a bad example here). It's a single socket option to set, so it's very easy to do.
>> From knowing approximately what settings people have in their NAT44 and 
> firewalls etc, I'd say the recommendation should be that keepalives are set to around 60-300 second interval, and then kill the connection if no traffic has passed in 3-5 of these intervals, kill the connection. Otherwise TCP will have backed off so far anyway, that it's probably faster to just re-try the connection instead of waiting for TCP to re-send the packet.
> I have seen so many times in my 20 years working in networking where lack of keepalives have caused all kinds of problems. I wish everybody would turn it on and keep it on.

As more and more connections flow over mobile networks, it seems more and more important, even for flows you did not expect. I have to send keepalives over IPv6 connections - not for NAT as on IPv4. but for middlebox devices that has an interesting approach and attitude towards connection management. ;-)

The SIP Outbound RFC has a lot of reasoning behind keep-alives for connection failover and may be good input here. <>