Re: statement regarding keepalives

Mikael Abrahamsson <swmike@swm.pp.se> Thu, 16 August 2018 07:28 UTC

Return-Path: <swmike@swm.pp.se>
X-Original-To: tsv-area@ietfa.amsl.com
Delivered-To: tsv-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 39177130EF5; Thu, 16 Aug 2018 00:28:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.301
X-Spam-Level:
X-Spam-Status: No, score=-4.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=swm.pp.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3ZuFTmpJAWTG; Thu, 16 Aug 2018 00:28:05 -0700 (PDT)
Received: from uplift.swm.pp.se (ipv6.swm.pp.se [IPv6:2a00:801::f]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A64181274D0; Thu, 16 Aug 2018 00:28:05 -0700 (PDT)
Received: by uplift.swm.pp.se (Postfix, from userid 501) id 647BEAF; Thu, 16 Aug 2018 09:28:03 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=swm.pp.se; s=mail; t=1534404483; bh=jm/Q1/DrEQflQEDZVdlJGPQmPJOjraROVqonY4L4UKk=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=zoGbAxbh1lci7qddpvbiHbo3d6FtF5STrYT/ejLu9C2R0mSuf9XBpCBC6iBvFsnZe jNkNTYnHpx/pL3nGkUylEk07x1rnEnx1ptDIxtKb6yofqEMPsWiZmBjqGpqDRPCMGz FIY0wyK3MxQynrD3ilQUDVDqjuyFXNwgT9t6YJn0=
Received: from localhost (localhost [127.0.0.1]) by uplift.swm.pp.se (Postfix) with ESMTP id 608B69F; Thu, 16 Aug 2018 09:28:03 +0200 (CEST)
Date: Thu, 16 Aug 2018 09:28:03 +0200 (CEST)
From: Mikael Abrahamsson <swmike@swm.pp.se>
To: Kent Watsen <kwatsen@juniper.net>
cc: Tom Herbert <tom@herbertland.com>, "tsv-area@ietf.org" <tsv-area@ietf.org>, "netconf-chairs@ietf.org" <netconf-chairs@ietf.org>, "tls-ads@ietf.org" <tls-ads@ietf.org>, "tsvwg-ads@tools.ietf.org" <tsvwg-ads@tools.ietf.org>
Subject: Re: statement regarding keepalives
In-Reply-To: <513E9F0D-CFAD-4009-8F86-289D9DC55A79@juniper.net>
Message-ID: <alpine.DEB.2.20.1808160919260.19688@uplift.swm.pp.se>
References: <D3326DE0-3F31-4045-B945-82B3F417BE4B@juniper.net> <alpine.DEB.2.20.1807201340240.14354@uplift.swm.pp.se> <B50DC954-CBB6-41C5-BE3A-F1DECD6046A5@juniper.net> <717202c9c6c6b3d083bfa4c8a9925e45@strayalpha.com> <6377766E-9A03-41BA-A4D4-8796F46278BD@juniper.net> <CALx6S34+rG_rx+79=iaeu5YT4pYUWRqAym6S_CNzJq9-a40Yvw@mail.gmail.com> <513E9F0D-CFAD-4009-8F86-289D9DC55A79@juniper.net>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
Organization: People's Front Against WWW
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-area/EX-Tz3Cc8kNpZz4Smkx11tf9OhI>
X-BeenThere: tsv-area@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Transport and Services Area Mailing List <tsv-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-area/>
List-Post: <mailto:tsv-area@ietf.org>
List-Help: <mailto:tsv-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Aug 2018 07:28:08 -0000

On Wed, 15 Aug 2018, Kent Watsen wrote:

> You bring up an interesting point, it goes to the motivation for wanting 
> to do keepalives in the first place.  The text doesn't yet mention 
> maintain flow state as a motivation.

It's not only to maintain flow state, it's also to close the connection 
when the network goes down and doesn't work anymore, and "give up" on 
connections that doesn't work anymore (for some definition of "anymore").

I have operationally been in the situation where a server/client 
application was implemented so that the server could only handle 256 
connections (some filedescriptor limit). Every time the firewall was 
rebooted, lost state, the connection hung around forever. So the server 
administrators had to go in and restart the process to clear these 
connections, otherwise there were 256 hung connections and no new 
connections could be established.

Sometimes the other endpoint goes down, and doesn't come back. We will for 
instance deploy home gateways probably keeping netconf-call-home sessions 
to an NMS, and we want them to be around forever, as long as they work. 
TCP level keepalives would solve this, as if the customer just powers off 
the device, after a while the session will be cleared. Using TCP 
keepalives here means you get this kind of behaviour even if the 
upper-layer application doesn't support it (netconf might have been a bad 
example here). It's a single socket option to set, so it's very easy to 
do.

>From knowing approximately what settings people have in their NAT44 and 
firewalls etc, I'd say the recommendation should be that keepalives are 
set to around 60-300 second interval, and then kill the connection if no 
traffic has passed in 3-5 of these intervals, kill the connection. 
Otherwise TCP will have backed off so far anyway, that it's probably 
faster to just re-try the connection instead of waiting for TCP to re-send 
the packet.

I have seen so many times in my 20 years working in networking where lack 
of keepalives have caused all kinds of problems. I wish everybody would 
turn it on and keep it on.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se