Re: statement regarding keepalives

Tom Herbert <tom@herbertland.com> Wed, 15 August 2018 18:45 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: tsv-area@ietfa.amsl.com
Delivered-To: tsv-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ED9B8130DF2 for <tsv-area@ietfa.amsl.com>; Wed, 15 Aug 2018 11:45:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WgG5RuPeLfgY for <tsv-area@ietfa.amsl.com>; Wed, 15 Aug 2018 11:45:09 -0700 (PDT)
Received: from mail-qt0-x22a.google.com (mail-qt0-x22a.google.com [IPv6:2607:f8b0:400d:c0d::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1F2F01294D7 for <tsv-area@ietf.org>; Wed, 15 Aug 2018 11:45:09 -0700 (PDT)
Received: by mail-qt0-x22a.google.com with SMTP id y5-v6so2284596qti.12 for <tsv-area@ietf.org>; Wed, 15 Aug 2018 11:45:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=/IFhZhEk7ylUNJyYaz9okOGfFbgErLtQ3HIiW5xdKNY=; b=D5zR7EkntyR1w4qXX/NHh54kgDfr5WbowlfTnq+1zmuvwGXtuEKisAANFTTZkcrFA2 OE6ImRG9qRKNtRd7DVxxud2kkun4NKixCj8PdBKzd9+sb08tbvF+fgTVsk1eg64r87h2 VL1Zl8iBiVavsn0NEQ7drmuFdf/xrFygzSqH4WrZWNaEQ3GSs9EGojMhMoybijxMV6+U wOfPytPRWVUX6kGOwR6m+wnM+koxS6QKHr/nEG4e74wBXpJPQcPTCFtNRBf5AXAucbBk 2fDAodISP/O2V4H4T4SCyclTvSqbpQ8BnVFfUNmWERa9Kv+fSMogbomMNat7emU4Hnhc Zr6A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=/IFhZhEk7ylUNJyYaz9okOGfFbgErLtQ3HIiW5xdKNY=; b=iQImdOQAQiXhUoUPLctD/UlOnWpHzD4a1ek0ZfIQ7YUWHs+TLo8FWyEn7kgBh7SkxX nGc8xoxjrwvGbMZrHzqkWMtlCvp2fqS+lwBwoM8fRE8Cf1mmQhvZEvGJs0kXyEO7Mgkr aRBjvYVC9djVaMD1npqFkmp+QYSfucXsHS614tJlj85B+tmdrxMGKNSJ5Vcazp91lNq2 N8MwiAfvbMxYSSC93BJr21EBGzrCbEYF2t4RxD5AyaguzrPCyEPJWdsY4hAlp1N6gLAO KI03kUT9XmwLxIpZN4kB6h//xnR5tNxaOyDHQw6V43cyE/4wRt4XnCS2d+i2Lyw6Uf06 r2nA==
X-Gm-Message-State: AOUpUlHNlHN9mOWFrGCPMpKNejiOFTePFH5vI15zveo7Uie0yHoEF5LT pg4yFdWDOZ75FdNgeVkJjokE2h1ZM5M7x8K/aa0nOCv0
X-Google-Smtp-Source: AA+uWPwIYaefPYbj7xDSTOmGx6OISD9mRJUtPtB6Xv4xBzGk55oeKLmRFKJ5BVBwj+EkLqpXFKcNvfE2OyE6QdwUbb4=
X-Received: by 2002:a0c:a281:: with SMTP id g1-v6mr24145969qva.60.1534358707995; Wed, 15 Aug 2018 11:45:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:ac8:3304:0:0:0:0:0 with HTTP; Wed, 15 Aug 2018 11:45:07 -0700 (PDT)
In-Reply-To: <6377766E-9A03-41BA-A4D4-8796F46278BD@juniper.net>
References: <D3326DE0-3F31-4045-B945-82B3F417BE4B@juniper.net> <alpine.DEB.2.20.1807201340240.14354@uplift.swm.pp.se> <B50DC954-CBB6-41C5-BE3A-F1DECD6046A5@juniper.net> <717202c9c6c6b3d083bfa4c8a9925e45@strayalpha.com> <6377766E-9A03-41BA-A4D4-8796F46278BD@juniper.net>
From: Tom Herbert <tom@herbertland.com>
Date: Wed, 15 Aug 2018 11:45:07 -0700
Message-ID: <CALx6S34+rG_rx+79=iaeu5YT4pYUWRqAym6S_CNzJq9-a40Yvw@mail.gmail.com>
Subject: Re: statement regarding keepalives
To: Kent Watsen <kwatsen@juniper.net>
Cc: Joe Touch <touch@strayalpha.com>, "netconf-chairs@ietf.org" <netconf-chairs@ietf.org>, "tsv-area@ietf.org" <tsv-area@ietf.org>, "tsvwg-ads@tools.ietf.org" <tsvwg-ads@tools.ietf.org>, "tls-ads@ietf.org" <tls-ads@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-area/ZM3bC09PofJQ6u6_RXjgepFvjNk>
X-BeenThere: tsv-area@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Transport and Services Area Mailing List <tsv-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-area/>
List-Post: <mailto:tsv-area@ietf.org>
List-Help: <mailto:tsv-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Aug 2018 18:45:21 -0000

On Wed, Aug 15, 2018 at 10:35 AM, Kent Watsen <kwatsen@juniper.net> wrote:
>
> Below is an updated version of some text that we might roll into
> a statement or an I-D of some sort.  Kindly review and provide
> suggestions for improvement, or support for the text as is, if
> that is the case.  ;)
>
> This update accommodates comments from:
>   - Wesley Eddy & David Black
>      - removed "layers of functionality" verbiage
>      - moved footnote into the body of the document (this had
>        a cascading effect, and why it looks so different now)
>   - Joe Touch
>      - keepalives should occur at *all* layers that benefit
>      - keepalives at a layer should be suppressed in the
>        presence of sufficient traffic from higher layers
>      - keepalives at a layer should not be interpreted as
>        implying state at any other layer
>
> This update does not accommodate comments from:
>   - Michael Abrahamsson & Tom Herbert
>      - no statement added to promote TCP keepalives
>         * note: I believe this to be unnecessary because
>           the current text doesn't ever say to not use TCP.
>      - no statement added for tuning params (e.g., timeouts).
>         * note: we could add this, but it will increase the
>           scope of the document - do we want to do this?
>
> Cheers!
> Kent
>
>
> ===== START =====
>
> # Connection Strategies for Long-lived Connections
>
> A networked device may have an ongoing need to interact with a remote
> device. Sometimes the need arises from wanting to push data to the
> remote device, and sometimes the need arises from wanting to check if
> there is any data the remote device may have pending to deliver to
> it.
>
> There are two fundamental network connection strategies that can be
> used to accomplish this goal: 1) a single long-lived connection and
> 2) a sequence of short-lived connections.
>
> A single long-lived connection is most common, as it is
> straightforward to implement and directly answers the question of
> if the "connection" is established. However, long-lived connections
> require more system resources, which may affect scalability, and
> require the initiator of the connection to periodically test the
> aliveness of the remote device, discussed further in the next
> section.
>
> A sequence of short-lived connections is less common, as there is an
> additional implementation effort, as well as concerns such as: 1) the
> delay of the remote device needing to wait until the connection is
> reestablished in order to deliver pending data, and 2) the additional
> latency incurred from starting new connections, especially when
> cryptology is involved. However, short-lived connections do not
> require keepalives and are arguably more secure, as each device is
> forced to re-authenticate the other and reload all related
> access-control policies on each connection.
>
> For networking sessions that are primarily quiet, and the use case
> can cope with the additional latency of waiting for and starting new
> connections, it is RECOMMENDED to use a sequence of short-lived
> connections, instead of maintaining a single long-lived connection
> using aliveness checks.
>
>
> # Keepalives for Persistent Connections
>
> When the initiator of a networking session needs to maintain a
> long-lived connection, it is necessary for it to periodically test
> the aliveness of the remote device. In such cases, it is RECOMMENDED
> that the aliveness check happens at the highest protocol layer
> possible that is meaningful to the application, in order to maximize
> the depth of the aliveness check.
>
> For example, for an HTTPS connection to a simple webserver,
> HTTP-level keepalives would test more layers of functionality than
> TLS-level keepalives. However, for a webserver that is accessed via a
> load-balancer that terminates TLS connections, TLS-level aliveness
> checks may be the most meaningful check that can be performed.
>
> More generally, it is RECOMMENDED that applications be able to
> perform the aliveness checks at all protocol levels that benefit, but
> suppress the aliveness checks at lower protocol layers from occurring
> when there is sufficient activity at higher protocol layers.
> Keepalives at a layer SHOULD NOT be interpreted as implying state at
> any other layer.
>
> In order to ensure aliveness checks can occur at any given protocol
> layer, it is RECOMMENDED that protocol designers always include an
> aliveness check mechanism in the protocol and, for client/server
> protocols, that the aliveness check can be initiated from either
> device, as sometimes the "server" is the initiator of the underlying
> networking connection (e.g., RFC 8071).
>
> Some protocol stacks have a secure transport protocol layer (e.g.,
> TLS, SSH, DTLS) that sits on top of a cleartext protocol layer (e.g.,
> TCP, UDP). In such cases, it is RECOMMENDED that the aliveness check
> occurs within protection envelope afforded by the secure transport
> protocol layer; the aliveness checks SHOULD NOT occur via the
> underlying cleartext protocol layer, as an adversary can block
> aliveness check messages in either direction and send fake aliveness
> check messages in either direction.
>
I think the statement is missing a primary purpose of keepalives,
maybe the most important one, which to maintain flow state in NAT and
firewalls and prevent eviction by timeout or LRU.

Also, any meaningful discussion or statement about keepalives should
include considerations on the frequency of keepalives and their cost.

Keepalives themselves carry no meaningful end user data, they are
purely management overhead. The higher the frequency of keepalives,
the higher the overhead and hence the more network resources they
consume. At some point they can become a source of congestion,
especially when keepalive timers become synchronized across a network
as I previously pointed out. Unfortunately, there is no standard for
how NAT state eviction is done and no standard NAT timeout, so the
frequency of keepalives to prevent NAT state eviction is probably
higher than it should be (hence more network overhead).

In terms of cost, consider the effects of waking up the transmitter on
a smart phone periodically just for the purpose of keeping connections
up. With a high enough frequency this will drain the battery quickly.
In fact, one of the touted benefits of IPv6 was supposed to be that
NAT isn't present so there is no need for periodic keepalives to
maintain NAT state and hence this would conserve power on mobile
devices. Use of keepalives in power constrained devices is a real
issue.

Tom

>