Re: [Dots] Improve controllability and predictability of keepalives

Carsten Bormann <> Tue, 12 November 2019 13:11 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 6CB121200F3 for <>; Tue, 12 Nov 2019 05:11:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id g-rz4-MJOxPX for <>; Tue, 12 Nov 2019 05:11:26 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 43E5F1200DB for <>; Tue, 12 Nov 2019 05:11:26 -0800 (PST)
Received: from [] ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 47C7Rc5BWzzyW7; Tue, 12 Nov 2019 14:11:24 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <>
In-Reply-To: <787AE7BB302AE849A7480A190F8B9330313624A5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
Date: Tue, 12 Nov 2019 14:11:35 +0100
Cc: "" <>
X-Mao-Original-Outgoing-Id: 595257090.653393-fa04c6d9cd8330638e37944fcb9b4938
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <787AE7BB302AE849A7480A190F8B9330313624A5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <>
Subject: Re: [Dots] Improve controllability and predictability of keepalives
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 12 Nov 2019 13:11:28 -0000

Hi Med,

what the text below doesn’t say is what kind of information you want to derive from the heartbeats.  The way they currently (draft -39) are defined, the client uses a GET (*).  GET is not supposed to influence application state, so the server will not learn anything from that heartbeat.  Is the intention that only the client needs to react to heartbeat failures?

RFC 7252 defines PROBING_RATE as 1 B/s.  If you get a response within the heartbeat interval to the non-confirmable requests, that is not relevant.  If you don’t, your heartbeat interval "MUST be chosen in
   such a way that an endpoint does not exceed an average data rate of
   PROBING_RATE in sending to another endpoint that does not respond.
If your interval is intended to be 15 s, that would mean your requests must be ≤ 15 B, or you need to define PROBING_RATE differently for your application.
It seems right now you are not trying to be particularly frugal with the heartbeat message, which is probably OK since most of your networks will be Ethernet and that will expand the frame size to 64 B anyway.  But that means that you need to define PROBING_RATE to be ~ 5 B/s if you don’t want to be slowed down in probing.

Grüße, Carsten

(*) And expects a 2.03, which would mean that the server confirms the ETag given in the request.  But there is no ETag in that request, and I really don’t see why a 2.05 with an empty payload wouldn’t also work.  But maybe you want to move to POST anyway (so there can be application semantics, like taking note of the heartbeat, on the server), and 2.04 would fit that very well (see RFC 7252 Section 5.8.2).

> On Nov 12, 2019, at 10:18, <> <> wrote:
> Hi Carsten,
> You indicated the following in an offline message (I’m adding dots mailing list as you were OK; see below):
> > I would have expected using requests sent in non-confirmable messages,
> > requiring more work on the application side (interpreting losses) but also
> > delivering a more regular, predictable, less noisy signal.
> > A little bit of specification text is then needed to ensure that those
> > requests/responses meet the requirements of RFC 8085 and the related
> > specifications in RFC 7252, and that both sides are in a good position to
> > interpret the signal they get.
> …
> > If not, I would like to discuss the issue on the DOTS mailing list, and see
> > whether a small set of changes to the keepalive mechanisms employed can
> > improve its controllability and predictability.
> Assuming that non-confirmable application HBs are used, which changes do you think are needed to enhance the DOTS mechanism to meet 8085/7252 requirements?
> As a reminder we do have the following for setting the hb parameters:
> ======
>       Note: heartbeat-interval should be tweaked to also assist DOTS
>       messages for NAT traversal (SIG-011 of [RFC8612]).  According to
>       [RFC8085], keepalive messages must not be sent more frequently
>       than once every 15 seconds and should use longer intervals when
>       possible.  Furthermore, [RFC4787] recommends NATs to use a state
>       timeout of 2 minutes or longer, but experience shows that sending
>       packets every 15 to 30 seconds is necessary to prevent the
>       majority of middleboxes from losing state for UDP flows.  From
>       that standpoint, the RECOMMENDED minimum heartbeat-interval is 15
>       seconds and the RECOMMENDED maximum heartbeat-interval is 240
>       seconds.  The recommended value of 30 seconds is selected to
>       anticipate the expiry of NAT state.
>       A heartbeat-interval of 30 seconds may be considered as too chatty
>       in some deployments.  For such deployments, DOTS agents may
>       negotiate longer heartbeat-interval values to prevent any network
>       overload with too frequent keepalives.
>       Different heartbeat intervals can be defined for 'mitigating-
>       config' and 'idle-config' to reduce being too chatty during idle
>       times.  If there is an on-path translator between the DOTS client
>       (standalone or part of a DOTS gateway) and the DOTS server, the
>       'mitigating-config' heartbeat-interval has to be smaller than the
>       translator session timeout.  It is recommended that the 'idle-
>       config' heartbeat-interval is also smaller than the translator
>       session timeout to prevent translator traversal issues, or
>       disabled entirely.  Means to discover the lifetime assigned by a
>       translator are out of scope.
> =======
> Thank you.
> Cheers,
> Med