Re: [Dots] Improve controllability and predictability of keepalives

<mohamed.boucadair@orange.com> Tue, 12 November 2019 16:24 UTC

From: mohamed.boucadair@orange.com
To: Carsten Bormann <cabo@tzi.org>
CC: "dots@ietf.org" <dots@ietf.org>
Thread-Topic: Improve controllability and predictability of keepalives
Thread-Index: AdWZOjG0GmH8bbOEScaak/2iE6U5+QAGCIeAAAPLSJA=
Date: Tue, 12 Nov 2019 16:24:04 +0000
Message-ID: <787AE7BB302AE849A7480A190F8B933031362BF5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
References: <787AE7BB302AE849A7480A190F8B9330313624A5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <18CFEB6A-81B8-44A4-B749-DB1689E0B442@tzi.org>
In-Reply-To: <18CFEB6A-81B8-44A4-B749-DB1689E0B442@tzi.org>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/RriJeAHTmSZAzjMLtsctMQhm1eg>
Subject: Re: [Dots] Improve controllability and predictability of keepalives
Precedence: list

Re-,

Please see inline. 

Cheers,
Med

> -----Message d'origine-----
> De : Carsten Bormann [mailto:cabo@tzi.org]
> Envoyé : mardi 12 novembre 2019 14:12
> À : BOUCADAIR Mohamed TGI/OLN
> Cc : dots@ietf.org
> Objet : Re: Improve controllability and predictability of keepalives
> 
> Hi Med,
> 
> what the text below doesn’t say is what kind of information you want to
> derive from the heartbeats. 

[Med] How to interpret HBs by endpoints is discussed in Section 4.7. 

 The way they currently (draft -39) are
> defined, the client uses a GET (*).  GET is not supposed to influence
> application state, so the server will not learn anything from that
> heartbeat.  Is the intention that only the client needs to react to
> heartbeat failures?

[Med] No, the server needs to react to heartbeat failures. The cases that are discussed in the spec are as follows: 

*  If the DOTS server receives traffic from the peer DOTS client but maximum 'missing-hb-
   allowed' threshold is reached, the DOTS server MUST NOT consider the
   DOTS signal channel session disconnected.  The DOTS server MUST keep
   on using the current DOTS signal channel session so that the DOTS
   client can send mitigation requests over the current DOTS signal
   channel session.  In this case, the DOTS server can identify the DOTS
   client is under attack and the inbound link to the DOTS client
   (domain) is saturated.

* If the DOTS server does not
   receive a mitigation request from the DOTS client, it implies the
   DOTS client has not detected the attack or, if an attack mitigation
   is in progress, it implies the applied DDoS mitigation actions are
   not yet effective to handle the DDoS attack volume

*  If the DOTS server does not receive any traffic from the peer DOTS
   client during the time span required to exhaust the maximum 'missing-
   hb-allowed' threshold, the DOTS server concludes the session is
   disconnected.  The DOTS server can then trigger pre-configured
   mitigation requests for this DOTS client (if any).

> 
> RFC 7252 defines PROBING_RATE as 1 B/s.  If you get a response within the
> heartbeat interval to the non-confirmable requests, that is not relevant.
> If you don’t, your heartbeat interval "MUST be chosen in
>    such a way that an endpoint does not exceed an average data rate of
>    PROBING_RATE in sending to another endpoint that does not respond.
> If your interval is intended to be 15 s, that would mean your requests must
> be ≤ 15 B, or you need to define PROBING_RATE differently for your
> application.
> It seems right now you are not trying to be particularly frugal with the
> heartbeat message, which is probably OK since most of your networks will be
> Ethernet and that will expand the frame size to 64 B anyway.  But that
> means that you need to define PROBING_RATE to be ~ 5 B/s if you don’t want
> to be slowed down in probing.

[Med] Good point. Will update accordingly. 

> 
> Grüße, Carsten
> 
> (*) And expects a 2.03, which would mean that the server confirms the ETag
> given in the request.  But there is no ETag in that request, and I really
> don’t see why a 2.05 with an empty payload wouldn’t also work.  But maybe
> you want to move to POST anyway (so there can be application semantics,
> like taking note of the heartbeat, on the server), and 2.04 would fit that
> very well (see RFC 7252 Section 5.8.2).

[Med] Will consider the use of POST instead of GET. 

> 
> > On Nov 12, 2019, at 10:18, <mohamed.boucadair@orange.com>
> <mohamed.boucadair@orange.com> wrote:
> >
> > Hi Carsten,
> >
> > You indicated the following in an offline message (I’m adding dots
> mailing list as you were OK; see below):
> >
> > > I would have expected using requests sent in non-confirmable messages,
> > > requiring more work on the application side (interpreting losses) but
> also
> > > delivering a more regular, predictable, less noisy signal.
> > > A little bit of specification text is then needed to ensure that those
> > > requests/responses meet the requirements of RFC 8085 and the related
> > > specifications in RFC 7252, and that both sides are in a good position
> to
> > > interpret the signal they get.
> > …
> > > If not, I would like to discuss the issue on the DOTS mailing list, and
> see
> > > whether a small set of changes to the keepalive mechanisms employed can
> > > improve its controllability and predictability.
> >
> > Assuming that non-confirmable application HBs are used, which changes do
> you think are needed to enhance the DOTS mechanism to meet 8085/7252
> requirements?
> >
> > As a reminder we do have the following for setting the hb parameters:
> >
> > ======
> >       Note: heartbeat-interval should be tweaked to also assist DOTS
> >       messages for NAT traversal (SIG-011 of [RFC8612]).  According to
> >       [RFC8085], keepalive messages must not be sent more frequently
> >       than once every 15 seconds and should use longer intervals when
> >       possible.  Furthermore, [RFC4787] recommends NATs to use a state
> >       timeout of 2 minutes or longer, but experience shows that sending
> >       packets every 15 to 30 seconds is necessary to prevent the
> >       majority of middleboxes from losing state for UDP flows.  From
> >       that standpoint, the RECOMMENDED minimum heartbeat-interval is 15
> >       seconds and the RECOMMENDED maximum heartbeat-interval is 240
> >       seconds.  The recommended value of 30 seconds is selected to
> >       anticipate the expiry of NAT state.
> >
> >       A heartbeat-interval of 30 seconds may be considered as too chatty
> >       in some deployments.  For such deployments, DOTS agents may
> >       negotiate longer heartbeat-interval values to prevent any network
> >       overload with too frequent keepalives.
> >
> >       Different heartbeat intervals can be defined for 'mitigating-
> >       config' and 'idle-config' to reduce being too chatty during idle
> >       times.  If there is an on-path translator between the DOTS client
> >       (standalone or part of a DOTS gateway) and the DOTS server, the
> >       'mitigating-config' heartbeat-interval has to be smaller than the
> >       translator session timeout.  It is recommended that the 'idle-
> >       config' heartbeat-interval is also smaller than the translator
> >       session timeout to prevent translator traversal issues, or
> >       disabled entirely.  Means to discover the lifetime assigned by a
> >       translator are out of scope.
> > =======
> >
> > Thank you.
> >
> > Cheers,
> > Med

[Dots] Improve controllability and predictability… mohamed.boucadair
Re: [Dots] Improve controllability and predictabi… Carsten Bormann
Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
Re: [Dots] Improve controllability and predictabi… Carsten Bormann
Re: [Dots] Improve controllability and predictabi… Carsten Bormann
Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
Re: [Dots] Improve controllability and predictabi… Carsten Bormann
Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
Re: [Dots] Improve controllability and predictabi… Carsten Bormann
Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
Re: [Dots] Improve controllability and predictabi… Valery Smyslov