Re: [Dots] Improve controllability and predictability of keepalives

Hi Carsten,

Please see inline. 

Cheers,
Med

> -----Message d'origine-----
> De : Carsten Bormann [mailto:cabo@tzi.org]
> Envoyé : samedi 16 novembre 2019 15:34
> À : BOUCADAIR Mohamed TGI/OLN
> Cc : dots@ietf.org
> Objet : Re: Improve controllability and predictability of keepalives
> 
> Thank you.
> I still have a few questions, but I think this looks very good.
> 
> 4.4 says:
>            Mitigation requests MUST NOT be delayed
>  	   because of other congestion control checks.
> What might those be?

[Med] A too low probing rate for example as indicated in the sentence right after that excerpt. There is a risk that mitigation requests are delayed (or that heartbeats are not sent which may lead to session liveness checks).   

> (You don’t want to be in a situation where you send a ton of mitigation
> requests that are mostly dropped at the tail of a queue.)
> 

[Med] Yes, we don't that. The authoritative rule is as follows: 

   Requests marked by the DOTS client as Non-confirmable messages are
   sent at regular intervals until a response is received from the DOTS
   server.  If the DOTS client cannot maintain an RTT estimate, it MUST
   NOT send more than one Non-confirmable request every 3 seconds, and
   SHOULD use an even less aggressive rate whenever possible (case 2 in
   Section 3.1.3 of [RFC8085]).  

> Before Figure 27, it says:
> 
>            The PUT request used for DOTS heartbeat MUST NOT have a
>  	   'cuid', 'cdid,' or 'mid' Uri-Path.  Such PUT requests MUST NOT be
> 
>  	   relayed by DOTS gateways.
> 
> Why?  (And what should the DOTS gateways do instead?)

[Med] Session loss detect and recovery is local to an agent and its immediate peer. Heartbeats are thus not relayed. 

We can update as follows: 

"This procedure occurs between a DOTS agent and its immediate peer DOTS agent. As such, this GET request MUST NOT be relayed by a DOTS gateway. The PUT request used for DOTS heartbeat MUST NOT have a 'cuid', 'cdid,' or 'mid' Uri-Path."

> 
> Grüße, Carsten
> 
> 
> > On Nov 14, 2019, at 18:17, mohamed.boucadair@orange.com wrote:
> >
> > Hi Carsten, all,
> >
> > As promised, we updated the draft to take into account your inputs. The
> candidate version is available at (see section 4.7 in particular):
> >
> > https://github.com/boucadair/draft-ietf-dots-signal-
> channel/blob/master/draft-ietf-dots-signal-channel-39.txt
> >
> > The main changes are:
> > * Use PUT to send heartbeat requests
> > * Use 2.04 instead of 2.03
> > * DOTS agents can negotiate a probing-rate
> > * Provide some guideline for setting the probing-rate.
> >
> > Do you have any further comment on the new heartbeat mechanism? Thank
> you.
> >
> > Cheers,
> > Med
> >
> >> -----Message d'origine-----
> >> De : Dots [mailto:dots-bounces@ietf.org] De la part de
> >> mohamed.boucadair@orange.com
> >> Envoyé : mardi 12 novembre 2019 17:24
> >> À : Carsten Bormann
> >> Cc : dots@ietf.org
> >> Objet : Re: [Dots] Improve controllability and predictability of
> keepalives
> >>
> >> Re-,
> >>
> >> Please see inline.
> >>
> >> Cheers,
> >> Med
> >>
> >>> -----Message d'origine-----
> >>> De : Carsten Bormann [mailto:cabo@tzi.org]
> >>> Envoyé : mardi 12 novembre 2019 14:12
> >>> À : BOUCADAIR Mohamed TGI/OLN
> >>> Cc : dots@ietf.org
> >>> Objet : Re: Improve controllability and predictability of keepalives
> >>>
> >>> Hi Med,
> >>>
> >>> what the text below doesn’t say is what kind of information you want to
> >>> derive from the heartbeats.
> >>
> >> [Med] How to interpret HBs by endpoints is discussed in Section 4.7.
> >>
> >> The way they currently (draft -39) are
> >>> defined, the client uses a GET (*).  GET is not supposed to influence
> >>> application state, so the server will not learn anything from that
> >>> heartbeat.  Is the intention that only the client needs to react to
> >>> heartbeat failures?
> >>
> >> [Med] No, the server needs to react to heartbeat failures. The cases
> that
> >> are discussed in the spec are as follows:
> >>
> >> *  If the DOTS server receives traffic from the peer DOTS client but
> >> maximum 'missing-hb-
> >>   allowed' threshold is reached, the DOTS server MUST NOT consider the
> >>   DOTS signal channel session disconnected.  The DOTS server MUST keep
> >>   on using the current DOTS signal channel session so that the DOTS
> >>   client can send mitigation requests over the current DOTS signal
> >>   channel session.  In this case, the DOTS server can identify the DOTS
> >>   client is under attack and the inbound link to the DOTS client
> >>   (domain) is saturated.
> >>
> >> * If the DOTS server does not
> >>   receive a mitigation request from the DOTS client, it implies the
> >>   DOTS client has not detected the attack or, if an attack mitigation
> >>   is in progress, it implies the applied DDoS mitigation actions are
> >>   not yet effective to handle the DDoS attack volume
> >>
> >> *  If the DOTS server does not receive any traffic from the peer DOTS
> >>   client during the time span required to exhaust the maximum 'missing-
> >>   hb-allowed' threshold, the DOTS server concludes the session is
> >>   disconnected.  The DOTS server can then trigger pre-configured
> >>   mitigation requests for this DOTS client (if any).
> >>
> >>>
> >>> RFC 7252 defines PROBING_RATE as 1 B/s.  If you get a response within
> the
> >>> heartbeat interval to the non-confirmable requests, that is not
> relevant.
> >>> If you don’t, your heartbeat interval "MUST be chosen in
> >>>   such a way that an endpoint does not exceed an average data rate of
> >>>   PROBING_RATE in sending to another endpoint that does not respond.
> >>> If your interval is intended to be 15 s, that would mean your requests
> >> must
> >>> be ≤ 15 B, or you need to define PROBING_RATE differently for your
> >>> application.
> >>> It seems right now you are not trying to be particularly frugal with
> the
> >>> heartbeat message, which is probably OK since most of your networks
> will
> >> be
> >>> Ethernet and that will expand the frame size to 64 B anyway.  But that
> >>> means that you need to define PROBING_RATE to be ~ 5 B/s if you don’t
> >> want
> >>> to be slowed down in probing.
> >>
> >> [Med] Good point. Will update accordingly.
> >>
> >>>
> >>> Grüße, Carsten
> >>>
> >>> (*) And expects a 2.03, which would mean that the server confirms the
> >> ETag
> >>> given in the request.  But there is no ETag in that request, and I
> really
> >>> don’t see why a 2.05 with an empty payload wouldn’t also work.  But
> maybe
> >>> you want to move to POST anyway (so there can be application semantics,
> >>> like taking note of the heartbeat, on the server), and 2.04 would fit
> >> that
> >>> very well (see RFC 7252 Section 5.8.2).
> >>
> >> [Med] Will consider the use of POST instead of GET.
> >>
> >>>
> >>>> On Nov 12, 2019, at 10:18, <mohamed.boucadair@orange.com>
> >>> <mohamed.boucadair@orange.com> wrote:
> >>>>
> >>>> Hi Carsten,
> >>>>
> >>>> You indicated the following in an offline message (I’m adding dots
> >>> mailing list as you were OK; see below):
> >>>>
> >>>>> I would have expected using requests sent in non-confirmable
> >> messages,
> >>>>> requiring more work on the application side (interpreting losses) but
> >>> also
> >>>>> delivering a more regular, predictable, less noisy signal.
> >>>>> A little bit of specification text is then needed to ensure that
> >> those
> >>>>> requests/responses meet the requirements of RFC 8085 and the related
> >>>>> specifications in RFC 7252, and that both sides are in a good
> >> position
> >>> to
> >>>>> interpret the signal they get.
> >>>> …
> >>>>> If not, I would like to discuss the issue on the DOTS mailing list,
> >> and
> >>> see
> >>>>> whether a small set of changes to the keepalive mechanisms employed
> >> can
> >>>>> improve its controllability and predictability.
> >>>>
> >>>> Assuming that non-confirmable application HBs are used, which changes
> >> do
> >>> you think are needed to enhance the DOTS mechanism to meet 8085/7252
> >>> requirements?
> >>>>
> >>>> As a reminder we do have the following for setting the hb parameters:
> >>>>
> >>>> ======
> >>>>      Note: heartbeat-interval should be tweaked to also assist DOTS
> >>>>      messages for NAT traversal (SIG-011 of [RFC8612]).  According to
> >>>>      [RFC8085], keepalive messages must not be sent more frequently
> >>>>      than once every 15 seconds and should use longer intervals when
> >>>>      possible.  Furthermore, [RFC4787] recommends NATs to use a state
> >>>>      timeout of 2 minutes or longer, but experience shows that sending
> >>>>      packets every 15 to 30 seconds is necessary to prevent the
> >>>>      majority of middleboxes from losing state for UDP flows.  From
> >>>>      that standpoint, the RECOMMENDED minimum heartbeat-interval is 15
> >>>>      seconds and the RECOMMENDED maximum heartbeat-interval is 240
> >>>>      seconds.  The recommended value of 30 seconds is selected to
> >>>>      anticipate the expiry of NAT state.
> >>>>
> >>>>      A heartbeat-interval of 30 seconds may be considered as too
> >> chatty
> >>>>      in some deployments.  For such deployments, DOTS agents may
> >>>>      negotiate longer heartbeat-interval values to prevent any network
> >>>>      overload with too frequent keepalives.
> >>>>
> >>>>      Different heartbeat intervals can be defined for 'mitigating-
> >>>>      config' and 'idle-config' to reduce being too chatty during idle
> >>>>      times.  If there is an on-path translator between the DOTS client
> >>>>      (standalone or part of a DOTS gateway) and the DOTS server, the
> >>>>      'mitigating-config' heartbeat-interval has to be smaller than the
> >>>>      translator session timeout.  It is recommended that the 'idle-
> >>>>      config' heartbeat-interval is also smaller than the translator
> >>>>      session timeout to prevent translator traversal issues, or
> >>>>      disabled entirely.  Means to discover the lifetime assigned by a
> >>>>      translator are out of scope.
> >>>> =======
> >>>>
> >>>> Thank you.
> >>>>
> >>>> Cheers,
> >>>> Med
> >>
> >> _______________________________________________
> >> Dots mailing list
> >> Dots@ietf.org
> >> https://www.ietf.org/mailman/listinfo/dots