Re: [Dots] Improve controllability and predictability of keepalives

Carsten Bormann <cabo@tzi.org> Sat, 16 November 2019 14:36 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 281F61201C6 for <dots@ietfa.amsl.com>; Sat, 16 Nov 2019 06:36:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.865
X-Spam-Level:
X-Spam-Status: No, score=-0.865 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jpvXs4UKd9kq for <dots@ietfa.amsl.com>; Sat, 16 Nov 2019 06:34:50 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 05D97120170 for <dots@ietf.org>; Sat, 16 Nov 2019 06:34:34 -0800 (PST)
Received: from [172.16.152.174] (unknown [101.100.166.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 47Fd5g5vZNz1013; Sat, 16 Nov 2019 15:34:31 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <787AE7BB302AE849A7480A190F8B9330313D3421@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
Date: Sat, 16 Nov 2019 22:34:29 +0800
Cc: "dots@ietf.org" <dots@ietf.org>
X-Mao-Original-Outgoing-Id: 595607667.376168-fc42474b70c95a2e263c1275a2a87399
Content-Transfer-Encoding: quoted-printable
Message-Id: <8E353427-9380-44CF-B151-63DB0106BFCA@tzi.org>
References: <787AE7BB302AE849A7480A190F8B9330313624A5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <18CFEB6A-81B8-44A4-B749-DB1689E0B442@tzi.org> <787AE7BB302AE849A7480A190F8B933031362BF5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <787AE7BB302AE849A7480A190F8B9330313D3421@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
To: mohamed.boucadair@orange.com
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/d940IrQyLyZQRgQkVeanudYArXo>
Subject: Re: [Dots] Improve controllability and predictability of keepalives
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Nov 2019 14:36:23 -0000

Thank you.
I still have a few questions, but I think this looks very good.

4.4 says:
           Mitigation requests MUST NOT be delayed	
 	   because of other congestion control checks. 
What might those be?
(You don’t want to be in a situation where you send a ton of mitigation requests that are mostly dropped at the tail of a queue.)

Before Figure 27, it says:

           The PUT request used for DOTS heartbeat MUST NOT have a	
 	   'cuid', 'cdid,' or 'mid' Uri-Path.  Such PUT requests MUST NOT be	
 	   relayed by DOTS gateways.

Why?  (And what should the DOTS gateways do instead?)

Grüße, Carsten


> On Nov 14, 2019, at 18:17, mohamed.boucadair@orange.com wrote:
> 
> Hi Carsten, all, 
> 
> As promised, we updated the draft to take into account your inputs. The candidate version is available at (see section 4.7 in particular): 
> 
> https://github.com/boucadair/draft-ietf-dots-signal-channel/blob/master/draft-ietf-dots-signal-channel-39.txt
> 
> The main changes are:
> * Use PUT to send heartbeat requests
> * Use 2.04 instead of 2.03
> * DOTS agents can negotiate a probing-rate
> * Provide some guideline for setting the probing-rate. 
> 
> Do you have any further comment on the new heartbeat mechanism? Thank you. 
> 
> Cheers,
> Med
> 
>> -----Message d'origine-----
>> De : Dots [mailto:dots-bounces@ietf.org] De la part de
>> mohamed.boucadair@orange.com
>> Envoyé : mardi 12 novembre 2019 17:24
>> À : Carsten Bormann
>> Cc : dots@ietf.org
>> Objet : Re: [Dots] Improve controllability and predictability of keepalives
>> 
>> Re-,
>> 
>> Please see inline.
>> 
>> Cheers,
>> Med
>> 
>>> -----Message d'origine-----
>>> De : Carsten Bormann [mailto:cabo@tzi.org]
>>> Envoyé : mardi 12 novembre 2019 14:12
>>> À : BOUCADAIR Mohamed TGI/OLN
>>> Cc : dots@ietf.org
>>> Objet : Re: Improve controllability and predictability of keepalives
>>> 
>>> Hi Med,
>>> 
>>> what the text below doesn’t say is what kind of information you want to
>>> derive from the heartbeats.
>> 
>> [Med] How to interpret HBs by endpoints is discussed in Section 4.7.
>> 
>> The way they currently (draft -39) are
>>> defined, the client uses a GET (*).  GET is not supposed to influence
>>> application state, so the server will not learn anything from that
>>> heartbeat.  Is the intention that only the client needs to react to
>>> heartbeat failures?
>> 
>> [Med] No, the server needs to react to heartbeat failures. The cases that
>> are discussed in the spec are as follows:
>> 
>> *  If the DOTS server receives traffic from the peer DOTS client but
>> maximum 'missing-hb-
>>   allowed' threshold is reached, the DOTS server MUST NOT consider the
>>   DOTS signal channel session disconnected.  The DOTS server MUST keep
>>   on using the current DOTS signal channel session so that the DOTS
>>   client can send mitigation requests over the current DOTS signal
>>   channel session.  In this case, the DOTS server can identify the DOTS
>>   client is under attack and the inbound link to the DOTS client
>>   (domain) is saturated.
>> 
>> * If the DOTS server does not
>>   receive a mitigation request from the DOTS client, it implies the
>>   DOTS client has not detected the attack or, if an attack mitigation
>>   is in progress, it implies the applied DDoS mitigation actions are
>>   not yet effective to handle the DDoS attack volume
>> 
>> *  If the DOTS server does not receive any traffic from the peer DOTS
>>   client during the time span required to exhaust the maximum 'missing-
>>   hb-allowed' threshold, the DOTS server concludes the session is
>>   disconnected.  The DOTS server can then trigger pre-configured
>>   mitigation requests for this DOTS client (if any).
>> 
>>> 
>>> RFC 7252 defines PROBING_RATE as 1 B/s.  If you get a response within the
>>> heartbeat interval to the non-confirmable requests, that is not relevant.
>>> If you don’t, your heartbeat interval "MUST be chosen in
>>>   such a way that an endpoint does not exceed an average data rate of
>>>   PROBING_RATE in sending to another endpoint that does not respond.
>>> If your interval is intended to be 15 s, that would mean your requests
>> must
>>> be ≤ 15 B, or you need to define PROBING_RATE differently for your
>>> application.
>>> It seems right now you are not trying to be particularly frugal with the
>>> heartbeat message, which is probably OK since most of your networks will
>> be
>>> Ethernet and that will expand the frame size to 64 B anyway.  But that
>>> means that you need to define PROBING_RATE to be ~ 5 B/s if you don’t
>> want
>>> to be slowed down in probing.
>> 
>> [Med] Good point. Will update accordingly.
>> 
>>> 
>>> Grüße, Carsten
>>> 
>>> (*) And expects a 2.03, which would mean that the server confirms the
>> ETag
>>> given in the request.  But there is no ETag in that request, and I really
>>> don’t see why a 2.05 with an empty payload wouldn’t also work.  But maybe
>>> you want to move to POST anyway (so there can be application semantics,
>>> like taking note of the heartbeat, on the server), and 2.04 would fit
>> that
>>> very well (see RFC 7252 Section 5.8.2).
>> 
>> [Med] Will consider the use of POST instead of GET.
>> 
>>> 
>>>> On Nov 12, 2019, at 10:18, <mohamed.boucadair@orange.com>
>>> <mohamed.boucadair@orange.com> wrote:
>>>> 
>>>> Hi Carsten,
>>>> 
>>>> You indicated the following in an offline message (I’m adding dots
>>> mailing list as you were OK; see below):
>>>> 
>>>>> I would have expected using requests sent in non-confirmable
>> messages,
>>>>> requiring more work on the application side (interpreting losses) but
>>> also
>>>>> delivering a more regular, predictable, less noisy signal.
>>>>> A little bit of specification text is then needed to ensure that
>> those
>>>>> requests/responses meet the requirements of RFC 8085 and the related
>>>>> specifications in RFC 7252, and that both sides are in a good
>> position
>>> to
>>>>> interpret the signal they get.
>>>> …
>>>>> If not, I would like to discuss the issue on the DOTS mailing list,
>> and
>>> see
>>>>> whether a small set of changes to the keepalive mechanisms employed
>> can
>>>>> improve its controllability and predictability.
>>>> 
>>>> Assuming that non-confirmable application HBs are used, which changes
>> do
>>> you think are needed to enhance the DOTS mechanism to meet 8085/7252
>>> requirements?
>>>> 
>>>> As a reminder we do have the following for setting the hb parameters:
>>>> 
>>>> ======
>>>>      Note: heartbeat-interval should be tweaked to also assist DOTS
>>>>      messages for NAT traversal (SIG-011 of [RFC8612]).  According to
>>>>      [RFC8085], keepalive messages must not be sent more frequently
>>>>      than once every 15 seconds and should use longer intervals when
>>>>      possible.  Furthermore, [RFC4787] recommends NATs to use a state
>>>>      timeout of 2 minutes or longer, but experience shows that sending
>>>>      packets every 15 to 30 seconds is necessary to prevent the
>>>>      majority of middleboxes from losing state for UDP flows.  From
>>>>      that standpoint, the RECOMMENDED minimum heartbeat-interval is 15
>>>>      seconds and the RECOMMENDED maximum heartbeat-interval is 240
>>>>      seconds.  The recommended value of 30 seconds is selected to
>>>>      anticipate the expiry of NAT state.
>>>> 
>>>>      A heartbeat-interval of 30 seconds may be considered as too
>> chatty
>>>>      in some deployments.  For such deployments, DOTS agents may
>>>>      negotiate longer heartbeat-interval values to prevent any network
>>>>      overload with too frequent keepalives.
>>>> 
>>>>      Different heartbeat intervals can be defined for 'mitigating-
>>>>      config' and 'idle-config' to reduce being too chatty during idle
>>>>      times.  If there is an on-path translator between the DOTS client
>>>>      (standalone or part of a DOTS gateway) and the DOTS server, the
>>>>      'mitigating-config' heartbeat-interval has to be smaller than the
>>>>      translator session timeout.  It is recommended that the 'idle-
>>>>      config' heartbeat-interval is also smaller than the translator
>>>>      session timeout to prevent translator traversal issues, or
>>>>      disabled entirely.  Means to discover the lifetime assigned by a
>>>>      translator are out of scope.
>>>> =======
>>>> 
>>>> Thank you.
>>>> 
>>>> Cheers,
>>>> Med
>> 
>> _______________________________________________
>> Dots mailing list
>> Dots@ietf.org
>> https://www.ietf.org/mailman/listinfo/dots