Re: [Dots] Behavior when keep-alives fail (RE: Mirja Kühlewind's Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)

Mirja Kuehlewind <> Thu, 18 July 2019 15:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 118351203DE; Thu, 18 Jul 2019 08:24:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id bivxiDBwTCVu; Thu, 18 Jul 2019 08:24:54 -0700 (PDT)
Received: from ( [IPv6:2a01:488:42:1000:50ed:8223::]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 86A96120172; Thu, 18 Jul 2019 08:24:54 -0700 (PDT)
Received: from ([2001:16b8:2c88:6c00:5093:af0e:85c0:19b4]); authenticated by running ExIM with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) id 1ho8H4-0003cl-9R; Thu, 18 Jul 2019 17:24:50 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Mirja Kuehlewind <>
In-Reply-To: <787AE7BB302AE849A7480A190F8B93302F643236@OPEXCNORMAE.corporate.adroot.infra.ftgroup>
Date: Thu, 18 Jul 2019 17:24:49 +0200
Cc: "" <>, Benjamin Kaduk <>, "" <>, "" <>, The IESG <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <787AE7BB302AE849A7480A190F8B93302EAB2ABA@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <> <787AE7BB302AE849A7480A190F8B93302EAB2CC5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <> <787AE7BB302AE849A7480A190F8B93302EAB2F77@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <06e901d531a8$38f38d90$aadaa8b0$> <> <072901d531d3$bdd39c00$397ad400$> <> <> <787AE7BB302AE849A7480A190F8B93302F643236@OPEXCNORMAE.corporate.adroot.infra.ftgroup>
X-Mailer: Apple Mail (2.3445.104.11)
X-HE-SMSGID: 1ho8H4-0003cl-9R
Archived-At: <>
Subject: Re: [Dots] =?utf-8?q?Behavior_when_keep-alives_fail_=28RE=3A___Mirja?= =?utf-8?q?_K=C3=BChlewind=27s_Discuss_on_draft-ietf-dots-signal-channel-3?= =?utf-8?q?1=3A_=28with_DISCUSS_and_COMMENT=29?=
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 18 Jul 2019 15:24:57 -0000

Hi Med,

Thanks for this nice summary. Please see inline.

> On 17. Jul 2019, at 14:15, wrote:
> Hi Mirja,
> Glad to see that we are progressing. With regards to the remaining issue, let's recap on the intended functionality and general observations.
> * What is the intended functionality?
> (1) What we need is a check to assess the liveness of the peer DOTS agent + maintain NAT/FW state on on-path devices. 
> (2) CoAP does not impose how CoAP Ping messages should be sent, how their response (or lack of) should be interpreted, etc. This is left to the application.
> (3) In a DDoS context, failure to receive heartbeats ** should not ** be systematically interpreted as the session is defunct. To that aim, a specific behavior is followed at the DOTS layer (not CoAP layer). The DOTS layer instructs the CoAP layer (DOTS session, and mitigation operations) accordingly.   
> * Why CoAP Pings?
> CoAP Pings naturally achieve (1). These messages are interesting because they have small message sizes. CoAP Ping uses Empty Confirmable message (no request) and the response is a Reset message (which is empty).
> CoAP Ping can be used ** as-is ** during idle time (no attack).

Agreed to hear.

> Multiple Ping failures and Ping exchange in both directions is required only during the attack time to really detect if the peer is dead or alive (because of (3)).

This is understood but that’s not what ping is designed for.

> This is aligned with (2).
> * Why not DOTS-specific messages?
> If we have to define such message, it needs to be the smallest footprint as practical.

I think I don’t quite understand this requirement. But let’s go on...

> An Empty message is inexpensive, nevertheless empty Non-confirmable messages are not allowed (See rfc7252#section-4.3) and sending the Reset message is optional. 
> Furthermore, if heartbeats were non-confirmable that means that we would need to get the application layer to do any recovery work.

It’s not really recovery work, it’s how ping usually works. But yes, the application would need to care about anything like this here, but actually as you want to send frequently messages even if no confirmation is received, you actually don’t want to send them reliably and there it’s not really a retransmission. It’s rather sending a new message at a defined time frame.

> It is much simpler to make them confirmable and delegate the recovery work to CoAP layer.

I you want a mechanism that checks both side of the connectivity using a non-confirmable message with ping and pong at the application makes more sense and actually has a “smaller footprint” because you only send one message in each direction instead of an empty message + ack in both.

> So, even if we did it in the DOTS layer it will be the same as "CoAP ping”.

It’s not because it will cover both directions and you can directly indicate if a pong was received or not. I think that’s a functionality that your are actually looking for and is not provided with the lower layer ping.

> The WG did not see a reason to re-do the same in the application layer.

It’s usual to have keep-alive mechanism at multiple layers to cover different needs. Naturally if you implement a keep-alive at a higher layer (which send messages often enough), you can enable also keep-alive as the lower layer(s), however, which will normally not become active as you have messages on the higher layer before the idle timer expires. Therefore it would be easy do design a more dedicated mechanism for the DDoS case and use COAP pings for the “normal” case to address your function (1).

> In DOTS, the CoAP ping is initiated by the application (DOTS layer itself) and CoAP layer only handles the retransmissions. The proposed mechanism does not require an update to the CoAP stack other with the assumption that callback handlers are supported to give visibility of when a ping or pong was received by the CoAP layer in the application layer.

Which probably is an update to any COAP implementation. And usually something that should not be exposed because in this case it’s solely a lower layer function. The app should activate keep-alive, configure the idle time and get notified when the connection has been detected to be not alive anymore.

> As such, the current design is pragmatic. 
> The DOTS layer has the ** full control required for the intended functionality **.
> * Why separating heartbeats parameters from CoAP transmission ones?
> We need to allow for a differentiated handling of heartbeats during idle/mitigation times. To that aim, we defined specific heartbeat parameters for each of these times without touching the CoAP connection parameters.

If you define a separate parameter that needs to be implemented and handled, you can also implement an own mechanism for the DDoS case.

> All the message transmission parameters including missing-hb-allowed are configurable using the DOTS signal channel (see draft-ietf-dots-signal-channel-35#section-4.5) and these message transmission parameter including the missing-hb-allowed is only used for UDP transport.
> One last comment about adjusting CoAP parameter values, we spent in the WG a lot of cycles to discuss this particular point. The conclusion of the WG is that 4 retransmits (that corresponds to the default value of MAX_RETRANSMIT) are not sufficient to detect the session is disconnected during volumetric 
> DDoS attack saturating the incoming link, and which will lead to high packet loss. This is why multiple heartbeat losses are required to detect the session is defunct.
I believe you spend so much discussion time because you are misusing a mechanism that was not designed for your purpose instead of design a proper one.

> Unless there is a flaw with the current design (fail to achieve the intended functionality, implementation complexity, side effects, ...), we don't see a reason to change the current approach. 

I believe there are flaws in the design. First it’s a layer violation, but if more an idealistic concern but usually designing in layers is a good approach. But more importantly, you end up with un-frequent messages which may still terminate the connection at some point, while what you want is to simply send messages frequently in an unreliable fashion but a low rate until the attack is over.

Please have a discussion with the working group about an alternative mechanism. The meeting is next week and I hope it would be possible to get some time for that in the dots session.

I unfortunately won’t be able to make it to the session but I will sync up with Ben.


> Thank you. 
> Cheers,
> Jon/Tiru/Med
>> -----Message d'origine-----
>> De : Mirja Kuehlewind []
>> Envoyé : mardi 16 juillet 2019 17:41
>> À : Konda, Tirumaleswar Reddy
>> Cc : Jon Shallow; BOUCADAIR Mohamed TGI/OLN; draft-ietf-dots-signal-
>>;;; The IESG;
>>; Benjamin Kaduk
>> Objet : Re: [Dots] Behavior when keep-alives fail (RE: Mirja Kühlewind's
>> Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)
>> Hi Tiru,
>> Thanks for the updates. I think there is one remaining issue on the use of
>> ping/heart-beats (see also my other message). However, I believe all other
>> discuss points have been addressed now. Thanks for that!
>> Mirja