Re: [Dots] Behavior when keep-alives fail (RE: Mirja Kühlewind's Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)

Mirja Kuehlewind <ietf@kuehlewind.net> Wed, 03 July 2019 17:12 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E19A21203FB; Wed, 3 Jul 2019 10:12:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id twwqJk4x3wgx; Wed, 3 Jul 2019 10:12:18 -0700 (PDT)
Received: from wp513.webpack.hosteurope.de (wp513.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8223::]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 985341203A7; Wed, 3 Jul 2019 10:12:18 -0700 (PDT)
Received: from 200116b82468ed00dc125255380fc353.dip.versatel-1u1.de ([2001:16b8:2468:ed00:dc12:5255:380f:c353]); authenticated by wp513.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) id 1hiinl-0006oJ-Nd; Wed, 03 Jul 2019 19:12:13 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Mirja Kuehlewind <ietf@kuehlewind.net>
In-Reply-To: <06e901d531a8$38f38d90$aadaa8b0$@jpshallow.com>
Date: Wed, 3 Jul 2019 19:12:13 +0200
Cc: mohamed.boucadair@orange.com, draft-ietf-dots-signal-channel@ietf.org, "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@mcafee.com>, dots@ietf.org, frank.xialiang@huawei.com, The IESG <iesg@ietf.org>, dots-chairs@ietf.org, Benjamin Kaduk <kaduk@mit.edu>
Content-Transfer-Encoding: quoted-printable
Message-Id: <2B716406-0554-4DC6-B6F9-057A9D4D85C4@kuehlewind.net>
References: <787AE7BB302AE849A7480A190F8B93302EAB2ABA@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <F0E0BDCF-9D56-4547-86E3-FEBABD77A6EB@kuehlewind.net> <787AE7BB302AE849A7480A190F8B93302EAB2CC5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <AA91C255-9CF2-4016-8538-E634C09C27EE@kuehlewind.net> <787AE7BB302AE849A7480A190F8B93302EAB2F77@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <06e901d531a8$38f38d90$aadaa8b0$@jpshallow.com>
To: Jon Shallow <supjps-ietf@jpshallow.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-bounce-key: webpack.hosteurope.de;ietf@kuehlewind.net;1562173938;ff985c2d;
X-HE-SMSGID: 1hiinl-0006oJ-Nd
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/FYqut6DDM0Ive5_nKGG2oHkP-es>
Subject: Re: [Dots] =?utf-8?q?Behavior_when_keep-alives_fail_=28RE=3A___Mirja?= =?utf-8?q?_K=C3=BChlewind=27s_Discuss_on_draft-ietf-dots-signal-channel-3?= =?utf-8?q?1=3A_=28with_DISCUSS_and_COMMENT=29?=
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2019 17:12:23 -0000

Hi Jon,

Thanks for extended explanation. Please see questions inline.c

> On 3. Jul 2019, at 16:04, Jon Shallow <supjps-ietf@jpshallow.com>; wrote:
> 
> Hi Mirja,
> 
> As an implementer of DOTS I have the following comments to make to try and help understand what is going on with the Heartbeats
> 
> In the peace time scenario, it is assumed that the heartbeats function through the network in both directions and that it is possible to disable one or both of the heartbeat directions.
> 
> Heartbeats can use UDP or TCP depending on the session set up based on the initial connection (UDP is preferred over TCP).  With UDP, there is a CoAP Ping (Empty request CON 0.00) and a CoAP RST response.  With TCP, there is a Coap Ping (7.02) and a Coap Pong (7.03) response.
> For UDP, https://tools.ietf.org/html/rfc7252#section-4.8.2 defines the ACK_TIMEOUT, ACK_RANDOM_FACTOR, and MAX_RETRANSMIT which map onto the DOTS heartbeat parameters ack-timeout, ack-random-factor and max-retransmit.  These 3 parameters determine the elapsed time when there has been transmission failure.   There is an additional DOTS parameter missing-hb-allowed to support more than one heartbeat loss should it be needed before determining that a DOTS agent has really gone away (instead of, say, going through a reboot or a restart cycle).

Why do you need this additional parameter? Why is max-retransmit not enough? Isn’t the results the same, you send more ping frames before you finally give up?

> 
> When handling an attack scenario, there is a good chance that the inbound (DOTS server to DOTS client) data path is flooded /overloaded and hence packet loss (but not the case with all DDoS type scenarios).
> 
> A significant purpose of the DOTS client generating a heartbeat is to make sure that any NAT devices in the path maintain their NAT associations and allow any returning responses (which could be unsolicited if the observe of a mitigation is active).
> 
> Even in the attack scenario, the DOTS server will see these heartbeat messages, but can only deduce that the connection from the DOTS client to the DOTS server is good - but cannot make any assumptions about traffic flowing in the other direction.

However, if the client does never receive a Coap RST or Coap pong, it will sooner or later give up and not send any ping messages anymore. In this case the server will receive no ping anymore and can decide to send own pings. Important is the idle time out is known to both ends.

Further if your really want to be sure if the RST was received or not, I’d recommend you to use an own application ping that indicated if the ping is a retransmission or not. 

Detection one-way congestion is a different function than keep-alive testing and it is better to use an explicit mechanism for that then trying to infer something from a mechanism that was designed for a different purpose.

> 
> However, the DOTS client may not get a ping response due to the flooded inbound pipe.  If the DOTS client has initiated a mitigation request, then it is unsafe for the DOTS client to close down the session - it will need to refresh the mitigation requests / create new ones even if the mitigation is not being that effective as traffic can still flow to the server.  It is possible that the DOTS server has just restarted - hence the requirement to try and open up a new session in parallel.
> 
> If the DOTS server also initiates heartbeat messages, sees the DOTS client pings, but does not see any response to the DOTS server ping, the DOTS server can now deduce that the outbound pipe is good, but the inbound pipe to the DOTS client is failing.  The DOTS server then does not need to close down the session as it will be expecting additional mitigation requests from the DOTS client - even though the DOTS server Coap Ping is failing.
> 
> Furthermore, if the DOTS server initiates its CoAP ping on receipt of the DOTS client Coap Ping, then there is a good chance that the NAT sessions are "warm" on any intervening NAT devices.  If the DOTS server initiates the Coap Ping on its own cycle, there is a chance that it may not get through and confuse the logic.

This also sounds to me that you should rather design your own testing during mitigation in the dots layer, e.g. don’t use the Coap Ping, but send a non-confirmable Coap message which contains a dots layer “ping" and an indication if a dots-layer “pong" has been received or not.

However, note that this still might not work with TCP as messages cannot be transmitted unreliably and not-transmitted/no-acked application layer data will block all other traffic on the same connection at some point because TCP will try to retransmit and shrink the congestion window to the minimum.

Mirja


> 
> Regards
> 
> Jon
> 
>> -----Original Message-----
>> From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of supjps-mohamed.boucadair@orange.com
>> Sent: 03 July 2019 14:46
>> To: Mirja Kuehlewind
>> Cc: draft-ietf-dots-signal-channel@ietf.org; Konda, Tirumaleswar Reddy;
>> dots@ietf.org; frank.xialiang@huawei.com; The IESG; dots-chairs@ietf.org;
>> Benjamin Kaduk
>> Subject: Re: [Dots] Behavior when keep-alives fail (RE: Mirja Kühlewind's
>> Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)
>> 
>> Re-,
>> 
>> Please see inline.
>> 
>> Cheers,
>> Med
>> 
>>> -----Message d'origine-----
>>> De : Mirja Kuehlewind [mailto:ietf@kuehlewind.net]
>>> Envoyé : mercredi 3 juillet 2019 14:46
>>> À : BOUCADAIR Mohamed TGI/OLN
>>> Cc : Konda, Tirumaleswar Reddy; Benjamin Kaduk; draft-ietf-dots-signal-
>>> channel@ietf.org; frank.xialiang@huawei.com; dots@ietf.org; The IESG;
>>> dots-chairs@ietf.org
>>> Objet : Re: Behavior when keep-alives fail (RE: [Dots] Mirja Kühlewind's
>>> Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and
>> COMMENT)
>>> 
>>> Hi Med,
>>> 
>>> See below.
>>> 
>>>> On 3. Jul 2019, at 12:48, <mohamed.boucadair@orange.com>;
>>> <mohamed.boucadair@orange.com>; wrote:
>>>> 
>>>> Mirja,
>>>> 
>>>>> Actually to my understanding this will not work. Both TCP heartbeat and
>>>>> Coap Ping are transmitted reliably. If you don’t receive an ack for
>>> these
>>>>> transmissions you are not able to send any additional messages and can
>>>>> only choose the connection.
>>>> 
>>>> This behavior is implemented and tested between two implementations.
>> The
>>> exact procedure is described in the draft, fwiw:
>>>> 
>>>> ==
>>>>  When a Confirmable "CoAP Ping" is sent, and if there is no response,
>>>>  the "CoAP Ping" is retransmitted max-retransmit number of times by
>>>>  the CoAP layer using an initial timeout set to a random duration
>>>>  between ack-timeout and (ack-timeout*ack-random-factor) and
>>>>  exponential back-off between retransmissions.  By choosing the
>>>>  recommended transmission parameters, the "CoAP Ping" will timeout
>>>>  after 45 seconds.  If the DOTS agent does not receive any response
>>>>  from the peer DOTS agent for 'missing-hb-allowed' number of
>>>>  consecutive "CoAP Ping" Confirmable messages, it concludes that the
>>>>  DOTS signal channel session is disconnected.  A DOTS client MUST NOT
>>>>  transmit a "CoAP Ping" while waiting for the previous "CoAP Ping"
>>>>  response from the same DOTS server.
>>>> ==
>>> 
>>> First, can you explain why you need 'missing-hb-allowed’?
>> 
>> [Med] because we need to make sure this a "real/durable" session defunct,
>> not a false positive. For example, this would have implications on the server
>> as it may erroneously start automated mitigations (because it concludes the
>> session is lost).
>> 
>> If the ping is
>>> transmitted reliably, one “missed” should be enough to conclude that the
>>> session is disconnected.
>> 
>> [Med] Hmm, under some DDoS attacks, both endpoints may be
>> sending/replying to confirmable ping messages, but the reply may get
>> dropped. The session is not disconnected in such case.
>> 
>>> 
>>> Yes, as Coap Ping is used, the agent should not only conclude that the
>>> DOTS signal session is disconnected but also the Coap session and not send
>>> any further Coap messages anymore.
>>> 
>>> If you want to send further UDP datagram you should it unreliability and
>>> not more often then one per 3 seconds.
>>> 
>>> Mirja
>>> 
>>> 
>>>> 
>>>> Cheers,
>>>> Med
>>>> 
>>>>> -----Message d'origine-----
>>>>> De : Mirja Kuehlewind [mailto:ietf@kuehlewind.net]
>>>>> Envoyé : mercredi 3 juillet 2019 12:26
>>>>> À : BOUCADAIR Mohamed TGI/OLN
>>>>> Cc : Konda, Tirumaleswar Reddy; Benjamin Kaduk; draft-ietf-dots-signal-
>>>>> channel@ietf.org; frank.xialiang@huawei.com; dots@ietf.org; The IESG;
>>>>> dots-chairs@ietf.org
>>>>> Objet : Re: Behavior when keep-alives fail (RE: [Dots] Mirja
>>> Kühlewind's
>>>>> Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and
>>> COMMENT)
>>>>> 
>>>>> Hi Med,
>>>>> 
>>>>> See below.
>>>>> 
>>>>>> On 3. Jul 2019, at 09:53, mohamed.boucadair@orange.com wrote:
>>>>>> 
>>>>>> Hi Mirja,
>>>>>> 
>>>>>> (Focusing on individual issues)
>>>>>> 
>>>>>> Please see inline.
>>>>>> 
>>>>>> Cheers,
>>>>>> Med
>>>>>> 
>>>>>>> -----Message d'origine-----
>>>>>>> De : Mirja Kuehlewind [mailto:ietf@kuehlewind.net]
>>>>>>> Envoyé : mardi 2 juillet 2019 16:00
>>>>>>> À : BOUCADAIR Mohamed TGI/OLN
>>>>>>> Cc : Konda, Tirumaleswar Reddy; Benjamin Kaduk; draft-ietf-dots-
>>> signal-
>>>>>>> channel@ietf.org; frank.xialiang@huawei.com; dots@ietf.org; The
>> IESG;
>>>>>>> dots-chairs@ietf.org
>>>>>>> Objet : Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-
>>> signal-
>>>>>>> channel-31: (with DISCUSS and COMMENT)
>>>>>>> 
>>>>>> ...
>>>>>>>>>>>> 10) The document should more explicitly provide more
>> guidance
>>>>> about
>>>>>>>>>>>> when a client should start a session and what should be done
>>> (from
>>>>>>> the
>>>>>>>>>>>> client side) if a session is detected as inactive (other than
>>>>> during
>>>>>>>>>>>> migration which is discussed a bit in 4.7). Is the assumption to
>>>>>>> have
>>>>>>>>>>>> basically permanently an active session or connect for
>> migration
>>>>> and
>>>>>>>>>>>> configuration requests separately at a time?
>>>>>>>>>>> 
>>>>>>>>>>> I think there was some clarifying text added, but please confirm
>>> if
>>>>>>> you
>>>>>>>>> think it
>>>>>>>>>>> is sufficient.
>>>>>>>>> 
>>>>>>>>> Sorry, don’t see where text was added. Can you provide a pointer?
>>>>>>>> 
>>>>>>>> [Med] We do have this text, for example:
>>>>>>>> 
>>>>>>>> The DOTS signal channel can be established between two DOTS
>> agents
>>>>>>>> prior or during an attack.  The DOTS signal channel is initiated by
>>>>>>>> the DOTS client.  The DOTS client can then negotiate, configure, and
>>>>>>>> retrieve the DOTS signal channel session behavior with its DOTS peer
>>>>>>>> (Section 4.5).  Once the signal channel is established, the DOTS
>>>>>>>> agents periodically send heartbeats to keep the channel active
>>>>>>>> (Section 4.7).  At any time, the DOTS client may send a mitigation
>>>>>>>> request message (Section 4.4) to a DOTS server over the active
>>> signal
>>>>>>>> channel.  While mitigation is active (because of the higher
>>>>>>>> likelihood of packet loss during a DDoS attack), the DOTS server
>>>>>>>> periodically sends status messages to the client, including basic
>>>>>>>> mitigation feedback details.  Mitigation remains active until the
>>>>>>>> DOTS client explicitly terminates mitigation, or the mitigation
>>>>>>>> lifetime expires.  Also, the DOTS server may rely on the signal
>>>>>>>> channel session loss to trigger mitigation for pre-configured
>>>>>>>> mitigation requests (if any).
>>>>>>> 
>>>>>>> Okay thanks for for the pointer. What I think is missing are some
>>>>>>> sentences about what the client (or server) should do if the keep-
>>> alive
>>>>>>> fails. Try to reconnect directly or just with the next request or
>>>>>>> whatever. Basically who should reconnect and when?
>>>>>> 
>>>>>> [Med] This is discussed in details in Section 4.7, in particular.
>>>>>> 
>>>>>> As a generic rule, it is always the client who connects (see the
>>> excerpt
>>>>> above).
>>>>>> 
>>>>>> The server may use the failure to initiate automated mitigation (see
>>> the
>>>>> excerpt above). More details are provided in other sections.
>>>>>> 
>>>>>> There are several heartbeat failure cases to handle by the client.
>>>>> Examples from 4.7 are provided below, fwiw:
>>>>>> 
>>>>>>    The DOTS client MUST NOT consider the DOTS signal channel session
>>>>>>    terminated even after a maximum 'missing-hb-allowed' threshold is
>>>>>>    reached.  The DOTS client SHOULD keep on using the current DOTS
>>>>>>    signal channel session to send heartbeat requests over it, so that
>>>>>>    the DOTS server knows the DOTS client has not disconnected the
>>>>>>    DOTS signal channel session.
>>>>>> 
>>>>>>    After the maximum 'missing-hb-allowed' threshold is reached, the
>>>>>>    DOTS client SHOULD try to resume the (D)TLS session.  The DOTS
>>>>>>    client SHOULD send mitigation requests over the current DOTS
>>>>>>    signal channel session, and in parallel, for example, try to
>>>>>>    resume the (D)TLS session or use 0-RTT mode in DTLS 1.3 to
>>>>>>    piggyback the mitigation request in the ClientHello message.
>>>>>> 
>>>>>>    As soon as the link is no longer saturated, if traffic from the
>>>>>>    DOTS server reaches the DOTS client over the current DOTS signal
>>>>>>    channel session, the DOTS client can stop (D)TLS session
>>>>>>    resumption or if (D)TLS session resumption is successful then
>>>>>>    disconnect the current DOTS signal channel session.
>>>>>> 
>>>>>> Do you think additional text is needed?
>>>>> 
>>>>> Actually to my understanding this will not work. Both TCP heartbeat and
>>>>> Coap Ping are transmitted reliably. If you don’t receive an ack for
>>> these
>>>>> transmissions you are not able to send any additional messages and can
>>>>> only choose the connection.
>>>>> 
>>>>> Mirja
>>>>> 
>>>>> 
>>>> 
>> 
>> _______________________________________________
>> Dots mailing list
>> Dots@ietf.org
>> https://www.ietf.org/mailman/listinfo/dots
> 
>