Re: [Dots] Behavior when keep-alives fail (RE: Mirja Kühlewind's Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)

"Jon Shallow" <supjps-ietf@jpshallow.com> Wed, 03 July 2019 14:04 UTC

Return-Path: <supjps-ietf@jpshallow.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B2728120096; Wed, 3 Jul 2019 07:04:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hUR3HCh7LMLs; Wed, 3 Jul 2019 07:04:32 -0700 (PDT)
Received: from mail.jpshallow.com (mail.jpshallow.com [217.40.240.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0179E120091; Wed, 3 Jul 2019 07:04:31 -0700 (PDT)
Received: from [127.0.0.1] (helo=N01332) by mail.jpshallow.com with esmtp (Exim 4.92) (envelope-from <jon.shallow@jpshallow.com>) id 1hifs1-0007PX-EJ; Wed, 03 Jul 2019 15:04:25 +0100
From: Jon Shallow <supjps-ietf@jpshallow.com>
To: mohamed.boucadair@orange.com, 'Mirja Kuehlewind' <ietf@kuehlewind.net>, draft-ietf-dots-signal-channel@ietf.org, "'Konda, Tirumaleswar Reddy'" <TirumaleswarReddy_Konda@mcafee.com>, dots@ietf.org, frank.xialiang@huawei.com, 'The IESG' <iesg@ietf.org>, dots-chairs@ietf.org, 'Benjamin Kaduk' <kaduk@mit.edu>
References: <787AE7BB302AE849A7480A190F8B93302EAB2ABA@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <F0E0BDCF-9D56-4547-86E3-FEBABD77A6EB@kuehlewind.net> <787AE7BB302AE849A7480A190F8B93302EAB2CC5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <AA91C255-9CF2-4016-8538-E634C09C27EE@kuehlewind.net> <787AE7BB302AE849A7480A190F8B93302EAB2F77@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
In-Reply-To: <787AE7BB302AE849A7480A190F8B93302EAB2F77@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
Date: Wed, 03 Jul 2019 15:04:25 +0100
Message-ID: <06e901d531a8$38f38d90$aadaa8b0$@jpshallow.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQHD6CHHXptH0G8Exk79rrRA9rtJXgH6mckIAd1vi7sB26j+qgH8ZsBTpp39ZNA=
Content-Language: en-gb
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/RYu9lWP8Od__0h-4lWVIey7b6AE>
Subject: Re: [Dots] Behavior when keep-alives fail (RE: Mirja Kühlewind's Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2019 14:04:36 -0000

Hi Mirja,

As an implementer of DOTS I have the following comments to make to try and help understand what is going on with the Heartbeats

In the peace time scenario, it is assumed that the heartbeats function through the network in both directions and that it is possible to disable one or both of the heartbeat directions.

Heartbeats can use UDP or TCP depending on the session set up based on the initial connection (UDP is preferred over TCP).  With UDP, there is a CoAP Ping (Empty request CON 0.00) and a CoAP RST response.  With TCP, there is a Coap Ping (7.02) and a Coap Pong (7.03) response.
For UDP, https://tools.ietf.org/html/rfc7252#section-4.8.2 defines the ACK_TIMEOUT, ACK_RANDOM_FACTOR, and MAX_RETRANSMIT which map onto the DOTS heartbeat parameters ack-timeout, ack-random-factor and max-retransmit.  These 3 parameters determine the elapsed time when there has been transmission failure.   There is an additional DOTS parameter missing-hb-allowed to support more than one heartbeat loss should it be needed before determining that a DOTS agent has really gone away (instead of, say, going through a reboot or a restart cycle).
 
When handling an attack scenario, there is a good chance that the inbound (DOTS server to DOTS client) data path is flooded /overloaded and hence packet loss (but not the case with all DDoS type scenarios).

A significant purpose of the DOTS client generating a heartbeat is to make sure that any NAT devices in the path maintain their NAT associations and allow any returning responses (which could be unsolicited if the observe of a mitigation is active).

Even in the attack scenario, the DOTS server will see these heartbeat messages, but can only deduce that the connection from the DOTS client to the DOTS server is good - but cannot make any assumptions about traffic flowing in the other direction.

However, the DOTS client may not get a ping response due to the flooded inbound pipe.  If the DOTS client has initiated a mitigation request, then it is unsafe for the DOTS client to close down the session - it will need to refresh the mitigation requests / create new ones even if the mitigation is not being that effective as traffic can still flow to the server.  It is possible that the DOTS server has just restarted - hence the requirement to try and open up a new session in parallel.

If the DOTS server also initiates heartbeat messages, sees the DOTS client pings, but does not see any response to the DOTS server ping, the DOTS server can now deduce that the outbound pipe is good, but the inbound pipe to the DOTS client is failing.  The DOTS server then does not need to close down the session as it will be expecting additional mitigation requests from the DOTS client - even though the DOTS server Coap Ping is failing.

Furthermore, if the DOTS server initiates its CoAP ping on receipt of the DOTS client Coap Ping, then there is a good chance that the NAT sessions are "warm" on any intervening NAT devices.  If the DOTS server initiates the Coap Ping on its own cycle, there is a chance that it may not get through and confuse the logic.

Regards

Jon

> -----Original Message-----
> From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of supjps-mohamed.boucadair@orange.com
> Sent: 03 July 2019 14:46
> To: Mirja Kuehlewind
> Cc: draft-ietf-dots-signal-channel@ietf.org; Konda, Tirumaleswar Reddy;
> dots@ietf.org; frank.xialiang@huawei.com; The IESG; dots-chairs@ietf.org;
> Benjamin Kaduk
> Subject: Re: [Dots] Behavior when keep-alives fail (RE: Mirja Kühlewind's
> Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)
> 
> Re-,
> 
> Please see inline.
> 
> Cheers,
> Med
> 
> > -----Message d'origine-----
> > De : Mirja Kuehlewind [mailto:ietf@kuehlewind.net]
> > Envoyé : mercredi 3 juillet 2019 14:46
> > À : BOUCADAIR Mohamed TGI/OLN
> > Cc : Konda, Tirumaleswar Reddy; Benjamin Kaduk; draft-ietf-dots-signal-
> > channel@ietf.org; frank.xialiang@huawei.com; dots@ietf.org; The IESG;
> > dots-chairs@ietf.org
> > Objet : Re: Behavior when keep-alives fail (RE: [Dots] Mirja Kühlewind's
> > Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and
> COMMENT)
> >
> > Hi Med,
> >
> > See below.
> >
> > > On 3. Jul 2019, at 12:48, <mohamed.boucadair@orange.com>
> > <mohamed.boucadair@orange.com> wrote:
> > >
> > > Mirja,
> > >
> > >> Actually to my understanding this will not work. Both TCP heartbeat and
> > >> Coap Ping are transmitted reliably. If you don’t receive an ack for
> > these
> > >> transmissions you are not able to send any additional messages and can
> > >> only choose the connection.
> > >
> > > This behavior is implemented and tested between two implementations.
> The
> > exact procedure is described in the draft, fwiw:
> > >
> > > ==
> > >   When a Confirmable "CoAP Ping" is sent, and if there is no response,
> > >   the "CoAP Ping" is retransmitted max-retransmit number of times by
> > >   the CoAP layer using an initial timeout set to a random duration
> > >   between ack-timeout and (ack-timeout*ack-random-factor) and
> > >   exponential back-off between retransmissions.  By choosing the
> > >   recommended transmission parameters, the "CoAP Ping" will timeout
> > >   after 45 seconds.  If the DOTS agent does not receive any response
> > >   from the peer DOTS agent for 'missing-hb-allowed' number of
> > >   consecutive "CoAP Ping" Confirmable messages, it concludes that the
> > >   DOTS signal channel session is disconnected.  A DOTS client MUST NOT
> > >   transmit a "CoAP Ping" while waiting for the previous "CoAP Ping"
> > >   response from the same DOTS server.
> > > ==
> >
> > First, can you explain why you need 'missing-hb-allowed’?
> 
> [Med] because we need to make sure this a "real/durable" session defunct,
> not a false positive. For example, this would have implications on the server
> as it may erroneously start automated mitigations (because it concludes the
> session is lost).
> 
>  If the ping is
> > transmitted reliably, one “missed” should be enough to conclude that the
> > session is disconnected.
> 
> [Med] Hmm, under some DDoS attacks, both endpoints may be
> sending/replying to confirmable ping messages, but the reply may get
> dropped. The session is not disconnected in such case.
> 
> >
> > Yes, as Coap Ping is used, the agent should not only conclude that the
> > DOTS signal session is disconnected but also the Coap session and not send
> > any further Coap messages anymore.
> >
> > If you want to send further UDP datagram you should it unreliability and
> > not more often then one per 3 seconds.
> >
> > Mirja
> >
> >
> > >
> > > Cheers,
> > > Med
> > >
> > >> -----Message d'origine-----
> > >> De : Mirja Kuehlewind [mailto:ietf@kuehlewind.net]
> > >> Envoyé : mercredi 3 juillet 2019 12:26
> > >> À : BOUCADAIR Mohamed TGI/OLN
> > >> Cc : Konda, Tirumaleswar Reddy; Benjamin Kaduk; draft-ietf-dots-signal-
> > >> channel@ietf.org; frank.xialiang@huawei.com; dots@ietf.org; The IESG;
> > >> dots-chairs@ietf.org
> > >> Objet : Re: Behavior when keep-alives fail (RE: [Dots] Mirja
> > Kühlewind's
> > >> Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and
> > COMMENT)
> > >>
> > >> Hi Med,
> > >>
> > >> See below.
> > >>
> > >>> On 3. Jul 2019, at 09:53, mohamed.boucadair@orange.com wrote:
> > >>>
> > >>> Hi Mirja,
> > >>>
> > >>> (Focusing on individual issues)
> > >>>
> > >>> Please see inline.
> > >>>
> > >>> Cheers,
> > >>> Med
> > >>>
> > >>>> -----Message d'origine-----
> > >>>> De : Mirja Kuehlewind [mailto:ietf@kuehlewind.net]
> > >>>> Envoyé : mardi 2 juillet 2019 16:00
> > >>>> À : BOUCADAIR Mohamed TGI/OLN
> > >>>> Cc : Konda, Tirumaleswar Reddy; Benjamin Kaduk; draft-ietf-dots-
> > signal-
> > >>>> channel@ietf.org; frank.xialiang@huawei.com; dots@ietf.org; The
> IESG;
> > >>>> dots-chairs@ietf.org
> > >>>> Objet : Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-
> > signal-
> > >>>> channel-31: (with DISCUSS and COMMENT)
> > >>>>
> > >>> ...
> > >>>>>>>>> 10) The document should more explicitly provide more
> guidance
> > >> about
> > >>>>>>>>> when a client should start a session and what should be done
> > (from
> > >>>> the
> > >>>>>>>>> client side) if a session is detected as inactive (other than
> > >> during
> > >>>>>>>>> migration which is discussed a bit in 4.7). Is the assumption to
> > >>>> have
> > >>>>>>>>> basically permanently an active session or connect for
> migration
> > >> and
> > >>>>>>>>> configuration requests separately at a time?
> > >>>>>>>>
> > >>>>>>>> I think there was some clarifying text added, but please confirm
> > if
> > >>>> you
> > >>>>>> think it
> > >>>>>>>> is sufficient.
> > >>>>>>
> > >>>>>> Sorry, don’t see where text was added. Can you provide a pointer?
> > >>>>>
> > >>>>> [Med] We do have this text, for example:
> > >>>>>
> > >>>>> The DOTS signal channel can be established between two DOTS
> agents
> > >>>>> prior or during an attack.  The DOTS signal channel is initiated by
> > >>>>> the DOTS client.  The DOTS client can then negotiate, configure, and
> > >>>>> retrieve the DOTS signal channel session behavior with its DOTS peer
> > >>>>> (Section 4.5).  Once the signal channel is established, the DOTS
> > >>>>> agents periodically send heartbeats to keep the channel active
> > >>>>> (Section 4.7).  At any time, the DOTS client may send a mitigation
> > >>>>> request message (Section 4.4) to a DOTS server over the active
> > signal
> > >>>>> channel.  While mitigation is active (because of the higher
> > >>>>> likelihood of packet loss during a DDoS attack), the DOTS server
> > >>>>> periodically sends status messages to the client, including basic
> > >>>>> mitigation feedback details.  Mitigation remains active until the
> > >>>>> DOTS client explicitly terminates mitigation, or the mitigation
> > >>>>> lifetime expires.  Also, the DOTS server may rely on the signal
> > >>>>> channel session loss to trigger mitigation for pre-configured
> > >>>>> mitigation requests (if any).
> > >>>>
> > >>>> Okay thanks for for the pointer. What I think is missing are some
> > >>>> sentences about what the client (or server) should do if the keep-
> > alive
> > >>>> fails. Try to reconnect directly or just with the next request or
> > >>>> whatever. Basically who should reconnect and when?
> > >>>
> > >>> [Med] This is discussed in details in Section 4.7, in particular.
> > >>>
> > >>> As a generic rule, it is always the client who connects (see the
> > excerpt
> > >> above).
> > >>>
> > >>> The server may use the failure to initiate automated mitigation (see
> > the
> > >> excerpt above). More details are provided in other sections.
> > >>>
> > >>> There are several heartbeat failure cases to handle by the client.
> > >> Examples from 4.7 are provided below, fwiw:
> > >>>
> > >>>     The DOTS client MUST NOT consider the DOTS signal channel session
> > >>>     terminated even after a maximum 'missing-hb-allowed' threshold is
> > >>>     reached.  The DOTS client SHOULD keep on using the current DOTS
> > >>>     signal channel session to send heartbeat requests over it, so that
> > >>>     the DOTS server knows the DOTS client has not disconnected the
> > >>>     DOTS signal channel session.
> > >>>
> > >>>     After the maximum 'missing-hb-allowed' threshold is reached, the
> > >>>     DOTS client SHOULD try to resume the (D)TLS session.  The DOTS
> > >>>     client SHOULD send mitigation requests over the current DOTS
> > >>>     signal channel session, and in parallel, for example, try to
> > >>>     resume the (D)TLS session or use 0-RTT mode in DTLS 1.3 to
> > >>>     piggyback the mitigation request in the ClientHello message.
> > >>>
> > >>>     As soon as the link is no longer saturated, if traffic from the
> > >>>     DOTS server reaches the DOTS client over the current DOTS signal
> > >>>     channel session, the DOTS client can stop (D)TLS session
> > >>>     resumption or if (D)TLS session resumption is successful then
> > >>>     disconnect the current DOTS signal channel session.
> > >>>
> > >>> Do you think additional text is needed?
> > >>
> > >> Actually to my understanding this will not work. Both TCP heartbeat and
> > >> Coap Ping are transmitted reliably. If you don’t receive an ack for
> > these
> > >> transmissions you are not able to send any additional messages and can
> > >> only choose the connection.
> > >>
> > >> Mirja
> > >>
> > >>
> > >
> 
> _______________________________________________
> Dots mailing list
> Dots@ietf.org
> https://www.ietf.org/mailman/listinfo/dots