Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)

<mohamed.boucadair@orange.com> Tue, 23 July 2019 05:32 UTC

Return-Path: <mohamed.boucadair@orange.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 48A0112014F; Mon, 22 Jul 2019 22:32:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FVJcukjRdJQP; Mon, 22 Jul 2019 22:32:29 -0700 (PDT)
Received: from relais-inet.orange.com (relais-inet.orange.com [80.12.66.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9355B1202AA; Mon, 22 Jul 2019 22:32:29 -0700 (PDT)
Received: from opfedar06.francetelecom.fr (unknown [xx.xx.xx.8]) by opfedar23.francetelecom.fr (ESMTP service) with ESMTP id 45t6Yl6vQdzBrw9; Tue, 23 Jul 2019 07:32:27 +0200 (CEST)
Received: from Exchangemail-eme6.itn.ftgroup (unknown [xx.xx.13.107]) by opfedar06.francetelecom.fr (ESMTP service) with ESMTP id 45t6Yl5gBmz3wb4; Tue, 23 Jul 2019 07:32:27 +0200 (CEST)
Received: from OPEXCAUBMA2.corporate.adroot.infra.ftgroup ([fe80::e878:bd0:c89e:5b42]) by OPEXCAUBM8F.corporate.adroot.infra.ftgroup ([::1]) with mapi id 14.03.0439.000; Tue, 23 Jul 2019 07:32:27 +0200
From: mohamed.boucadair@orange.com
To: "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com>, Benjamin Kaduk <kaduk@mit.edu>, Valery Smyslov <valery@smyslov.net>
CC: "dots-chairs@ietf.org" <dots-chairs@ietf.org>, "dots@ietf.org" <dots@ietf.org>
Thread-Topic: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
Thread-Index: AQHVP3mDTxo3A+0PVQS1r9HIn/ZDyqbUgTIAgAMuYSA=
Date: Tue, 23 Jul 2019 05:32:26 +0000
Message-ID: <787AE7BB302AE849A7480A190F8B9330312E57CA@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
References: <787AE7BB302AE849A7480A190F8B93302FA841A9@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <00c201d53e27$194cfc20$4be6f460$@smyslov.net> <20190721040520.GS23137@kduck.mit.edu> <DM5PR16MB1705B068DCF6AB20658EF826EAC50@DM5PR16MB1705.namprd16.prod.outlook.com>
In-Reply-To: <DM5PR16MB1705B068DCF6AB20658EF826EAC50@DM5PR16MB1705.namprd16.prod.outlook.com>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.114.13.247]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/F_b5UZzNP_RZu9P-cZe_XL58TNE>
Subject: Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Jul 2019 05:32:41 -0000

Hi Tiru, all,

Please see inline. 

Cheers,
Med

> -----Message d'origine-----
> De : Konda, Tirumaleswar Reddy [mailto:TirumaleswarReddy_Konda@McAfee.com]
> Envoyé : dimanche 21 juillet 2019 08:52
> À : Benjamin Kaduk; Valery Smyslov
> Cc : dots-chairs@ietf.org; BOUCADAIR Mohamed TGI/OLN; dots@ietf.org
> Objet : RE: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
> 
> Hi Ben,
> 
> There seems to several confusions regarding the heartbeat mechanism, I
> will try to address all the comments/Discuss from you, Mirja and Valery
> below:
> 
> [1] https://tools.ietf.org/html/rfc7252 is specific to UDP transport (and
> does not deal with TCP). Please see the first paragraph in
> https://tools.ietf.org/html/rfc7252#section-3. The message transmission
> parameters (max-retransmit, ack-timeout and ack-random-factor) and
> missing-hb-allowed discussed in DOTS signal channel are specific to UDP
> transport.
> 
> [2] CoAP over TCP is discussed in https://tools.ietf.org/html/rfc8323.
> Please see the following differences b/w CoAP-over UDP and CoAP-over-TCP
> relevant to our discussion:
> 
> a) CoAP ping/pong defined in RFC7252 (uses Empty confirmable message and
> reset) will not work for CoAP-over-TCP. As per
> https://tools.ietf.org/html/rfc8323#section-3.4, Empty messages (Code
> 0.00) can always be sent and MUST be ignored by the recipient. CoAP-over-
> TCP defines its own CoAP ping/pong for connection health (see
> https://tools.ietf.org/html/rfc8323#section-5.4).
> 
> b)Confirmable  and Non-confirmable message types are specific to UDP, and
> are not supported in CoAP-over-TCP.
> 
> [3] For TCP, if no ack is received for CoAP ping for specific duration,
> TCP will close the connection, and the DOTS client will have to re-
> establish the TCP connection. missing-hb-allowed is of no use for TCP. We
> are all in the same page for TCP, and the draft can probably
>       be updated for better clarity.
> 
> [4] Now coming to UDP, please see my responses below:
> 
> a) As you already know, DOTS signal channel uses heartbeat exchange in
> both directions, and hence CoAP ping is sent by both DOTS client and
> server.
> b) CoAP ping is a confirmable message and hence the exponential back-off
> with the default value of MAX_RETRANSMIT is 4
> (https://tools.ietf.org/html/rfc7252#section-4.8).
> c) CoAP ping is the only confirmable message exchanged during attack (all
> other messages exchanged during an attack are non-confirmable).  The
> specification allows distinct values for message transmission parameters
> and missing-hb-allowed to be used during attack and peace times.
> 
> To handle congestion conditions during an attack, the specification allows
> two options:
> 
> [Option a] By setting MAX_RETRANSMIT to 1, exponential-back off is avoided
> and missing-hb-allowed set to a very higher value (e.g. 20) to handle
> congestion (high packet loss). The draft can be updated to explain [Option
> a] in more detail.
> [Option b] The CoAP MAX_RETRANSMIT default value of 4 is not modified, and
> for example, missing-hb-allowed can be set to 5 (since 4 transmits are not
> sufficient to detect the peer is not alive during congestion).
> 

[Med] We can add this text to illustrate the configuration flexibility: 

   The specification allows for a flexible retry configuration when an
   unreliable transport is in use.  For example, a server may be tweaked
   to return a lower 'missing-hb-allowed' (e.g., 5) value but delegate
   the retransmission to the underlying CoAP library by setting 'max-
   retransmit' to a high value (e.g., 3).  The server may also be
   configured to return a 'max-retransmit' set to '1' together with a
   higher 'missing-hb-allowed' value (e.g., 15).


> The Discuss from Mirja is not to rely on the CoAP ping/pong but to define
> it in the DOTS layer itself (please see
> https://mailarchive.ietf.org/arch/msg/dots/V6vv28zDpdY5eR_kaB7L-60bhkk)
> and suggested to go with an alternate design using non-confirmable
> messages. The alternate design won't work is our assessment, please see my
> response
> https://mailarchive.ietf.org/arch/msg/dots/QRMfsmhPTFksN6a_nBBKimVx-lM
> 
> Cheers,
> -Tiru
> 
> > -----Original Message-----
> > From: Dots <dots-bounces@ietf.org> On Behalf Of Benjamin Kaduk
> > Sent: Sunday, July 21, 2019 9:35 AM
> > To: Valery Smyslov <valery@smyslov.net>
> > Cc: dots-chairs@ietf.org; mohamed.boucadair@orange.com; dots@ietf.org
> > Subject: Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
> >
> > This email originated from outside of the organization. Do not click
> links or
> > open attachments unless you recognize the sender and know the content is
> > safe.
> >
> > Hi Valery,
> >
> > On Fri, Jul 19, 2019 at 02:42:50PM +0300, Valery Smyslov wrote:
> > > Hi Med,
> > >
> > > I believe Mirja's main point was that if you use liveness check
> > > mechanism in the transport layer, then if it reports that liveness
> check fails,
> > then it _also_ closes the transport session.
> > >
> > > Quotes from her emails:
> > > "Yes, as Coap Ping is used, the agent should not only conclude that
> the
> > DOTS signal session is disconnected but also the Coap session and not
> send
> > any further Coap messages anymore."
> > >
> > > and
> > >
> > > "Actually to my understanding this will not work. Both TCP heartbeat
> and
> > Coap Ping are transmitted reliably. If you don’t receive an ack for
> these
> > transmissions you are not able to send any additional messages and can
> only
> > close the connection."
> > >
> > > I'm not familiar with CoAP, but I suspect she's right about TCP - if
> > > TCP layer itself doesn't receive ACK for the sent data after several
> > retransmissions, the connection is closed.
> >
> > Thanks for this crisp summary (and thanks Med for the detailed writeup
> as
> > well)!
> >
> > > As far as I understand the current draft allows underlying liveness
> > > check to fail and has a parameter to restart this check several times
> > > if this happens. It seems that a new transport session will be created
> > > in this case (at least if TCP is used). In my reading of the draft
> > > this seems not been assumed, it is assumed that the session remains
> the
> > same. So, I think that main Mirja's concern is that it won't work (at
> least with
> > TCP).
> >
> > My sense is similar; if I could attempt to summarize Mirja's stance,
> it's that
> > we're invoking a transport-level feature that does its own retransmit
> and
> > backoff, but then if the transport comes back and says "the peer is
> gone", we
> > say "but we're under attack, so I don't believe you; try again".
> > This kicks of another independent set of "retransmits" (I know it's not
> > technically the right word) with a fresh exponential backoff.  There's
> two
> > complaints about this: (1) we're changing the transport, since if the
> transport
> > concludes the peer is gone then the transport "normally" tears down the
> > connection (*) entirely, and (2) the assembly of (exponential backoff
> 1),
> > (exponential backoff 2), (exponential backoff 2) is strange pacing, and
> might
> > be better served by a similar number of "retransmits" but with different
> > pacing, since the long delay at the end of each backoff period is not
> expected
> > to add a huge amount of value in terms of letting congestion ease during
> > attack time, and we would be just as well served by capping the delay
> > between retransmits and having more retransmits.
> >
> > The asterisk on (1) is of course because, as is noted later in the
> thread, only
> > TCP tears down the association when it concludes the peer is gone
> (assuming
> > I'm reading the right parts of 7252).  Quoting 7252:
> >
> >                                                         If the
> >    retransmission counter reaches MAX_RETRANSMIT on a timeout, or if the
> >    endpoint receives a Reset message, then the attempt to transmit the
> >    message is canceled and the application process informed of failure.
> >    On the other hand, if the endpoint receives an acknowledgement in
> >    time, transmission is considered successful.
> >
> > So all CoAP does is to tell the application "that request didn't work",
> but CoAP
> > is happy to try additional requests on the connection; the teardown
> logic is
> > indeed left up to the application.
> >
> > I'm not sure that we've seen much discussion about (2), though (sorry if
> I
> > missed it) -- why is the repeated backoff-and-restart the right pacing
> for this
> > purpose?
> >
> > -Ben
> >
> > > I didn't participate in the WG discussion on this, so I don't know
> > > what was discussed regarding this issue. If it was discussed and the
> > > WG has come to conclusion that this is not an issue, then I believe
> > > more text should be added to the draft so, that people like Mirja, who
> > didn't participate in the discussion, don't have any concerns while
> reading the
> > draft.
> > >
> > > Regards,
> > > Valery.
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: mohamed.boucadair@orange.com
> > > > <mohamed.boucadair@orange.com>
> > > > Sent: Friday, July 19, 2019 9:57 AM
> > > > To: Benjamin Kaduk (kaduk@mit.edu) <kaduk@mit.edu>; dots-
> > > > chairs@ietf.org; dots@ietf.org
> > > > Subject: Mirja's DISCUSS: Pending Point (AD Help Needed)
> > > >
> > > > Hi Ben, chairs, all,
> > > >
> > > > (restricting the discussion to the AD/chairs/WG)
> > > >
> > > > * Status:
> > > >
> > > > All DISCUSS points from Mirja's review were fixed, except the one
> > > > discussed in this message.
> > > >
> > > > * Pending Point:
> > > >
> > > > Rather than going into much details, I consider the following as the
> > > > summary of the remaining DISCUSS point from Mirja:
> > > >
> > > > > I believe there are flaws in the design. First it’s a layer
> > > > > violation, but if more an idealistic concern but usually designing
> > > > > in layers is a good approach. But more importantly, you end up
> > > > > with un-frequent messages which may still terminate the connection
> > > > > at some point, while what you want is to simply send messages
> > > > > frequently in an unreliable fashion but a low rate until the
> attack is over.
> > > >
> > > > * Discussion:
> > > >
> > > > (1) First of all, let's remind that RFC7252 does not define how CoAP
> > > > ping must be used. It does only say:
> > > >
> > > > ==
> > > >       Provoking a Reset
> > > >       message (e.g., by sending an Empty Confirmable message) is
> also
> > > >       useful as an inexpensive check of the liveness of an endpoint
> > > >       ("CoAP ping").
> > > > ==
> > > >
> > > > How the liveness is assessed is left to applications. So, there is
> > > > ** no layer violation **.
> > > >
> > > > (2) What we need isn't (text from Mirja):
> > > >
> > > > > to simply send messages frequently in an unreliable fashion but a
> > > > > low rate until the attack is over "
> > > >
> > > > It is actually the other way around. The spec says:
> > > >
> > > >   "... This is particularly useful for DOTS
> > > >    servers that might want to reduce heartbeat frequency or cease
> > > >    heartbeat exchanges when an active DOTS client has not requested
> > > >    mitigation."
> > > >
> > > > What we want can be formalized as:
> > > >  - Taking into account DDoS traffic conditions, a check to assess
> > > > the liveness of the peer DOTS agent + maintain NAT/FW state on on-
> path
> > devices.
> > > >
> > > > An much more elaborated version is documented in SIG-004 of RFC
> 8612.
> > > >
> > > > * My analysis:
> > > >
> > > > - The intended functionality is naturally provided by existing CoAP
> > messages.
> > > > - Informed WG decision: The WG spent a lot of cycles when specifying
> > > > the current behavior to be meet the requirements set in RFC8612.
> > > > - Why not an alternative design: We can always define messages with
> > > > duplicated functionality, but that is not a good design approach
> > > > especially when there is no evident benefit.
> > > > - The specification is not broken: it was implemented and tested.
> > > >
> > > > And a logistic comment: this issue fits IMHO under the non-discuss
> > > > criteria in https://www.ietf.org/blog/discuss-criteria-iesg-
> review/#stand-
> > undisc.
> > > >
> > > > * What's Next?
> > > >
> > > > As an editor, I don't think a change is needed but I'd like to hear
> > > > from Ben, chairs, and the WG.
> > > >
> > > > Please share your thoughts and whether you agree/disagree with the
> > > > above analysis.
> > > >
> > > > Cheers,
> > > > Med
> > >
> >
> > _______________________________________________
> > Dots mailing list
> > Dots@ietf.org
> > https://www.ietf.org/mailman/listinfo/dots