Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)

<mohamed.boucadair@orange.com> Tue, 23 July 2019 13:56 UTC

Return-Path: <mohamed.boucadair@orange.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 85884120119; Tue, 23 Jul 2019 06:56:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vmj8b9yxkb8i; Tue, 23 Jul 2019 06:56:52 -0700 (PDT)
Received: from relais-inet.orange.com (relais-inet.orange.com [80.12.70.34]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 85A8A120116; Tue, 23 Jul 2019 06:56:52 -0700 (PDT)
Received: from opfednr06.francetelecom.fr (unknown [xx.xx.xx.70]) by opfednr26.francetelecom.fr (ESMTP service) with ESMTP id 45tKlk55qtzydh; Tue, 23 Jul 2019 15:56:50 +0200 (CEST)
Received: from Exchangemail-eme6.itn.ftgroup (unknown [xx.xx.13.101]) by opfednr06.francetelecom.fr (ESMTP service) with ESMTP id 45tKlk3N5dzDq75; Tue, 23 Jul 2019 15:56:50 +0200 (CEST)
Received: from OPEXCAUBMA2.corporate.adroot.infra.ftgroup ([fe80::e878:bd0:c89e:5b42]) by OPEXCAUBM6F.corporate.adroot.infra.ftgroup ([::1]) with mapi id 14.03.0439.000; Tue, 23 Jul 2019 15:56:50 +0200
From: mohamed.boucadair@orange.com
To: "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com>, Benjamin Kaduk <kaduk@mit.edu>, Valery Smyslov <valery@smyslov.net>
CC: "dots-chairs@ietf.org" <dots-chairs@ietf.org>, "dots@ietf.org" <dots@ietf.org>
Thread-Topic: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
Thread-Index: AdU9/zWZw7DhsbF6RNCA2PYrEJMCCAAJ+IgAAFSa1gAAAzHm8ABkbgsAAAIWUGAADlD0QA==
Date: Tue, 23 Jul 2019 13:56:49 +0000
Message-ID: <787AE7BB302AE849A7480A190F8B9330312E604F@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
References: <787AE7BB302AE849A7480A190F8B93302FA841A9@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <00c201d53e27$194cfc20$4be6f460$@smyslov.net> <20190721040520.GS23137@kduck.mit.edu> <DM5PR16MB1705B068DCF6AB20658EF826EAC50@DM5PR16MB1705.namprd16.prod.outlook.com> <787AE7BB302AE849A7480A190F8B9330312E57CA@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <MWHPR16MB17119026CED493164A85FDDBEAC70@MWHPR16MB1711.namprd16.prod.outlook.com>
In-Reply-To: <MWHPR16MB17119026CED493164A85FDDBEAC70@MWHPR16MB1711.namprd16.prod.outlook.com>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.114.13.247]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/xsXuU_VcLpwyiZatsYId2TmbuqA>
Subject: Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Jul 2019 13:56:57 -0000

Re-,

Thank you, Tiru,

All: FWIW, we prepared a set of slides to expose the pending DISCUSS point from Mirja. The slides are available at: 
https://datatracker.ietf.org/meeting/105/materials/slides-105-dots-heartbeat-mechanism-mirjas-discuss-on-the-signal-channel-i-d-00 

Cheers,
Med

> -----Message d'origine-----
> De : Konda, Tirumaleswar Reddy [mailto:TirumaleswarReddy_Konda@McAfee.com]
> Envoyé : mardi 23 juillet 2019 08:41
> À : BOUCADAIR Mohamed TGI/OLN; Benjamin Kaduk; Valery Smyslov
> Cc : dots-chairs@ietf.org; dots@ietf.org
> Objet : RE: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
> 
> > -----Original Message-----
> > From: mohamed.boucadair@orange.com
> > <mohamed.boucadair@orange.com>
> > Sent: Tuesday, July 23, 2019 11:02 AM
> > To: Konda, Tirumaleswar Reddy
> > <TirumaleswarReddy_Konda@McAfee.com>; Benjamin Kaduk
> > <kaduk@mit.edu>; Valery Smyslov <valery@smyslov.net>
> > Cc: dots-chairs@ietf.org; dots@ietf.org
> > Subject: RE: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
> >
> >
> >
> > Hi Tiru, all,
> >
> > Please see inline.
> >
> > Cheers,
> > Med
> >
> > > -----Message d'origine-----
> > > De : Konda, Tirumaleswar Reddy
> > > [mailto:TirumaleswarReddy_Konda@McAfee.com]
> > > Envoyé : dimanche 21 juillet 2019 08:52 À : Benjamin Kaduk; Valery
> > > Smyslov Cc : dots-chairs@ietf.org; BOUCADAIR Mohamed TGI/OLN;
> > > dots@ietf.org Objet : RE: [Dots] Mirja's DISCUSS: Pending Point (AD
> > > Help Needed)
> > >
> > > Hi Ben,
> > >
> > > There seems to several confusions regarding the heartbeat mechanism, I
> > > will try to address all the comments/Discuss from you, Mirja and
> > > Valery
> > > below:
> > >
> > > [1] https://tools.ietf.org/html/rfc7252 is specific to UDP transport
> > > (and does not deal with TCP). Please see the first paragraph in
> > > https://tools.ietf.org/html/rfc7252#section-3. The message
> > > transmission parameters (max-retransmit, ack-timeout and
> > > ack-random-factor) and missing-hb-allowed discussed in DOTS signal
> > > channel are specific to UDP transport.
> > >
> > > [2] CoAP over TCP is discussed in https://tools.ietf.org/html/rfc8323.
> > > Please see the following differences b/w CoAP-over UDP and
> > > CoAP-over-TCP relevant to our discussion:
> > >
> > > a) CoAP ping/pong defined in RFC7252 (uses Empty confirmable message
> > > and
> > > reset) will not work for CoAP-over-TCP. As per
> > > https://tools.ietf.org/html/rfc8323#section-3.4, Empty messages (Code
> > > 0.00) can always be sent and MUST be ignored by the recipient.
> > > CoAP-over- TCP defines its own CoAP ping/pong for connection health
> > > (see https://tools.ietf.org/html/rfc8323#section-5.4).
> > >
> > > b)Confirmable  and Non-confirmable message types are specific to UDP,
> > > and are not supported in CoAP-over-TCP.
> > >
> > > [3] For TCP, if no ack is received for CoAP ping for specific
> > > duration, TCP will close the connection, and the DOTS client will have
> > > to re- establish the TCP connection. missing-hb-allowed is of no use
> > > for TCP. We are all in the same page for TCP, and the draft can
> probably
> > >       be updated for better clarity.
> > >
> > > [4] Now coming to UDP, please see my responses below:
> > >
> > > a) As you already know, DOTS signal channel uses heartbeat exchange in
> > > both directions, and hence CoAP ping is sent by both DOTS client and
> > > server.
> > > b) CoAP ping is a confirmable message and hence the exponential
> > > back-off with the default value of MAX_RETRANSMIT is 4
> > > (https://tools.ietf.org/html/rfc7252#section-4.8).
> > > c) CoAP ping is the only confirmable message exchanged during attack
> > > (all other messages exchanged during an attack are non-confirmable).
> > > The specification allows distinct values for message transmission
> > > parameters and missing-hb-allowed to be used during attack and peace
> > times.
> > >
> > > To handle congestion conditions during an attack, the specification
> > > allows two options:
> > >
> > > [Option a] By setting MAX_RETRANSMIT to 1, exponential-back off is
> > > avoided and missing-hb-allowed set to a very higher value (e.g. 20) to
> > > handle congestion (high packet loss). The draft can be updated to
> > > explain [Option a] in more detail.
> > > [Option b] The CoAP MAX_RETRANSMIT default value of 4 is not modified,
> > > and for example, missing-hb-allowed can be set to 5 (since 4 transmits
> > > are not sufficient to detect the peer is not alive during congestion).
> > >
> >
> > [Med] We can add this text to illustrate the configuration flexibility:
> >
> >    The specification allows for a flexible retry configuration when an
> >    unreliable transport is in use.  For example, a server may be tweaked
> >    to return a lower 'missing-hb-allowed' (e.g., 5) value but delegate
> >    the retransmission to the underlying CoAP library by setting 'max-
> >    retransmit' to a high value (e.g., 3).  The server may also be
> >    configured to return a 'max-retransmit' set to '1' together with a
> >    higher 'missing-hb-allowed' value (e.g., 15).
> 
> Looks good, Both these techniques are used by protocols today, I see DTLS
> heartbeat uses retransmit and exponential back-off (see
> https://tools.ietf.org/html/rfc6347#section-4.2.4.1) for liveness check
> and in STUN usage for consent freshness
> (https://tools.ietf.org/html/rfc7675) STUN binding requests are sent
> periodically.
> 
> Cheers,
> -Tiru
> 
> >
> >
> > > The Discuss from Mirja is not to rely on the CoAP ping/pong but to
> > > define it in the DOTS layer itself (please see
> > > https://mailarchive.ietf.org/arch/msg/dots/V6vv28zDpdY5eR_kaB7L-
> > 60bhkk
> > > ) and suggested to go with an alternate design using non-confirmable
> > > messages. The alternate design won't work is our assessment, please
> > > see my response
> > >
> > https://mailarchive.ietf.org/arch/msg/dots/QRMfsmhPTFksN6a_nBBKimVx-
> > lM
> > >
> > > Cheers,
> > > -Tiru
> > >
> > > > -----Original Message-----
> > > > From: Dots <dots-bounces@ietf.org> On Behalf Of Benjamin Kaduk
> > > > Sent: Sunday, July 21, 2019 9:35 AM
> > > > To: Valery Smyslov <valery@smyslov.net>
> > > > Cc: dots-chairs@ietf.org; mohamed.boucadair@orange.com;
> > > > dots@ietf.org
> > > > Subject: Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
> > > >
> > > > This email originated from outside of the organization. Do not click
> > > links or
> > > > open attachments unless you recognize the sender and know the
> > > > content is safe.
> > > >
> > > > Hi Valery,
> > > >
> > > > On Fri, Jul 19, 2019 at 02:42:50PM +0300, Valery Smyslov wrote:
> > > > > Hi Med,
> > > > >
> > > > > I believe Mirja's main point was that if you use liveness check
> > > > > mechanism in the transport layer, then if it reports that liveness
> > > check fails,
> > > > then it _also_ closes the transport session.
> > > > >
> > > > > Quotes from her emails:
> > > > > "Yes, as Coap Ping is used, the agent should not only conclude
> > > > > that
> > > the
> > > > DOTS signal session is disconnected but also the Coap session and
> > > > not
> > > send
> > > > any further Coap messages anymore."
> > > > >
> > > > > and
> > > > >
> > > > > "Actually to my understanding this will not work. Both TCP
> > > > > heartbeat
> > > and
> > > > Coap Ping are transmitted reliably. If you don’t receive an ack for
> > > these
> > > > transmissions you are not able to send any additional messages and
> > > > can
> > > only
> > > > close the connection."
> > > > >
> > > > > I'm not familiar with CoAP, but I suspect she's right about TCP -
> > > > > if TCP layer itself doesn't receive ACK for the sent data after
> > > > > several
> > > > retransmissions, the connection is closed.
> > > >
> > > > Thanks for this crisp summary (and thanks Med for the detailed
> > > > writeup
> > > as
> > > > well)!
> > > >
> > > > > As far as I understand the current draft allows underlying
> > > > > liveness check to fail and has a parameter to restart this check
> > > > > several times if this happens. It seems that a new transport
> > > > > session will be created in this case (at least if TCP is used). In
> > > > > my reading of the draft this seems not been assumed, it is assumed
> > > > > that the session remains
> > > the
> > > > same. So, I think that main Mirja's concern is that it won't work
> > > > (at
> > > least with
> > > > TCP).
> > > >
> > > > My sense is similar; if I could attempt to summarize Mirja's stance,
> > > it's that
> > > > we're invoking a transport-level feature that does its own
> > > > retransmit
> > > and
> > > > backoff, but then if the transport comes back and says "the peer is
> > > gone", we
> > > > say "but we're under attack, so I don't believe you; try again".
> > > > This kicks of another independent set of "retransmits" (I know it's
> > > > not technically the right word) with a fresh exponential backoff.
> > > > There's
> > > two
> > > > complaints about this: (1) we're changing the transport, since if
> > > > the
> > > transport
> > > > concludes the peer is gone then the transport "normally" tears down
> > > > the connection (*) entirely, and (2) the assembly of (exponential
> > > > backoff
> > > 1),
> > > > (exponential backoff 2), (exponential backoff 2) is strange pacing,
> > > > and
> > > might
> > > > be better served by a similar number of "retransmits" but with
> > > > different pacing, since the long delay at the end of each backoff
> > > > period is not
> > > expected
> > > > to add a huge amount of value in terms of letting congestion ease
> > > > during attack time, and we would be just as well served by capping
> > > > the delay between retransmits and having more retransmits.
> > > >
> > > > The asterisk on (1) is of course because, as is noted later in the
> > > thread, only
> > > > TCP tears down the association when it concludes the peer is gone
> > > (assuming
> > > > I'm reading the right parts of 7252).  Quoting 7252:
> > > >
> > > >                                                         If the
> > > >    retransmission counter reaches MAX_RETRANSMIT on a timeout, or if
> > the
> > > >    endpoint receives a Reset message, then the attempt to transmit
> the
> > > >    message is canceled and the application process informed of
> failure.
> > > >    On the other hand, if the endpoint receives an acknowledgement in
> > > >    time, transmission is considered successful.
> > > >
> > > > So all CoAP does is to tell the application "that request didn't
> > > > work",
> > > but CoAP
> > > > is happy to try additional requests on the connection; the teardown
> > > logic is
> > > > indeed left up to the application.
> > > >
> > > > I'm not sure that we've seen much discussion about (2), though
> > > > (sorry if
> > > I
> > > > missed it) -- why is the repeated backoff-and-restart the right
> > > > pacing
> > > for this
> > > > purpose?
> > > >
> > > > -Ben
> > > >
> > > > > I didn't participate in the WG discussion on this, so I don't know
> > > > > what was discussed regarding this issue. If it was discussed and
> > > > > the WG has come to conclusion that this is not an issue, then I
> > > > > believe more text should be added to the draft so, that people
> > > > > like Mirja, who
> > > > didn't participate in the discussion, don't have any concerns while
> > > reading the
> > > > draft.
> > > > >
> > > > > Regards,
> > > > > Valery.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: mohamed.boucadair@orange.com
> > > > > > <mohamed.boucadair@orange.com>
> > > > > > Sent: Friday, July 19, 2019 9:57 AM
> > > > > > To: Benjamin Kaduk (kaduk@mit.edu) <kaduk@mit.edu>; dots-
> > > > > > chairs@ietf.org; dots@ietf.org
> > > > > > Subject: Mirja's DISCUSS: Pending Point (AD Help Needed)
> > > > > >
> > > > > > Hi Ben, chairs, all,
> > > > > >
> > > > > > (restricting the discussion to the AD/chairs/WG)
> > > > > >
> > > > > > * Status:
> > > > > >
> > > > > > All DISCUSS points from Mirja's review were fixed, except the
> > > > > > one discussed in this message.
> > > > > >
> > > > > > * Pending Point:
> > > > > >
> > > > > > Rather than going into much details, I consider the following as
> > > > > > the summary of the remaining DISCUSS point from Mirja:
> > > > > >
> > > > > > > I believe there are flaws in the design. First it’s a layer
> > > > > > > violation, but if more an idealistic concern but usually
> > > > > > > designing in layers is a good approach. But more importantly,
> > > > > > > you end up with un-frequent messages which may still terminate
> > > > > > > the connection at some point, while what you want is to simply
> > > > > > > send messages frequently in an unreliable fashion but a low
> > > > > > > rate until the
> > > attack is over.
> > > > > >
> > > > > > * Discussion:
> > > > > >
> > > > > > (1) First of all, let's remind that RFC7252 does not define how
> > > > > > CoAP ping must be used. It does only say:
> > > > > >
> > > > > > ==
> > > > > >       Provoking a Reset
> > > > > >       message (e.g., by sending an Empty Confirmable message) is
> > > also
> > > > > >       useful as an inexpensive check of the liveness of an
> endpoint
> > > > > >       ("CoAP ping").
> > > > > > ==
> > > > > >
> > > > > > How the liveness is assessed is left to applications. So, there
> > > > > > is
> > > > > > ** no layer violation **.
> > > > > >
> > > > > > (2) What we need isn't (text from Mirja):
> > > > > >
> > > > > > > to simply send messages frequently in an unreliable fashion
> > > > > > > but a low rate until the attack is over "
> > > > > >
> > > > > > It is actually the other way around. The spec says:
> > > > > >
> > > > > >   "... This is particularly useful for DOTS
> > > > > >    servers that might want to reduce heartbeat frequency or
> cease
> > > > > >    heartbeat exchanges when an active DOTS client has not
> requested
> > > > > >    mitigation."
> > > > > >
> > > > > > What we want can be formalized as:
> > > > > >  - Taking into account DDoS traffic conditions, a check to
> > > > > > assess the liveness of the peer DOTS agent + maintain NAT/FW
> > > > > > state on on-
> > > path
> > > > devices.
> > > > > >
> > > > > > An much more elaborated version is documented in SIG-004 of RFC
> > > 8612.
> > > > > >
> > > > > > * My analysis:
> > > > > >
> > > > > > - The intended functionality is naturally provided by existing
> > > > > > CoAP
> > > > messages.
> > > > > > - Informed WG decision: The WG spent a lot of cycles when
> > > > > > specifying the current behavior to be meet the requirements set
> in
> > RFC8612.
> > > > > > - Why not an alternative design: We can always define messages
> > > > > > with duplicated functionality, but that is not a good design
> > > > > > approach especially when there is no evident benefit.
> > > > > > - The specification is not broken: it was implemented and
> tested.
> > > > > >
> > > > > > And a logistic comment: this issue fits IMHO under the
> > > > > > non-discuss criteria in
> > > > > > https://www.ietf.org/blog/discuss-criteria-iesg-
> > > review/#stand-
> > > > undisc.
> > > > > >
> > > > > > * What's Next?
> > > > > >
> > > > > > As an editor, I don't think a change is needed but I'd like to
> > > > > > hear from Ben, chairs, and the WG.
> > > > > >
> > > > > > Please share your thoughts and whether you agree/disagree with
> > > > > > the above analysis.
> > > > > >
> > > > > > Cheers,
> > > > > > Med
> > > > >
> > > >
> > > > _______________________________________________
> > > > Dots mailing list
> > > > Dots@ietf.org
> > > > https://www.ietf.org/mailman/listinfo/dots