Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)

Benjamin Kaduk <kaduk@mit.edu> Sun, 21 July 2019 04:05 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 37F831200D8; Sat, 20 Jul 2019 21:05:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9neO7z1iAZKa; Sat, 20 Jul 2019 21:05:27 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1FF081200C4; Sat, 20 Jul 2019 21:05:26 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x6L45LYe023428 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 21 Jul 2019 00:05:23 -0400
Date: Sat, 20 Jul 2019 23:05:20 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Valery Smyslov <valery@smyslov.net>
Cc: mohamed.boucadair@orange.com, dots-chairs@ietf.org, dots@ietf.org
Message-ID: <20190721040520.GS23137@kduck.mit.edu>
References: <787AE7BB302AE849A7480A190F8B93302FA841A9@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <00c201d53e27$194cfc20$4be6f460$@smyslov.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <00c201d53e27$194cfc20$4be6f460$@smyslov.net>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/mv9mO1yJj7UxMSeKAdwFRs5qvmk>
Subject: Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Jul 2019 04:05:29 -0000

Hi Valery,

On Fri, Jul 19, 2019 at 02:42:50PM +0300, Valery Smyslov wrote:
> Hi Med,
> 
> I believe Mirja's main point was that if you use liveness check mechanism in the transport layer,
> then if it reports that liveness check fails, then it _also_ closes the transport session.
> 
> Quotes from her emails:
> "Yes, as Coap Ping is used, the agent should not only conclude that the DOTS signal session is disconnected but also the Coap session and not send any further Coap messages anymore."
> 
> and
> 
> "Actually to my understanding this will not work. Both TCP heartbeat and Coap Ping are transmitted reliably. If you don’t receive an ack for these transmissions you are not able to send any additional messages and can only close the connection."
> 
> I'm not familiar with CoAP, but I suspect she's right about TCP - if TCP layer itself doesn't receive ACK
> for the sent data after several retransmissions, the connection is closed.

Thanks for this crisp summary (and thanks Med for the detailed writeup as
well)!

> As far as I understand the current draft allows underlying liveness check to fail and has a parameter
> to restart this check several times if this happens. It seems that a new transport session will be 
> created in this case (at least if TCP is used). In my reading of the draft this seems not been assumed,
> it is assumed that the session remains the same. So, I think that main Mirja's concern is that it won't
> work (at least with TCP).

My sense is similar; if I could attempt to summarize Mirja's stance, it's
that we're invoking a transport-level feature that does its own retransmit
and backoff, but then if the transport comes back and says "the peer is
gone", we say "but we're under attack, so I don't believe you; try again".
This kicks of another independent set of "retransmits" (I know it's not
technically the right word) with a fresh exponential backoff.  There's two
complaints about this: (1) we're changing the transport, since if the
transport concludes the peer is gone then the transport "normally" tears down
the connection (*) entirely, and (2) the assembly of (exponential backoff
1), (exponential backoff 2), (exponential backoff 2) is strange pacing, and
might be better served by a similar number of "retransmits" but with
different pacing, since the long delay at the end of each backoff period is
not expected to add a huge amount of value in terms of letting congestion
ease during attack time, and we would be just as well served by capping the
delay between retransmits and having more retransmits.

The asterisk on (1) is of course because, as is noted later in the thread,
only TCP tears down the association when it concludes the peer is gone
(assuming I'm reading the right parts of 7252).  Quoting 7252:

                                                        If the
   retransmission counter reaches MAX_RETRANSMIT on a timeout, or if the
   endpoint receives a Reset message, then the attempt to transmit the
   message is canceled and the application process informed of failure.
   On the other hand, if the endpoint receives an acknowledgement in
   time, transmission is considered successful.

So all CoAP does is to tell the application "that request didn't work", but
CoAP is happy to try additional requests on the connection; the teardown
logic is indeed left up to the application.

I'm not sure that we've seen much discussion about (2), though (sorry if I
missed it) -- why is the repeated backoff-and-restart the right pacing for
this purpose?

-Ben

> I didn't participate in the WG discussion on this, so I don't know what was discussed
> regarding this issue. If it was discussed and the WG has come to conclusion that this is 
> not an issue, then I believe more text should be added to the draft so, that people
> like Mirja, who didn't participate in the discussion, don't have any concerns while reading the draft.
> 
> Regards,
> Valery.
> 
> 
> 
> 
> > -----Original Message-----
> > From: mohamed.boucadair@orange.com
> > <mohamed.boucadair@orange.com>
> > Sent: Friday, July 19, 2019 9:57 AM
> > To: Benjamin Kaduk (kaduk@mit.edu) <kaduk@mit.edu>; dots-
> > chairs@ietf.org; dots@ietf.org
> > Subject: Mirja's DISCUSS: Pending Point (AD Help Needed)
> > 
> > Hi Ben, chairs, all,
> > 
> > (restricting the discussion to the AD/chairs/WG)
> > 
> > * Status:
> > 
> > All DISCUSS points from Mirja's review were fixed, except the one discussed in
> > this message.
> > 
> > * Pending Point:
> > 
> > Rather than going into much details, I consider the following as the summary
> > of the remaining DISCUSS point from Mirja:
> > 
> > > I believe there are flaws in the design. First it’s a layer violation,
> > > but if more an idealistic concern but usually designing in layers is a
> > > good approach. But more importantly, you end up with un-frequent
> > > messages which may still terminate the connection at some point, while
> > > what you want is to simply send messages frequently in an unreliable
> > > fashion but a low rate until the attack is over.
> > 
> > * Discussion:
> > 
> > (1) First of all, let's remind that RFC7252 does not define how CoAP ping must
> > be used. It does only say:
> > 
> > ==
> >       Provoking a Reset
> >       message (e.g., by sending an Empty Confirmable message) is also
> >       useful as an inexpensive check of the liveness of an endpoint
> >       ("CoAP ping").
> > ==
> > 
> > How the liveness is assessed is left to applications. So, there is ** no layer
> > violation **.
> > 
> > (2) What we need isn't (text from Mirja):
> > 
> > > to simply send messages frequently in an unreliable fashion but a low
> > > rate until the attack is over "
> > 
> > It is actually the other way around. The spec says:
> > 
> >   "... This is particularly useful for DOTS
> >    servers that might want to reduce heartbeat frequency or cease
> >    heartbeat exchanges when an active DOTS client has not requested
> >    mitigation."
> > 
> > What we want can be formalized as:
> >  - Taking into account DDoS traffic conditions, a check to assess the liveness of
> > the peer DOTS agent + maintain NAT/FW state on on-path devices.
> > 
> > An much more elaborated version is documented in SIG-004 of RFC 8612.
> > 
> > * My analysis:
> > 
> > - The intended functionality is naturally provided by existing CoAP messages.
> > - Informed WG decision: The WG spent a lot of cycles when specifying the
> > current behavior to be meet the requirements set in RFC8612.
> > - Why not an alternative design: We can always define messages with
> > duplicated functionality, but that is not a good design approach especially
> > when there is no evident benefit.
> > - The specification is not broken: it was implemented and tested.
> > 
> > And a logistic comment: this issue fits IMHO under the non-discuss criteria in
> > https://www.ietf.org/blog/discuss-criteria-iesg-review/#stand-undisc.
> > 
> > * What's Next?
> > 
> > As an editor, I don't think a change is needed but I'd like to hear from Ben,
> > chairs, and the WG.
> > 
> > Please share your thoughts and whether you agree/disagree with the above
> > analysis.
> > 
> > Cheers,
> > Med
>