Re: [btns] Q: How to deal with connection latch breaks?

Nicolas Williams <Nicolas.Williams@sun.com> Fri, 07 August 2009 06:25 UTC

Return-Path: <Nicolas.Williams@sun.com>
X-Original-To: btns@core3.amsl.com
Delivered-To: btns@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0C54A3A6AAB for <btns@core3.amsl.com>; Thu, 6 Aug 2009 23:25:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.79
X-Spam-Level:
X-Spam-Status: No, score=-5.79 tagged_above=-999 required=5 tests=[AWL=0.256, BAYES_00=-2.599, HELO_MISMATCH_COM=0.553, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YSSyWPsyJ-ZJ for <btns@core3.amsl.com>; Thu, 6 Aug 2009 23:25:50 -0700 (PDT)
Received: from sca-ea-mail-1.sun.com (sca-ea-mail-1.Sun.COM [192.18.43.24]) by core3.amsl.com (Postfix) with ESMTP id CCF8F3A6863 for <btns@ietf.org>; Thu, 6 Aug 2009 23:25:45 -0700 (PDT)
Received: from dm-central-02.central.sun.com ([129.147.62.5]) by sca-ea-mail-1.sun.com (8.13.7+Sun/8.12.9) with ESMTP id n776Pk08023287 for <btns@ietf.org>; Fri, 7 Aug 2009 06:25:46 GMT
Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by dm-central-02.central.sun.com (8.13.8+Sun/8.13.8/ENSMAIL, v2.2) with ESMTP id n776PkxU044435 for <btns@ietf.org>; Fri, 7 Aug 2009 00:25:46 -0600 (MDT)
Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.14.3+Sun/8.14.3) with ESMTP id n776FDGA011220; Fri, 7 Aug 2009 01:15:13 -0500 (CDT)
Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.14.3+Sun/8.14.3/Submit) id n776FC4B011219; Fri, 7 Aug 2009 01:15:12 -0500 (CDT)
X-Authentication-Warning: binky.Central.Sun.COM: nw141292 set sender to Nicolas.Williams@sun.com using -f
Date: Fri, 07 Aug 2009 01:15:12 -0500
From: Nicolas Williams <Nicolas.Williams@sun.com>
To: btns@ietf.org
Message-ID: <20090807061512.GG1035@Sun.COM>
References: <5D4C4002AC7C3340987C37B1E4B66A54029A9076@SACMVEXC2-PRD.hq.netapp.com> <20090806215107.GB10982@Sun.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20090624201707.GY1308@Sun.COM>
User-Agent: Mutt/1.5.7i
Cc: "Eisler, Michael" <Michael.Eisler@netapp.com>
Subject: Re: [btns] Q: How to deal with connection latch breaks?
X-BeenThere: btns@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Better-Than-Nothing-Security Working Group discussion list <btns.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/btns>, <mailto:btns-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/btns>
List-Post: <mailto:btns@ietf.org>
List-Help: <mailto:btns-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/btns>, <mailto:btns-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Aug 2009 06:25:51 -0000

Michael Richardson, Mike Eisler and I have discussed this issue in
e-mail and on the phone and we have come to a consensus.

We believe that the purist approach to latch breaks in the absence of
new APIs (or their non-use) is to treat the situation as packet loss.
However, the pragmatic default is to treat latch breaks as a reset (in
the TCP case) or ICMP destination unreachable (in the SCTP case, when
there are multi-homed end-points, so that path failover may take place
sooner).

The pragmatic approach allows an on-path DoS, but not an off-path DoS.
This seems acceptable given that an on-path attacker that can pull off a
connection reset attack by causing a latch break can also cause bits to
stop moving in the "purist" alternative anyways.  I.e., there's an
on-path DoS attack not matter what, and resetting the connection
(or causing failover to another path) sooner seems better than forcing
the ULP or the application to timeout first.

So we will say that implementors SHOULD provide APIs and MUST choose a
default behavior in case of latch breaks when no APIs are available (or
can't be used), and we list the set of behaviors that implementors may
choose from.  We also specify a default behavior: the "pragmatic"
one described above.  Finally, we combine the relevant text from
sections 5.1 and 5.4 into a single new section.

The new text, to replace the last paragraph of sections 5.1 and 5.4:

   See section 5.5.

New section 5.5 text (modulo xml2rfc formatting):

5.5 Handling of BROKEN state for TCP and SCTP

   There are several ways to handle connection latch transitions to the
   BROKEN state in the case of connection-oriented ULPs like TCP or
   SCTP:

   a) Wait for a possible future transition back to the ESTABLISHED
      state, until which time the ULP will not move data between the two
      end-points of the connection.  ULP and application timeout
      mechanisms will, of course, trigger in the event of too lengthy a
      stay in the BROKEN state.  SCTP can detect these timeouts and
      initiate failover, in the case of multi-homed associations.

   b) Act as though the connection has been reset (RST message
      received, in TCP, or ABORT message received, in SCTP).

   c) Act as though an ICMP destination unreachable message had been
      received (in SCTP such messages can trigger path failover in the
      case of multi-homed associations).

   Implementors SHOULD provide APIs for either informing applications
   (asynchronously or otherwise) of latch breaks, so that they may
   choose a disposition (wait, close, or proceed with path failover), or
   by which applications can select a specific disposition a priori
   (before a latch break happens).

   Implementors MUST provide a default disposition in the event of a
   connection latch break.  Though (a) is clearly the purist default, we
   RECOMMEND (b) for TCP and SCTP associations where only a single path
   remains (one 5-tuple), and (c) for multi-homed SCTP associations.
   The rationale for this recommendation is as follows: a conflicting SA
   most likely indicates that the original peer is gone and has been
   replaced by another, and it's not likely that the original peer will
   return, thus failing faster seems reasonable.

   Note that our recommended default behavior does not create off-path
   reset denial-of-service (DoS) attacks: to break a connection latch an
   attacker would first have to successfully establish an SA, with one
   of the connection's end-points, that conflicts with the connection
   latch, and that requires multiple messages to be exchanged between
   that end-point and the attacker.  Unless the attacker's chosen victim
   end-point allows the attacker to claim IP address ranges for its SAs
   then the attacker would have to actually take over the other
   end-point's addresses, which rules out off-path attacks.

Comments?

Nico
--