BFD State Machine

Dave Katz <dkatz@juniper.net> Wed, 02 March 2005 20:43 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA24636; Wed, 2 Mar 2005 15:43:23 -0500 (EST)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D6ai1-0002CB-GP; Wed, 02 Mar 2005 15:44:41 -0500
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D6agY-0007J8-3e; Wed, 02 Mar 2005 15:43:10 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D6agW-0007J3-5F for rtg-bfd@megatron.ietf.org; Wed, 02 Mar 2005 15:43:08 -0500
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA24611 for <rtg-bfd@ietf.org>; Wed, 2 Mar 2005 15:43:06 -0500 (EST)
Received: from colo-dns-ext1.juniper.net ([207.17.137.57]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D6ahk-0002BZ-8Z for rtg-bfd@ietf.org; Wed, 02 Mar 2005 15:44:24 -0500
Received: from merlot.juniper.net (merlot.juniper.net [172.17.27.10]) by colo-dns-ext1.juniper.net (8.11.3/8.9.3) with ESMTP id j22Kgx960556 for <rtg-bfd@ietf.org>; Wed, 2 Mar 2005 12:42:59 -0800 (PST) (envelope-from dkatz@juniper.net)
Received: from [172.16.12.139] (nimbus-sf.juniper.net [172.16.12.139]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id j22Kgre42691 for <rtg-bfd@ietf.org>; Wed, 2 Mar 2005 12:42:53 -0800 (PST) (envelope-from dkatz@juniper.net)
Mime-Version: 1.0 (Apple Message framework v619.2)
Content-Transfer-Encoding: 7bit
Message-Id: <0036571b21c6f5db0c588591b0426735@juniper.net>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
To: "'rtg-bfd@ietf.org'" <rtg-bfd@ietf.org>
From: Dave Katz <dkatz@juniper.net>
Date: Wed, 02 Mar 2005 13:42:53 -0700
X-Mailer: Apple Mail (2.619.2)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 73734d43604d52d23b3eba644a169745
Content-Transfer-Encoding: 7bit
Subject: BFD State Machine
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
Sender: rtg-bfd-bounces@ietf.org
Errors-To: rtg-bfd-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 10d3e4e3c32e363f129e380e644649be
Content-Transfer-Encoding: 7bit

Richard Spencer's comments made me put on my thinking cap for awhile, 
and I've come to the conclusion that the BFD state machine as specified 
is flawed.  It has (at least) the following issues:

   Sessions could come up faster

   Unidirectional failures require a timeout of INIT state to restore 
service

   Service restoration is probabilistic

   Dissimilar timer values during session establishment can cause 
permanent deadlock

The first is a little unhappy, but not too bad.  The second and third 
are ugly, but not fatal;  the session will come up eventually.  The 
fourth is fatal as currently spec'ed--if one side has a transmit 
interval of one second and the other has a transmit interval of five 
seconds, the session will enter the deadlock loop that Richard pointed 
out.  This is fixable by requiring that the same value be sent for 
Desired TX and Required RX while not in Up state, but that's ugly too.

The heart of the problem is this--there are two symmetric wait states 
in the state machine, but there is insufficient state passed in the 
protocol (only one bit) to communicate what's happening.  The IHU bit 
tells you only that the other system is in one of two states.  The 
symmetry in the protocol, combined with the state ambiguity, causes the 
deadlock opportunity.

After some hemming and hawing, Dave2 and I have decided that we should 
bite the bullet and change the protocol.  This will require an 
incompatible change and a bump of the version number.  The change is 
fairly simple;  rather than communicate a single bit of state 
information (IHU), instead communicate the current state.

As we move an extra bit of state into the protocol, this allows us to 
remove a state from the state machine.  The new scheme fixes all of the 
above problems;  as a matter of fact the unidrectional failure case 
ends up taking one less packet to come up than the bidirectional 
failure case.  Since the state is carried in the protocol, every state 
change causes a change in packet contents, which in turn triggers an 
"extra" packet to be transmitted, so the sessions can come up rapidly, 
but safely, and in as deterministic a manner as is possible in 
networking.  It also takes one less packet in each direction to come 
up.

Given that this document hasn't even made proposed standard yet, and 
given that, as far as I know, I'm the only one with deployed code on 
which customers rely, I am not going to attempt to document any kind of 
interoperability or version negotiation scheme.  Version 0 should 
simply be discarded and replaced with Version 1.

I'm going to spin the document, which will appear after the IETF 
meeting (as the I-D sluice is blocked until after the meeting.)  Dave2 
will make a presentation at the meeting describing the changes.

Below is the new state machine.  It is pretty straightforward;  the 
notations on the arcs are the neighbor's state as signalled in received 
BFD packets (plus the detection timer expiration.)  I think this is 
much more transparent than the last one in terms of being able to 
intuit the mechanism and its side effects.  This should be sufficient 
for the motivated to analyze and pick apart.  Comments and brickbats to 
the list, please.

All this goes to show that protocol design is a subtle art...

--Dave

                                  +--+
                                  |  | UP
                                  |  V
                          DOWN  +------+  INIT
                   +------------|      |------------+
                   |            | DOWN |            |
                   |  +-------->|      |<--------+  |
                   |  |         +------+         |  |
                   |  |                          |  |
                   |  |                          |  |
                   |  |                     DOWN,|  |
                   |  |TIMER                TIMER|  |
                   V  |                          |  V
                 +------+                      +------+
            +----|      |                      |      |----+
        DOWN|    | INIT |--------------------->|  UP  |    |INIT, UP
            +--->|      | INIT, UP             |      |<---+
                 +------+                      +------+