Re: BFD State Machine

Chris Nogradi <cnogradi@laurelnetworks.com> Thu, 03 March 2005 19:09 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA02618; Thu, 3 Mar 2005 14:09:17 -0500 (EST)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D6vif-0000mz-W9; Thu, 03 Mar 2005 14:10:46 -0500
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D6vfz-0008In-L9; Thu, 03 Mar 2005 14:07:59 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D6vfy-0008Ib-1u for rtg-bfd@megatron.ietf.org; Thu, 03 Mar 2005 14:07:58 -0500
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA02487 for <rtg-bfd@ietf.org>; Thu, 3 Mar 2005 14:07:56 -0500 (EST)
Received: from paperclip.laurelnetworks.com ([63.94.127.69] ident=nobody) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D6vhN-0000kp-NQ for rtg-bfd@ietf.org; Thu, 03 Mar 2005 14:09:26 -0500
Received: from notepad.laurelnetworks.com (notepad.laurelnetworks.com [63.94.127.20]) by paperclip.laurelnetworks.com (Laurel/Laurel) with ESMTP id j23J7uU4012478; Thu, 3 Mar 2005 14:07:56 -0500
Received: from cnogradi-linux.dhcp.pit.laurelnetworks.com (cnogradi-linux.dhcp.pit.laurelnetworks.com [10.0.19.158]) by notepad.laurelnetworks.com (Laurel/Laurel) with ESMTP id j23J7ttL027139; Thu, 3 Mar 2005 14:07:56 -0500
From: Chris Nogradi <cnogradi@laurelnetworks.com>
Organization: Laurel Networks
To: Dave Katz <dkatz@juniper.net>
Date: Thu, 03 Mar 2005 14:07:24 -0500
User-Agent: KMail/1.6.2
References: <0036571b21c6f5db0c588591b0426735@juniper.net>
In-Reply-To: <0036571b21c6f5db0c588591b0426735@juniper.net>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200503031407.24669.cnogradi@laurelnetworks.com>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: ff03b0075c3fc728d7d60a15b4ee1ad2
Content-Transfer-Encoding: 7bit
Cc: "'rtg-bfd@ietf.org'" <rtg-bfd@ietf.org>
Subject: Re: BFD State Machine
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
Sender: rtg-bfd-bounces@ietf.org
Errors-To: rtg-bfd-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 2086112c730e13d5955355df27e3074b
Content-Transfer-Encoding: 7bit

Dave,

Since all implementations of BFD that I have observed use a one packet per 
second sedate rate, I never noticed the fact that the draft specified this as 
the MINIMUM interval.   If fact when you described a possible solution for 
the deadlock, requiring the Desired TX and Required RX to match during 
negotiations, the thought occurred to me that you were suggesting that the 
Required RX value should be used during negotiations. For instance, suppose 
that an implementation uses a one second sedate rate but the peer 
implementation is advertising a three second Required RX value. Should this 
Required RX value be used instead of the one second interval during 
negotiations? I suppose that since the peer's Desired Tx/Multiplier pair is 
used to determine the detection time used during the negotiation while in the 
init state, the Required RX time should also be honored during this time.  I 
guess the confusion arises from the fact that when the session is being 
negotiated, the time values are used without requiring a poll sequence 
(poll/final).  However, when the session is up, timer value changes must be 
communicated using a poll sequence (poll/final).

BTW, assuming that an original draft implementation uses a fixed sedate rate, 
it would seem that it would not run into the deadlock problem unless it has 
to interoperate with an implementation using a different rate.  Is this true?

Any clarification in these matters is much appreciated,

Thanks,

Chris
   
On Wednesday 02 March 2005 15:42, Dave Katz wrote:
> Richard Spencer's comments made me put on my thinking cap for awhile,
> and I've come to the conclusion that the BFD state machine as specified
> is flawed.  It has (at least) the following issues:
>
>    Sessions could come up faster
>
>    Unidirectional failures require a timeout of INIT state to restore
> service
>
>    Service restoration is probabilistic
>
>    Dissimilar timer values during session establishment can cause
> permanent deadlock
>
> The first is a little unhappy, but not too bad.  The second and third
> are ugly, but not fatal;  the session will come up eventually.  The
> fourth is fatal as currently spec'ed--if one side has a transmit
> interval of one second and the other has a transmit interval of five
> seconds, the session will enter the deadlock loop that Richard pointed
> out.  This is fixable by requiring that the same value be sent for
> Desired TX and Required RX while not in Up state, but that's ugly too.
>
> The heart of the problem is this--there are two symmetric wait states
> in the state machine, but there is insufficient state passed in the
> protocol (only one bit) to communicate what's happening.  The IHU bit
> tells you only that the other system is in one of two states.  The
> symmetry in the protocol, combined with the state ambiguity, causes the
> deadlock opportunity.
>
> After some hemming and hawing, Dave2 and I have decided that we should
> bite the bullet and change the protocol.  This will require an
> incompatible change and a bump of the version number.  The change is
> fairly simple;  rather than communicate a single bit of state
> information (IHU), instead communicate the current state.
>
> As we move an extra bit of state into the protocol, this allows us to
> remove a state from the state machine.  The new scheme fixes all of the
> above problems;  as a matter of fact the unidrectional failure case
> ends up taking one less packet to come up than the bidirectional
> failure case.  Since the state is carried in the protocol, every state
> change causes a change in packet contents, which in turn triggers an
> "extra" packet to be transmitted, so the sessions can come up rapidly,
> but safely, and in as deterministic a manner as is possible in
> networking.  It also takes one less packet in each direction to come
> up.
>
> Given that this document hasn't even made proposed standard yet, and
> given that, as far as I know, I'm the only one with deployed code on
> which customers rely, I am not going to attempt to document any kind of
> interoperability or version negotiation scheme.  Version 0 should
> simply be discarded and replaced with Version 1.
>
> I'm going to spin the document, which will appear after the IETF
> meeting (as the I-D sluice is blocked until after the meeting.)  Dave2
> will make a presentation at the meeting describing the changes.
>
> Below is the new state machine.  It is pretty straightforward;  the
> notations on the arcs are the neighbor's state as signalled in received
> BFD packets (plus the detection timer expiration.)  I think this is
> much more transparent than the last one in terms of being able to
> intuit the mechanism and its side effects.  This should be sufficient
> for the motivated to analyze and pick apart.  Comments and brickbats to
> the list, please.
>
> All this goes to show that protocol design is a subtle art...
>
> --Dave
>
>                                   +--+
>
>                                   |  | UP
>                                   |
>                                   |  V
>
>                           DOWN  +------+  INIT
>                    +------------|      |------------+
>
>                    |            | DOWN |            |
>                    |
>                    |  +-------->|      |<--------+  |
>                    |
>                    |  |         +------+         |  |
>                    |  |
>                    |  |
>                    |  |                     DOWN,|  |
>                    |  |TIMER                TIMER|  |
>
>                    V  |                          |  V
>                  +------+                      +------+
>             +----|      |                      |      |----+
>         DOWN|    | INIT |--------------------->|  UP  |    |INIT, UP
>             +--->|      | INIT, UP             |      |<---+
>                  +------+                      +------+