Re: FSM changes for the Draft-15
Susan Hares <skh@nexthop.com> Mon, 12 November 2001 16:16 UTC
Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id LAA03808 for <idr-archive@nic.merit.edu>; Mon, 12 Nov 2001 11:16:58 -0500 (EST)
Received: by trapdoor.merit.edu (Postfix) id 56EEE912F8; Mon, 12 Nov 2001 11:10:00 -0500 (EST)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id CC632912F0; Mon, 12 Nov 2001 11:09:38 -0500 (EST)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 3170C912F1 for <idr@trapdoor.merit.edu>; Mon, 12 Nov 2001 11:09:16 -0500 (EST)
Received: by segue.merit.edu (Postfix) id 155E25DD9C; Mon, 12 Nov 2001 11:09:16 -0500 (EST)
Delivered-To: idr@merit.edu
Received: from presque.djinesys.com (presque.djinesys.com [198.108.88.2]) by segue.merit.edu (Postfix) with ESMTP id EDA765DD8F for <idr@merit.edu>; Mon, 12 Nov 2001 11:09:15 -0500 (EST)
Received: from skh.nexthop.com (gateway3bo.networktwo.net [206.88.0.53]) by presque.djinesys.com (8.11.3/8.11.1) with ESMTP id fACG4qE13193; Mon, 12 Nov 2001 11:04:52 -0500 (EST) (envelope-from skh@nexthop.com)
Message-Id: <5.0.0.25.0.20011112091905.035a01d0@mail.nexthop.com>
X-Sender: skh@mail.nexthop.com
X-Mailer: QUALCOMM Windows Eudora Version 5.0
Date: Mon, 12 Nov 2001 10:19:15 -0500
To: Enke Chen <enke@redback.com>
From: Susan Hares <skh@nexthop.com>
Subject: Re: FSM changes for the Draft-15
Cc: Susan Hares <skh@nexthop.com>, idr@merit.edu, jeffrey Haas <jhaas@nexthop.com>, Yakov Rekhter <yakov@juniper.net>, enke@redback.com
In-Reply-To: <20011108055056.D41B47E6C1@popserv3.redback.com>
References: <Message from Susan Hares <skh@nexthop.com> <5.0.0.25.0.20011107162314.01d39868@mail.nexthop.com>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="=====================_9674761==_.ALT"
X-ECS-MailScanner: Found to be clean
Sender: owner-idr@merit.edu
Precedence: bulk
Enke: Perhaps you would care to review the additional documents I sent for the specifics of the states and the actions. I included two additional bodies of work for you to answer the question of what's broken: 1) red-line text for the actual text. - this shows where the text was changed 2) State machine table with actions. I have repeated the highlights here. To summarize: 1) By your own admission and John Scudders, the state machine text needed work. Your comment was to remove it because fixing was too difficult. 2) I've not added a new state, I've encode the two sub-states in idle to two full states. The mixing of the idle states does not allow the operator to know what's really happening by MIB or SNMP query of the state of the connection. If a machine is in "hold down" due to an automatic condition, the state machine should tell you -- not leave the operator guessing what part of Idle state. 3) Why is the question raised now > > I would like to first ask the working group if they want to remove the > > state machine or take fixes to the State machine. I really think we > > should take fixes so we do not have interoperability issues. > >Not sure why this question is raised as Draft-15 contains a section on >"BGP Finite State Machine" (Section 8), and no one has suggested removing >it in the previous or the current Last Call. As I indicated last August, I felt the state machine should be fixed. As a response to the current Last Call, I submitted alternative text fixing the text in the state machine text. >Could you what exactly are the problems with the current text of >Section 8? As you and John Scudder agreed in August, the text is unclear. The state machine had 23 actions/events allowed with 6 states, one of which (Idle) had two sub-states. The Idle sub-state text waved its hands at the questions of what should happen between automatic versus manual operator intervention. Problems and changes: 1) I simply cleaned up the text by adding something each implementation with the ability to "automatically" terminate the connections would require. The addition of the "Idle Hold" state provides the operator with a clean way to determine which portion of the Idle state they are in. 2) encode the concept of "manual" left by the Idle text below 3) fixed a problem in the Connect state where the Hold time was not being set during the "transport connection succeeds" function prior to going to Open Sent. It used to say: If the transport protocol connection succeeds, the local system clears the ConnectRetry timer, completes initialization, sends an OPEN message to its peer, and changes its state to OpenSent. Fix: If the transport connection succeeds, the local system: -clears the ConnectRetry timer, - completes initialization, - send an Open message to its peer, - set Hold timer to a large value, and - changes its state to Open Sent. 4) Made a special exception on Open messages with the version error to allow this to by-pass the automatic hold down in Idle Hold state. It is my understanding that current implementations treat the version errors in a slightly different way. Please comment on this action. 5) Allow for "manual" intervention - operators and others to get a connection out of the Idle state. The manual intervention can be via SNMP MIB or CLI. This manual intervention is implied, but unclear in the earlier IDLE text. The text in the Established state of: In response to the Stop event (initiated by either system or operator), the local system sends a NOTIFICATION message with Error Code Cease and changes its state to Idle. Was changed to: In response to the stop event initiated by the system (automatic), the local system: - sends a NOTFICATION with Cease, - sets IdleHoldTimer = 2**(ConnectRetryCnt)*60 [formula in Idle text] - sets connect retry timer to zero, - drops the TCP connection, - releases all BGP resources, - goes to IdleHold state, and - deletes all routes. An example automatic stop event is exceeding the number of prefixes for a given peer and the local system automatically disconnecting the peer. In response to a stop event initiated by an operator: - release all resources (including deleting all routes), - set ConnectRetryCnt to zero (0), - set connect retry timer to zero (0), and - transition to the Idle. 6) Edit into each state the specific actions indicated in the "idle text" : The draft-15 section 8 has text in Idle state of: If a BGP speaker detects an error, it shuts down the connection and changes its state to Idle. Getting out of the Idle state requires generation of the Start event. If such an event is generated automatically, then persistent BGP errors may result in persistent flapping of the speaker. To avoid such a condition it is recommended that Start events should not be generated immediately for a peer that was previously transitioned to Idle due to an error. For a peer that was previously transitioned to Idle due to an error, the time between consecutive generation of Start events, if such events are generated automatically, shall exponentially increase. The value of the initial timer shall be 60 seconds. The time shall be doubled for each consecutive retry. An implementation MAY impose a configurable upper bound on that time. Once the upper bound is reached, the speaker shall no longer automatically generate the Start event for the peer. Why now: ----------------- >I failed to see why we need the new states (which by the way would make all >the depolyed implementations non compliant). This standard is at "Draft" and we are entertaining implementation comments on the rest of the draft to match current practice. The actual operation of the hold down feature idle state in automatic or manual mode is unclear. The operators have no way to determine what portion of the "Idle" state the connection is in. The concept of specifying that once the upper bound has been reached, the speaker shall no longer automatically generate the Start event for the peer indicates a state within a state. Any implementation support 'automatic' transitions must "guess" at whether this state has been reached. Let me know if you have additional questions, Sue Hares
- Re: BGP MIB work Susan Hares
- Re: BGP MIB work Enke Chen
- BGP MIB work Susan Hares
- Re: FSM changes for the Draft-15 Alex Zinin
- Re: FSM changes for the Draft-15 Jeffrey Haas
- Re: FSM changes for the Draft-15 Jeffrey Haas
- Re: FSM changes for the Draft-15 andrewl
- Re: FSM changes for the Draft-15 Susan Hares
- Re: FSM changes for the Draft-15 Susan Hares
- Re: FSM changes for the Draft-15 Susan Hares
- Re: FSM changes for the Draft-15 Susan Hares
- Re: FSM changes for the Draft-15 Susan Hares
- Re: FSM changes for the Draft-15 Alex Zinin
- Re: FSM changes for the Draft-15 Alex Zinin
- Re: FSM changes for the Draft-15 andrewl
- Re: FSM changes for the Draft-15 Edward Crabbe
- Re: FSM changes for the Draft-15 Antal Sasvari
- Re: FSM changes for the Draft-15 Eric Gray
- Re: FSM changes for the Draft-15 David Ball
- Re: FSM changes for the Draft-15 Enke Chen
- Re: FSM changes for the Draft-15 Ben Black
- Re: FSM changes for the Draft-15 Yakov Rekhter
- Re: FSM changes for the Draft-15 Randy Bush
- FSM changes for the Draft-15 Susan Hares
- Re: AS-wide Unique BGP Identifier Enke Chen
- Re: AS-wide Unique BGP Identifier Enke Chen
- Re: IDR WG Last Call Susan Hares
- Re: IDR WG Last Call Enke Chen
- Re: IDR WG Last Call Susan Hares
- Re: IDR WG Last Call Jeffrey Haas
- Re: IDR WG Last Call Russ White
- Re: IDR WG Last Call Enke Chen
- Re: IDR WG Last Call Jeffrey Haas
- Re: IDR WG Last Call Enke Chen
- Re: IDR WG Last Call Enke Chen
- Re: IDR WG Last Call Enke Chen