Comments on FSM
Alex Zinin <azinin@nexsi.com> Thu, 17 January 2002 00:31 UTC
Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id TAA21219 for <idr-archive@nic.merit.edu>; Wed, 16 Jan 2002 19:31:17 -0500 (EST)
Received: by trapdoor.merit.edu (Postfix) id 992ED912BE; Wed, 16 Jan 2002 19:30:10 -0500 (EST)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id 1FEFC912C2; Wed, 16 Jan 2002 19:30:09 -0500 (EST)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id E2B5E912BE for <idr@trapdoor.merit.edu>; Wed, 16 Jan 2002 19:30:07 -0500 (EST)
Received: by segue.merit.edu (Postfix) id E4C0A5DDA1; Wed, 16 Jan 2002 19:30:06 -0500 (EST)
Delivered-To: idr@merit.edu
Received: from relay1.nexsi.com (relay1.nexsi.com [66.35.205.133]) by segue.merit.edu (Postfix) with ESMTP id 0C95F5DDC6 for <idr@merit.edu>; Wed, 16 Jan 2002 19:30:05 -0500 (EST)
Received: from mail.nexsi.com (unknown [66.35.212.41]) by relay1.nexsi.com (Postfix) with ESMTP id A76F33F71; Wed, 16 Jan 2002 16:32:53 -0800 (PST)
Received: from khonsu.sw.nexsi.com ([172.17.212.34]) by mail.nexsi.com (8.9.3/8.9.3) with ESMTP id QAA11382; Wed, 16 Jan 2002 16:29:16 -0800
Date: Wed, 16 Jan 2002 16:29:16 -0800
From: Alex Zinin <azinin@nexsi.com>
X-Mailer: The Bat! (v1.51) Personal
Reply-To: Alex Zinin <azinin@nexsi.com>
Organization: Nexsi Systems
X-Priority: 3 (Normal)
Message-ID: <195204309992.20020116162916@nexsi.com>
To: idr@merit.edu
Cc: Susan Hares <skh@nexthop.com>
Subject: Comments on FSM
In-Reply-To: <5.0.0.25.0.20020116181115.03ea46f8@mail.nexthop.com>
References: <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <20020115140711.GA23937@opentransit.net> <20020114123700.C7761@nexthop.com> <200201141750.g0EHo3634958@merlot.juniper.net> <87advfjcqi.wl@vaio.zebra.org> <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <5.0.0.25.0.20020116181115.03ea46f8@mail.nexthop.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Sender: owner-idr@merit.edu
Precedence: bulk
Sue, Some comments on the FSM text below. I constrained myself to mostly editorial changes, as I'd prefer to see the FSM description in a different form (the one I'm working on). Also, I think that the connection collision related issue brought by Dennis is still not addressed. Frankly, I'm not sure how well this can be addressed if the spec continues to treat competing transport connections for the same peer as separate BGP sessions and FSMs... Thanks, -- Alex Zinin General notes: - there should be a separate section describing all session attributes (such as timers and flags) involved in FSM operation. - there should be a separate subsection where all events would be formally defined and named. Defined names should then be used in the text. - there should be a section specifically describing processing of incoming transport connections - I think IdleHold should go. (stet'ed the text for now) - Processing of manual Stop in Established lacked a NOTIFICATION message. - Thorough check of FSM text correctness is hard because events are not documented and one would need to parse the text and formalize the description to do so, i.e., pretty much what I'm doing with the other representation. - I think we need a state transition diagram (not table) in the text for better visualization (I'm going to have one). This won't be possible without proper event documentation. - NOTIFICATION codes and subcodes need verification. (some changes are in the text.) - Sessions "killed" due to connection collision should not go to Idle, but be destroyed. (Not addressed in the text below.) Below is corrected text of the section . You'll be able to do a "diff -u" or something against the original one to see line-by-line changes. 8. BGP Finite State machine. This section specifies BGP operation in terms of a Finite State Machine (FSM) of a peering session. Following is a brief summary and overview of BGP operations by state as determined by this FSM. /* a "brief summary" of FSM? There should be a complete one */ An instance of an FSM is created by a BGP speaker for each configured peer. It may also be created dynamically when an incoming transport connection is reported, there's no matching configured peer and the speaker is configured to accept such connections. FSMs created for configured peers are initially put in the Idle state. See section XXX for more information on incoming transport connection processing. Each FSM state with corresponding event processing rules is described below. Idle state: This is the initial state of the FSM. This state is also used to keep BGP sessions in the inactive state when necessary. In this state BGP refuses all incoming BGP connections from the peer. No resources are allocated to the session. /* Here, I assume that the IdleHold will be removed and * Idle be used for session hold down */ A manual start event is a start event initiated by an operator to initiate BGP session establishment. An automatic start event is a start event generated by the system. /* The above should go to the event description section */ In response to a Start event (manual or automatic), the following steps are performed: - Allocate resources for the session, - Start the ConnectRetry timer (see note below), - Initiate an outbound transport connection to the peer, - Start listening for a connection that may be initiated by the peer, - Transition to Connect. NOTE: The exact value of the ConnectRetry timer is a local matter, but it should be sufficiently large to allow TCP initialization. Any other event received in the IDLE state, is ignored. /* I have a problem with this. Until all events are properly * documented, saying "any other event" is inappropriate. */ Expiration Date July 2002 [Page 33] RFC DRAFT January 2002 IdleHold state: /* I think this state should go and Idle be used for the same * purpose with local mechanisms used to control how long * the session stays there... I'm skipping this state... */ The IdleHold state keeps the system in "Idle" mode until a certain time period has passed or an operator intervenes to manually restart the connection. This "IdleHold timeout" prevents persistent flapping of a BGP peering session. Upon entering the Idle Hold state, if the IdleHoldTimer exceeds the local limit the "Keep Idle" flag is set. Upon receiving a Manual start, the local system: - clears the IdleHoldtimer, - clears "keep Idle" flag - initializes all BGP resources, - starts the ConnectRetry timer, - initiates a transport connection to the other BGP peer, - listens for a connection that may be initiated by the remote BGPPeer, and - changes its state to connect. Upon receiving a IdleHoldtimer expired event, the local system checks to see that the Keep Idle flag is set. If the Keep Idle flag is set, the system stays in the "Idle Hold" state. If the Keep Idle flag is not set, the local system: - clears the IdleHoldtimer, - and transitions the state to Idle. Getting out of the IdleHoldstate requires either operator intervention via a manual start or the IdleHoldtimer to expire with the "Keep Idle" flag to be clear. Any other event received in the IdleHold state is ignored. Connect State: In this state, BGP has initiated an outbound transport connection and is waiting it to be completed. Expiration Date July 2002 [Page 34] RFC DRAFT January 2002 If the transport connection succeeds, the following steps are performed: - Clear the ConnectRetry timer, - Complete initialization /* What does this really mean spec-wise? Just remove ? */ - Send an Open message to the peer, - Set Hold timer to a large value (see note below) - Transition to OpenSent. /* Stop listening to the incoming connection here ? */ NOTE: A hold timer value of 4 minutes is suggested. If the transport protocol connection fails (e.g., retransmission timeout), the following steps are performed: - Restart the ConnectRetry timer, - Continue to listen for a connection that may be initiated by the peer, /* Remove this and specify when I should stop * listening? */ - Transition to Active state. In response to the ConnectRetry timer expired event, the local system performs the following steps: - Restart the ConnectRetry timer, - Initiate an outbound transport connection to the peer, - Continue to listen for a connection that may be initiated by the remote BGP peer, and /* Remove this and specify when I should stop * listening? */ - Stay in Connect state. The start event (manual or automatic) is ignored in the Connect state. In response to any other event (initiated by the system or operator), the following steps are executed: /* Again, the same comment about "any other event"...*/ - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, Expiration Date July 2002 [Page 35] RFC DRAFT January 2002 - Drop the transport connection, - Release resources allocated for the session, - Transition to IdleHoldstate /* Idle */ Active State: In this state BGP is not actively initiating an outbound transport connection, but is trying to acquire a peer by listening for and accepting an incoming one. If the local system does not allow BGP connections with unconfigured peers, then the local system rejects connections from IP addresses that are not configured peers, and remains in the Active state. If the transport connection succeeds, the following steps are performed: - Stop the ConnectRetry timer, - Complete the initialization, /* Remove this ?*/ - Send the Open message to the peer, - Set the Hold timer to a large value (see note below) - Transition to the OpenSent state. /* Stop listening here? */ NOTE: A Hold timer value of 4 minutes is suggested. In response the ConnectRetry timer expired event, the local system performs the following: - Restart the ConnectRetry timer, - Initiate an outbound transport connection to the peer, - Continue to listen for connection that may be initiated by the peer, /* Remove this ?*/ - Transition to Connect. The start events (initiated by the system or operator) are ignored in the Active state. Expiration Date July 2002 [Page 36] RFC DRAFT January 2002 In response to any other event (initiated by the system or operator), the following steps are taken: - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer - Drops the transport connection, - Releases resources allocated for the session - Transition to IdleHold state. /* Idle */ /* Stop listening here ?*/ OpenSent: In this state, the local system has sent out an OPEN message and awaits an OPEN message from the peer. When an OPEN message is received, all fields are checked for correctness. If the BGP message header checking or OPEN message check detects an error (see Section 6.2), or a connection collision (see Section 6.8) the following steps are performed: - Send a NOTIFICATION message /* Need code here */ - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer - Drop the transport connection, - Release resources allocated for the session, - Transition to the IdleHold state. /* Idle */ If there are no errors in the OPEN message, the local system does the following steps: - Send a KEEPALIVE message, - Start the KeepAlive timer (see note below) - Start the Hold timer according to the negotiated value (see section 4.2 and note below), - Transition to OpenConfirm. Expiration Date July 2002 [Page 37] RFC DRAFT January 2002 NOTE: If the negotiated Hold time value is zero, then the HoldTime timer and KeepAlive timers are not started. If the value of the Autonomous System field is the same as the local Autonomous System number, then the connection is an "internal" connection; otherwise, it is an "external" connection. (This will impact UPDATE processing as described below.) /* Does this above para really belong to the FSM description? * I think not. Move to OPEN message section? */ If a disconnect signal is received from the underlying transport protocol, the following steps are done: - Close the transport connection, - Restart the ConnectRetry timer, - Listen for a connection that may be initiated by the remote peer, - Transition to the Active state. If the HoldTimer expires: - Send a NOTIFICATION message with error code "Hold Timer Expired", - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session - Transition to IdleHold state. The Start event (manual and automatic) is ignored in the OpenSent state. If a NOTIFICATION message is received with the "Unsupported Version Number" code, the following steps are peformed: - Close the transport connection - Release resources allocated for the session, - Reset ConnectRetryCnt - Stop the ConnectRetry timer, Expiration Date July 2002 [Page 38] RFC DRAFT January 2002 - Transition to Idle state. If any other NOTIFICATION is received, the following steps are executed: - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session - Transition to IdleHold state. /* Idle */ In response to any other event, the local system performs the following steps: - Send the NOTFICATION message with Error Code "Finite State Machine Error", - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer - Drop the transport connection, - Release resources allocated for the session - Transition to IdleHold state. /* Idle */ OpenConfirm state In this state, the local system has received an OPEN message from the peer, confirmed its reception by sending out a KEEPALIVE message, and awaits an incoming KEEPALIVE message confirming that the remote peer received the OPEN message sent before, or an incoming NOTIFICATION message reporting a problem. If the local system receives a KEEPALIVE message, the FSM transitions to Established state. Upon expiration of the HoldTimer: - Send the NOTIFICATION message with the error code "Hold Timer Expired", - Set IdleHoldTimer to 2**(ConnectRetryCnt)*60 Expiration Date July 2002 [Page 39] RFC DRAFT January 2002 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Releases resources allocated for the session - Transition to IdleHold state. /* Idle */ If the local system receives a NOTIFICATION message or receives a disconnect signal from the underlying transport protocol, the following steps are done: - Set IdleHold Timer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, - Transition to IdleHold state. /* Idle */ In response to the automatic Stop event: - Send the NOTIFICATION message with "Cease" error code, - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, - Transition to IdleHold state. /* Idle */ In response to a manual Stop event: - Send the NOTIFICATION message with "Cease" error code, Expiration Date July 2002 [Page 40] RFC DRAFT January 2002 - Release resources allocated for the session, - Reset ConnectRetryCnt - Stop the ConnectRetry timer, - Transition to Idle state. The Start event is ignored in the OpenConfirm state. In response to any other event: - Send a NOTIFICATION with a code of "Finite State Machine Error", - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, - Transition to IdleHold state. /* Idle */ Established State: This is the most advanced state of the FSM. In the Established state the peers can exchange UPDATE, NOTFICATION, and KEEPALIVE messages. On reception of an UPDATE or KEEPALIVE message: - Restarts the Hold timer, if the negotiated HoldTime value is non-zero, - Stay in Established state. If the local system receives a NOTIFICATION message or a disconnect signal from the underlying transport protocol: - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60, - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, Expiration Date July 2002 [Page 41] RFC DRAFT January 2002 - Transition to IdleHoldstate. /* Idle */ If the local system receives an UPDATE message, and the Update message error handling procedure (see Section 6.3) detects an error, the following steps are performed: - Send a NOTIFICATION message with "Update Message Error" error code, - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, - Transition to IdleHold state. /* Idle */ On expiration of the Hold timer: - Send a NOTIFICATION message with error code "Hold Timer Expired", - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, - Transition to IdleHold state. /* Idle */ On expiration of the KeepAlive timer: - Send a KEEPALIVE message, - Restart the KeepAlive timer, unless the negotiated Hold Time value is zero. NOTE: The KeepAlive timer is also restarted each time a KEEPALIVE or UPDATE message is sent it restarts, unless the negotiated Hold Time value is zero. In response to an automatic Stop event: Expiration Date July 2002 [Page 42] RFC DRAFT January 2002 - Send a NOTIFICATION with "Cease" error code, - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, including invalidation of all routes possibly received from the peer - Transition to IdleHold state. /* Idle */ NOTE: An example of when an automatic Stop event can be generated is exceeding the maximum number of prefixes allowed to be received from a given peer. In response to a manual Stop event: - Send a NOTIFICATION with "Cease" error code, - Reset ConnectRetryCnt - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, including invalidation of all routes possibly received from the peer - Transition to Idle state. /* Idle */ The Start event is ignored in the Established state. In response to any other event, the local system: - Send a NOTIFICATION with "Finite State Machine Error" error code, - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60 - Increment ConnectRetryCnt by 1, - Stop the ConnectRetry timer, - Drop the transport connection, - Release resources allocated for the session, including invalidation of all routes possibly received from the peer - Transition to IdleHold state. /* Idle */ Expiration Date July 2002 [Page 43] RFC DRAFT January 2002
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: Comments on FSM Susan Hares
- Re: bgp4-17 Cease subcode Vincent Gillet
- Re: bgp4-17 Cease subcode Vincent Gillet
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: Comments on FSM Jeffrey Haas
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Kunihiro Ishiguro
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode andrewl
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Randy Bush
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: Comments on FSM Alex Zinin
- Re: bgp4-17 Cease subcode Randy Bush
- Re: Comments on FSM Kunihiro Ishiguro
- Comments on FSM Alex Zinin
- Re: bgp4-17 Cease subcode Enke Chen
- Re: bgp4-17 Cease subcode Greg Hankins
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Kunihiro Ishiguro
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Kunihiro Ishiguro
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Jeffrey Haas
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Eric Gray
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Eric Gray
- Re: bgp4-17 Cease subcode Jeff Pickering
- Re: bgp4-17 Cease subcode Vincent Gillet
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Yakov Rekhter
- Re: bgp4-17 Cease subcode Jeffrey Haas
- Re: bgp4-17 Cease subcode Yakov Rekhter
- Re: bgp4-17 Cease subcode Tom Petch
- Re: bgp4-17 Cease subcode Yakov Rekhter
- bgp4-17 Cease subcode Tom Petch
- Re: processing order of reach/unreach in rfc2858b… Jeffrey Haas
- Re: processing order of reach/unreach in rfc2858b… Enke Chen