Comments on FSM

Alex Zinin <azinin@nexsi.com> Thu, 17 January 2002 00:31 UTC

Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id TAA21219 for <idr-archive@nic.merit.edu>; Wed, 16 Jan 2002 19:31:17 -0500 (EST)
Received: by trapdoor.merit.edu (Postfix) id 992ED912BE; Wed, 16 Jan 2002 19:30:10 -0500 (EST)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id 1FEFC912C2; Wed, 16 Jan 2002 19:30:09 -0500 (EST)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id E2B5E912BE for <idr@trapdoor.merit.edu>; Wed, 16 Jan 2002 19:30:07 -0500 (EST)
Received: by segue.merit.edu (Postfix) id E4C0A5DDA1; Wed, 16 Jan 2002 19:30:06 -0500 (EST)
Delivered-To: idr@merit.edu
Received: from relay1.nexsi.com (relay1.nexsi.com [66.35.205.133]) by segue.merit.edu (Postfix) with ESMTP id 0C95F5DDC6 for <idr@merit.edu>; Wed, 16 Jan 2002 19:30:05 -0500 (EST)
Received: from mail.nexsi.com (unknown [66.35.212.41]) by relay1.nexsi.com (Postfix) with ESMTP id A76F33F71; Wed, 16 Jan 2002 16:32:53 -0800 (PST)
Received: from khonsu.sw.nexsi.com ([172.17.212.34]) by mail.nexsi.com (8.9.3/8.9.3) with ESMTP id QAA11382; Wed, 16 Jan 2002 16:29:16 -0800
Date: Wed, 16 Jan 2002 16:29:16 -0800
From: Alex Zinin <azinin@nexsi.com>
X-Mailer: The Bat! (v1.51) Personal
Reply-To: Alex Zinin <azinin@nexsi.com>
Organization: Nexsi Systems
X-Priority: 3 (Normal)
Message-ID: <195204309992.20020116162916@nexsi.com>
To: idr@merit.edu
Cc: Susan Hares <skh@nexthop.com>
Subject: Comments on FSM
In-Reply-To: <5.0.0.25.0.20020116181115.03ea46f8@mail.nexthop.com>
References: <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <20020115140711.GA23937@opentransit.net> <20020114123700.C7761@nexthop.com> <200201141750.g0EHo3634958@merlot.juniper.net> <87advfjcqi.wl@vaio.zebra.org> <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <5.0.0.25.0.20020116181115.03ea46f8@mail.nexthop.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Sender: owner-idr@merit.edu
Precedence: bulk

Sue,

 Some comments on the FSM text below.
 I constrained myself to mostly editorial changes,
 as I'd prefer to see the FSM description in a
 different form (the one I'm working on).

 Also, I think that the connection collision related
 issue brought by Dennis is still not addressed.
 Frankly, I'm not sure how well this can be addressed
 if the spec continues to treat competing transport
 connections for the same peer as separate BGP sessions
 and FSMs...

 Thanks,
 
-- 
Alex Zinin


General notes:

 - there should be a separate section describing all
   session attributes (such as timers and flags) involved
   in FSM operation.
 
 - there should be a separate subsection where all events
   would be formally defined and named. Defined names
   should then be used in the text.

 - there should be a section specifically describing 
   processing of incoming transport connections

 - I think IdleHold should go. (stet'ed the text for now)

 - Processing of manual Stop in Established lacked
   a NOTIFICATION message.

 - Thorough check of FSM text correctness is hard because
   events are not documented and one would need to
   parse the text and formalize the description to do so,
   i.e., pretty much what I'm doing with the other
   representation.

 - I think we need a state transition diagram (not table)
   in the text for better visualization (I'm going to
   have one). This won't be possible without proper
   event documentation.

 - NOTIFICATION codes and subcodes need verification.
   (some changes are in the text.)

 - Sessions "killed" due to connection collision should
   not go to Idle, but be destroyed. (Not addressed in
   the text below.)

Below is corrected text of the section . You'll be able to
do a "diff -u" or something against the original one to
see line-by-line changes.

8. BGP Finite State machine.


   This section specifies BGP operation in terms of a Finite State
   Machine (FSM) of a peering session. Following is a brief summary
   and overview of BGP operations by state as determined by this FSM.

   /* a "brief summary" of FSM? There should be a complete one */

   An instance of an FSM is created by a BGP speaker for each configured
   peer. It may also be created dynamically when an incoming transport
   connection is reported, there's no matching configured peer and
   the speaker is configured to accept such connections. FSMs created
   for configured peers are initially put in the Idle state. See section
   XXX for more information on incoming transport connection processing.

   Each FSM state with corresponding event processing rules is described
   below.
      
      Idle state:

         This is the initial state of the FSM. This state is also
         used to keep BGP sessions in the inactive state when    
         necessary. In this state BGP refuses all incoming BGP
         connections from the peer. No resources are allocated to
         the session.

         /* Here, I assume that the IdleHold will be removed and
          * Idle be used for session hold down
          */

         A manual start event is a start event initiated by an operator
         to initiate BGP session establishment. An automatic start event
         is a start event generated by the system.

         /* The above should go to the event description section
          */

         In response to a Start event (manual or automatic), the following
         steps are performed:

            - Allocate resources for the session,

            - Start the ConnectRetry timer (see note below),

            - Initiate an outbound transport connection to the peer,

            - Start listening for a connection that may be initiated
              by the peer,

            - Transition to Connect.

         NOTE: The exact value of the ConnectRetry timer is a local 
               matter, but it should be sufficiently large to allow TCP
               initialization.

         Any other event received in the IDLE state, is ignored.

         /* I have a problem with this. Until all events are properly
          * documented, saying "any other event" is inappropriate.
          */




Expiration Date July 2002                                      [Page 33]





RFC DRAFT                                                   January 2002


      IdleHold state:

      /* I think this state should go and Idle be used for the same
       * purpose with local mechanisms used to control how long
       * the session stays there... I'm skipping this state...
       */

         The IdleHold state keeps the system in "Idle" mode until a
         certain time period has passed or an operator intervenes to
         manually restart the connection.  This "IdleHold timeout"
         prevents persistent flapping of a BGP peering session.

         Upon entering the Idle Hold state, if the IdleHoldTimer exceeds
         the local limit the "Keep Idle" flag is set.

         Upon receiving a Manual start, the local system:

            - clears the IdleHoldtimer,

            - clears "keep Idle" flag

            - initializes all BGP resources,

            - starts the ConnectRetry timer,

            - initiates a transport connection to the other BGP peer,

            - listens for a connection that may be initiated by the
            remote BGPPeer, and

            - changes its state to connect.

         Upon receiving a IdleHoldtimer expired event, the local system
         checks to see that the Keep Idle flag is set.  If the Keep Idle
         flag is set, the system stays in the "Idle Hold" state.

         If the Keep Idle flag is not set, the local system:

            - clears the IdleHoldtimer,

            - and transitions the state to Idle.

         Getting out of the IdleHoldstate requires either operator
         intervention via a manual start or the IdleHoldtimer to expire
         with the "Keep Idle" flag to be clear.

         Any other event received in the IdleHold state is ignored.

      Connect State:

         In this state, BGP has initiated an outbound transport
         connection and is waiting it to be completed.





Expiration Date July 2002                                      [Page 34]





RFC DRAFT                                                   January 2002


         If the transport connection succeeds, the following steps
         are performed:

            - Clear the ConnectRetry timer,

            - Complete initialization
              /* What does this really mean spec-wise? Just remove ? */

            - Send an Open message to the peer,

            - Set Hold timer to a large value (see note below)

            - Transition to OpenSent.

            /* Stop listening to the incoming connection here ? */

         NOTE: A hold timer value of 4 minutes is suggested.

         If the transport protocol connection fails (e.g.,
         retransmission timeout), the following steps are performed:

            - Restart the ConnectRetry timer,

            - Continue to listen for a connection that may be 
              initiated by the peer,

              /* Remove this and specify when I should stop
               * listening?
               */

            - Transition to Active state.

         In response to the ConnectRetry timer expired event, the local
         system performs the following steps:

            - Restart the ConnectRetry timer,

            - Initiate an outbound transport connection to the peer,

            - Continue to listen for a connection that may be initiated
              by the remote BGP peer, and

              /* Remove this and specify when I should stop
               * listening?
               */

            - Stay in Connect state.

         The start event (manual or automatic) is ignored in the Connect
         state.

         In response to any other event (initiated by the system or
         operator), the following steps are executed:

         /* Again, the same comment about "any other event"...*/

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,




Expiration Date July 2002                                      [Page 35]





RFC DRAFT                                                   January 2002


            - Drop the transport connection,

            - Release resources allocated for the session,

            - Transition to IdleHoldstate /* Idle */

      Active State:

         In this state BGP is not actively initiating an outbound
         transport connection, but is trying to acquire a peer 
         by listening for and accepting an incoming one.

         If the local system does not allow BGP connections with
         unconfigured peers, then the local system rejects connections
         from IP addresses that are not configured peers, and remains
         in the Active state.

         If the transport connection succeeds, the following
         steps are performed:

            - Stop the ConnectRetry timer,

            - Complete the initialization, /* Remove this ?*/

            - Send the Open message to the peer,

            - Set the Hold timer to a large value (see note below)

            - Transition to the OpenSent state.

            /* Stop listening here? */

         NOTE: A Hold timer value of 4 minutes is suggested.

         In response the ConnectRetry timer expired event, the local
         system performs the following:

            - Restart the ConnectRetry timer,

            - Initiate an outbound transport connection to the peer,

            - Continue to listen for connection that may be initiated
              by the peer, /* Remove this ?*/

            - Transition to Connect.

         The start events (initiated by the system or operator) are
         ignored in the Active state.




Expiration Date July 2002                                      [Page 36]





RFC DRAFT                                                   January 2002


         In response to any other event (initiated by the system or
         operator), the following steps are taken:

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer

            - Drops the transport connection,

            - Releases resources allocated for the session

            - Transition to IdleHold state. /* Idle */

            /* Stop listening here ?*/

      OpenSent:

         In this state, the local system has sent out an OPEN
         message and awaits an OPEN message from the peer.
         When an OPEN message is received, all fields are checked for
         correctness.  If the BGP message header checking or OPEN
         message check detects an error (see Section 6.2), or a
         connection collision (see Section 6.8) the following
         steps are performed:

            - Send a NOTIFICATION message /* Need code here */

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer 

            - Drop the transport connection,

            - Release resources allocated for the session,

            - Transition to the IdleHold state. /* Idle */

         If there are no errors in the OPEN message, the local system
         does the following steps:

            - Send a KEEPALIVE message,

            - Start the KeepAlive timer (see note below)

            - Start the Hold timer according to the negotiated value 
              (see section 4.2 and note below),

            - Transition to OpenConfirm.




Expiration Date July 2002                                      [Page 37]





RFC DRAFT                                                   January 2002


         NOTE: If the negotiated Hold time value is zero, then the HoldTime
         timer and KeepAlive timers are not started.   
         
         If the value of the Autonomous System field is the same as 
         the local Autonomous System number, then the connection is an 
         "internal" connection; otherwise, it is an "external" connection.
         (This will impact UPDATE processing as described below.)

         /* Does this above para really belong to the FSM description?
          * I think not. Move to OPEN message section?
          */

         If a disconnect signal is received from the underlying
         transport protocol, the following steps are done:

            - Close the transport connection,

            - Restart the ConnectRetry timer,

            - Listen for a connection that may be initiated by 
              the remote peer,
              
            - Transition to the Active state.

         If the HoldTimer expires:

            - Send a NOTIFICATION message with error code 
              "Hold Timer Expired",

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session

            - Transition to IdleHold state.

         The Start event (manual and automatic) is ignored in the
         OpenSent state.

         If a NOTIFICATION message is received with the "Unsupported
         Version Number" code, the following steps are peformed:

            - Close the transport connection

            - Release resources allocated for the session,

            - Reset ConnectRetryCnt

            - Stop the ConnectRetry timer,



Expiration Date July 2002                                      [Page 38]





RFC DRAFT                                                   January 2002


            - Transition to Idle state.

         If any other NOTIFICATION is received, the following
         steps are executed:

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session

            - Transition to IdleHold state. /* Idle */

         In response to any other event, the local system performs
         the following steps:

            - Send the NOTFICATION message with Error Code "Finite State
            Machine Error",

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer

            - Drop the transport connection,

            - Release resources allocated for the session

            - Transition to IdleHold state. /* Idle */

      OpenConfirm state

         In this state, the local system has received an OPEN message
         from the peer, confirmed its reception by sending out
         a KEEPALIVE message, and awaits an incoming KEEPALIVE message
         confirming that the remote peer received the OPEN message
         sent before, or an incoming NOTIFICATION message reporting
         a problem.

         If the local system receives a KEEPALIVE message, the FSM
         transitions to Established state.

         Upon expiration of the HoldTimer:

            - Send the NOTIFICATION message with the error code "Hold
              Timer Expired",

            - Set IdleHoldTimer to 2**(ConnectRetryCnt)*60



Expiration Date July 2002                                      [Page 39]





RFC DRAFT                                                   January 2002


            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Releases resources allocated for the session

            - Transition to IdleHold state. /* Idle */

         If the local system receives a NOTIFICATION message or receives
         a disconnect signal from the underlying transport
         protocol, the following steps are done:

            - Set IdleHold Timer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,

            - Transition to IdleHold state. /* Idle */

         In response to the automatic Stop event:

            - Send the NOTIFICATION message with "Cease" error code,

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,

            - Transition to IdleHold state. /* Idle */


         In response to a manual Stop event:

            - Send the NOTIFICATION message with "Cease" error code,



Expiration Date July 2002                                      [Page 40]





RFC DRAFT                                                   January 2002


            - Release resources allocated for the session,

            - Reset ConnectRetryCnt 

            - Stop the ConnectRetry timer,

            - Transition to Idle state.

         The Start event is ignored in the OpenConfirm state.

         In response to any other event:

            - Send a NOTIFICATION with a code of "Finite State Machine
              Error",

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,

            - Transition to IdleHold state. /* Idle */

      Established State:

         This is the most advanced state of the FSM. In the Established 
         state the peers can exchange UPDATE, NOTFICATION, and KEEPALIVE
         messages.

         On reception of an UPDATE or KEEPALIVE message:

            - Restarts the Hold timer, if the negotiated HoldTime 
              value is non-zero,

            - Stay in Established state.

         If the local system receives a NOTIFICATION message or a
         disconnect signal from the underlying transport protocol:

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60,

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,



Expiration Date July 2002                                      [Page 41]





RFC DRAFT                                                   January 2002

            - Transition to IdleHoldstate. /* Idle */


         If the local system receives an UPDATE message, and the Update
         message error handling procedure (see Section 6.3) detects an
         error, the following steps are performed:

            - Send a NOTIFICATION message with "Update Message Error"
              error code,

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,

            - Transition to IdleHold state. /* Idle */

         On expiration of the Hold timer:

            - Send a NOTIFICATION message with error code "Hold Timer
              Expired",

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,

            - Transition to IdleHold state. /* Idle */

         On expiration of the KeepAlive timer:

            - Send a KEEPALIVE message,

            - Restart the KeepAlive timer, unless the negotiated
              Hold Time value is zero.

         NOTE: The KeepAlive timer is also restarted each time a 
               KEEPALIVE or UPDATE message is sent it restarts,
               unless the negotiated Hold Time value is zero.

         In response to an automatic Stop event:




Expiration Date July 2002                                      [Page 42]





RFC DRAFT                                                   January 2002


            - Send a NOTIFICATION with "Cease" error code,

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,
              including invalidation of all routes possibly 
              received from the peer

            - Transition to IdleHold state. /* Idle */

         NOTE: An example of when an automatic Stop event can be 
              generated is exceeding  the maximum number of prefixes
              allowed to be received from a given peer.

         In response to a manual Stop event:

            - Send a NOTIFICATION with "Cease" error code,

            - Reset ConnectRetryCnt

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,
              including invalidation of all routes possibly 
              received from the peer

            - Transition to Idle state. /* Idle */

         The Start event is ignored in the Established state.

         In response to any other event, the local system:


            - Send a NOTIFICATION with "Finite State
              Machine Error" error code,

            - Set IdleHoldtimer to 2**(ConnectRetryCnt)*60

            - Increment ConnectRetryCnt by 1,

            - Stop the ConnectRetry timer,

            - Drop the transport connection,

            - Release resources allocated for the session,
              including invalidation of all routes possibly 
              received from the peer

            - Transition to IdleHold state. /* Idle */


Expiration Date July 2002                                      [Page 43]





RFC DRAFT                                                   January 2002