Re: bgp4-17 Cease subcode

Alex Zinin <azinin@nexsi.com> Thu, 17 January 2002 17:16 UTC

Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id MAA15921 for <idr-archive@nic.merit.edu>; Thu, 17 Jan 2002 12:16:03 -0500 (EST)
Received: by trapdoor.merit.edu (Postfix) id 3E418912E5; Thu, 17 Jan 2002 12:15:10 -0500 (EST)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id D36AA912E6; Thu, 17 Jan 2002 12:15:09 -0500 (EST)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id CC6C6912E5 for <idr@trapdoor.merit.edu>; Thu, 17 Jan 2002 12:15:07 -0500 (EST)
Received: by segue.merit.edu (Postfix) id 17FBC5DDDD; Thu, 17 Jan 2002 12:15:07 -0500 (EST)
Delivered-To: idr@merit.edu
Received: from relay1.nexsi.com (relay1.nexsi.com [66.35.205.133]) by segue.merit.edu (Postfix) with ESMTP id A7E405DDDA for <idr@merit.edu>; Thu, 17 Jan 2002 12:15:06 -0500 (EST)
Received: from mail.nexsi.com (unknown [66.35.212.41]) by relay1.nexsi.com (Postfix) with ESMTP id CEDB13F67; Thu, 17 Jan 2002 09:17:54 -0800 (PST)
Received: from khonsu.sw.nexsi.com (cscovpn6.nexsi.com [172.16.213.6]) by mail.nexsi.com (8.9.3/8.9.3) with ESMTP id JAA17644; Thu, 17 Jan 2002 09:14:17 -0800
Date: Thu, 17 Jan 2002 09:14:13 -0800
From: Alex Zinin <azinin@nexsi.com>
X-Mailer: The Bat! (v1.51) Personal
Reply-To: Alex Zinin <azinin@nexsi.com>
Organization: Nexsi Systems
X-Priority: 3 (Normal)
Message-ID: <32264605142.20020117091413@nexsi.com>
To: Susan Hares <skh@nexthop.com>
Cc: randy Bush <randy@psg.com>, idr@merit.edu
Subject: Re: bgp4-17 Cease subcode
In-Reply-To: <5.0.0.25.0.20020117083423.0252ef28@mail.nexthop.com>
References: <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <20020115140711.GA23937@opentransit.net> <20020114123700.C7761@nexthop.com> <200201141750.g0EHo3634958@merlot.juniper.net> <20020115140711.GA23937@opentransit.net> <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <5.0.0.25.0.20020117083423.0252ef28@mail.nexthop.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-idr@merit.edu
Precedence: bulk

Sue,

 To clarify my position: I think that recommendation for an
 exponential back-off is a good thing and solves a real
 problem. However, I think that the IdleHold state should not
 be in the FSM because:

   1) It seems to be redundant. See (*) below.
   2) It introduces changes to the base FSM to account for
      a feature of an optional nature. People who didn't
      implement it would be put in a non-compliant position,
      people who won't implement it will have to take things
      out of the FSM.
   3) I am under impression (though I might be wrong) that
      it does not match the majority of deployed implementations
      that do not have this state and hence would ago against the
      objective of this spec review round.
 ---
 *) When it comes to FSMs, one almost always can achieve the same
    results by either using an existing state with flags or by adding
    another state. It is often hard to decide which approach to follow.
    One simple test I find useful is to see how different periodic
    activities, packet and event processing are for the two states.
    If there is a considerable difference, separate states are most
    probably a good idea. In our case, the two states are practically
    the same---we don't do anything on a periodic basis (no messages,
    no outbound connections), we do not accept incoming connections,
    we ignore all events but Start. The difference is essentially in
    the reason we get to either state, which affects the name of
    the state and when the Start event is generated. IMHO it does not
    justify a new state, but that's my personal opinion :) I think
    the original text from -12 (maybe revised a bit) should be good
    enough...
    
-- 
Alex Zinin

Thursday, January 17, 2002, 5:51:35 AM, Susan Hares wrote:


> Alex and Randy:

> Let's go back to first principles here on the FSM?

> 1st)  The original text is below from draft-12.

>     This explanation gives "flag" that has implications
>     for the full state machine.   The definition of
>     state machines that Alex proposes, is all inputs
>     and all actions are determined by state machine in
>     clear text.

> 2nd) Is the bug real?

>     The text was fixing a " persistent bgp flapping"
>     bug.  According to the text there is a
>     state within a state with fairly vague
>     descriptions.

> The second question belongs to Randy, since he wears 2 hats (Operations
> and Routing temp AD), was this a real problem?  Has it gone away?

> 1) At least one routing vendor doesn't implement it [cisco]
>     and I know this vendor is utilized in BGP peering sessions
>     in the network.

> 2) What was the operational concern this text implies?

> If there is no operational issue and no operational usage,
> return to the original FSM text and out the comments on
> "hold down."

> So, Randy and all other operators - Is the problem it describes real?
> Does anyone need it?   Let's answer that question first.




> Sue Hares

> ----------------------


>            If a BGP speaker detects an error, it shuts down the connection
>           and changes its state to Idle. Getting out of the Idle state
>           requires generation of the Start event.  If such an event is
>           generated automatically, then persistent BGP errors may result
>           in persistent flapping of the speaker.  To avoid such a
>           condition it is recommended that Start events should not be
>           generated immediately for a peer that was previously
>           transitioned to Idle due to an error. For a peer that was
>           previously transitioned to Idle due to an error, the time



> Expiration Date July 2001                                      [Page 31]





> RFC DRAFT                                                   January 2001


>           between consecutive generation of Start events, if such events
>           are generated automatically, shall exponentially increase. The
>           value of the initial timer shall be 60 seconds. The time shall
>           be doubled for each consecutive retry.

>           Any other event received in the Idle state is ignored.



> At 11:10 AM 1/16/2002 -0800, you wrote:

>>Sue,
>>
>>  Introduction of the IdleHold does change the FSM,
>>  and I thought we wanted the spec to reflect the current
>>  running code as much as possible.
>>
>>  I agree with Russ and Ishi---the new state does not
>>  seem to be necessary, instead it could be as easy
>>  as holding the session in Idle and giving clue on
>>  how to make the delay exponential. I don't think
>>  there's an interoperability issue if people decide
>>  to keep the session in Idle using different internal
>>  mechanisms.
>>
>>  Regards,
>>
>>--
>>Alex Zinin
>>
>>Wednesday, January 16, 2002, 6:04:36 AM, Susan Hares wrote:
>>
>> > Kunihiro:
>>
>> > We are not changing the FSM so I would be surprised if the change
>> > was anything but modest. Usually specifications with "no big" deal get
>> > interpreted differently.  Inter-operable code means you tie down the 
>> details.
>>
>> > The comment was on the clarity of the specification.
>>
>> > If you have a specific comment on the text of the state machine,
>> > can you propose the concerns you have as a revision to the text.
>>
>> > Sue
>>
>> > PS -- I just love last call on a draft...  It's when
>> >        everyone finally reads a new section  ;-)...
>>
>>
>> > At 05:39 PM 1/15/2002 -0800, Kunihiro Ishiguro wrote:
>> >> >> > Is the expoential backoff in the FSM in current implementations?
>> >> >>
>> >> >> I guess we are going to find this out as part of the implementation
>> >> >> report. And if it is not in (at least two) current implementations,
>> >> >> we'll take it out of the text.
>> >> >
>> >> >I implemented Cease subcode in zebra-0.92a but not exponential backoff.
>> >> >I checked that new subcode does not put other BGP stack in trouble
>> >> >(checked Cisco and Juniper).
>> >>
>> >>First of all, sorry for talking about specific implementation.  In
>> >>Zebra implementation we've implemented Cease subcode.  The code is in
>> >>CVS repository.  I'll prepare a release version for that.
>> >>
>> >>And also we've implemented exponetial backoff.  It is done without
>> >>introducing new FSM status.  It is not a big deal.  Just check a flag
>> >>in a few functions.
>>
>>
>> >>Exponential backoff is a good feature.  I can't understand why it
>> >>require change to FSM.