Re: bgp4-17 Cease subcode

Susan Hares <skh@nexthop.com> Tue, 15 January 2002 21:31 UTC

Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id QAA09666 for <idr-archive@nic.merit.edu>; Tue, 15 Jan 2002 16:31:47 -0500 (EST)
Received: by trapdoor.merit.edu (Postfix) id B87A99122A; Tue, 15 Jan 2002 16:30:45 -0500 (EST)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id 8875D91276; Tue, 15 Jan 2002 16:30:45 -0500 (EST)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 495C89122A for <idr@trapdoor.merit.edu>; Tue, 15 Jan 2002 16:30:44 -0500 (EST)
Received: by segue.merit.edu (Postfix) id 267EF5DDAC; Tue, 15 Jan 2002 16:30:44 -0500 (EST)
Delivered-To: idr@merit.edu
Received: from presque.djinesys.com (presque.djinesys.com [198.108.88.2]) by segue.merit.edu (Postfix) with ESMTP id CB7F65DDA5 for <idr@merit.edu>; Tue, 15 Jan 2002 16:30:43 -0500 (EST)
Received: from SKH.nexthop.com ([64.211.218.122]) by presque.djinesys.com (8.11.3/8.11.1) with ESMTP id g0FLUK398886; Tue, 15 Jan 2002 16:30:20 -0500 (EST) (envelope-from skh@nexthop.com)
Message-Id: <5.0.0.25.0.20020115155854.04a7cd68@mail.nexthop.com>
X-Sender: skh@mail.nexthop.com
X-Mailer: QUALCOMM Windows Eudora Version 5.0
Date: Tue, 15 Jan 2002 16:30:18 -0500
To: Russ White <riw@cisco.com>
From: Susan Hares <skh@nexthop.com>
Subject: Re: bgp4-17 Cease subcode
Cc: Susan Hares <skh@nexthop.com>, Alex Zinin <azinin@nexsi.com>, Eric Gray <eric.gray@sandburst.com>, Inter-Domain Routing Mailing List <idr@merit.edu>
In-Reply-To: <Pine.GSO.4.21.0201151500510.20905-100000@ruwhite-u10.cisco .com>
References: <5.0.0.25.0.20020115144930.04a32a00@mail.nexthop.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-NextHop-MailScanner: Found to be clean
Sender: owner-idr@merit.edu
Precedence: bulk

Russ:

Can I change my mind on opening the FSM conversation?
Two people asking the same question encouraged me to do this.
Thanks for the input.  (keep those email terminals smokin' with
the input ;-))....

How about a 3rd option --- Let's go back to the problem set
and fix the text from there.

         Problem 1  - persistent flapping (bad thing), recommended
                      fix: IDLE hold timer

         Problem 2 -  FSM text doesn't specify how this features
                     works from all states.   Result:
                     inter-operability between implementations is
                     problematic.  [another bad thing]

If people are deploying BGP without the recommended (but
not required) IDLE hold timer for persistent flapping,
then this specification should allow it. [filter of what is deployed]


Option 3 - fix both with background text that says:
                 1) recommended to have IDLE Hold timer, and
                 2) Not required.

Now to the text:

Fix to problem 1 -  FSM does not require you to implemented
the hold off.

How about replacing this  following text at the IDLE State:

IDLE
=============

 From this text:
 >A manual start event is a start event initiated
 >by an operator. An automatic start event is
 >a start event generated by the system.

The text can be: a manual event is intended to be
a start event generated by "manual" intervention.  If the
presistent flapping suppression option is not set,
all start events will be manual start events.

How about re-adding this section at the bottom
of the state machine?


Manual versus Automatic start in the FSM

If a BGP speaker detects an error, it shuts down the
connection and changes its state to Idle-hold. Getting out
of the Idle state requires generation of the Start event.
Such a start event can either be manual or automatic.

If this start event is generated automatically, then
persistent BGP errors may result in persistent flapping
of the speaker.  To avoid such a condition it is
recommended that the Start events should not be generated
immediately for a peer that was previously transitioned
to Idle due to an error.   For a peer that was previously
transitioned to Idle due to an error, the time between
consecutive generation of Start events, if such events are
generated automatically, shall exponentially increase. The
value of the initial timer shall be 60 seconds. The time shall
be doubled for each consecutive retry.The formula for this
expotential backoff is expressed in the formula:

         IDLE hold time = 2**(connectRetryCnt)*60


----------------

What do you think?


Sue Hares



At 03:01 PM 1/15/2002 -0500, Russ White wrote:

>So, then it sounds like the way we should approach this (the
>idlehold state in the fsm) is to think about it seperately from
>the draft 17 issue (?).
>
>Russ
>
>On Tue, 15 Jan 2002, Susan Hares wrote:
>
> >
> > Alex:
> >
> > I know you have been active on the mailing list.
> > And you didn't read it before the last call on the FSM.
> > And this question was asked on the list because I
> > wondered about it.
> >
> > ;-)... <giggle on>  --- I guess not even great people are
> > perfect.  <giggle off>
> >
> > OK, now I've got 2 more against (4 total).  And 15-20 votes
> > for it inside the draft.
> >
> > I'm sure your vote is a bit more specific.  :-)...
> >
> > Is the issue you want to pull exponential out of the draft?
> > This is a change to the BGP specification.  This draft
> > is about corrections not changes in functionality.
> >
> > What do we do?
> >
> > Sue Hares
> >
> >
> >
> > At 10:30 AM 1/15/2002 -0800, Alex Zinin wrote:
> >
> > >Russ, Eric,
> > >
> > >  I tend to agree here...
> > >  I'm currently reviewing the FSM and the impression I'm
> > >  getting is that we can go without the IdleHold state
> > >  and say that implementations may/should use some local mechanisms
> > >  to hold BGP sessions in the *Idle* state to avoid excessive
> > >  session flapping. I think this will be simpler and will
> > >  match current implementations better...
> > >
> > >--
> > >Alex Zinin
> > >
> > >Tuesday, January 15, 2002, 7:21:06 AM, Russ White wrote:
> > >
> > >
> > > > Something like: "BGP implementations can/should/must
> > > > (?) implement some method to prevent continuous flapping of
> > > > peering sessions at a high rate," and then a footbote explaining
> > > > that an exponential backoff is one such possible method?
> > >
> > > > Russ
> > >
> > > > On Tue, 15 Jan 2002, Eric Gray wrote:
> > >
> > > >> Russ,
> > > >>
> > > >>     Very good point.  However, how would you represent "do some
> > > >> private magic here" in an FSM?  That may make it the dreaded ISM.
> > > >> Perhaps it might be sufficient to remove this from the FSM and
> > > >> add a footnote (possibly mentioning an exponential back-off as an
> > > >> example?).
> > > >>
> > > >> You wrote:
> > > >>
> > > >> > Well, I just took it as 'do people do this?' I agree that it
> > > >> > won't cause interop problems either way--it's actually something
> > > >> > that's implementation local, so I'm not certain why the
> > > >> > exponential backoff would be in the fsm (?). There are, in other
> > > >> > words, other ways I could imagine handling this problem that
> > > >> > wouldn't effect interoperability as well....
> > > >> >
> > > >> > :-)
> > > >> >
> > > >> > Russ
> > > >> >
> > > >> > On Tue, 15 Jan 2002, Eric Gray wrote:
> > > >> >
> > > >> > > Russ,
> > > >> > >
> > > >> > >     I don't think that NAKs are in order on this question - even
> > > from the
> > > >> > > 1500 pound dragon.  :-)
> > > >> > >
> > > >> > >     The fact that anyone's implementation doesn't do X is
> > > important only
> > > >> > > if not doing X causes interoperability problems with 
> implementations
> > > >> > > that do X.   That is not the case here, I believe...
> > > >> > >
> > > >> > > You wrote:
> > > >> > >
> > > >> > > > > > On Mon, Jan 14, 2002 at 09:28:53AM -0800, Yakov Rekhter 
> wrote:
> > > >> > > > > > > Please remember that the goal of the draft is to document
> > > >> > > > > > > what is *currently* implemented and deployed, *not* what
> > > >> > > > > > > *could* be implemented and deployed.
> > > >> > > > > >
> > > >> > > > > > Is the expoential backoff in the FSM in current 
> implementations?
> > > >> > > > >
> > > >> > > > > I guess we are going to find this out as part of the
> > > >> > > > > implementation report. And if it is not in (at least two)
> > > >> > > > > current implementations, we'll take it out of the text.
> > > >> > > >
> > > >> > > > Cisco doesn't do this....
> > > >> > > >
> > > >> > > > :-)
> > > >> > > >
> > > >> > > > Russ
> > > >> > > >
> > > >> > > > _____________________________
> > > >> > > > riw@cisco.com <>< Grace Alone
> > > >> > >
> > > >> > > --
> > > >> > > Eric Gray (mailto:eric.gray@sandburst.com)
> > > >> > > http://www.mindspring.com/~ewgray
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >> > _____________________________
> > > >> > riw@cisco.com <>< Grace Alone
> > > >>
> > > >> --
> > > >> Eric Gray (mailto:eric.gray@sandburst.com)
> > > >> http://www.mindspring.com/~ewgray
> > > >>
> > > >>
> > > >>
> > >
> > > > _____________________________
> > > > riw@cisco.com <>< Grace Alone
> >
> >
>
>_____________________________
>riw@cisco.com <>< Grace Alone