Re: bgp4-17 Cease subcode
Susan Hares <skh@nexthop.com> Thu, 17 January 2002 18:29 UTC
Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id NAA18263 for <idr-archive@nic.merit.edu>; Thu, 17 Jan 2002 13:29:50 -0500 (EST)
Received: by trapdoor.merit.edu (Postfix) id 0E390912E6; Thu, 17 Jan 2002 13:29:24 -0500 (EST)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id B55F7912E7; Thu, 17 Jan 2002 13:29:23 -0500 (EST)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 66964912E6 for <idr@trapdoor.merit.edu>; Thu, 17 Jan 2002 13:29:22 -0500 (EST)
Received: by segue.merit.edu (Postfix) id 214935DDE6; Thu, 17 Jan 2002 13:29:22 -0500 (EST)
Delivered-To: idr@merit.edu
Received: from presque.djinesys.com (presque.djinesys.com [198.108.88.2]) by segue.merit.edu (Postfix) with ESMTP id BBBBC5DDE2 for <idr@merit.edu>; Thu, 17 Jan 2002 13:29:21 -0500 (EST)
Received: from SKH.nexthop.com ([64.211.218.122]) by presque.djinesys.com (8.11.3/8.11.1) with ESMTP id g0HISh373599; Thu, 17 Jan 2002 13:28:43 -0500 (EST) (envelope-from skh@nexthop.com)
Message-Id: <5.0.0.25.0.20020117124225.042f96e0@mail.nexthop.com>
X-Sender: skh@mail.nexthop.com
X-Mailer: QUALCOMM Windows Eudora Version 5.0
Date: Thu, 17 Jan 2002 13:28:40 -0500
To: Alex Zinin <azinin@nexsi.com>
From: Susan Hares <skh@nexthop.com>
Subject: Re: bgp4-17 Cease subcode
Cc: randy Bush <randy@psg.com>, fenner@research.att.com, idr@merit.edu
In-Reply-To: <32264605142.20020117091413@nexsi.com>
References: <5.0.0.25.0.20020117083423.0252ef28@mail.nexthop.com> <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <20020115140711.GA23937@opentransit.net> <20020114123700.C7761@nexthop.com> <200201141750.g0EHo3634958@merlot.juniper.net> <20020115140711.GA23937@opentransit.net> <5.0.0.25.0.20020116090028.039d2fa8@mail.nexthop.com> <5.0.0.25.0.20020117083423.0252ef28@mail.nexthop.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="=====================_37894649==_"
X-NextHop-MailScanner: Found to be clean
Sender: owner-idr@merit.edu
Precedence: bulk
Alex: 1) Thanks for your note. Glad you think the bgp exponential back off is good. We should get operator input to answer questions in #3. I want to go with whatever the operator community has deployed for this draft. 2) State machine text clean up was more than the Idle Hold state Just to let you know the FSM text was tighted up based on the word document with the diagram I sent out. There were other inconsistencies in the FSM that I fixed based on the states not being consistent with this diagram. I will generate the state machine table without the idle hold state to indicate those differences first. [Warning - this plus the 2 other state machines you sent me to review will take me a while. Warning -- day turnaround most likely] 3)bgp exponential back-off I would still like to hear from operators what was the problem and what this is being used for. If it is a serious problem, and people need it we can get feedback on: a) Do operators use the flapping bgp speaker upon automatic configuration on any router in their network? If so, why? b) If they utilize this feature, how do they detect that the back-off is going on? If they do want to detect it, what do they want to know? How critical is it? - if it is critical to know about, then something that can be queried in a standard manner is useful in the mib. The specific state may be useful. - if the ISPs don't use it, we'll pull the feature. - if it is important, but they don't want to have a state to manage it then we can recombine the states. When the IESG calls (protocol monitors that they are), we can point to the market request on this issue. anyone who can ask ISPs to query this? 4) comparing all your input --- still processing I am plugging through the 2 versions of state machine you sent. Did you send me a review of the diagram (word document) I'm attaching again. I didn't find that even though you mentioned you had done it. Did I miss it? That would speed up this process since a 5 page form is quicker to review and glance over. Sue At 09:14 AM 1/17/2002 -0800, you wrote: >Sue, > > To clarify my position: I think that recommendation for an > exponential back-off is a good thing and solves a real > problem. However, I think that the IdleHold state should not > be in the FSM because: > > 1) It seems to be redundant. See (*) below. > 2) It introduces changes to the base FSM to account for > a feature of an optional nature. People who didn't > implement it would be put in a non-compliant position, > people who won't implement it will have to take things > out of the FSM. > 3) I am under impression (though I might be wrong) that > it does not match the majority of deployed implementations > that do not have this state and hence would ago against the > objective of this spec review round. > --- > *) When it comes to FSMs, one almost always can achieve the same > results by either using an existing state with flags or by adding > another state. It is often hard to decide which approach to follow. > One simple test I find useful is to see how different periodic > activities, packet and event processing are for the two states. > If there is a considerable difference, separate states are most > probably a good idea. In our case, the two states are practically > the same---we don't do anything on a periodic basis (no messages, > no outbound connections), we do not accept incoming connections, > we ignore all events but Start. The difference is essentially in > the reason we get to either state, which affects the name of > the state and when the Start event is generated. IMHO it does not > justify a new state, but that's my personal opinion :) I think > the original text from -12 (maybe revised a bit) should be good > enough... > >-- >Alex Zinin > >Thursday, January 17, 2002, 5:51:35 AM, Susan Hares wrote: > > > > Alex and Randy: > > > Let's go back to first principles here on the FSM? > > > 1st) The original text is below from draft-12. > > > This explanation gives "flag" that has implications > > for the full state machine. The definition of > > state machines that Alex proposes, is all inputs > > and all actions are determined by state machine in > > clear text. > > > 2nd) Is the bug real? > > > The text was fixing a " persistent bgp flapping" > > bug. According to the text there is a > > state within a state with fairly vague > > descriptions. > > > The second question belongs to Randy, since he wears 2 hats (Operations > > and Routing temp AD), was this a real problem? Has it gone away? > > > 1) At least one routing vendor doesn't implement it [cisco] > > and I know this vendor is utilized in BGP peering sessions > > in the network. > > > 2) What was the operational concern this text implies? > > > If there is no operational issue and no operational usage, > > return to the original FSM text and out the comments on > > "hold down." > > > So, Randy and all other operators - Is the problem it describes real? > > Does anyone need it? Let's answer that question first. > > > > > > Sue Hares > > > ---------------------- > > > > If a BGP speaker detects an error, it shuts down the connection > > and changes its state to Idle. Getting out of the Idle state > > requires generation of the Start event. If such an event is > > generated automatically, then persistent BGP errors may result > > in persistent flapping of the speaker. To avoid such a > > condition it is recommended that Start events should not be > > generated immediately for a peer that was previously > > transitioned to Idle due to an error. For a peer that was > > previously transitioned to Idle due to an error, the time > > > > > Expiration Date July 2001 [Page 31] > > > > > > > RFC DRAFT January 2001 > > > > between consecutive generation of Start events, if such events > > are generated automatically, shall exponentially increase. The > > value of the initial timer shall be 60 seconds. The time shall > > be doubled for each consecutive retry. > > > Any other event received in the Idle state is ignored. > > > > > At 11:10 AM 1/16/2002 -0800, you wrote: > > >>Sue, > >> > >> Introduction of the IdleHold does change the FSM, > >> and I thought we wanted the spec to reflect the current > >> running code as much as possible. > >> > >> I agree with Russ and Ishi---the new state does not > >> seem to be necessary, instead it could be as easy > >> as holding the session in Idle and giving clue on > >> how to make the delay exponential. I don't think > >> there's an interoperability issue if people decide > >> to keep the session in Idle using different internal > >> mechanisms. > >> > >> Regards, > >> > >>-- > >>Alex Zinin > >> > >>Wednesday, January 16, 2002, 6:04:36 AM, Susan Hares wrote: > >> > >> > Kunihiro: > >> > >> > We are not changing the FSM so I would be surprised if the change > >> > was anything but modest. Usually specifications with "no big" deal get > >> > interpreted differently. Inter-operable code means you tie down the > >> details. > >> > >> > The comment was on the clarity of the specification. > >> > >> > If you have a specific comment on the text of the state machine, > >> > can you propose the concerns you have as a revision to the text. > >> > >> > Sue > >> > >> > PS -- I just love last call on a draft... It's when > >> > everyone finally reads a new section ;-)... > >> > >> > >> > At 05:39 PM 1/15/2002 -0800, Kunihiro Ishiguro wrote: > >> >> >> > Is the expoential backoff in the FSM in current implementations? > >> >> >> > >> >> >> I guess we are going to find this out as part of the implementation > >> >> >> report. And if it is not in (at least two) current implementations, > >> >> >> we'll take it out of the text. > >> >> > > >> >> >I implemented Cease subcode in zebra-0.92a but not exponential > backoff. > >> >> >I checked that new subcode does not put other BGP stack in trouble > >> >> >(checked Cisco and Juniper). > >> >> > >> >>First of all, sorry for talking about specific implementation. In > >> >>Zebra implementation we've implemented Cease subcode. The code is in > >> >>CVS repository. I'll prepare a release version for that. > >> >> > >> >>And also we've implemented exponetial backoff. It is done without > >> >>introducing new FSM status. It is not a big deal. Just check a flag > >> >>in a few functions. > >> > >> > >> >>Exponential backoff is a good feature. I can't understand why it > >> >>require change to FSM.
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: Comments on FSM Susan Hares
- Re: bgp4-17 Cease subcode Vincent Gillet
- Re: bgp4-17 Cease subcode Vincent Gillet
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: Comments on FSM Jeffrey Haas
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Kunihiro Ishiguro
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode andrewl
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Randy Bush
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: Comments on FSM Alex Zinin
- Re: bgp4-17 Cease subcode Randy Bush
- Re: Comments on FSM Kunihiro Ishiguro
- Comments on FSM Alex Zinin
- Re: bgp4-17 Cease subcode Enke Chen
- Re: bgp4-17 Cease subcode Greg Hankins
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Kunihiro Ishiguro
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Kunihiro Ishiguro
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Jeffrey Haas
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Alex Zinin
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Eric Gray
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Eric Gray
- Re: bgp4-17 Cease subcode Jeff Pickering
- Re: bgp4-17 Cease subcode Vincent Gillet
- Re: bgp4-17 Cease subcode Russ White
- Re: bgp4-17 Cease subcode Susan Hares
- Re: bgp4-17 Cease subcode Yakov Rekhter
- Re: bgp4-17 Cease subcode Jeffrey Haas
- Re: bgp4-17 Cease subcode Yakov Rekhter
- Re: bgp4-17 Cease subcode Tom Petch
- Re: bgp4-17 Cease subcode Yakov Rekhter
- bgp4-17 Cease subcode Tom Petch
- Re: processing order of reach/unreach in rfc2858b… Jeffrey Haas
- Re: processing order of reach/unreach in rfc2858b… Enke Chen