Re: OSPF WG Charter Proposal

"Joel M. Halpern" <joel@STEVECROCKER.COM> Thu, 07 November 2002 15:44 UTC

Received: from cherry.ease.lsoft.com (cherry.ease.lsoft.com [209.119.0.109]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA28708 for <ospf-archive@LISTS.IETF.ORG>; Thu, 7 Nov 2002 10:44:45 -0500 (EST)
Received: from walnut (209.119.0.61) by cherry.ease.lsoft.com (LSMTP for Digital Unix v1.1b) with SMTP id <11.007B79E1@cherry.ease.lsoft.com>; Thu, 7 Nov 2002 10:46:47 -0500
Received: from DISCUSS.MICROSOFT.COM by DISCUSS.MICROSOFT.COM (LISTSERV-TCP/IP release 1.8e) with spool id 329696 for OSPF@DISCUSS.MICROSOFT.COM; Thu, 7 Nov 2002 10:46:47 -0500
Received: from 208.184.15.238 by WALNUT.EASE.LSOFT.COM (SMTPL release 1.0f) with TCP; Thu, 7 Nov 2002 10:36:47 -0500
Received: from [63.113.114.131] (HELO JLaptop.stevecrocker.com) by EXECDSL.COM (CommuniGate Pro SMTP 3.3) with ESMTP id 3933007 for OSPF@DISCUSS.MICROSOFT.COM; Thu, 07 Nov 2002 10:36:46 -0500
X-Sender: joel@stevecrocker.com@mail.stevecrocker.com
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Message-ID: <5.1.0.14.0.20021107102407.01642ec8@mail.stevecrocker.com>
Date: Thu, 07 Nov 2002 10:34:53 -0500
Reply-To: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
Sender: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
From: "Joel M. Halpern" <joel@STEVECROCKER.COM>
Subject: Re: OSPF WG Charter Proposal
To: OSPF@DISCUSS.MICROSOFT.COM
In-Reply-To: <28F05913385EAC43AF019413F674A0170167B229@OCCLUST04EVS1.ugd .att.com>
Precedence: list

There seem to be multiple things mixes together in this discussion, some of
which are nicely separated in the draft below.

One set of things are behaviors like pacing link start, pacing flooding,
marking critical packets, etc.  These have a number of useful
properties.  They improve the overall result without changing the basic
mechanisms.  They do not introduce significant risk of interaction between
mechanisms.

A second set of things mentioned are issues like hitless restart.  While
these have more impact, they have benefits in a number of different
regards, and can be evaluated on their own merits.

Then there are manual mechanisms to help flooding in very meshy
topologies.  While somewhat dangerous, these have proved
necessary.  Because they are manual, they can be applied with sensitivity
to their overall topology impact.

Then there are automatic information distribution change mechanisms.  This
covers such things as automatic area change and alternative flooding
algorithms.  These are extremely dangers.  They interact very strongly with
the basic robustness mechanisms of OSPF.  They introduce significant
additional complexity in many regards.  I strongly suggest that we stay
away from any such behaviors, and ensure that our charter keeps us away
from them.

Yours,
Joel M. Halpern

At 08:27 AM 11/7/2002 -0500, Ash, Gerald R (Jerry), ALASO wrote:
>Dave,
>
> > > Such failures are not the fault of the service provider
> > > operation or the vendor/equipment implementation.  They are
> > > due to shortcomings in the link-state protocols themselves --
> > > thus the need for the enhancements proposed in the draft.
>
> > I strongly disagree with this statement.  While the design of the
> > protocols can make it challenging, there is ample room in
> > implementation to provide stable and scalable networks.
> >
> > When a network collapses, the fault lies at the feet of the
> > implementers.  In every case I've seen (too many), the collapse was
> > inevitable sooner or later, due to naive design choices in software,
> > but at the same time was quite nonlinear in its onset (making any
> > predictive or self-monitoring approach pretty hopeless.)
> >
> > There are some things that would make the job easier, at the cost
> > of additional complexity, but pointing at network collapses
> > and blaming the protocols is disingenuous.
>
>I think you should review the ample evidence presented in
>http://www.ietf.org/internet-drafts/draft-ash-manral-ospf-congestion-control-00.txt
>that the protocols need to be enhanced to better respond to congestion
>collapse:
>
>- Section 2: documented failures and their root-cause analysis, across
>multiple service provider networks (also review the cited references)
>- Appendix B: vendor analysis of a realistic failure scenario similar to
>one experienced as discussed in Section 2 (perhaps you would like to
>provide your own analysis of this scenario based on your OSPF implementation)
>- Appendix C: simulation analysis of protocol performance (other I-D's
>being discussed provide analysis of proposed protocol extensions)
>
>To say that network collapse in *every* case is due to *naive design
>choices* ignores the evidence/analysis presented.  Based on the
>evidence/analysis, there is clearly room for the protocols to be improved
>to the point where networks *never* go down for hours or days at a time
>(drawing unwanted headlines & business impact).
>
>Jerry