Re: OSPF WG Charter Proposal

"Choudhury, Gagan L, ALASO" <gchoudhury@ATT.COM> Fri, 08 November 2002 00:14 UTC

Received: from cherry.ease.lsoft.com (cherry.ease.lsoft.com [209.119.0.109]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA19103 for <ospf-archive@LISTS.IETF.ORG>; Thu, 7 Nov 2002 19:14:17 -0500 (EST)
Received: from walnut (209.119.0.61) by cherry.ease.lsoft.com (LSMTP for Digital Unix v1.1b) with SMTP id <22.007B8E61@cherry.ease.lsoft.com>; Thu, 7 Nov 2002 19:16:45 -0500
Received: from DISCUSS.MICROSOFT.COM by DISCUSS.MICROSOFT.COM (LISTSERV-TCP/IP release 1.8e) with spool id 331535 for OSPF@DISCUSS.MICROSOFT.COM; Thu, 7 Nov 2002 19:16:45 -0500
Received: from 192.128.166.71 by WALNUT.EASE.LSOFT.COM (SMTPL release 1.0f) with TCP; Thu, 7 Nov 2002 19:16:45 -0500
Received: from attrh3i.attrh.att.com ([135.71.62.12]) by almso2.proxy.att.com (AT&T IPNS/MSO-4.0) with ESMTP id gA7Mx7T2019055 for <OSPF@DISCUSS.MICROSOFT.COM>; Thu, 7 Nov 2002 19:16:44 -0500 (EST)
Received: from occlust04evs1.ugd.att.com (135.71.164.13) by attrh3i.attrh.att.com (6.5.019) id 3D8B56050103E1B3 for OSPF@DISCUSS.MICROSOFT.COM; Thu, 7 Nov 2002 19:16:44 -0500
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Thread-Topic: Re: OSPF WG Charter Proposal
Thread-Index: AcKGu+RR8DsOCu/0EdaixgDAT2iu5g==
Message-ID: <28F05913385EAC43AF019413F674A017123E4D@OCCLUST04EVS1.ugd.att.com>
Date: Thu, 07 Nov 2002 19:16:44 -0500
Reply-To: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
Sender: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
From: "Choudhury, Gagan L, ALASO" <gchoudhury@ATT.COM>
Subject: Re: OSPF WG Charter Proposal
To: OSPF@DISCUSS.MICROSOFT.COM
Precedence: list
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by ietf.org id TAA19103

It is great to see that there are lots of discussions on the OSPF Scalability issues (including flooding optimization, prioritized treatment of Hellos so as not to lose adjacency under congestion, congestion notification to neighbor and slowing down of control messages based on that, etc.).  

It appears that people mostly agree that occasionally operational networks do see large scale flooding of control messages or LSA storm (triggered by hardware failure, software bug, faulty operational practice, and so on).  (I have personal experience of observing such LSA storms and resulting CPU/Memory congestion).  In some cases of large scale flooding of control messages the network may get out of it with little difficulty but we all know that there are also cases that results in failures of many nodes and trunks and loss of traffic which is absolutely unacceptable from an operator's point of view.  People might also agree that there is a LSA storm threshold (or "cliff") and a near-simultaneous generation of LSA storm exceeding this threshold (or "cliff") may cause problems.  Ideally, this threshold should be as large as the LSA database size even in very large networks (with potentially many ASE LSAs, and many Traffic-Engineering-Related LSAs in upcoming MPLS networks) but in reality that does not appear to be the case.

The various proposals for OSPF Scalability improvements are to move the LSA storm threshold (or "cliff") significantly upwards.  Some people feel that this should be done only through smart (proprietary) implementation.  However, the points against that are the following:

1) It may be OK to rely on the smart proprietary implementation of a vendor if we run a single-vendor network.  However,  Operators would intend to run multi-vendor networks with standard, non-proprietary solution to scalability problems.  As an example, Dave Katz points out that it is very important not to lose adjacencies during congestion.  We absolutely agree with that and propose either prioritization of Hello messages (facilitated by special marking) or using Implicit Hello (use any received message over an adjacency for the purpose of keeping it alive) to achieve the same goal in a multi-vendor network.  We have also seen large number of retransmissions (default retransmission timer is 5 seconds) as a major cause of congestion and propose prioritizing LSA acknowledgments and slowing down retransmissions during congestion to reduce retransmission traffic.  This can perhaps be achieved by a specific vendor in a proprietary way but an Operator would like to see a non-proprietary solution to this problem.

2) It has been argued that some of the protocol extensions being proposed would make it too difficult for vendors to implement it.  I don't quite get it.  Isn't it also being said that some vendors are already implementing them in a proprietary way ?

3) Some of the protocol extensions proposed are difficult (may be even impossible) to implement in a purely proprietary way.  One example is congestion notification (already pointed out by Dave Katz).  Another example is flooding over only one of many parallel point-to-point links between neighbors (this is already being done in PNNI in a non-proprietary way).               

        	
                        Gagan Choudhury