Re: Congestion Avoidance & Control for OSPF Networks <draft-ash -manral-ospf-congestion-control-00.txt>

"Cheng, Dean" <DCheng@POLARISNETWORKS.COM> Mon, 08 July 2002 18:08 UTC

Received: from cherry.ease.lsoft.com (cherry.ease.lsoft.com [209.119.0.109]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA28595 for <ospf-archive@LISTS.IETF.ORG>; Mon, 8 Jul 2002 14:08:57 -0400 (EDT)
Received: from walnut (209.119.0.61) by cherry.ease.lsoft.com (LSMTP for Digital Unix v1.1b) with SMTP id <13.0068523D@cherry.ease.lsoft.com>; Mon, 8 Jul 2002 14:09:42 -0400
Received: from DISCUSS.MICROSOFT.COM by DISCUSS.MICROSOFT.COM (LISTSERV-TCP/IP release 1.8e) with spool id 64814 for OSPF@DISCUSS.MICROSOFT.COM; Mon, 8 Jul 2002 14:09:41 -0400
Received: from 216.15.8.227 by WALNUT.EASE.LSOFT.COM (SMTPL release 1.0f) with TCP; Mon, 8 Jul 2002 14:09:41 -0400
Received: from jupiter.polarisnetworks.com ([192.168.0.19]) by mercury.polarisnetworks.com (Post.Office MTA v3.5.3 release 223 ID# 0-0U10L2S100V35) with ESMTP id com; Mon, 8 Jul 2002 11:05:05 -0700
Received: by JUPITER with Internet Mail Service (5.5.2650.21) id <32SQFK28>; Mon, 8 Jul 2002 11:07:58 -0700
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain; charset="iso-8859-1"
Message-ID: <DFC78D6E417DD411A26F0001021D63869016B0@JUPITER>
Date: Mon, 08 Jul 2002 11:07:57 -0700
Reply-To: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
Sender: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
From: "Cheng, Dean" <DCheng@POLARISNETWORKS.COM>
Subject: Re: Congestion Avoidance & Control for OSPF Networks <draft-ash -manral-ospf-congestion-control-00.txt>
Comments: To: "Ash, Gerald R (Jerry), ALASO" <gash@ATT.COM>
To: OSPF@DISCUSS.MICROSOFT.COM
Precedence: list

Jerry et. al,

   Just have a general comment on your I-D.

   For congestion and overload problems that may occur
   in the networks where LS protocols used, there are
   quite a few possibilities that may have nothing
   to do with the LS protocols themselves, such as:

     1) Implementation faults.

     2) Configuration problems.
        This includes the wrong setting of protocols'
        parameters such as timer values.

     3) Network design/planning problems.
        This includes, for example, too many nodes in
        a single area/peer-group, etc.

   It would be better in your I-D to list all the
   non-protocol related problems along with suggestions
   (I believe that would resolve most of the problems
   as seen), before looking at the protocols themselves.
   Enhancements to existing protocols, if any,
   may not be able to resolve networking problems
   that are caused by anyone of the above.

   Also, the routing protocol used in the Frame Relay
   network where a failure occurred (the example
   given in the Section 2) is actually not a link-state
   protocol at all.

Regards,
Dean

> -----Original Message-----
> From: Ash, Gerald R (Jerry), ALASO [mailto:gash@ATT.COM]
> Sent: Wednesday, July 03, 2002 12:20 PM
> To: OSPF@DISCUSS.MICROSOFT.COM
> Subject: Re: Congestion Avoidance & Control for OSPF Networks
> <draft-ash-manral-ospf-congestion-control-00.txt>
>
>
> Hello John, All:
>
> Hopefully by now you've had chance to review our latest draft
> on OSPF congestion control
> http://search.ietf.org/internet-drafts/draft-ash-manral-ospf-c
> ongestion-control-00.txt.
>
> As requested (by John), we've narrowed the scope of the work
> to propose specific congestion control mechanisms not already
> addressed by other documents in the OSPF WG.
>
> These mechanisms will prevent control overloads from bringing
> down a network, as they have in the past (I give a brief
> background below).
>
> We request your comments on our I-D.  We also request your
> suggestions for advancing the work in the working group.
>
> Thanks,
> Jerry Ash
>
> Brief background:
>
> AT&T has suffered a few massive failures of operational
> networks due to control overloads of link-state (LS)
> protocols ('OSPF', 'PNNI', etc.).  These outages are
> documented and referenced in
> http://search.ietf.org/internet-drafts/draft-ash-manral-ospf-c
ongestion-control-00.txt and in
http://search.ietf.org/internet-drafts/draft-ash-ospf-isis-congestion-contro
l-02.txt.

In the instances cited, the LS protocol overwhelmed the network with a
control load 'storm' ('LSA overload'), which brought the network down, and
then prevented its recovery.  Fortunately such failures are very rare;
however, 'rare' for such events is unacceptable, 'never' is the goal.  Other
service providers have experienced similar outages caused by similar
problems.

Such failures are not the fault of the service provider operation or the
vendor/equipment implementation.  They are due to shortcomings in the
link-state protocols themselves -- thus the need for the enhancements
proposed in the draft.

The proposals in the draft will prevent such events from being triggered,
and/or provide recovery mechanisms in case such events occur.

The problem of control overload is becoming even more acute as LS protocols
are enhanced to support new capabilities, such as:

MPLS traffic engineering
http://search.ietf.org/internet-drafts/draft-katz-yeung-ospf-traffic-06.txt,
GMPLS
http://www.ietf.org/internet-drafts/draft-ietf-ccamp-ospf-gmpls-extensions-0
7.txt,
multi-area TE
http://www.ietf.org/internet-drafts/draft-cheng-ccamp-ospf-multiarea-te-exte
nsions-00.txt,
MPLS/DiffServ TE
http://search.ietf.org/internet-drafts/draft-ietf-tewg-diff-te-reqts-05.txt,
etc.

With this ever advancing complexity, the need keeps increasing to address
the stated problem, soon.