Re: OSPF WG Charter Proposal
"Choudhury, Gagan L, ALASO" <gchoudhury@ATT.COM> Mon, 11 November 2002 17:42 UTC
Received: from cherry.ease.lsoft.com (cherry.ease.lsoft.com [209.119.0.109]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA25630 for <ospf-archive@LISTS.IETF.ORG>; Mon, 11 Nov 2002 12:42:46 -0500 (EST)
Received: from walnut (209.119.0.61) by cherry.ease.lsoft.com (LSMTP for Digital Unix v1.1b) with SMTP id <3.007C1EBF@cherry.ease.lsoft.com>; Mon, 11 Nov 2002 12:45:10 -0500
Received: from DISCUSS.MICROSOFT.COM by DISCUSS.MICROSOFT.COM (LISTSERV-TCP/IP release 1.8e) with spool id 344256 for OSPF@DISCUSS.MICROSOFT.COM; Mon, 11 Nov 2002 12:45:10 -0500
Received: from 192.128.134.71 by WALNUT.EASE.LSOFT.COM (SMTPL release 1.0f) with TCP; Mon, 11 Nov 2002 12:45:09 -0500
Received: from attrh3i.attrh.att.com ([135.71.62.12]) by kcmso2.proxy.att.com (AT&T IPNS/MSO-4.0) with ESMTP id gABGxvkQ012757 for <OSPF@DISCUSS.MICROSOFT.COM>; Mon, 11 Nov 2002 11:45:10 -0600 (CST)
Received: from occlust04evs1.ugd.att.com (135.71.164.13) by attrh3i.attrh.att.com (6.5.019) id 3D8B5605011A3B77 for OSPF@DISCUSS.MICROSOFT.COM; Mon, 11 Nov 2002 12:45:09 -0500
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Thread-Topic: OSPF WG Charter Proposal
Thread-Index: AcKHrcK9KaWiOExTQ+y8b/iUH8+fHQB+7Cvg
Message-ID: <28F05913385EAC43AF019413F674A017023F684D@OCCLUST04EVS1.ugd.att.com>
Date: Mon, 11 Nov 2002 12:45:08 -0500
Reply-To: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
Sender: Mailing List <OSPF@DISCUSS.MICROSOFT.COM>
From: "Choudhury, Gagan L, ALASO" <gchoudhury@ATT.COM>
Subject: Re: OSPF WG Charter Proposal
To: OSPF@DISCUSS.MICROSOFT.COM
Precedence: list
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by ietf.org id MAA25630
Dennis, Thanks for your message and taking the time to read our draft. You point out very correctly that adjacencies should not be taken down in the presence of a spike of (non-Hello, not adjacency related) data base traffic (i.e., LSUs). I suspect that you do it by looking at the OSPF packet header and if you find it to be a Hello then you process it ahead of other packets (LSUs). This way even if there is a large spike of LSUs, you never miss Hello and keep the adjacency up. We call this "Prioritized Treatment to Hello Packets". We are not doubting that you do it but since it is not stated explicitly in OSPF we feel that it is important to explicitly point out the importance of "Prioritizing Hello Packets". We suggest use of special marking so that Hello packets can be easily separated from LSUs and it would be easy to prioritize them. However, even without special marking if you can easily separate out Hello packets and process them ahead of LSUs then that is perfectly alright. We did the simulation study to show what happens if Hellos are processed at the same priority as incoming LSUs (and so can get behind many LSUs under congestion), we did not mean to say "implementations always do this". In fact we say in the abstract that some implementations are perhaps already prioritizing Hellos in order to not lose adjacency but we want all implementations to do them. In the simulation as we prioritized Hello we saw that adjacencies were never declared down (as expected). However, at a larger LSA storm threshold there were a large number of Retransmissions which caused sustained heavy congestion at a subset of the node processors (the ones with high adjacency). (We used fixed retransmission timer values of 5 seconds or 10 seconds throughout the simulation). In this case even though links were not being declared down, the sustained heavy congestion due to retransmissions is unacceptable and so we call this a "cliff" even though it is not as harmful as losing adjacencies. We show by simulations that the retransmission traffic may be reduced significantly (and therefore the "cliff" moved to the right significantly) by prioritizing Acknowledgment packets. However, this is not necessarily the only solution. There are beneficial effects of slowing down retransmissions under congestion as well (those simulation results are not reported yet). Again, we are not saying that implementations do not take any special measures to reduce retransmission traffic under heavy congestion. However, these special measures are not spelled out explicitly in OSPF but are important enough for explicit mention so that all implementations can benefit from it. You wondered the cause of the problem or LSA storm (near synchronous update of a very large number of LSAs). It can be caused by one/more link failures, one/more node failures, sudden synchronization/generation of a large number of LSAs (just by coincidence or even triggered by some software bug or improper operational procedure). The problem happens very rarely but it does happen. Also, it is a problem only in large networks with say hundreds of nodes, thousands of trunks, very high adjacency at some nodes, and very large number of LSAs, e.g., large number of ASE LSAs or large number of LSAs that carry link reserved bandwidth information (similar to traffic-engineering LSAs). Gagan Choudhury -----Original Message----- From: Dennis Ferguson [mailto:dennis@JUNIPER.NET] Sent: Saturday, November 09, 2002 12:05 AM To: OSPF@DISCUSS.MICROSOFT.COM Subject: Re: OSPF WG Charter Proposal Hello, I just read draft-ietf-ospf-scalability-02.txt, along with draft-ash-manral-ospf-congestion-control-00.txt which seems to be related, and find I am made a little uncomfortable by the way the problem(s) is being described and hence the inevitability (or even desirability) of the solutions offered. Please excuse me if this has been discussed before; I skimmed and deleted a bunch of messages about this before actually reading the documents and discovering that I cared. > The various proposals for OSPF Scalability improvements are to move > the LSA storm threshold (or "cliff") significantly upwards. Some people > feel that this should be done only through smart (proprietary) > implementation. However, the points against that are the following: I think the LSA storm threshold, or cliff, is not a problem in itself, but rather is a symptom of an underlying problem. Because the problem was not identified explicitly, however, it took some time to figure out what might be causing this cliff (guessed at mostly by working backwards from the proposed solution(s)). The problem which causes this appears to me to actually be that the implementation allowed load from non-adjacency- maintenance traffic (i.e. essentially traffic which wasn't Hellos) to effect adjacency maintenance, in particular by incorrectly dropping adjacencies to routers which were in fact still operating. The problem of erroneously dropping adjacencies to routers which are still up when under (unrelated) stress is the canonical problem which leads to network routing collapse. Just about anyone who has implemented a link state protocol (or any protocol where adjacencies are maintained for that matter, e.g. BGP or PPP), deployed it and watched it break a couple of times knows this by heart, though it is certainly worth writing down for implementors who haven't got this far yet. If you want stable routing then implementations must be done in a way such that this *does* *not* *happen*, that is the implementation must ensure that the cost of spikes in database flooding is paid solely by increased convergence time, and is never the cause of adjacencies being incorrectly declared down. If implementations are done this way then the ultimate scalability of the protocol implementation (beyond data base memory requirements, though data base memory has gotten inordinately cheap these days) is dictated solely by how long you are willing to wait for your network to converge after spikes, and that is a continuous degradation with no "cliff" at all. Now I would simply assert that there is nothing about the OSPF protocol, as currently specified, which prevents implementations from achieving the nirvana described in the previous paragraph. Nothing. It may not be easy to get an implementation to work this way, but I think this difficulty is inherent in the problem that needs to be solved within the (inherently proprietary) real-life environment that the implementation runs inside which sometimes leaves it short of resources, and is hence far removed from the protocol definition itself. Errors which cause adjacencies to be incorrectly declared down are implementation errors, not protocol errors, and are probably best addressed in the implementation which is broken rather than the protocol which isn't. An implementation which doesn't work like the previous paragraph is one which still has bugs in need of fixing. So while I like the simulations done in the scalability draft a lot, I'll just point out how this draft seems to go given the assumptions and views above: (1) You did simulations using an OSPF implemention which had a bug, in that it unnecessarily and erroneously declared adjacencies down in the presence of a spike of (non-Hello, not adjacency related) data base traffic. How you came to the conclusion that implementations which don't make this error are "proprietary" while the particular implementation you simulated which made this error is not, I don't understand. (2) Rather than addressing the problem within your implementation, you elected to try to solve the problem by having the implementation's neighbours help out by setting some new bits in packets (i.e. a protocol change) to assist in a prioritization scheme. I don't mind this, except to note that changing an established protocol and then waiting for everyone to catch up to the change is usually the least productive, longest-term way to fix bugs in an implementation which has router neighbours from multiple vendors it is relying on to do this for a "fix". (3) You then redid the simulations after the "fix" was applied, and found that while there was still a cliff the "fix" had moved it up and to the right. That is, despite the fact that you even went to the trouble of making a protocol change, the "fix" still didn't actually fix the problem, which is that you have a cliff at all. The routing failure can apparently still occur, the danger of your network routing melting is still there, it just apparently needs a much bigger event to set it off. This is telling me that, independent of this, you really need to look at the implementation you are simulating with and determine what's actually wrong with it, as it is still an accident waiting to happen. But, more important than this, note that the above comments are still based on speculation about the root cause of the problem, since according to the draft the problem is a "cliff" and that the fix is to move the cliff up and to the right, and it is hard to relate any of this to the underlying protocol. You really, really must identify what the routers are actually doing wrong, in words rather than from numerical simulation results, since if you don't understand and present this your readers will never understand the spectrum of possible fixes and why they should work. For example, if the problem really is one of routers erroneously dropping adjacencies where the neighbour is still up, then prioritization of Hello packet reception and processing seems like not a bad way to help keep this from occurring (we can debate whether this is better done by a protocol change or by just having the routers look at the OSPF packet type before queuing), since its effect is to keep adjacency errors from occuring by spending CPU keeping this accurate before spending CPU doing anything which can be deferred. It is far from the only way to help, however. For example, another way to help fix this (perhaps in addition to the above) might be for the routers to observe when they've been under load, with a long queue of work to do that they've perhaps been dropping packets off the end of, and to defer making any decisions to drop adjacencies until after this condition clears. This still makes errors about the state of adjacencies, but changes the nature of the errors from one where the errors you make are terminating adjacencies which are really still up, which can melt your network, to one where the errors you make are concluding adjacencies might still be up when they have in fact gone down, a temporary booboo whose sole effect might be to increase convergence time (which is how you should be paying for load spikes in any event). Note that this adds robustness (i.e. probably is the best you can do in the circumstance) not only when you are getting too many LSAs, but also when you are temporarily too short of CPU to even keep up to the incoming Hellos. There are a lot of ways to address this particular problem (if this actually is the "cliff" problem) in implementations, enough that your implementation really shouldn't have "cliff" problems to begin with, and you should just keep fixing the implementation until the "cliff" problems go away so you don't have to live in fear that something big will happen which will cause your network routing to melt and fail to recover. This is so fundamental that I wouldn't wait for the wide deployment of protocol changes to ensure that this is fixed (even if protocol changes were exceptionally useful, which I doubt), but would just get it fixed in the implementations you've already got talking to the other routers you've already got. Once you've got basic reliability thing out of the way you can then begin to look at addressing the real scaling problem which remains, which is convergence time and maximizing the size of the network you can have while still having it converge in acceptably short periods. This is the problem which I think might benefit from protocol changes, both related to minimizing flooding load and otherwise. It is just important to be clear on why these changes are being done as well, for about the same reasons it is important to understand what a "cliff" problem is, that is you'll never get the right answer without being very clear about why you're doing it. For example, sending a lot of unnecessary retransmissions to a neighbour should never, ever cause that neighbour to break (if it does he's the one with the problem needing fixing), but it is still not a good thing to do since it makes everyone work harder and that is guaranteed to slow convergence. The latter thing is still very much worth fixing, you just shouldn't be doing the fixing to hack around implementation-robustness "cliff" problems. To cut this short, then, I'll just point out what I wanted to say in the first place about the congestion draft, which is that in general I don't think using rate controls ever works right given the problem which I think this should be addressing. The problem with rate controls is that if they are fixed configuration they're always wrong, while if they're automatically adjustable you need to compute adjustments slowly to ensure stability, which in turn means that the rate you compute is usually wrong for current circumstances. It is either too high or too low. In the case of flooding to neighbours, both of these cases are bad. Sending too fast is bad, not because it will break anything (it absolutely shouldn't) but because processing a big queue of stale stuff, perhaps with accompanying spurious retransmissions, makes everyone work harder than they need to and delays convergence. Sending too slow, however, is also bad because it puts you in the incredibly silly situation of having an unconverged network where the routers are simultaneously idle some of the time, which slows convergence for no discernable advantage at all. I hence, without having thought enough about it, like schemes which attempt to do window control (i.e. limit the number of outstanding, un-acked LSAs instead), using the delay of and rate at which acks come back as the control signal, if only because there are examples of transport protocols working like this which exhibit both stability and a fairly quick response to changes. In the case of the SPF computation, however, imposing standardized rate limits on the maximum frequency with which an implementation may compute this just seems irrational, and makes no sense at all unless the implementation entirely lacks the ability to prioritize its actions, an attribute which is certainly not universal and is unlikely to lead to great success in making any of this stuff work reliably. I mean, if a router is one SPF computation away from converging its routing, and knows it has absolutely nothing else to do at the moment, is there any rational reason to require it to sit idle for the four seconds to some timer expiry before actually doing it? I think most implementations should be able to figure out for themselves when it is a good idea to run the Dykstra and when it isn't, and so should retain the freedom to do this. So I think the "congestion" draft has some good ideas (I also like other attempts to find ways to minimize flooding work) but seems confused about why these are good ideas which makes some of its other ideas not-so-good, while the "scalability" draft is proposing a protocol change to relieve (but not entirely fix) a mysterious problem which, if better defined, might turn out to be a classic adjacency maintenance implementation bug which is fixable with no protocol changes at all. I think the latter draft would be a whole lot better (in fact, would be really, really great) if, instead of making a protocol change, you instead did and documented whatever was necessary to make the "cliff" disappear entirely from your simulation implementation without making protocol changes, at which point the (convergence) scaling problem the former draft should probably be aimed at addressing would be a lot clearer. Dennis Ferguson
- OSPF WG Charter Proposal Rohit Dube
- Re: OSPF WG Charter Proposal Naidu, Venkata
- Re: OSPF WG Charter Proposal Rohit Dube
- Re: OSPF WG Charter Proposal Naidu, Venkata
- Re: OSPF WG Charter Proposal Acee Lindem
- Re: OSPF WG Charter Proposal Manral, Vishwas
- Re: OSPF WG Charter Proposal Mukesh Gupta
- Re: OSPF WG Charter Proposal Rohit Dube
- Re: OSPF WG Charter Proposal Acee Lindem
- Re: FW: OSPF WG Charter Proposal Rohit Dube
- Re: FW: OSPF WG Charter Proposal Don Goodspeed
- Re: FW: OSPF WG Charter Proposal Naidu, Venkata
- Re: FW: OSPF WG Charter Proposal Rohit Dube
- Re: OSPF WG Charter Proposal Manral, Vishwas
- Re: FW: OSPF WG Charter Proposal Manral, Vishwas
- Re: FW: OSPF WG Charter Proposal Jeff Parker
- Re: FW: OSPF WG Charter Proposal Joyal, Daniel R (Daniel)
- Re: OSPF WG Charter Proposal Ash, Gerald R (Jerry), ALASO
- Re: OSPF WG Charter Proposal Manohar Naidu Ellanti
- Re: OSPF WG Charter Proposal Dave Katz
- Re: OSPF WG Charter Proposal Ash, Gerald R (Jerry), ALASO
- Re: OSPF WG Charter Proposal Rohit Dube
- Re: OSPF WG Charter Proposal Joel M. Halpern
- Re: OSPF WG Charter Proposal Tony Przygienda
- Re: OSPF WG Charter Proposal John Drake
- Re: OSPF WG Charter Proposal Jeff Parker
- Re: OSPF WG Charter Proposal Manral, Vishwas
- Re: OSPF WG Charter Proposal Manral, Vishwas
- Re: OSPF WG Charter Proposal Tony Przygienda
- Re: OSPF WG Charter Proposal Ash, Gerald R (Jerry), ALASO
- Re: OSPF WG Charter Proposal Rohit Dube
- Re: OSPF WG Charter Proposal Ash, Gerald R (Jerry), ALASO
- Re: OSPF WG Charter Proposal Dave Katz
- Re: OSPF WG Charter Proposal Manohar Naidu Ellanti
- Re: OSPF WG Charter Proposal Dave Katz
- Re: OSPF WG Charter Proposal Choudhury, Gagan L, ALASO
- Re: OSPF WG Charter Proposal Manral, Vishwas
- Re: OSPF WG Charter Proposal Dave Katz
- Re: OSPF WG Charter Proposal Manral, Vishwas
- Re: OSPF WG Charter Proposal Tony Przygienda
- Re: OSPF WG Charter Proposal Rohit Dube
- Re: OSPF WG Charter Proposal Acee Lindem
- Re: OSPF WG Charter Proposal Choudhury, Gagan L, ALASO
- Re: OSPF WG Charter Proposal Dennis Ferguson
- Re: OSPF WG Charter Proposal Dave Katz
- Re: OSPF WG Charter Proposal Alex Zinin
- Re: OSPF WG Charter Proposal Choudhury, Gagan L, ALASO