Re: [GROW] I-D Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt
Robert Raszuk <robert@raszuk.net> Thu, 22 September 2011 16:52 UTC
Return-Path: <robert@raszuk.net>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D79E221F87D9 for <grow@ietfa.amsl.com>; Thu, 22 Sep 2011 09:52:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[AWL=-0.500, BAYES_00=-2.599, J_BACKHAIR_26=1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qevj5lbMk0wj for <grow@ietfa.amsl.com>; Thu, 22 Sep 2011 09:52:56 -0700 (PDT)
Received: from mail37.opentransfer.com (mail37.opentransfer.com [76.162.254.37]) by ietfa.amsl.com (Postfix) with SMTP id 4CBD321F84D9 for <grow@ietf.org>; Thu, 22 Sep 2011 09:52:56 -0700 (PDT)
Received: (qmail 24707 invoked by uid 399); 22 Sep 2011 16:55:19 -0000
Received: from unknown (HELO ?192.168.1.52?) (83.31.148.153) by mail37.opentransfer.com with SMTP; 22 Sep 2011 16:55:19 -0000
Message-ID: <4E7B6872.7010403@raszuk.net>
Date: Thu, 22 Sep 2011 18:55:14 +0200
From: Robert Raszuk <robert@raszuk.net>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2
MIME-Version: 1.0
To: "George, Wesley" <wesley.george@twcable.com>
References: <20110915135818.19974.94670.idtracker@ietfa.amsl.com> <34E4F50CAFA10349A41E0756550084FB0F8D14C1@PRVPEXVS04.corp.twcable.com> <4E7A5617.2080900@raszuk.net> <34E4F50CAFA10349A41E0756550084FB0F8D19E4@PRVPEXVS04.corp.twcable.com>
In-Reply-To: <34E4F50CAFA10349A41E0756550084FB0F8D19E4@PRVPEXVS04.corp.twcable.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "grow@ietf.org" <grow@ietf.org>
Subject: Re: [GROW] I-D Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: robert@raszuk.net
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Sep 2011 16:52:58 -0000
Hi Wes, Many thx for your comments. I will clarify the corresponding sections in the draft. As to your point of routing loop danger caused by advertising additional paths via IBGP I think if you could illustrate a topology example where such loop could form it would help to perhaps make the spec more clear to address such concern. Many thx, R. > -----Original Message----- From: Robert Raszuk > [mailto:robert@raszuk.net] Sent: Wednesday, September 21, 2011 5:25 > PM To: George, Wesley Cc: grow@ietf.org Subject: Re: [GROW] I-D > Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt > > Hello Wes, > >> Other stuff: 2.1 - when discussing overhead and scale concerns for >> add paths, perhaps a citation to 4984 would be appropriate? > > I would prefer not to mix the growing internet scale concerns from > some of the operational practices/configuration based based scale > concerns. > > WEG] Understand, but I'm not sure that it's so easy to separate the > two. You'll find me saying the same thing to anyone suggesting a > change that has the net effect of significantly increasing the burn > rate for memory and CPU resources, whether it's a configuration > change or otherwise, because it still exacerbates the overall issue. > (more on that in a moment) > >> I've made similar comments to the SIDR folks, and I think generally >> anything that adds a non-trivial amount of impact to the growth >> curve of the routing system needs to consider this. > > I think there is substantial difference for local vs global size > increase of the routing system. Here in this work all concerns are > regarding to the local one. > > WEG] Generally, I'm not sure that I'd make so much of a distinction. > While yes, in theory changes of this type only impact the ASN that > chooses to implement it, rather than what it announces to the outside > world, the global scaling problem is due to the intersection between > available resources, their growth curve, and the growth curve of the > routing table. Saying that it only is a concern if it contributes to > the size of the DFZ routing table is oversimplifying the root > problem, because if internal scale problems exhaust the resources > available for both internal and external routes, you still have the > same end state - out of resources. In that case, the only difference > between a local scaling problem and a global problem is the > deployment penetration. If this is widely deployed, it has now > steepened the growth curve noted in 4984, because it still is using > some of the overall available resources. I've said on more than one > occasion that the iBGP routes carried by an SP are as much or more of > a problem than the growth of the global table because they don't have > nearly as much of the aggregation and optimization to reduce their > footprint. The only difference is the level of administrative control > over growth, but that's a fairly limited knob to turn - for lots of > reasons it may not be any more feasible to change things internally > to reduce internal route growth than it is to change global route > growth. Besides, I think that your draft is trying to have it both > ways - you malign Add Paths for having scaling problems, and then > seem content to gloss over a very similar problem created by your > solution simply because it appears to be slightly less severe and > more localized. > >> 4. This asserts that no code changes are necessary to RR clients. >> I'm not sure I totally agree with that... If the idea is to have a >> primary (best) RR and then N additional paths, the general >> assumption is that the N, N1, ... RRs are carrying routes that are >> less and less preferred. How does this system avoid the same sort >> of inconsistency of best path choice among different routers in the >> network if there is no way to identify those paths as secondary? I >> think you need some way to determine if the alternate routes are >> intended to be ECMP routes or backup routes... You may be able to >> cover this without code changes by using alternate configurations >> of other BGP preference indicators (MED, Localpref, metric, etc), >> perhaps with inbound route policy on the client or outbound on the >> RR, but since things like metric may be different based on where >> something is in the network, that may lead to inconsistency if used >> by itself. Even then, the draft doesn't discuss how this should be >> managed. > > > I stand by the claim that no code change is needed on clients. > Moreover no even additional policy change is required either. > > The best way to illustrate this is to compare presence of additional > BGP paths on the clients in the scenario where clients are > interconnected with full IBGP mesh or would get all paths with > add-path. In neither case there is a notion of RR telling client > which path is best or which is second best .. and there is number of > good reasons for that (one is that for RR numbering paths can be > different then for client, the other one is that when we would > withdraw any path advertised and ordered we would need to re > advertise with new order all remaining paths - that amount of churn > is non negligible). > > Each client's BGP best path is capable of making safe (loop free) > autonomous choice of paths in PIC/fast connectivity restoration/ibgp > multipath cases. > > WEG] I'm sorry, maybe I'm being thick, but I still don't understand > how this would work in a way that would always avoid routing loops. > Under normal state, you have a RR client reflecting its best path to > the client based on the routes it receives from the rest of its > neighbors, meaning that the clients don't have visibility to > candidate alternatives that the RR does, so they're all making the > same choice at least within the local cone of influence of that RR. > You add a second set of RRs (rr') that is announcing a second-best > path as if it was the best path to restore 1 (or more) of the > candidate alternatives to the client. The client receives the best > and 2nd-best path and evaluates them using standard methods. If the > thing that makes one route better than the other is something locally > interesting like metric, and the client's particular place in the > universe means that the metric is different as compared to other > clients, the P routers, and the RRs, it may choose the 2nd-best path > as best, and this may lead to routing loops if it tries to send the > route to another router that has a different belief of what the best > path is. This case is much more likely if the RR and RR' are not > collocated with all of their clients and/or each other. I think that > this may also be the case when the tiebreaker is router-id if you're > not careful of the way that you address your route-reflectors and/or > are not doing next-hop self at the edges. Only in the case where the > 2nd-best path is clearly worse to all members of the ASN (lower local > pref, longer AS-path, etc) are you assured of no possibility for two > routers each getting a different result when evaluating those two > different routes. I think that 4.2 covers some part of this case, in > the way that it documents its assumptions and what must be done to > enable deployment, especially the references to ignoring IGP metric, > but IMO it's not clear enough in the explanation why some of these > things must be done - the failure case isn't discussed. > >> 4.1 Also, there's a definite scaling consideration on the RR >> clients that isn't really discussed here - they are now going to be >> storing some number of additional routes and paths that is linearly >> related to the number of additional planes that are implemented. >> The addition of more RR sessions that presumably carry a portion of >> the full routing table now drives a non-trivial increase in memory >> footprint and processing overhead (and potentially convergence time >> for slower boxes). In the simplest case of 2 primary >> route-reflectors (for diversity), and 1 2nd-best path RR, you've >> added one session. If you want to carry a 3rd-best RR or have >> redundant 2nd-best RRs, you've added 4 sessions. It's fair to say >> that after a certain number of alternate paths, you start having >> less routes because there are only so many alternative exits, but >> otherwise there is a potentially large problem even if it's not >> quite as bad as addpaths. I might recommend that you do some >> analysis of the routing table to know where this threshold makes a >> difference, based on how many alternate paths an average route >> carries. In addition to being a scaling consideration, it also >> helps to inform what value of N becomes diminishing returns because >> most networks don't have that many backup paths. I envision this >> being something like "80% of routes have 4 or less paths, so moving >> beyond 4 planes may add overhead without much benefit..." > > It is absolutely correct to say that more paths client carries the > more CPU cycles and memory will be used to process and store them. > > However there is one observation to be made ... in 99% of cases I > have seen for distributing more then best path intra-domain the > sufficient number of paths per net on each client is 2. > > WEG] the document should explicitly state this. That's exactly what I > was getting at when I mentioned analysis above. If nearly all > applications only need one alternate to bring the total paths to two, > and more would be diminishing returns, the document should recommend > this, and note that more are possible if the operator's situation > dictates by simply repeating the deployment more times. I will note > that this guidance as well as the note at the end of 4.2 that "The > additional planes of route reflectors do not need to be fully > redundant as the primary one does" contradicts your example because > it has both RR1' and RR2'. > > > IMHO cost of bringing additional paths for control plane is quite > well understood today. Moreover it is quite implementation dependent. > Some implementation may use X bytes per path while the other one Y > bytes to store the same path. I think some separate BGP scaling > document (even as BCP) may be equally useful for any technique to > advertise more then best path. I would prefer to keep this outside of > the solutions work on how to advertise and distribute those > additional paths. > > WEG] I'm not looking for a level of detail that requires you to > discuss the number of bytes per path. Simply noting that scaling > issues exist and their general categories is enough. Make the logical > leap for your reader that implementing this solution brings with it > the scaling problems inherent with adding an additional route > reflector (and therefore its additional routes and paths). >> >> It may be appropriate to add a separate scaling considerations >> discussion to your deployment considerations (section 6) to >> discuss some of the above. > > I agree 100% .. but as stated above I do not find this specific to > diverse-path. It seems a general issue and I would highly encourage > someone to take a stub to document this in IETF/IDR/GROW or maybe at > Nanog community repository. > > WEG] it may not be specific to diverse-path, but diverse-path is > specifically advocating doing something that would otherwise not be > done (adding additional RR<->client BGP peers w/full routes beyond > what is necessary for simple RR redundancy). Therefore I still think > that you need to discuss the specific scaling concerns that this > implementation needs to consider, even if it's at a relatively high > level and the document notes that these are not unique to this > implementation. I agree that a general scaling considerations > document may be appropriate, but since that does not exist and I > don't want this document to be blocked awaiting completion of such, a > brief discussion within this document would help a lot. > >> There may be additional operational considerations from the >> perspective of route analysis - if you have either a homebuilt or >> off the shelf set of software that does route analysis for the >> purpose of event root-cause analysis, anomaly detection, capacity >> planning/failure analysis, etc, it has to be aware of these >> additional planes such that it returns the proper response when >> evaluating the routing table to determine what the expected >> behavior should be in the real network. This is especially >> important when it uses the table to determine how traffic will >> reroute during different failure scenarios. These tools may act >> like a participant in the mesh rather than a client in order to get >> a pure view of the table, and that may lead to undesired results if >> the multiple planes aren't taken into account. There may also be >> considerations for looking glass implementations and the actual >> information that is visible on the RRs and RR clients as the result >> of standard BGP show commands to aid in troubleshooting and >> verification. > > Very good point. Two comments on this .. > > - As to the impact to the tools I am less worried as presence of > additional paths can be a fact today as already mentioned with full > mesh or as used by some operator's by playing with adjusting > different weight values of pair of RRs on a per net basis. > > WEG] sure, but I don't think that it's valid to assume that all > analysis tools have taken this into account in their implementation, > so it's worth mentioning as an operational consideration. The comment > may be helpful to characterize the level of potential impact. > > - The use of "planes" in the draft is more of a conceptual nature. > In practice all paths are still kept in the single table where normal > best path is calculated. That means that tools like looking glass > should not observe any changes nor impact. > > WEG] a good clarification to add to the document. > > > This E-mail and any of its attachments may contain Time Warner Cable > proprietary information, which is privileged, confidential, or > subject to copyright belonging to Time Warner Cable. This E-mail is > intended solely for the use of the individual or entity to which it > is addressed. If you are not the intended recipient of this E-mail, > you are hereby notified that any dissemination, distribution, > copying, or action taken in relation to the contents of and > attachments to this E-mail is strictly prohibited and may be > unlawful. If you have received this E-mail in error, please notify > the sender immediately and permanently delete the original and any > copy of this E-mail and any printout. > >
- [GROW] I-D Action: draft-ietf-grow-diverse-bgp-pa… internet-drafts
- [GROW] Fwd: I-D Action: draft-ietf-grow-diverse-b… Robert Raszuk
- Re: [GROW] I-D Action: draft-ietf-grow-diverse-bg… George, Wesley
- Re: [GROW] I-D Action: draft-ietf-grow-diverse-bg… Robert Raszuk
- Re: [GROW] I-D Action: draft-ietf-grow-diverse-bg… George, Wesley
- Re: [GROW] I-D Action: draft-ietf-grow-diverse-bg… Robert Raszuk
- Re: [GROW] I-D Action: draft-ietf-grow-diverse-bg… Jakob Heitz
- Re: [GROW] I-D Action: draft-ietf-grow-diverse-bg… Robert Raszuk