Re: [GROW] I-D Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt

Robert Raszuk <robert@raszuk.net> Thu, 22 September 2011 16:52 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D79E221F87D9 for <grow@ietfa.amsl.com>; Thu, 22 Sep 2011 09:52:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[AWL=-0.500, BAYES_00=-2.599, J_BACKHAIR_26=1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qevj5lbMk0wj for <grow@ietfa.amsl.com>; Thu, 22 Sep 2011 09:52:56 -0700 (PDT)
Received: from mail37.opentransfer.com (mail37.opentransfer.com [76.162.254.37]) by ietfa.amsl.com (Postfix) with SMTP id 4CBD321F84D9 for <grow@ietf.org>; Thu, 22 Sep 2011 09:52:56 -0700 (PDT)
Received: (qmail 24707 invoked by uid 399); 22 Sep 2011 16:55:19 -0000
Received: from unknown (HELO ?192.168.1.52?) (83.31.148.153) by mail37.opentransfer.com with SMTP; 22 Sep 2011 16:55:19 -0000
Message-ID: <4E7B6872.7010403@raszuk.net>
Date: Thu, 22 Sep 2011 18:55:14 +0200
From: Robert Raszuk <robert@raszuk.net>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2
MIME-Version: 1.0
To: "George, Wesley" <wesley.george@twcable.com>
References: <20110915135818.19974.94670.idtracker@ietfa.amsl.com> <34E4F50CAFA10349A41E0756550084FB0F8D14C1@PRVPEXVS04.corp.twcable.com> <4E7A5617.2080900@raszuk.net> <34E4F50CAFA10349A41E0756550084FB0F8D19E4@PRVPEXVS04.corp.twcable.com>
In-Reply-To: <34E4F50CAFA10349A41E0756550084FB0F8D19E4@PRVPEXVS04.corp.twcable.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "grow@ietf.org" <grow@ietf.org>
Subject: Re: [GROW] I-D Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: robert@raszuk.net
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Sep 2011 16:52:58 -0000

Hi Wes,

Many thx for your comments. I will clarify the corresponding sections in
the draft.

As to your point of routing loop danger caused by advertising additional 
paths via IBGP I think if you could illustrate a topology example where 
such loop could form it would help to perhaps make the spec more clear 
to address such concern.

Many thx,
R.

> -----Original Message----- From: Robert Raszuk
> [mailto:robert@raszuk.net] Sent: Wednesday, September 21, 2011 5:25
> PM To: George, Wesley Cc: grow@ietf.org Subject: Re: [GROW] I-D
> Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt
>
> Hello Wes,
>
>> Other stuff: 2.1 - when discussing overhead and scale concerns for
>> add paths, perhaps a citation to 4984 would be appropriate?
>
> I would prefer not to mix the growing internet scale concerns from
> some of the operational practices/configuration based based scale
> concerns.
>
> WEG] Understand, but I'm not sure that it's so easy to separate the
> two. You'll find me saying the same thing to anyone suggesting a
> change that has the net effect of significantly increasing the burn
> rate for memory and CPU resources, whether it's a configuration
> change or otherwise, because it still exacerbates the overall issue.
> (more on that in a moment)
>
>> I've made similar comments to the SIDR folks, and I think generally
>> anything that adds a non-trivial amount of impact to the growth
>> curve of the routing system needs to consider this.
>
> I think there is substantial difference for local vs global size
> increase of the routing system. Here in this work all concerns are
> regarding to the local one.
>
> WEG] Generally, I'm not sure that I'd make so much of a distinction.
> While yes, in theory changes of this type only impact the ASN that
> chooses to implement it, rather than what it announces to the outside
> world, the global scaling problem is due to the intersection between
> available resources, their growth curve, and the growth curve of the
> routing table. Saying that it only is a concern if it contributes to
> the size of the DFZ routing table is oversimplifying the root
> problem, because if internal scale problems exhaust the resources
> available for both internal and external routes, you still have the
> same end state - out of resources. In that case, the only difference
> between a local scaling problem and a global problem is the
> deployment penetration. If this is widely deployed, it has now
> steepened the growth curve noted in 4984, because it still is using
> some of the overall available resources. I've said on more than one
> occasion that the iBGP routes carried by an SP are as much or more of
> a problem than the growth of the global table because they don't have
> nearly as much of the aggregation and optimization to reduce their
> footprint. The only difference is the level of administrative control
> over growth, but that's a fairly limited knob to turn - for lots of
> reasons it may not be any more feasible to change things internally
> to reduce internal route growth than it is to change global route
> growth. Besides, I think that your draft is trying to have it both
> ways - you malign Add Paths for having scaling problems, and then
> seem content to gloss over a very similar problem created by your
> solution simply because it appears to be slightly less severe and
> more localized.
>
>> 4. This asserts that no code changes are necessary to RR clients.
>> I'm not sure I totally agree with that... If the idea is to have a
>> primary (best) RR and then N additional paths, the general
>> assumption is that the N, N1, ... RRs are carrying routes that are
>> less and less preferred. How does this system avoid the same sort
>> of inconsistency of best path choice among different routers in the
>> network if there is no way to identify those paths as secondary? I
>> think you need some way to determine if the alternate routes are
>> intended to be ECMP routes or backup routes... You may be able to
>> cover this without code changes by using alternate configurations
>> of other BGP preference indicators (MED, Localpref, metric, etc),
>> perhaps with inbound route policy on the client or outbound on the
>> RR, but since things like metric may be different based on where
>> something is in the network, that may lead to inconsistency if used
>> by itself. Even then, the draft doesn't discuss how this should be
>> managed.
>
>
> I stand by the claim that no code change is needed on clients.
> Moreover no even additional policy change is required either.
>
> The best way to illustrate this is to compare presence of additional
> BGP paths on the clients in the scenario where clients are
> interconnected with full IBGP mesh or would get all paths with
> add-path. In neither case there is a notion of RR telling client
> which path is best or which is second best .. and there is number of
> good reasons for that (one is that for RR numbering paths can be
> different then for client, the other one is that when we would
> withdraw any path advertised and ordered we would need to re
> advertise with new order all remaining paths - that amount of churn
> is non negligible).
>
> Each client's BGP best path is capable of making safe (loop free)
> autonomous choice of paths in PIC/fast connectivity restoration/ibgp
> multipath cases.
>
> WEG] I'm sorry, maybe I'm being thick, but I still don't understand
> how this would work in a way that would always avoid routing loops.
> Under normal state, you have a RR client reflecting its best path to
> the client based on the routes it receives from the rest of its
> neighbors, meaning that the clients don't have visibility to
> candidate alternatives that the RR does, so they're all making the
> same choice at least within the local cone of influence of that RR.
> You add a second set of RRs (rr') that is announcing a second-best
> path as if it was the best path to restore 1 (or more) of the
> candidate alternatives to the client. The client receives the best
> and 2nd-best path and evaluates them using standard methods. If the
> thing that makes one route better than the other is something locally
> interesting like metric, and the client's particular place in the
> universe means that the metric is different as compared to other
> clients, the P routers, and the RRs, it may choose the 2nd-best path
> as best, and this may lead to routing loops if it tries to send the
> route to another router that has a different belief of what the best
> path is. This case is much more likely if the RR and RR' are not
> collocated with all of their clients and/or each other. I think that
> this may also be the case when the tiebreaker is router-id if you're
> not careful of the way that you address your route-reflectors and/or
> are not doing next-hop self at the edges. Only in the case where the
> 2nd-best path is clearly worse to all members of the ASN (lower local
> pref, longer AS-path, etc) are you assured of no possibility for two
> routers each getting a different result when evaluating those two
> different routes. I think that 4.2 covers some part of this case, in
> the way that it documents its assumptions and what must be done to
> enable deployment, especially the references to ignoring IGP metric,
> but IMO it's not clear enough in the explanation why some of these
> things must be done - the failure case isn't discussed.
>
>> 4.1 Also, there's a definite scaling consideration on the RR
>> clients that isn't really discussed here - they are now going to be
>> storing some number of additional routes and paths that is linearly
>> related to the number of additional planes that are implemented.
>> The addition of more RR sessions that presumably carry a portion of
>> the full routing table now drives a non-trivial increase in memory
>> footprint and processing overhead (and potentially convergence time
>> for slower boxes). In the simplest case of 2 primary
>> route-reflectors (for diversity), and 1 2nd-best path RR, you've
>> added one session. If you want to carry a 3rd-best RR or have
>> redundant 2nd-best RRs, you've added 4 sessions. It's fair to say
>> that after a certain number of alternate paths, you start having
>> less routes because there are only so many alternative exits, but
>> otherwise there is a potentially large problem even if it's not
>> quite as bad as addpaths. I might recommend that you do some
>> analysis of the routing table to know where this threshold makes a
>> difference, based on how many alternate paths an average route
>> carries. In addition to being a scaling consideration, it also
>> helps to inform what value of N becomes diminishing returns because
>> most networks don't have that many backup paths. I envision this
>> being something like "80% of routes have 4 or less paths, so moving
>> beyond 4 planes may add overhead without much benefit..."
>
> It is absolutely correct to say that more paths client carries the
> more CPU cycles and memory will be used to process and store them.
>
> However there is one observation to be made ... in 99% of cases I
> have seen for distributing more then best path intra-domain the
> sufficient number of paths per net on each client is 2.
>
> WEG] the document should explicitly state this. That's exactly what I
> was getting at when I mentioned analysis above. If nearly all
> applications only need one alternate to bring the total paths to two,
> and more would be diminishing returns, the document should recommend
> this, and note that more are possible if the operator's situation
> dictates by simply repeating the deployment more times. I will note
> that this guidance as well as the note at the end of 4.2 that "The
> additional planes of route reflectors do not need to be fully
> redundant as the primary one does" contradicts your example because
> it has both RR1' and RR2'.
>
>
> IMHO cost of bringing additional paths for control plane is quite
> well understood today. Moreover it is quite implementation dependent.
> Some implementation may use X bytes per path while the other one Y
> bytes to store the same path. I think some separate BGP scaling
> document (even as BCP) may be equally useful for any technique to
> advertise more then best path. I would prefer to keep this outside of
> the solutions work on how to advertise and distribute those
> additional paths.
>
> WEG] I'm not looking for a level of detail that requires you to
> discuss the number of bytes per path. Simply noting that scaling
> issues exist and their general categories is enough. Make the logical
> leap for your reader that implementing this solution brings with it
> the scaling problems inherent with adding an additional route
> reflector (and therefore its additional routes and paths).
>>
>> It may be appropriate to add a separate scaling considerations
>> discussion to your deployment considerations (section 6) to
>> discuss some of the above.
>
> I agree 100% .. but as stated above I do not find this specific to
> diverse-path. It seems a general issue and I would highly encourage
> someone to take a stub to document this in IETF/IDR/GROW or maybe at
> Nanog community repository.
>
> WEG] it may not be specific to diverse-path, but diverse-path is
> specifically advocating doing something that would otherwise not be
> done (adding additional RR<->client BGP peers w/full routes beyond
> what is necessary for simple RR redundancy). Therefore I still think
> that you need to discuss the specific scaling concerns that this
> implementation needs to consider, even if it's at a relatively high
> level and the document notes that these are not unique to this
> implementation. I agree that a general scaling considerations
> document may be appropriate, but since that does not exist and I
> don't want this document to be blocked awaiting completion of such, a
> brief discussion within this document would help a lot.
>
>> There may be additional operational considerations from the
>> perspective of route analysis - if you have either a homebuilt or
>> off the shelf set of software that does route analysis for the
>> purpose of event root-cause analysis, anomaly detection, capacity
>> planning/failure analysis, etc, it has to be aware of these
>> additional planes such that it returns the proper response when
>> evaluating the routing table to determine what the expected
>> behavior should be in the real network. This is especially
>> important when it uses the table to determine how traffic will
>> reroute during different failure scenarios. These tools may act
>> like a participant in the mesh rather than a client in order to get
>> a pure view of the table, and that may lead to undesired results if
>> the multiple planes aren't taken into account. There may also be
>> considerations for looking glass implementations and the actual
>> information that is visible on the RRs and RR clients as the result
>> of standard BGP show commands to aid in troubleshooting and
>> verification.
>
> Very good point. Two comments on this ..
>
> - As to the impact to the tools I am less worried as presence of
> additional paths can be a fact today as already mentioned with full
> mesh or as used by some operator's by playing with adjusting
> different weight values of pair of RRs on a per net basis.
>
> WEG] sure, but I don't think that it's valid to assume that all
> analysis tools have taken this into account in their implementation,
> so it's worth mentioning as an operational consideration. The comment
> may be helpful to characterize the level of potential impact.
>
> - The use of "planes" in the draft is more of a conceptual nature.
> In practice all paths are still kept in the single table where normal
> best path is calculated. That means that tools like looking glass
> should not observe any changes nor impact.
>
> WEG] a good clarification to add to the document.
>
>
> This E-mail and any of its attachments may contain Time Warner Cable
> proprietary information, which is privileged, confidential, or
> subject to copyright belonging to Time Warner Cable. This E-mail is
> intended solely for the use of the individual or entity to which it
> is addressed. If you are not the intended recipient of this E-mail,
> you are hereby notified that any dissemination, distribution,
> copying, or action taken in relation to the contents of and
> attachments to this E-mail is strictly prohibited and may be
> unlawful. If you have received this E-mail in error, please notify
> the sender immediately and permanently delete the original and any
> copy of this E-mail and any printout.
>
>