Re: [GROW] I-D Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt

I pointed this out on March 27 and proposed a fix.
I believe Pradosh had forseen the same problem when
he proposed the "Edge_Discriminator attribute" in
draft-pmohapat-idr-fast-conn-restore.

To prevent clients in different clusters from choosing
different bestpaths based on the router ID tie breaker (f):

The RR Plane that advertises the best path MUST be configured
with a BGP Identifier higher than that of the RR Plane that
advertises the 2nd best. This must be higher than that of
the Plane that advertises the 3rd best and so on.

Similarly,

To prevent clients in different clusters from choosing
different bestpaths based on the peer address tie breaker (g):

The RR Plane that advertises the best path MUST be configured
with a peer address higher than that of the RR Plane that
advertises the 2nd best. This must be higher than that of
the Plane that advertises the 3rd best and so on.

On Thursday, September 22, 2011 9:55 AM, Robert Raszuk <> wrote:

> Hi Wes,
>
> Many thx for your comments. I will clarify the corresponding sections
> in the draft.
>
> As to your point of routing loop danger caused by advertising
> additional paths via IBGP I think if you could illustrate a topology
> example where such loop could form it would help to perhaps make the
> spec more clear to address such concern.
>
> Many thx,
> R.
>
>> -----Original Message----- From: Robert Raszuk
>> [mailto:robert@raszuk.net] Sent: Wednesday, September 21, 2011 5:25
>> PM To: George, Wesley Cc: grow@ietf.org Subject: Re: [GROW] I-D
>> Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt
>>
>> Hello Wes,
>>
>>> Other stuff: 2.1 - when discussing overhead and scale concerns for
>>> add paths, perhaps a citation to 4984 would be appropriate?
>>
>> I would prefer not to mix the growing internet scale concerns from
>> some of the operational practices/configuration based based scale
>> concerns.
>>
>> WEG] Understand, but I'm not sure that it's so easy to separate the
>> two. You'll find me saying the same thing to anyone suggesting a
>> change that has the net effect of significantly increasing the burn
>> rate for memory and CPU resources, whether it's a configuration
>> change or otherwise, because it still exacerbates the overall issue.
>> (more on that in a moment)
>>
>>> I've made similar comments to the SIDR folks, and I think generally
>>> anything that adds a non-trivial amount of impact to the growth
>>> curve of the routing system needs to consider this.
>>
>> I think there is substantial difference for local vs global size
>> increase of the routing system. Here in this work all concerns are
>> regarding to the local one.
>>
>> WEG] Generally, I'm not sure that I'd make so much of a distinction.
>> While yes, in theory changes of this type only impact the ASN that
>> chooses to implement it, rather than what it announces to the outside
>> world, the global scaling problem is due to the intersection between
>> available resources, their growth curve, and the growth curve of the
>> routing table. Saying that it only is a concern if it contributes to
>> the size of the DFZ routing table is oversimplifying the root
>> problem, because if internal scale problems exhaust the resources
>> available for both internal and external routes, you still have the
>> same end state - out of resources. In that case, the only difference
>> between a local scaling problem and a global problem is the
>> deployment penetration. If this is widely deployed, it has now
>> steepened the growth curve noted in 4984, because it still is using
>> some of the overall available resources. I've said on more than one
>> occasion that the iBGP routes carried by an SP are as much or more of
>> a problem than the growth of the global table because they don't have
>> nearly as much of the aggregation and optimization to reduce their
>> footprint. The only difference is the level of administrative control
>> over growth, but that's a fairly limited knob to turn - for lots of
>> reasons it may not be any more feasible to change things internally
>> to reduce internal route growth than it is to change global route
>> growth. Besides, I think that your draft is trying to have it both
>> ways - you malign Add Paths for having scaling problems, and then
>> seem content to gloss over a very similar problem created by your
>> solution simply because it appears to be slightly less severe and
>> more localized.
>>
>>> 4. This asserts that no code changes are necessary to RR clients.
>>> I'm not sure I totally agree with that... If the idea is to have a
>>> primary (best) RR and then N additional paths, the general
>>> assumption is that the N, N1, ... RRs are carrying routes that are
>>> less and less preferred. How does this system avoid the same sort
>>> of inconsistency of best path choice among different routers in the
>>> network if there is no way to identify those paths as secondary? I
>>> think you need some way to determine if the alternate routes are
>>> intended to be ECMP routes or backup routes... You may be able to
>>> cover this without code changes by using alternate configurations
>>> of other BGP preference indicators (MED, Localpref, metric, etc),
>>> perhaps with inbound route policy on the client or outbound on the
>>> RR, but since things like metric may be different based on where
>>> something is in the network, that may lead to inconsistency if used
>>> by itself. Even then, the draft doesn't discuss how this should be
>>> managed.
>>
>>
>> I stand by the claim that no code change is needed on clients.
>> Moreover no even additional policy change is required either.
>>
>> The best way to illustrate this is to compare presence of additional
>> BGP paths on the clients in the scenario where clients are
>> interconnected with full IBGP mesh or would get all paths with
>> add-path. In neither case there is a notion of RR telling client
>> which path is best or which is second best .. and there is number of
>> good reasons for that (one is that for RR numbering paths can be
>> different then for client, the other one is that when we would
>> withdraw any path advertised and ordered we would need to re
>> advertise with new order all remaining paths - that amount of churn
>> is non negligible).
>>
>> Each client's BGP best path is capable of making safe (loop free)
>> autonomous choice of paths in PIC/fast connectivity restoration/ibgp
>> multipath cases.
>>
>> WEG] I'm sorry, maybe I'm being thick, but I still don't understand
>> how this would work in a way that would always avoid routing loops.
>> Under normal state, you have a RR client reflecting its best path to
>> the client based on the routes it receives from the rest of its
>> neighbors, meaning that the clients don't have visibility to
>> candidate alternatives that the RR does, so they're all making the
>> same choice at least within the local cone of influence of that RR.
>> You add a second set of RRs (rr') that is announcing a second-best
>> path as if it was the best path to restore 1 (or more) of the
>> candidate alternatives to the client. The client receives the best
>> and 2nd-best path and evaluates them using standard methods. If the
>> thing that makes one route better than the other is something locally
>> interesting like metric, and the client's particular place in the
>> universe means that the metric is different as compared to other
>> clients, the P routers, and the RRs, it may choose the 2nd-best path
>> as best, and this may lead to routing loops if it tries to send the
>> route to another router that has a different belief of what the best
>> path is. This case is much more likely if the RR and RR' are not
>> collocated with all of their clients and/or each other. I think that
>> this may also be the case when the tiebreaker is router-id if you're
>> not careful of the way that you address your route-reflectors and/or
>> are not doing next-hop self at the edges. Only in the case where the
>> 2nd-best path is clearly worse to all members of the ASN (lower local
>> pref, longer AS-path, etc) are you assured of no possibility for two
>> routers each getting a different result when evaluating those two
>> different routes. I think that 4.2 covers some part of this case, in
>> the way that it documents its assumptions and what must be done to
>> enable deployment, especially the references to ignoring IGP metric,
>> but IMO it's not clear enough in the explanation why some of these
>> things must be done - the failure case isn't discussed.
>>
>>> 4.1 Also, there's a definite scaling consideration on the RR
>>> clients that isn't really discussed here - they are now going to be
>>> storing some number of additional routes and paths that is linearly
>>> related to the number of additional planes that are implemented.
>>> The addition of more RR sessions that presumably carry a portion of
>>> the full routing table now drives a non-trivial increase in memory
>>> footprint and processing overhead (and potentially convergence time
>>> for slower boxes). In the simplest case of 2 primary
>>> route-reflectors (for diversity), and 1 2nd-best path RR, you've
>>> added one session. If you want to carry a 3rd-best RR or have
>>> redundant 2nd-best RRs, you've added 4 sessions. It's fair to say
>>> that after a certain number of alternate paths, you start having
>>> less routes because there are only so many alternative exits, but
>>> otherwise there is a potentially large problem even if it's not
>>> quite as bad as addpaths. I might recommend that you do some
>>> analysis of the routing table to know where this threshold makes a
>>> difference, based on how many alternate paths an average route
>>> carries. In addition to being a scaling consideration, it also
>>> helps to inform what value of N becomes diminishing returns because
>>> most networks don't have that many backup paths. I envision this
>>> being something like "80% of routes have 4 or less paths, so moving
>>> beyond 4 planes may add overhead without much benefit..."
>>
>> It is absolutely correct to say that more paths client carries the
>> more CPU cycles and memory will be used to process and store them.
>>
>> However there is one observation to be made ... in 99% of cases I
>> have seen for distributing more then best path intra-domain the
>> sufficient number of paths per net on each client is 2.
>>
>> WEG] the document should explicitly state this. That's exactly what I
>> was getting at when I mentioned analysis above. If nearly all
>> applications only need one alternate to bring the total paths to two,
>> and more would be diminishing returns, the document should recommend
>> this, and note that more are possible if the operator's situation
>> dictates by simply repeating the deployment more times. I will note
>> that this guidance as well as the note at the end of 4.2 that "The
>> additional planes of route reflectors do not need to be fully
>> redundant as the primary one does" contradicts your example because
>> it has both RR1' and RR2'.
>>
>>
>> IMHO cost of bringing additional paths for control plane is quite
>> well understood today. Moreover it is quite implementation dependent.
>> Some implementation may use X bytes per path while the other one Y
>> bytes to store the same path. I think some separate BGP scaling
>> document (even as BCP) may be equally useful for any technique to
>> advertise more then best path. I would prefer to keep this outside of
>> the solutions work on how to advertise and distribute those
>> additional paths.
>>
>> WEG] I'm not looking for a level of detail that requires you to
>> discuss the number of bytes per path. Simply noting that scaling
>> issues exist and their general categories is enough. Make the logical
>> leap for your reader that implementing this solution brings with it
>> the scaling problems inherent with adding an additional route
>> reflector (and therefore its additional routes and paths).
>>>
>>> It may be appropriate to add a separate scaling considerations
>>> discussion to your deployment considerations (section 6) to
>>> discuss some of the above.
>>
>> I agree 100% .. but as stated above I do not find this specific to
>> diverse-path. It seems a general issue and I would highly encourage
>> someone to take a stub to document this in IETF/IDR/GROW or maybe at
>> Nanog community repository.
>>
>> WEG] it may not be specific to diverse-path, but diverse-path is
>> specifically advocating doing something that would otherwise not be
>> done (adding additional RR<->client BGP peers w/full routes beyond
>> what is necessary for simple RR redundancy). Therefore I still think
>> that you need to discuss the specific scaling concerns that this
>> implementation needs to consider, even if it's at a relatively high
>> level and the document notes that these are not unique to this
>> implementation. I agree that a general scaling considerations
>> document may be appropriate, but since that does not exist and I
>> don't want this document to be blocked awaiting completion of such, a
>> brief discussion within this document would help a lot.
>>
>>> There may be additional operational considerations from the
>>> perspective of route analysis - if you have either a homebuilt or
>>> off the shelf set of software that does route analysis for the
>>> purpose of event root-cause analysis, anomaly detection, capacity
>>> planning/failure analysis, etc, it has to be aware of these
>>> additional planes such that it returns the proper response when
>>> evaluating the routing table to determine what the expected
>>> behavior should be in the real network. This is especially
>>> important when it uses the table to determine how traffic will
>>> reroute during different failure scenarios. These tools may act
>>> like a participant in the mesh rather than a client in order to get
>>> a pure view of the table, and that may lead to undesired results if
>>> the multiple planes aren't taken into account. There may also be
>>> considerations for looking glass implementations and the actual
>>> information that is visible on the RRs and RR clients as the result
>>> of standard BGP show commands to aid in troubleshooting and
>>> verification.
>>
>> Very good point. Two comments on this ..
>>
>> - As to the impact to the tools I am less worried as presence of
>> additional paths can be a fact today as already mentioned with full
>> mesh or as used by some operator's by playing with adjusting
>> different weight values of pair of RRs on a per net basis.
>>
>> WEG] sure, but I don't think that it's valid to assume that all
>> analysis tools have taken this into account in their implementation,
>> so it's worth mentioning as an operational consideration. The comment
>> may be helpful to characterize the level of potential impact.
>>
>> - The use of "planes" in the draft is more of a conceptual nature.
>> In practice all paths are still kept in the single table where normal
>> best path is calculated. That means that tools like looking glass
>> should not observe any changes nor impact.
>>
>> WEG] a good clarification to add to the document.
>>
>>
>> This E-mail and any of its attachments may contain Time Warner Cable
>> proprietary information, which is privileged, confidential, or
>> subject to copyright belonging to Time Warner Cable. This E-mail is
>> intended solely for the use of the individual or entity to which it
>> is addressed. If you are not the intended recipient of this E-mail,
>> you are hereby notified that any dissemination, distribution,
>> copying, or action taken in relation to the contents of and
>> attachments to this E-mail is strictly prohibited and may be
>> unlawful. If you have received this E-mail in error, please notify
>> the sender immediately and permanently delete the original and any
>> copy of this E-mail and any printout.
>>
>>
>
> _______________________________________________
> GROW mailing list
> GROW@ietf.org
> https://www.ietf.org/mailman/listinfo/grow

--
Jakob Heitz.