Re: [GROW] I-D Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt

Robert Raszuk <robert@raszuk.net> Wed, 21 September 2011 21:22 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 25C0211E8146 for <grow@ietfa.amsl.com>; Wed, 21 Sep 2011 14:22:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pYm33QxRRqLC for <grow@ietfa.amsl.com>; Wed, 21 Sep 2011 14:22:30 -0700 (PDT)
Received: from mail37.opentransfer.com (mail37.opentransfer.com [76.162.254.37]) by ietfa.amsl.com (Postfix) with SMTP id C790111E80D0 for <grow@ietf.org>; Wed, 21 Sep 2011 14:22:22 -0700 (PDT)
Received: (qmail 27049 invoked by uid 399); 21 Sep 2011 21:24:50 -0000
Received: from unknown (HELO ?216.69.73.140?) (216.69.73.140) by mail37.opentransfer.com with SMTP; 21 Sep 2011 21:24:50 -0000
Message-ID: <4E7A5617.2080900@raszuk.net>
Date: Wed, 21 Sep 2011 23:24:39 +0200
From: Robert Raszuk <robert@raszuk.net>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2
MIME-Version: 1.0
To: "George, Wesley" <wesley.george@twcable.com>
References: <20110915135818.19974.94670.idtracker@ietfa.amsl.com> <34E4F50CAFA10349A41E0756550084FB0F8D14C1@PRVPEXVS04.corp.twcable.com>
In-Reply-To: <34E4F50CAFA10349A41E0756550084FB0F8D14C1@PRVPEXVS04.corp.twcable.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "grow@ietf.org" <grow@ietf.org>
Subject: Re: [GROW] I-D Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: robert@raszuk.net
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Sep 2011 21:22:42 -0000

Hello Wes,

> I've read this draft. I'll note that I'm already in the
> acknowledgements section, but I can't find an on-list review that I
> wrote on this draft prior to this point unless I sent it privately to
> the authors, in which case I apologize if I'm retreading old
> discussion.

That was based on your review and off-list mail of June 23rd 2010.

> A nit regarding the acknowledgement - my first name is
> Wes, not George.

Apologies for that. Will be fixed.

> Some Nits Abstract: %s/build/built in parallel, /co-exit/co-exist Now
> that the draft is a WG doc, "..the authors believe..." (last
> sentence) can probably be removed (from intro section as well).
>
> Intro: "other then best path" - suggest rewording to something like
> "secondary and tertiary paths" or "alternate paths which are not
> considered best path"
>
> 2. %s/"reduction of time of reachability restoration"/faster
> reachability restoration

Will fix.

> Other stuff: 2.1 - when discussing overhead and scale concerns for
> add paths, perhaps a citation to 4984 would be appropriate?

I would prefer not to mix the growing internet scale concerns from some 
of the operational practices/configuration based based scale concerns.

> I've made
> similar comments to the SIDR folks, and I think generally anything
> that adds a non-trivial amount of impact to the growth curve of the
> routing system needs to consider this.

I think there is substantial difference for local vs global size 
increase of the routing system. Here in this work all concerns are 
regarding to the local one.

> Also, why is this citing
> implementation data from 2009? Is there now an implementation that
> supports add-paths, or is this just a holdover from earlier versions
> of the draft? If there is now an implementation, it would be good to
> revisit this section.

The point to cite this was to compare with original 2002 time when 
add-paths was defined to indicate how long it took for implementations 
to finally start looking at this. I will reword this section to better 
reflect the intention and avoid using absolute dates.


> 4. This asserts that no code changes are necessary to RR clients. I'm
> not sure I totally agree with that... If the idea is to have a
> primary (best) RR and then N additional paths, the general assumption
> is that the N, N1, ... RRs are carrying routes that are less and less
> preferred. How does this system avoid the same sort of inconsistency
> of best path choice among different routers in the network if there
> is no way to identify those paths as secondary? I think you need some
> way to determine if the alternate routes are intended to be ECMP
> routes or backup routes... You may be able to cover this without code
> changes by using alternate configurations of other BGP preference
> indicators (MED, Localpref, metric, etc), perhaps with inbound route
> policy on the client or outbound on the RR, but since things like
> metric may be different based on where something is in the network,
> that may lead to inconsistency if used by itself. Even then, the
> draft doesn't discuss how this should be managed.


I stand by the claim that no code change is needed on clients. Moreover 
no even additional policy change is required either.

The best way to illustrate this is to compare presence of additional BGP 
paths on the clients in the scenario where clients are interconnected 
with full IBGP mesh or would get all paths with add-path. In neither 
case there is a notion of RR telling client which path is best or which 
is second best .. and there is number of good reasons for that (one is 
that for RR numbering paths can be different then for client, the other 
one is that when we would withdraw any path advertised and ordered we 
would need to re advertise with new order all remaining paths - that 
amount of churn is non negligible).

Each client's BGP best path is capable of making safe (loop free) 
autonomous choice of paths in PIC/fast connectivity restoration/ibgp 
multipath cases.


> 4.1 Also, there's a definite scaling consideration on the RR clients
> that isn't really discussed here - they are now going to be storing
> some number of additional routes and paths that is linearly related
> to the number of additional planes that are implemented. The addition
> of more RR sessions that presumably carry a portion of the full
> routing table now drives a non-trivial increase in memory footprint
> and processing overhead (and potentially convergence time for slower
> boxes). In the simplest case of 2 primary route-reflectors (for
> diversity), and 1 2nd-best path RR, you've added one session. If you
> want to carry a 3rd-best RR or have redundant 2nd-best RRs, you've
> added 4 sessions. It's fair to say that after a certain number of
> alternate paths, you start having less routes because there are only
> so many alternative exits, but otherwise there is a potentially large
> problem even if it's not quite as bad as addpaths. I might recommend
> that you do some analysis of the routing table to know where this
> threshold makes a difference, based on how many alternate paths an
> average route carries. In addition to being a scaling consideration,
> it also helps to inform what value of N becomes diminishing returns
> because most networks don't have that many backup paths. I envision
> this being something like "80% of routes have 4 or less paths, so
> moving beyond 4 planes may add overhead without much benefit..."

It is absolutely correct to say that more paths client carries the more 
CPU cycles and memory will be used to process and store them.

However there is one observation to be made ... in 99% of cases I have 
seen for distributing more then best path intra-domain the sufficient 
number of paths per net on each client is 2.

Moreover this is configurable by the operator modulo RR support 
capabilities for more planes then 2.

IMHO cost of bringing additional paths for control plane is quite well 
understood today. Moreover it is quite implementation dependent. Some 
implementation may use X bytes per path while the other one Y bytes to 
store the same path. I think some separate BGP scaling document (even as 
BCP) may be equally useful for any technique to advertise more then best 
path. I would prefer to keep this outside of the solutions work on how 
to advertise and distribute those additional paths.


> Further scaling considerations occur in the core if the P routers
> must know about both the primary and secondary paths (and therefore
> peer/mesh with both R1 and R1'). You may be able to draw the
> conclusion that this is better than full mesh (5.1 #3) and therefore
> superior from a scaling perspective, but this at least needs to be
> discussed. It may be that the only way to avoid that particular issue
> is to operate with a BGP-free core...
 >
 > It may be appropriate to add a separate scaling considerations
 > discussion to your deployment considerations (section 6) to discuss
 > some of the above.

I agree 100% .. but as stated above I do not find this specific to 
diverse-path. It seems a general issue and I would highly encourage 
someone to take a stub to document this in IETF/IDR/GROW or maybe at 
Nanog community repository.


> There may be additional operational considerations from the
> perspective of route analysis - if you have either a homebuilt or off
> the shelf set of software that does route analysis for the purpose of
> event root-cause analysis, anomaly detection, capacity
> planning/failure analysis, etc, it has to be aware of these
> additional planes such that it returns the proper response when
> evaluating the routing table to determine what the expected behavior
> should be in the real network. This is especially important when it
> uses the table to determine how traffic will reroute during different
> failure scenarios. These tools may act like a participant in the mesh
> rather than a client in order to get a pure view of the table, and
> that may lead to undesired results if the multiple planes aren't
> taken into account. There may also be considerations for looking
> glass implementations and the actual information that is visible on
> the RRs and RR clients as the result of standard BGP show commands
> to aid in troubleshooting and verification.

Very good point. Two comments on this ..

- As to the impact to the tools I am less worried as presence of 
additional paths can be a fact today as already mentioned with full mesh 
or as used by some operator's by playing with adjusting different weight 
values of pair of RRs on a per net basis.

- The use of "planes" in the draft is more of a conceptual nature. In 
practice all paths are still kept in the single table where normal best 
path is calculated. That means that tools like looking glass should not 
observe any changes nor impact.

Once more time many thx for your review.
Robert.