Re: [GROW] I-D Action:draft-ietf-grow-diverse-bgp-path-dist-01.txt

Robert Raszuk <raszuk@cisco.com> Wed, 23 June 2010 22:28 UTC

Return-Path: <raszuk@cisco.com>
X-Original-To: grow@core3.amsl.com
Delivered-To: grow@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1CCFF3A67D1 for <grow@core3.amsl.com>; Wed, 23 Jun 2010 15:28:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.192
X-Spam-Level:
X-Spam-Status: No, score=-9.192 tagged_above=-999 required=5 tests=[AWL=-1.007, BAYES_40=-0.185, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MX+gFO2DtZpU for <grow@core3.amsl.com>; Wed, 23 Jun 2010 15:27:55 -0700 (PDT)
Received: from sj-iport-6.cisco.com (sj-iport-6.cisco.com [171.71.176.117]) by core3.amsl.com (Postfix) with ESMTP id 3B5B73A6844 for <grow@ietf.org>; Wed, 23 Jun 2010 15:27:55 -0700 (PDT)
Authentication-Results: sj-iport-6.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEABInIkyrRN+K/2dsb2JhbACfMnGpE4F4CwGYQoJYB4I8BJEM
X-IronPort-AV: E=Sophos;i="4.53,469,1272844800"; d="scan'208";a="549225891"
Received: from sj-core-4.cisco.com ([171.68.223.138]) by sj-iport-6.cisco.com with ESMTP; 23 Jun 2010 22:28:03 +0000
Received: from [192.168.1.61] (sjc-raszuk-87113.cisco.com [10.20.147.254]) by sj-core-4.cisco.com (8.13.8/8.14.3) with ESMTP id o5NMS0rW017446; Wed, 23 Jun 2010 22:28:01 GMT
Message-ID: <4C228A6E.1020408@cisco.com>
Date: Thu, 24 Jun 2010 00:27:58 +0200
From: Robert Raszuk <raszuk@cisco.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5
MIME-Version: 1.0
To: grow@ietf.org, ju1738@att.com
References: <20100623083005.61D6B3A6A65@core3.amsl.com> <1477DEAE19DD884CB004730D0FD77FD7041A89DA@misout7msgusr7e.ugd.att.com>
In-Reply-To: <1477DEAE19DD884CB004730D0FD77FD7041A89DA@misout7msgusr7e.ugd.att.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: hnguyen@att.com, al1457@att.com
Subject: Re: [GROW] I-D Action:draft-ietf-grow-diverse-bgp-path-dist-01.txt
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: raszuk@cisco.com
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jun 2010 22:28:02 -0000

Hi Jim,

> The approach as I understand it is two deploy multiple channels to
> disseminate routing state be it the 2,3,...nth path to some dest D..

Not necessarily at all. Multiple channels is just one of possible 
deployment models.

Primary goal is to observe that while today having a pair of best path 
RRs one could easily turn one of the reflector within such pair into a 
backup RR, and without any need for any new IBGP session to be 
provisioned or without any need to add new RRs disseminate backup path 
to all clients.

This provides very easy to deploy mechanism where without any need for 
PE upgrade you provide PEs additional paths for fast connectivity 
restoration, PIC or load balancing needs.

> Comments follow...

So do further replies ...

> Jim Uttaro
>
> Section 1.0
>
> "The parallel route reflector planes solution brings very significant
> benefits at a negligible capex and opex deployment price as compared to
> the alternative techniques"
>
> A number of points need to be clarified here. The first is the
> SP/Operator needs to deploy n number of RR planes to disseminate N
> paths. Assuming some form of redundancy we would have to of course buy
> the RRs or deploy some type of logical routers. How can this be
> monetized?

Not required. You maintain connectivity redundancy even if turning 
existing pair of RRs into primary and backup case. No new purchase order 
required nor need for any logical routers.

The draft describes RR planes to give the formalized description of the 
proposal, but I was hoping that it could be easily interpreted into 
basic deployment styles.

Adding new set of RRs if someone needs to is also possible, but 
completely not necessary for the diverse path deployment.

> Does this approach assume that customers who want fast
> restoration, load balancing, mitigation of oscillation would pay for
> this. Or does the draft assume that the addl RRs are of such negligible
> capex cost that the operator would simply incur the cost.. This model
> does not usually sit well with the folks that write the checks.

Again .. this is not necessary at all. Contrary if you need to swap your 
all PEs into a new ones which support alternative ways to disseminate 
more then one BGP path .. this is where the check would be rather heavy :).

Comparing that with just code upgrade on the RRs seems clear where the 
savings are.

> From an
> opex perspective we are putting in addl planes for each AS that is under
> the operators authority. So we not only need to pay for it we would need
> to establish coherent inter-AS strategies to manage, maintain these addl
> RRs.. Additionally the function of these devices is different than a
> traditional RR which implies that OpS needs to be cognizant of the
> difference and how they should be managed.. As described in the draft
> the 2,3..nth plane need not be as robust as it is not the primary path..
> This needs to be understood by OpS in terms of their response to failure
> or how to perform maintenance.. We are essentially introducing a new
> device from these perspectives...

Again you are stuck with one particular deployment model. I take as a 
recommendation to clarify this in the next version of the draft that 
provider may simply without adding new RRs nor without touching PEs turn 
one of the existing RRs into a diverse-path RR and disseminate diverse 
BGP paths to the clients.

All what is required on provider side is to upgrade RRs with new code + 
enable client sessions with new knob.

> Section 2.1
>
> "This new requirement has its own memory and processing cost.  Suffice
> to say that by the middle of 2009 none of the commercial BGP
> implementation can claim to support the new add-path behaviour in
> production code, in part because of this resource overhead."
>
> A bit confused by this statement.. My thoughts on this was add-paths is
> useful for a customer that is advertising multiple paths or at peering
> points. In both cases we would anticipate the use of routing policy to
> only select a subset of these routes. It is impractical to believe that
> we are going to duplicate the same state over and over again on each
> plane.. This is not a function of the draft but how operators deploy the
> functionality..This functionality has been around a long time in VPNV4
> services and I believe it will eventually be used for IPV4 services..

At peering points I find it very common to set next hop self while 
advertising towards IBGP peers so there is practically no reason to 
advertise mutliple ebgp learned paths towards the core.

VPNv4 PEs require this day one .. many ISPs in their IPv4/IPv6 do it by 
default as well today.

One would need to also note that this quite a common practice to 
provision external peerings on different ASBRs in order to avoid single 
ASBR going down a disconnect of number of external peering sessions.


> " The add paths protocol extensions have to be implemented by all the
> routers within an AS in order for the system to work correctly."
>
> Pls explain.. Why do you believe this? It is certainly not practical and
> I never envisioned a full upgrade across thousands of edges in multiple
> AS domains.. The approach we believe we could take is to deploy on a
> subset of edges for some set of routes.

The question one needs to ask is what is overall goal ? As you have 
observed the primary goal SPs are after is to provide fast connectivity 
restoration, load balancing or mitigate oscillations.

To accomplish this in MPLS networks or in IP encapsulation networks you 
need to push additional state to all edges/PEs otherwise you are missing 
the alternative paths where they need to be present.

Pushing more then best path with add-paths requires upgrade of PEs. 
Distributing additional paths with diverse-path proposal does not 
requires any touch to the PE.

So if you can clarify what is the point of using add-paths for "subset 
of edges for some set of routes" ?


> " It is intended as a way to buy more time allowing for a smoother and
> gradual migration where router upgrades will be required for perhaps
> different reasons.  It will also allow the time required where standard
> RP/RE memory size can easily accommodate the associated overhead with
> other techniques without any compromises.'
>
> His statement seems to conflict with the one above.. Above you state
> that it is needed everywhere to work correctly here the statement is we
> can buy time to gradually migrate.. Why don't we just gradually migrate
> and eliminate this middle step??

The gradual migration may take 10 years. Do you want to offer inferior 
services as compared with other SPs over the next 10 years ?

In my true opinion both add-paths and diverse-path are complimentary to 
each other. Yes they both enable operator to distribute more then 
overall best path, but they do it differently.

I imagine that RRs could be capable of supporting peers which are 
upgraded with the add-paths code as well as those clients which are not 
yet upgraded, but still would benefit with receiving more then best path 
only.

In fact I think that most of the current applications can be easily 
satisfied with just 2nd best path which dissemination diverse path 
proposal addresses.


> Section 4
>
> " The proposed solution is based on the use of additional route
> reflectors or new functionality enabled on the existing route reflectors
> that instead of distributing the best path for each route will
> distribute an alternative path other then best. "
>
> Would like to drill down on this a bit..In the first case where addl
> deployment of RRs are done I am assuming that these RRs would somehow
> prefer the second best path of the first.. How would this be done
> customers use many different mechanisms to identify primary, secondary,
> etc... AS-PATH prepend, Local Pref, IGP cost etc... are all used..How is
> this done on the secondary plane? Regardless of either of these
> approaches changes to the BGP implementation to select a different POI
> is needed. But where how do you know how a customer is identifying? Pls
> expand on this. It would seem that although the protocol definition does
> not change the operator needs to ensure that this functionality is
> constructed the same way across all the vendors.. Will this require
> another draft?

BGP best path algorithm is quite consistent today across vendors. That 
means that calculating best path is consistent. That also means that 
calculating 2nd best path would also be consistent.

So when this could break ?

It could break in step 9 of best path on non co-located RRs where we 
consider IGP metric to a BGP next hop.

In order to address this point there are number of options:

* Make sure from IGP point of view primary RR and backup RR are on the 
same point in the network - nothing additional is needed - that is also 
very often the case in control plane RRs in tunneled networks

* Disable IGP metric check step on RRs - as a matter of fact RRs making 
decision on best path from their point of view makes only sense when RRs 
are in the data plane on the POP to core boundaries. In all other RR 
placements somewhere in the core it is really not necessary.

* No need to worry about any IGP metric step but allow backup RR to 
learn primary RR's best path and accommodate this knowledge when 
advertising diverse path towards clients. Again no need to add any new 
RR is needed nor modify even a single line of configuration of the clients.

Any other BGP mechanism like AS-PATH prepend, Local Pref etc ... would 
be treated identically on both primary and backup RR so no issue.


> " The best path (main) reflector plane distributes the best path for
> each route as it does today.  The second plane distributes the second
> best path for each route and so on.  Distribution of N paths for each
> route can be
> achieved by using N reflector planes."
>
> How is this done when it is the IGP cost that is the deciding factor..
> Will we have to correctly place the Nth plane corresponding to IGP
> correctly in the IGP??

See above.


> " It is easy to observe that the installation of one or more additional
> route reflector control planes is much cheaper and an easier than the
> need of upgrading 100s of routers in the entire network to support
> different protocol encoding."
>
> See Above I do not believe it is all or nothing..

Also see above :) And by installation please do not think of physical RR 
installation.  Under this I meant to indicate turning existing set of RR 
into a backup RR plane as well.


> " Diverse path route reflectors need the new ability to calculate and
> propagate the Nth best path instead of the overall best path.  An
> implementation is encouraged to enable this new functionality on a per
> neighbor basis."
>
> Encouraged? I think it would be required..

I agree it is preferred and I am supporting that.

But one could observe that especially in topologies where you have very 
good POP symmetry towards pairs of RRs or when you would prefer to add 
RR as backup that you may want to turn diverse-path functionality on 
such backup RR on a per SAFI basis.

> Section 4.1.  Co-located best and backup path RRs
>
> "To simplify the description let's assume that we only use two route
> reflector planes (N=2).  When co-located the additional 2nd best path
> reflectors are connected to the network at the same points from the
> perspective of the IGP as the existing best path RRs.'
>
> Based upon implementation this may require ports on existing core router
> to terminate and a costing paradigm that duplicates the original the
> latter may be simple the former would require that there is availability
> at these locations.. Doesn't this also imply full symmetry? We could not
> deploy a subset for the nth plane and mimic the IGP decision making of
> the first?? The draft states that full symmetry is not needed.. Pls
> Clarify..

As indicated above full symmetry only applies when you want to make sure 
that IGP point of RRs is the same on primary and backup RR. As described 
earlier this is just one of 3 ways to make sure backup RR calculates 
correct backup paths towards it's clients.

And also as described above in this example addition of second plane may 
be as simple as upgrading one of your existing RRs and enabling it to 
distribute diverse path towards the clients.

Initially to one or few on a per session basis .. while to other clients 
still sending duplicate of best path like today - later with more 
experience gained to more and more clients being served by this cluster.

> " One of the deployment model of this scenario can be achieved by simple
> upgrade of the existing route reflectors without the need to deploy any
> new logical or physical platforms.  Such upgrade would allow route
> reflectors to service both upgraded to add-paths peers as well as those
> peers which can not be immediately upgraded while in
> the same time allowing to distribute more than single best path."
>
> The implication here is that the same primary RR would have to "hold"
> and disseminate multiple paths to D.. Would this create a scalability
> problem on this RR as it would have to hold these addl routes. Even
> though the number of BGP routes for the internet is small in comparison
> to VPNV4 this should be accounted for when RR platforms are selected.

I think for RRs platforms scalability concerns for number of routes and 
number of sessions are no longer the issue. Talk to your favorite vendor 
for up to date RR's scalability numbers :) But you are very correct. 
Those need to be considered when RR platforms are selected.

> Section 4.2.  Randomly located best and backup path RRs
>
> " The basic premise of this mode of deployment assumes that all
> reflector planes have the same information to choose from which includes
> the same set of BGP paths.  It also requires the ability to skip the
> comparison of the IGP metric to reach the bgp next hop during best-path
> calculation."
>
> Scalability concerns.We would be putting our main primary RRs at risk.

Not sure what risk you are referring to. As indicated earlier for 
control plane RRs it is really not necessary step in the best path since 
day one.

> Again I am confused about the IGP metric.. If the paths are equal up to
> the IGP metric how do decide which is primary/secondary.. The secondary
> RR needs to select one of the paths how does it do that??Is it router-id
> or something of that nature..

See above.

> "4.  Fully meshing newly added RRs' with the all other reflectors in
> both planes.  That condition does not apply if the newly added RR'(s)
> already have peering to all ASBRs/PEs."
>
> I cannot see creating BGP sessions to all ASBR/PEs. There are BGP
> session limits that also must be accounted for so I do not see that as a
> viable alternative in a large network.. So I guess we would have to
> fully mesh to all RRs. This is similar to a full mesh of PEs in terms of
> getting all the routes on the secondary to make a decision.

This is normal introduction process of new RR into the network. But as 
said already few times this is optional.

> " Any of the existing routers that are not already members of the best
> path route reflector plane can be easily configured to serve the 2nd
> plane either via using a logical / virtual router partition or by local
> implementation hooks."
>
> The term "Easily" is used to liberally. Getting complex functionality
> configured on our most important parts of the network is never easy. It
> requires a lot of test certification and coordination between OpS,
> maintenance, etc... to get deployed

One needs to pick the right set of tools which he can accomplish the 
task with in the most easy way. My goal is to deliver various deployment 
options and assist in selection of the best set of tools to complete the 
job.

That's why I am not saying to do this one way .. Depending on network 
size, scale, complexity one may find adding a new RR as trivial 
exercise, on the other hand someone else may think of existing RR 
upgrade at the next upgrade window as pretty much free operation which 
needs to be performed anyway. Then enabling diverse path to some clients 
and seeing how it works seems like a very smooth and gradual deployment 
- much easier then any RR based alternatives I can think of today.

> " The additional planes of route reflectors do not need to be fully
> redundant as the primary one does.  If we are preparing for a single
> network failure event, a failure of a non backed up N-th best-path route
> reflector would not result in an connectivity outage of the actual data
> plane.  The reason is that this would at most affect the presence of a
> backup path (not an active one) on same parts of the network.  If the
> operator chooses to build the N-th best path plane redundantly by
> installing not one, but two or more route reflectors serving each
> additional plane the additional robustness will be achieved."
>
> Yes that may be true but we envision add-paths as being functionality
> that not only enables fast restoration but the ability to provide
> customer with load balancing. Probably good to be specific about the
> goals of the draft in the intro/abstract.

So do I. Diverse path accommodates both goals just fine.

 > Probably good to be specific about the
 > goals of the draft in the intro/abstract.

Ack.


> Section 4.3.  Multi plane route servers for Internet Exchanges
>
> " In such cases 100s of ISPs are interconnected on a common LAN. Instead
> of having 100s of direct EBGP sessions on each exchange client, a single
> peering is created to the transparent route server. The route server can
> only propagate a single best path.  Mandating the upgrade for 100s of
> different service providers in order to implement add-path may be much
> more difficult as compared to asking them for provisioning one new EBGP
> session to an Nth best-path route server plane."
>
> I do not understand. Are you saying that each eBGP session is nailed up
> to each plane. Are you implying that we deploy 100 planes? If not how do
> we know which one of the 100 ISPs should get the benefit of having
> routes source by them propagated through the network??

No :) I am saying that if you have 100s of IX of clients by default 
route server would send only one overall best. So if you have two RS 
(and this is common for redundancy) one may send overall best and the 
other one diverse path to the IX customers. 3rd best path would be also 
easy to achieve.

I will clarify this section.


Jim - Many thx for your excellent comments and review,
R.