Re: [Idr] Shortest Path Routing Extensions for BGP Protocol

"Acee Lindem (acee)" <acee@cisco.com> Mon, 11 July 2016 22:59 UTC

Return-Path: <acee@cisco.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80FF612D596 for <idr@ietfa.amsl.com>; Mon, 11 Jul 2016 15:59:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.807
X-Spam-Level:
X-Spam-Status: No, score=-15.807 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z50ggFWWkcAU for <idr@ietfa.amsl.com>; Mon, 11 Jul 2016 15:59:11 -0700 (PDT)
Received: from alln-iport-3.cisco.com (alln-iport-3.cisco.com [173.37.142.90]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2131312D0AE for <idr@ietf.org>; Mon, 11 Jul 2016 15:59:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=33443; q=dns/txt; s=iport; t=1468277951; x=1469487551; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=hME1svnKWzIRV7hOYVcwNBpBjliJXVZIWFDfN0DEaHs=; b=O08xr6fJLg+vEbnz7Dbm0vRm74vNQLh3hZKlwlz8NjQypxEPyhECsTtL Wo7cVZQXdcbfGf+Je99npHBNSaj59XzATTr/0/eFesZFnOtuC7gBrjjU/ VF3yJvFECAgRQrmUpUSt/7tV5lG8u/MJh7Mx0wQm2c3xxWMyvkfCERSAO w=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0BLAgDQI4RX/51dJa1cgnBOgVIGrGuMFoF6hhgCHIEQOBQBAQEBAQEBZSeEXAEBBSNEDQUQAgEIEQMBAiEHAwICAh8RFAkIAgQOBYgWAxewP4pcDYN+AQEBAQEBAQMBAQEBAQEBAQEeiXGBA4JDgVpEgmGCWgWTXIUINAGMO4IWgWqEWIhqiBuHcwEeNoNxbod7RX8BAQE
X-IronPort-AV: E=Sophos;i="5.28,348,1464652800"; d="scan'208,217";a="295890512"
Received: from rcdn-core-6.cisco.com ([173.37.93.157]) by alln-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jul 2016 22:59:10 +0000
Received: from XCH-RTP-019.cisco.com (xch-rtp-019.cisco.com [64.101.220.159]) by rcdn-core-6.cisco.com (8.14.5/8.14.5) with ESMTP id u6BMx9Td001195 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Mon, 11 Jul 2016 22:59:09 GMT
Received: from xch-rtp-015.cisco.com (64.101.220.155) by XCH-RTP-019.cisco.com (64.101.220.159) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Mon, 11 Jul 2016 18:59:08 -0400
Received: from xch-rtp-015.cisco.com ([64.101.220.155]) by XCH-RTP-015.cisco.com ([64.101.220.155]) with mapi id 15.00.1210.000; Mon, 11 Jul 2016 18:59:08 -0400
From: "Acee Lindem (acee)" <acee@cisco.com>
To: Robert Raszuk <robert@raszuk.net>
Thread-Topic: Shortest Path Routing Extensions for BGP Protocol
Thread-Index: AQHR2TCrYqWhF/FuxECJ8BfBQYUWB6AOt3SAgAIPJgCAAxangA==
Date: Mon, 11 Jul 2016 22:59:08 +0000
Message-ID: <D3A98047.6A7C5%acee@cisco.com>
References: <CA+b+ERnYMUuB7Ps7SKrzQg0QFsPk-g2AdkWcDG+mF-9XVxJh5g@mail.gmail.com> <D3A54785.6931E%acee@cisco.com> <CA+b+ERn_o0Z-FvB=LdBP3kX2xxQioMXE4xyHO_f5AEmGHQ8y5g@mail.gmail.com>
In-Reply-To: <CA+b+ERn_o0Z-FvB=LdBP3kX2xxQioMXE4xyHO_f5AEmGHQ8y5g@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.116.152.196]
Content-Type: multipart/alternative; boundary="_000_D3A980476A7C5aceeciscocom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/iUpcHFfX5pbMyYJj8yo3rGzfp5Y>
Cc: "Keyur Patel (keyupate)" <keyupate@cisco.com>, "Derek Man-Kit Yeung (myeung)" <myeung@cisco.com>, idr wg <idr@ietf.org>
Subject: Re: [Idr] Shortest Path Routing Extensions for BGP Protocol
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Jul 2016 22:59:13 -0000

Hi Robert,
See inline.

From: <rraszuk@gmail.com<mailto:rraszuk@gmail.com>> on behalf of Robert Raszuk <robert@raszuk.net<mailto:robert@raszuk.net>>
Date: Saturday, July 9, 2016 at 3:49 PM
To: Acee Lindem <acee@cisco.com<mailto:acee@cisco.com>>
Cc: "Keyur Patel (keyupate)" <keyupate@cisco.com<mailto:keyupate@cisco.com>>, "Abhay Roy (akr)" <akr@cisco.com<mailto:akr@cisco.com>>, "Derek Man-Kit Yeung (myeung)" <myeung@cisco.com<mailto:myeung@cisco.com>>, "Venu Venugopal (venuv)" <venuv@cisco.com<mailto:venuv@cisco.com>>, IDR List <idr@ietf.org<mailto:idr@ietf.org>>
Subject: Re: Shortest Path Routing Extensions for BGP Protocol

Hi AC,

Many thx for responding.

As far as your question 100K+ was about number of compute nodes, virtual compute nodes in flat routing (enabled via SR-IOV) & switches together as in your proposal each of them is a node in the SPT. Unless you are planning to introduce analogy to ospf areas here :)

No areas here… If you are running BGP on the 100K+ compute nodes as I believe you are suggesting with the reference to SR-IOV, you would probably not want flat routing. In many of today DCs, the compute nodes are merely hosts on ToR subnets and these prefixes can be easily computed using SPF optimizations.


In any case the most powerful property of BGP is hierarchy and (multi-level) indirection via next hop. In normal BGP deployments next hops which are carried in BGP are resolved in IGP. In DC cases where eBGP is used between all fabric stages next hops resolves to connected routes.

In your case in flat routing you no longer have such recursion.

However I assume your proposal can coexist with other SAFIs and say external to the DC desitnations will be still carried in 1/1 or 2/1 and their next hops will be resolved by routes inserted into RIB by running SPF correct?

As Keyur responded, SPF routes could be used for resolution of other SAFIs.


Last .. LFA is a great technology however I am not sure if enabling it in topologies where you have 10s of ECMP active paths and where you are ready by design for a failure handling is a right thing to do. What are other real use cases for giving up on pure BGP best path selection and normal BGP multipath ?

I agree that the value of LFA is limited if you have prevalent ECMP and a data plane that does fast ECMP rehashing in the event of failure. The other inherent advantage of SPF-based computation is that a single failure only results in the directly impacted NLRI being advertised as opposed to all the impacted prefixes having to propagate hop-by-hop. This should improve convergence and even scaling.

Thanks,
Acee



Best regards,
R.


On Fri, Jul 8, 2016 at 6:22 PM, Acee Lindem (acee) <acee@cisco.com<mailto:acee@cisco.com>> wrote:
Hi Robert,

Thanks for engaging…

From: <rraszuk@gmail.com<mailto:rraszuk@gmail.com>> on behalf of Robert Raszuk <robert@raszuk.net<mailto:robert@raszuk.net>>
Date: Friday, July 8, 2016 at 11:51 AM
To: "Keyur Patel (keyupate)" <keyupate@cisco.com<mailto:keyupate@cisco.com>>, Acee Lindem <acee@cisco.com<mailto:acee@cisco.com>>, "Abhay Roy (akr)" <akr@cisco.com<mailto:akr@cisco.com>>, "Derek Man-Kit Yeung (myeung)" <myeung@cisco.com<mailto:myeung@cisco.com>>, "Venu Venugopal (venuv)" <venuv@cisco.com<mailto:venuv@cisco.com>>
Cc: IDR List <idr@ietf.org<mailto:idr@ietf.org>>
Subject: Shortest Path Routing Extensions for BGP Protocol

Hi,

I have reviewed your proposal.

Turning path vector or distance vector protocol into link state carrier is no doubt a bold idea :).

“Fortune befriends the bold.” -Emily Dickinson


Effectively what you are proposing is to use BGP TCP sessions to propagate link state database creating first "link vector protocol" !

For start I have few questions:

Q1:

RFC7752 has gone via lot of efforts (especially sections 3.2 and up) to include number of OSPF or ISIS specific encodings. In your proposal you mentioned OSPF twice and not even once ISIS. Does it mean that you are not going to use all encoding for specific IGPs as defined in RFC7752 ?

We are using the BGP Protocol-ID defined for BGP-EPE. The IANA request will be generalized from “BGP-EPE” to “BGP” in support of Segment Routing and other enhancements. The BGP-LS NLRI are specific BGP and not either IS-IS or OSPF.


Q2:

Who creates and maintains LSDB in each BGP speaker ? Are you planning to run OSPF and or ISIS except disable it to establish any adjaciencies ?

I’ve seen designs like this in my time but have never been of a fan of them ;^). BGP will do the SPF directly and maintain the SPT. You’ll note that a simplified SPF is already done for ORR.


Q3:

Currently there are already to models to build DCs with BGP ... one uses BGP to create only lean underlay the other is to use BGP for both underlay and tenants (example project Calico for the latter). With that scale wise I think your proposal will work great for the former. However I do have concerns about using your model for the latter where say 10,000 or 100,000 /32s or /128s from each VMs are injected and you need to construct SPT with all of those.

Similar to those designs, the SPT could be limited to the underlay. However, if there is no requirement for the benefits of L2 or L3 VPNs, I see no reason why these 100Ks of leafs VM prefixes in DC center couldn’t be supported.


Q4:

Related to Q3 in your model and say flat DC routing each compute node other then just injecting 10s of /32s and being "done" now becomes an IGP node. Since your document explicitely targets Massively Scaled Data Centers (MSDCs) I am concerned that having 100,000+ IGP nodes and in many case much more is not the best idea.

100K+ switches in a single DC fabric (i.e., BGP routing domain)? I have some experience in link-state protocols and I can tell you that OSPF is I/O bound mainly due to the flooding. If done right, the SPF calculation can be done with minimal computation. While BGP-LS isn’t the world's most compact encoding, it is completely incremental.


Q5:

Have you considered just proposing an OSPF route reflector instead without stuffing BGP into the mix ? As some of you perhaps remember the work on this started around year 2000 to optimize PE-CE CSC deployments :) It seems to me very reusable for this goal.

We looked at lots of alternatives and this one seemed like the best one. Please pass me a pointer to the work you mention.

Thanks,
Acee



Best regards,
Robert.