Re: [Idr] Shortest Path Routing Extensions for BGP Protocol

"Keyur Patel (keyupate)" <keyupate@cisco.com> Mon, 11 July 2016 18:24 UTC

Return-Path: <keyupate@cisco.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3482D12D642 for <idr@ietfa.amsl.com>; Mon, 11 Jul 2016 11:24:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.807
X-Spam-Level:
X-Spam-Status: No, score=-15.807 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9g0o-tm3-LJ5 for <idr@ietfa.amsl.com>; Mon, 11 Jul 2016 11:24:20 -0700 (PDT)
Received: from alln-iport-7.cisco.com (alln-iport-7.cisco.com [173.37.142.94]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4569A12D641 for <idr@ietf.org>; Mon, 11 Jul 2016 11:24:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=23174; q=dns/txt; s=iport; t=1468261460; x=1469471060; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=e7EuwPYypynLoPFa9y1040m09F2TgG3A7nPPPBBH8Xk=; b=QopL0l62UnNuLE4ABXtOygbh7Im/FVYGULRuE+BBRic5gi+pnBM1FRmo GFlNUG1xDfPcE5DjxkMkP4yToZHd1ffLn8gimHBMAK0YPgN55D34MYsHl gtkWTRqvPD9QyixaklwSaQjDTp8CfNpANCuJrSCdu5+MSDvoBhsRKP2YV U=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0COAgAo44NX/4cNJK1dgnBOgVIGrGeMFoF6hhgCgSo4FAEBAQEBAQFlJ4RcAQEFZw0FEAIBCBEDAQIhBwchERQJCAIEAQ0FiBYDF7s9DYN+AQEBAQEBAQMBAQEBAQEBAQEBARyGJ4NKgQOCQ4FaRIU7BZNchQg0AYw7ghaBaoRYiGqIG4dzAR42g3Fuh3tFfwEBAQ
X-IronPort-AV: E=Sophos;i="5.28,347,1464652800"; d="scan'208,217";a="296416333"
Received: from alln-core-2.cisco.com ([173.36.13.135]) by alln-iport-7.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 11 Jul 2016 18:24:19 +0000
Received: from XCH-RTP-003.cisco.com (xch-rtp-003.cisco.com [64.101.220.143]) by alln-core-2.cisco.com (8.14.5/8.14.5) with ESMTP id u6BIOIAm014417 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Mon, 11 Jul 2016 18:24:19 GMT
Received: from xch-rtp-017.cisco.com (64.101.220.157) by XCH-RTP-003.cisco.com (64.101.220.143) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Mon, 11 Jul 2016 14:24:18 -0400
Received: from xch-rtp-017.cisco.com ([64.101.220.157]) by XCH-RTP-017.cisco.com ([64.101.220.157]) with mapi id 15.00.1210.000; Mon, 11 Jul 2016 14:24:18 -0400
From: "Keyur Patel (keyupate)" <keyupate@cisco.com>
To: Robert Raszuk <robert@raszuk.net>, "Acee Lindem (acee)" <acee@cisco.com>
Thread-Topic: Shortest Path Routing Extensions for BGP Protocol
Thread-Index: AQHR26FvxIpLOkGbSUaecUKuYrsBFg==
Date: Mon, 11 Jul 2016 18:24:17 +0000
Message-ID: <D3A931B4.465B6%keyupate@cisco.com>
References: <CA+b+ERnYMUuB7Ps7SKrzQg0QFsPk-g2AdkWcDG+mF-9XVxJh5g@mail.gmail.com> <D3A54785.6931E%acee@cisco.com> <CA+b+ERn_o0Z-FvB=LdBP3kX2xxQioMXE4xyHO_f5AEmGHQ8y5g@mail.gmail.com>
In-Reply-To: <CA+b+ERn_o0Z-FvB=LdBP3kX2xxQioMXE4xyHO_f5AEmGHQ8y5g@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.4.9.150325
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.154.161.180]
Content-Type: multipart/alternative; boundary="_000_D3A931B4465B6keyupateciscocom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/QTVcTHlgmagwNBbZDvOqMcBt9Ao>
Cc: "Derek Man-Kit Yeung (myeung)" <myeung@cisco.com>, idr wg <idr@ietf.org>
Subject: Re: [Idr] Shortest Path Routing Extensions for BGP Protocol
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Jul 2016 18:24:22 -0000

Hi Robert,

Comment inlined #Keyur.

From: Robert Raszuk <robert@raszuk.net<mailto:robert@raszuk.net>>
Date: Saturday, July 9, 2016 at 12:49 PM
To: "Acee Lindem (acee)" <acee@cisco.com<mailto:acee@cisco.com>>
Cc: Keyur Patel <keyupate@cisco.com<mailto:keyupate@cisco.com>>, "Abhay Roy (akr)" <akr@cisco.com<mailto:akr@cisco.com>>, "Derek Man-Kit Yeung (myeung)" <myeung@cisco.com<mailto:myeung@cisco.com>>, "Venu Venugopal (venuv)" <venuv@cisco.com<mailto:venuv@cisco.com>>, idr wg <idr@ietf.org<mailto:idr@ietf.org>>
Subject: Re: Shortest Path Routing Extensions for BGP Protocol

Hi AC,

Many thx for responding.

As far as your question 100K+ was about number of compute nodes, virtual compute nodes in flat routing (enabled via SR-IOV) & switches together as in your proposal each of them is a node in the SPT. Unless you are planning to introduce analogy to ospf areas here :)

In any case the most powerful property of BGP is hierarchy and (multi-level) indirection via next hop. In normal BGP deployments next hops which are carried in BGP are resolved in IGP. In DC cases where eBGP is used between all fabric stages next hops resolves to connected routes.

In your case in flat routing you no longer have such recursion.

However I assume your proposal can coexist with other SAFIs and say external to the DC desitnations will be still carried in 1/1 or 2/1 and their next hops will be resolved by routes inserted into RIB by running SPF correct?

#Keyur: Yep. Other SAFIs can totally use the draft extensions to recurse their next hops.

Regards,
Keyur

Last .. LFA is a great technology however I am not sure if enabling it in topologies where you have 10s of ECMP active paths and where you are ready by design for a failure handling is a right thing to do. What are other real use cases for giving up on pure BGP best path selection and normal BGP multipath ?

Best regards,
R.


On Fri, Jul 8, 2016 at 6:22 PM, Acee Lindem (acee) <acee@cisco.com<mailto:acee@cisco.com>> wrote:
Hi Robert,

Thanks for engaging…

From: <rraszuk@gmail.com<mailto:rraszuk@gmail.com>> on behalf of Robert Raszuk <robert@raszuk.net<mailto:robert@raszuk.net>>
Date: Friday, July 8, 2016 at 11:51 AM
To: "Keyur Patel (keyupate)" <keyupate@cisco.com<mailto:keyupate@cisco.com>>, Acee Lindem <acee@cisco.com<mailto:acee@cisco.com>>, "Abhay Roy (akr)" <akr@cisco.com<mailto:akr@cisco.com>>, "Derek Man-Kit Yeung (myeung)" <myeung@cisco.com<mailto:myeung@cisco.com>>, "Venu Venugopal (venuv)" <venuv@cisco.com<mailto:venuv@cisco.com>>
Cc: IDR List <idr@ietf.org<mailto:idr@ietf.org>>
Subject: Shortest Path Routing Extensions for BGP Protocol

Hi,

I have reviewed your proposal.

Turning path vector or distance vector protocol into link state carrier is no doubt a bold idea :).

“Fortune befriends the bold.” -Emily Dickinson


Effectively what you are proposing is to use BGP TCP sessions to propagate link state database creating first "link vector protocol" !

For start I have few questions:

Q1:

RFC7752 has gone via lot of efforts (especially sections 3.2 and up) to include number of OSPF or ISIS specific encodings. In your proposal you mentioned OSPF twice and not even once ISIS. Does it mean that you are not going to use all encoding for specific IGPs as defined in RFC7752 ?

We are using the BGP Protocol-ID defined for BGP-EPE. The IANA request will be generalized from “BGP-EPE” to “BGP” in support of Segment Routing and other enhancements. The BGP-LS NLRI are specific BGP and not either IS-IS or OSPF.


Q2:

Who creates and maintains LSDB in each BGP speaker ? Are you planning to run OSPF and or ISIS except disable it to establish any adjaciencies ?

I’ve seen designs like this in my time but have never been of a fan of them ;^). BGP will do the SPF directly and maintain the SPT. You’ll note that a simplified SPF is already done for ORR.


Q3:

Currently there are already to models to build DCs with BGP ... one uses BGP to create only lean underlay the other is to use BGP for both underlay and tenants (example project Calico for the latter). With that scale wise I think your proposal will work great for the former. However I do have concerns about using your model for the latter where say 10,000 or 100,000 /32s or /128s from each VMs are injected and you need to construct SPT with all of those.

Similar to those designs, the SPT could be limited to the underlay. However, if there is no requirement for the benefits of L2 or L3 VPNs, I see no reason why these 100Ks of leafs VM prefixes in DC center couldn’t be supported.


Q4:

Related to Q3 in your model and say flat DC routing each compute node other then just injecting 10s of /32s and being "done" now becomes an IGP node. Since your document explicitely targets Massively Scaled Data Centers (MSDCs) I am concerned that having 100,000+ IGP nodes and in many case much more is not the best idea.

100K+ switches in a single DC fabric (i.e., BGP routing domain)? I have some experience in link-state protocols and I can tell you that OSPF is I/O bound mainly due to the flooding. If done right, the SPF calculation can be done with minimal computation. While BGP-LS isn’t the world's most compact encoding, it is completely incremental.


Q5:

Have you considered just proposing an OSPF route reflector instead without stuffing BGP into the mix ? As some of you perhaps remember the work on this started around year 2000 to optimize PE-CE CSC deployments :) It seems to me very reusable for this goal.

We looked at lots of alternatives and this one seemed like the best one. Please pass me a pointer to the work you mention.

Thanks,
Acee



Best regards,
Robert.