Re: [Rift] [RIFT][Non equal cost anycast]

<xu.benchong@zte.com.cn> Mon, 29 July 2019 03:41 UTC

Return-Path: <xu.benchong@zte.com.cn>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 24D581200D5 for <rift@ietfa.amsl.com>; Sun, 28 Jul 2019 20:41:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.198
X-Spam-Level:
X-Spam-Status: No, score=-4.198 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OzsPI7dCdvWE for <rift@ietfa.amsl.com>; Sun, 28 Jul 2019 20:41:39 -0700 (PDT)
Received: from mxhk.zte.com.cn (mxhk.zte.com.cn [63.217.80.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3D6AE120047 for <rift@ietf.org>; Sun, 28 Jul 2019 20:41:38 -0700 (PDT)
Received: from mxct.zte.com.cn (unknown [192.168.164.217]) by Forcepoint Email with ESMTPS id BA1CFD1EC7072E784DE6; Mon, 29 Jul 2019 11:41:36 +0800 (CST)
Received: from mse-fl1.zte.com.cn (unknown [10.30.14.238]) by Forcepoint Email with ESMTPS id 91196771E1D8342D103C; Mon, 29 Jul 2019 11:41:36 +0800 (CST)
Received: from njxapp04.zte.com.cn ([10.41.132.203]) by mse-fl1.zte.com.cn with SMTP id x6T3eZC6068014; Mon, 29 Jul 2019 11:40:35 +0800 (GMT-8) (envelope-from xu.benchong@zte.com.cn)
Received: from mapi (njxapp01[null]) by mapi (Zmail) with MAPI id mid201; Mon, 29 Jul 2019 11:40:35 +0800 (CST)
Date: Mon, 29 Jul 2019 11:40:35 +0800
X-Zmail-TransId: 2af95d3e6ab3301ba89b
X-Mailer: Zmail v1.0
Message-ID: <201907291140354336620@zte.com.cn>
In-Reply-To: <CA+wi2hORAFJ3uUg-NOS5BLXFwPQoi+Y9n-DyMkfmoftH2FBJtQ@mail.gmail.com>
References: CA+wi2hMg6gx_nnHCu7iP9S3snAjL=qAWObx3Hh=bUgzF=vz+3A@mail.gmail.com, CA+wi2hORAFJ3uUg-NOS5BLXFwPQoi+Y9n-DyMkfmoftH2FBJtQ@mail.gmail.com
Mime-Version: 1.0
From: xu.benchong@zte.com.cn
To: tonysietf@gmail.com
Cc: prz@juniper.net, rift@ietf.org
Content-Type: multipart/mixed; boundary="=====_001_next====="
X-MAIL: mse-fl1.zte.com.cn x6T3eZC6068014
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/rrH-LXJ8If5vJhgVegxdpf1QTgQ>
Subject: Re: [Rift] [RIFT][Non equal cost anycast]
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jul 2019 03:41:43 -0000

Tony,

The description is very clear, And I got more than I expected!

Thank you!

Benchong







原始邮件



发件人:TonyPrzygienda <tonysietf@gmail.com>
收件人:徐本崇10065053;
抄送人:Antoni Przygienda <prz@juniper.net>;rift@ietf.org <rift@ietf.org>;
日 期 :2019年07月29日 10:49
主 题 :Re: [Rift] [RIFT][Non equal cost anycast]





Hey Benchong, observe that RIFT is only giving suggested computations and does not provide a prescriptive language for them. In simpler terms, as long we follow valley-free routing (packet goes up until it turns down) we can use any nexthop available giving you reachability to the prefix in the desired direction really  (and in both cases you must do LPM based on the information provided to prevent blackholing). And then in both directions you can load-balance as you please. This allows to saturate the whole fabric while balancing on any metric, where BAD northbound is a metric that had good response from people who reviewed how they would like an IP fabric protocol to work and that's why it's included. One could define something like BAD as well in southbound direction but it's not that clear what makes sense there, simplest is SPF, next would be all-paths, then balancing on some kind of bandwidth in southbound towards the leaf (but that's where it gets hard if you think about it). 


In any case, if we want anycast to work properly we have to build an next-hop that includes all nodes that advertised the prefix, yes. But obviously we can also just include the shortest SPF node & it's still anycast, just a very badly balancing one if distances are uneven ;-) ... 


Valley-free routing in RIFT (obviously possible only since we have rank ordering of the nodes, i.e. a top and a bottom) frees us from the shackles of SPF and moves things closer to https://en.wikipedia.org/wiki/Maximum_flow_problem in a simple way since employing secondary technique in traditional routing like CSPF and so on pre-conditions much more complex machinery and source routing mechanisms like MPLS/SR in the fabric which would make it obviously more expensive (the finer we want to control our commodity flow the more state we stick on the packet and/or into the network) and less reactive to changes (due to the necessary state distribution into intermediate points once the topology converged). Whereas in plain RIFT we stick to valley-free hop-by-hop routing which though it does not allow much granularity except prefix  makes up by being pliable, simple & hence cheap and robust (which seems very desirable in fabric where bandwidth substitues for smarts [as IP did in most of its history]). And yes, SR/MPLS can be made to work on RIFT but that's a different discussion altogether. In a sense, if we advertise unsolicited label bindings on LIEs (which we optionally include)  we have a flavor of LDP already for free if I'm not mistaken ... 


--- tony 





On Sun, Jul 28, 2019 at 7:01 PM <xu.benchong@zte.com.cn> wrote:







Tony

Does this mean that when S-SPF calculate, we cannot select the best nexthop with the smallest distance, all the nexthops must be added to the forwarding table, and distance is used as the basis for “Non equal cost anycast” on the forwarding plane.

N-SPF uses the method of 5.3.6.1 and the BAD instead the distance.




--Benchong










原始邮件



发件人:TonyPrzygienda <tonysietf@gmail.com>
收件人:徐本崇10065053;
抄送人:Antoni Przygienda <prz@juniper.net>;rift@ietf.org <rift@ietf.org>;
日 期 :2019年07月27日 03:34
主 题 :Re: [Rift] [RIFT][Non equal cost anycast]



Benchong, as always, when people start implement they start to ask the real questions ;-) Yes, any cast in RIFT is much closer to what you would consider “true any cast” than IP is which is really just ECMP on same address. In RIFT anycast on different distance nodes is a normal thing. 

First, it it important to understand the difference between mobility and any cast on the fabric. if a prefix moves on the fabric without using the mobility attributes it can appear in two locations @ once of course (if the new TIE floods faster than the previous location manages to purge the prefix). That's not a proper anycast of course, that's just an artefact. If the prefix properly attaches timestamps by some means (such as 6lo) it will be understood as having moved, otherwise it will be any cast for a bit. 


And then, of course there is true any cast which is equal to two prefixes advertised from two nodes being equal. RIFT is loop-free which means that it doesn’t really care all that much about distance so if a packet enters from ToF it can be forwarded to any leaf showing any cast. That allows true “service on any cast” architecture. In case when you route from the leaf the packet will use default (unless an implementation does their own things the spec doesn’t mention but doesn’t suppress either) until it pops up far enough it sees any cast @ which point in time it will turn the packet south (assuming all anycast is on leafs). if balancing of any cast over whole metric is desired then the packet needs to be pushed all the way to the ToF using tunnels or some other solution. 


So, basically, any cast is just a funky next-hop on a prefix that can point to two different nodes & metric can be used to balance or ignored and the spec does not need to say more I think. 


This little diatribe should make it into RIFT applicability statement in some form I think ... 


-- tony 





On Fri, Jul 26, 2019 at 5:57 AM <xu.benchong@zte.com.cn> wrote:



Hi,Tony

Can you talk about REQ6, How does RIFT support it? Is it benefiting from the default route?

"  REQ6:    Non equal cost anycast must be supported to allow for easy

            and robust multi-homing of services without regressing to

            careful balancing of link costs."





Thank you!


Benchong






_______________________________________________
 RIFT mailing list
 RIFT@ietf.org
 https://www.ietf.org/mailman/listinfo/rift