Re: [Rift] Device restart problem

<xu.benchong@zte.com.cn> Fri, 25 October 2019 07:44 UTC

Return-Path: <xu.benchong@zte.com.cn>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 412BA120110 for <rift@ietfa.amsl.com>; Fri, 25 Oct 2019 00:44:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.197
X-Spam-Level:
X-Spam-Status: No, score=-4.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4Wo0s8y_w1kM for <rift@ietfa.amsl.com>; Fri, 25 Oct 2019 00:44:43 -0700 (PDT)
Received: from mxhk.zte.com.cn (mxhk.zte.com.cn [63.217.80.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1134B12004C for <rift@ietf.org>; Fri, 25 Oct 2019 00:44:42 -0700 (PDT)
Received: from mxct.zte.com.cn (unknown [192.168.164.215]) by Forcepoint Email with ESMTPS id 86AA9ABFE327E9963A40; Fri, 25 Oct 2019 15:44:39 +0800 (CST)
Received: from mse-fl1.zte.com.cn (unknown [10.30.14.238]) by Forcepoint Email with ESMTPS id 6FEFC19D49E0A98A955A; Fri, 25 Oct 2019 15:44:39 +0800 (CST)
Received: from njxapp02.zte.com.cn ([10.41.132.201]) by mse-fl1.zte.com.cn with SMTP id x9P7h7Bi064223; Fri, 25 Oct 2019 15:43:07 +0800 (GMT-8) (envelope-from xu.benchong@zte.com.cn)
Received: from mapi (njxapp03[null]) by mapi (Zmail) with MAPI id mid201; Fri, 25 Oct 2019 15:43:06 +0800 (CST)
Date: Fri, 25 Oct 2019 15:43:06 +0800 (CST)
X-Zmail-TransId: 2afb5db2a78a1a66e615
X-Mailer: Zmail v1.0
Message-ID: <201910251543067579596@zte.com.cn>
In-Reply-To: <CA+wi2hNN9JrRft2_n0eHmWq4+p2KHdBH3dwQ6pat8Ri02FTrHQ@mail.gmail.com>
References: 201910241652249011192@zte.com.cn, CA+wi2hNN9JrRft2_n0eHmWq4+p2KHdBH3dwQ6pat8Ri02FTrHQ@mail.gmail.com
Mime-Version: 1.0
From: <xu.benchong@zte.com.cn>
To: <tonysietf@gmail.com>
Cc: <rift@ietf.org>
Content-Type: multipart/mixed; boundary="=====_001_next====="
X-MAIL: mse-fl1.zte.com.cn x9P7h7Bi064223
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/oFJgZdaTuEg1wO73H_Yf0h7QE8o>
Subject: Re: [Rift] =?utf-8?q?Device_restart_problem?=
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Oct 2019 07:44:45 -0000

Hi,tony

Thanks for your reply.

When Leaf111 and Spine111 (assuming no other spines) are restarted at the same time, Spine111 can’t flood the received TIDE of ToF21 to Leaf111, and Seq NR can’t be updated -- Is this the second layer of onion;-)

Can we update the lifetime of the tie by tide to avoid sending tie period?

We are developing a verification implementation, and have not yet connect with Bruno's.






Benchong







原始邮件



发件人:TonyPrzygienda <tonysietf@gmail.com>;
收件人:徐本崇10065053;
抄送人:rift@ietf.org <rift@ietf.org>;;
日 期 :2019年10月24日 23:59
主 题 :Re: Device restart problem



hey xu, I see deeper and deepr into implementation, you just found first layer of the onion here BTW ;-)  The seqnr# handling is since times immemorial one of the trickier parts of IGP implementation (but not only, same problems exists in  other places but there, the information is not persistent so problem is not as pressing). 


Multiple mechanisms kick in here

a) the seqnr# is circular which is a very important piece of the puzzle. you cannot generate a "biggest" number no'one can override. math explained in appendix in lots detail. BTW, not my invention, smarter people than me worked stuff out long time ago but there was never a full, detailed, easy to implement writedown AFAIK. 

b) yes, the fact that we flood only northbound prevents via normal "flat flooding"  Leaf111 "getting" its old TIE with a higher sequence number. Using flat flooding south would of course kill largely the scalability of the protocol and make it equivalent to OSPF or ISIS  or any other "normal" link-state approach in terms of flooding complexity (well, flood reduction would still work ;-)
c) However, observe that Table 3 holds the key to the solution. TIDE/South tells you what you need to do to describe your database to the neighbor south. The description from Spine111 includes the description of N-TIEs of Leaf111 and with that Leaf111 can realize that there is a stale N-TIE it originated before reboot and re-issue with a higher sequence number (that's where a] comes into play) 


When you keep on implementing and testing you'll find another very interesting, far more complex case that we solved but I will keep the suspension going ;-)


The observation on the one week is also correct. Done very purposefully. Let's say RIFT runs on 0.5M devices (scale we aim at given multi-homed/overlay originating servers can run it as well). If you assume 5 TIEs per device that's 2.5M TIEs @ the top of the fabric (large but not a scary number compared to what we do with BGP and add/path on daily basis in world's most scalable implementations ;-). If we'd have something like 1hr reorigination we talk  2.5M/24 = 100K re-originations per hour. That gives you a flooding rate into ToFs of 30 TIEs/sec (assuming perfect flood reduction). All that disregarding things like server rebooting or container architectures which will possibly inject lots prefixes on moves/boots and so on. So refresh often is churn that is unnecessary. With 1 week lifetime we're talking 15K TIEs per hour refresh which is a manageable number given we're talking 0.5M devices. 


Observe however that you can issue with any lifetime you choose as a device and RIFT will work (and when emptying TIEs you are supposed to originate with 300secs only). So the 1 week is basically a protocol constant that can be knobbed. 


Let us know when you got first pieces inter'oped with Bruno's open source BTW. Things always become much more clear when implementations are bashed against each other ;-)


--- tony 








On Thu, Oct 24, 2019 at 1:52 AM <xu.benchong@zte.com.cn>; wrote:


Hi, Tony


There is a device restart problem

In draft-ietf-rift-rift-08 Figure 2, N-TIE of Leaf111 flooded to ToF21 via Spine111. Seq NR may be larger.

Leaf111 restarts and regenerates N-TIE. The random seq NR may be small, so that when Spine111 receives it, it will compare the seq NR and discard the new message.

According to the behavior of Appendix c.3.4 b.3, it is hoped that Spine111 sends DBTIE to Leaf111 to update seq NR.

However, according to the flooding range of N-TIE, this message cannot be sent out.

In this way, there will be a large number of invalid N-TIEs in the network for a long time (the default expire time of the protocol is 1 week)

Is this understanding correct? How does rift solve this problem?




Thank you!

Benchong