Re: [Rift] RIFT strange behaviours discussed today
Tony Przygienda <tonysietf@gmail.com> Sun, 17 May 2020 17:35 UTC
Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EF1AD3A09EA for <rift@ietfa.amsl.com>; Sun, 17 May 2020 10:35:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.197
X-Spam-Level:
X-Spam-Status: No, score=-0.197 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LlkHmZqY7skv for <rift@ietfa.amsl.com>; Sun, 17 May 2020 10:35:46 -0700 (PDT)
Received: from mail-il1-x129.google.com (mail-il1-x129.google.com [IPv6:2607:f8b0:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1BB943A09E9 for <rift@ietf.org>; Sun, 17 May 2020 10:35:46 -0700 (PDT)
Received: by mail-il1-x129.google.com with SMTP id n11so7621508ilj.4 for <rift@ietf.org>; Sun, 17 May 2020 10:35:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lrXYqPg2r91TfekSvmKeupfRgUApaC1krc0UVMroh+A=; b=riKMOji+gYcUaqEr0l96hg6Q33pL3z/+7QZr9rteWNbLvhhmRAZlaCVQwZtKYV1jot BoRDffWRkOgf5RHH3qcVDo0k/Lcd7jKE76UGRbhTXgU+c5MpgtLBw7ED/Hq31OxRmhzZ QPjzfLIbya2Gdhzt7rc+da60mtLxBagaPm949h/RNwWCAsHnq0h6EXyE5gNqptad4MrY TA6hgoZbVqhaR3CPitNPPWAhV1XGFRCEe8j5rcHBX3meRFn+zYENX8/jAdO4D0AXD4KZ A6nuwlq+52JiGbYVeTnJD7dlRqKOss95EWCz9AI3a107fqNWhGrph0A4C/WZN1MSSfBI 2GIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lrXYqPg2r91TfekSvmKeupfRgUApaC1krc0UVMroh+A=; b=fEDvMVN8OPjRyPHI6c4ftPpKHSOOI9BV02TqSmV+G3z9LBV7wwZRIE0qmFrvE40WM/ 2i7Qd+2S4baTtVC7nJtGDo8UXAZnqae+9B+2uvEqZ1dsvno4X0luxRR81NiyJSd5GmJx 6o92qg9TR8jptkW5xM29R/HtXWAIh5hYNbF625sWdJirpow+IAYH//fozizDxFUTdGob e/Wu4U5VUh2ncnv20dnGxtfD5VTNnSf1e9xoR1pTYrcxzFTpfmn+v1pqr6WEkRFiCVVS 4xgYlBVYaxjqdC2c3qRaHzXd6iKUBm4YaB0OtaJ9AuRMx8z2Hp2XaNW+RTBjCZMF5NgH wChw==
X-Gm-Message-State: AOAM531/2XbcefYP+v54VoDorLKSEV4J5ldAq/cJHuVF4w2j4l8jNoXO wb+fgidTyrDJJj/apmXTiiEi7bSYFLNz5sXuaS0=
X-Google-Smtp-Source: ABdhPJwMOhXN9BZYdyFxWiFDnEpg2P80z1Dv/84PP41Y16CM6M9S7kF4IAf6cH9AJjrxqlHp5Wk/iTuGyHcam/DtvGo=
X-Received: by 2002:a92:4909:: with SMTP id w9mr13077095ila.302.1589736945262; Sun, 17 May 2020 10:35:45 -0700 (PDT)
MIME-Version: 1.0
References: <4448e295-b6c3-d826-92db-1dd1ee7d5996@os.uniroma3.it> <db500ae6-12e0-b7a3-5f04-4400a4d2c384@os.uniroma3.it> <A04E3EEF-EC22-4A9D-89FF-238D77F574D9@gmail.com> <B1E9CAC8-D4ED-4712-ABCE-595C562A1420@gmail.com> <fc9abc89-b73c-d43e-c069-8fbdc666cd2f@os.uniroma3.it> <6A19CEF0-606E-4CE9-9459-A7CB46BE59EA@gmail.com> <148a7b61-7dd0-ab67-9c77-61e337cdf953@os.uniroma3.it> <732175B4-C269-4B44-BFA6-0D6118436CE0@gmail.com> <VI1PR04MB531289F7C4D22B0C385D6980C1A50@VI1PR04MB5312.eurprd04.prod.outlook.com> <9C5C0379-5277-4D38-BC31-E1898C8A2B18@gmail.com> <F861539E-1C0B-4ECF-9DC5-CE9C0BC276A2@gmail.com> <d0b6be4c-ccb7-f3d6-bd03-530bfeb9141f@os.uniroma3.it> <A334FA3A-173D-4242-B3DD-0448C8C948EE@gmail.com> <0E36293F-96E0-4994-ADFC-A4A819C98153@gmail.com> <f2832f67-7453-8819-20ce-08af56a1dbcf@os.uniroma3.it> <f39dab64-d80a-cbb6-35c8-9fc1d9529439@os.uniroma3.it> <9b3a2597-2e8d-6a15-1b23-894d0bf91d58@os.uniroma3.it> <d0f94342-a7e6-8c36-72e1-19260bd5d1a6@os.uniroma3.it> <af31e987-3d57-b90f-e12b-c63e694382d6@os.uniroma3.it> <869B95E0-99D2-495E-9D90-E4CCFE93E0A9@gmail.com>
In-Reply-To: <869B95E0-99D2-495E-9D90-E4CCFE93E0A9@gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Sun, 17 May 2020 10:34:06 -0700
Message-ID: <CA+wi2hMLsqN7pQ8qSTiwrXwt=qf86t=uuaRf7ibmxiBrG=iMtQ@mail.gmail.com>
To: Bruno Rijsman <brunorijsman@gmail.com>
Cc: Mariano Scazzariello <mscazzariello@os.uniroma3.it>, "tommasocaiazzi@gmail.com" <tommasocaiazzi@gmail.com>, rift@ietf.org
Content-Type: multipart/alternative; boundary="000000000000e8a5d205a5db7929"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/xQM9gw0eYgVveCsJq1g91uOca7Q>
Subject: Re: [Rift] RIFT strange behaviours discussed today
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 May 2020 17:35:49 -0000
nice drilling but once it starts around convergence speed/scalability python is really not that much your friend. You'd need a proper system language. Mariano, once you start to push that kind of envelope you probably want to slowly test our scaled-up version ... -- tony On Sat, May 16, 2020 at 3:51 PM Bruno Rijsman <brunorijsman@gmail.com> wrote: > Hi Mariano and Tomasso, > > > I have made two big changes to the way messages are queued in RIFT-Python. > > This code has been committed to the master branch. > > So it is ready for you to test, just in time for the paper deadline. > Although it is a bit risky to introduce such a large change with not much > time to fully “soak it in” before the paper deadline. Also, starting Monday > morning I will be unreachable for a few days. I leave it up to you to > decide whether or not you want to use this new code for the paper. > > > Big change #1 > > In the old code, the TIES_TX, TIES_ACK, and TIES_REQ queue were all > serviced only once per second, no matter what. The TIES_RTX queue didn’t > really do anything. > > TIDEs were sent once per 2 seconds (that did not change). > > In the new code, there are two queue for TIEs, TIE ACKs, and TIE REQs: > > (a) A fast tx_queue for the initial transmission after the item is > enqueued. It is serviced every 50ms. (With an optimization to make sure we > don’t run such a fast timer unless it is really needed - i.e. unless there > is at least one entry in some fast queue.) > > (b) A slow rtf_queue for subsequent retransmission if needed. It is > serviced every 1000ms. > > See the new file msg_queues.py for details. > > > Big change #2 > > In the old code, if a node regenerated a local TIE for any reason, it was > not immediately reflooded. > > Instead we would have to wait up to 2 seconds for the regenerated TIE to > be advertised in a TIDE, and then up to 1 second for the other side to > request it, and the up to another 1 second for this node to react to the > request. > > I changed that: if a local TIE is regenerated for any reason, the TIE > itself is immediately put on the fast TIE tx_queue, so it will be sent > after 50ms. > > > These two changes combined make most of the problems that you saw go away. > > You used see that a node would send a TIE, and then a second later *the > exact same* version of that TIE (same tie-nr, same seq-nr). > > This was because the TIRE was not sent fast enough to ack the TIE. > > That behavior should go away — the TIRE is now sent much faster. > > In general, convergence should be much faster. > > > You may see some other behavior that you did not see before. > > If you kill a node in the topology, you may see some “additional > intermediate states” because reconvergence is so much faster. > > Consider for example, that node X is killed, and node X had adjacencies > with Y1, Y2, Y3, Y4, …., Yn > > Each of those neighbor nodes Y1, Y2, …., Yn will lose their adjacency with > X, and reflood their local node TIE. > > Now, consider that Y1, Y2, …., Yn are also all adjacent to node Z (this is > quite common in Clos topologies). > > So, Z is going to receive updated node TIEs from Y1, Y2, Y3, …., Yn. > > And (this is the important part), Z may ALSO receive pos-disagg-prefix and > neg-disagg-prefix TIEs from Y1, Y2, …., Yn. > > Z receives these multiple pos-disagg-prefix and neg-disagg-prefix messages > asynchronously. > > That may cause Z to “change it mind” a few times about whether and what it > should disaggregate itself, and that may cause Z to originate multiple > different versions (i.e. sequence numbers) of its own pos-disagg-prefix or > neg-disagg-prefix message in quick succession. > > This will also have the effect to sending more messages than you might > expect, but this is really different from the behavior you were seeing > before. > > Before you were seeing the identical TIE (same seq-nr) being sent multiple > times. I would consider that to be real bug. > > Now you might see multiple versions of the same TIE (same tie-nr, > different seq-nr) being sent multiple times. I don’t consider that to be a > bug. Not a bug in the code at least. It is just a consequence of the RIFT > protocol reacting quickly to multiple adjacencies going down in quick > succession after a node failure. > > > I have a gut feeling that this “hunting” behavior will be less if we do > negative disaggregation everywhere (and no positive disaggregation). > > I am adding a “disaggregation: negative-only” configuration knob to test > that hypothesis. > > Don’t try it out yet, the code for this new knob is not finished yet. > > I will update you when it is. > > > — Bruno > > > > > > > > > On May 15, 2020, at 10:43 AM, Mariano Scazzariello < > mscazzariello@os.uniroma3.it> wrote: > > Hi Bruno, > any news with the duplicated packets issue? > > We just want to know that since 19th of May is the abstract submission > deadline, so we should know if we can match it in time. > Our "internal deadline" is Sunday 17th of May. > > Thanks, > Mariano and Tommaso. > Il 12/05/2020 13:14, Mariano Scazzariello ha scritto: > > Hi Bruno, > sorry for the spamming :D. > > Today me and Tommy further investigated on the problem that I reported > yesterday. > It seems that it is caused by TIEs sent multiple times by the same node. > We are not sure, we'll keep investigating, however we would like to have a > feedback from you. > > EXAMPLE OF MULTIPLE NEG DISAGG TIES > This example highlights what we saw yesterday during the call. > The first two packets are correct, since the ToFs send them to > spine_2_1_1. However after some time, the same ToFs resend the same neg > disagg TIEs, in fact the comparison is 0 (in red). > > *Sender Receiver Originator TIE > Type In DB? If so, comparison result.* > tof_1_2_1:if1 spine_2_1_1 121 > Neg-Dis-Prefix No > tof_1_2_2:if1 spine_2_1_1 122 > Neg-Dis-Prefix No > .... > tof_1_2_1:if1 spine_2_1_1 121 > Neg-Dis-Prefix Yes, comparison is 0 > tof_1_2_2:if1 spine_2_1_1 122 > Neg-Dis-Prefix Yes, comparison is 0 > ... > tof_1_2_2:if1 spine_2_1_1 122 > Neg-Dis-Prefix Yes, comparison is 0 > > > EXAMPLE OF MULTIPLE NODE TIES > Here we can see that the ToFs send to spine_2_1_1 a new node TIE that is > stored and reflected correctly (in green). Strange TIEs are highlighted in > red. > > *Sender Receiver Originator TIE > Type In DB? If so, comparison result.* > tof_1_2_1:if1 spine_2_1_1 121 Node > Yes, comparison is -1 > tof_1_2_2:if1 spine_2_1_1 122 Node > Yes, comparison is -1 > spine_2_1_1:if2 tof_1_2_1 122 Node > Yes, comparison is -1 <- CORRECT > REFLECTION > spine_2_1_1:if3 tof_1_2_2 121 Node > Yes, comparison is -1 <- > CORRECT REFLECTION > ... > tof_1_2_1:if1 spine_2_1_1 121 Node > Yes, comparison is -1 <- Spine > receives Node TIE from tof_1_2_1 different from the one stored in the DB, > should reflect to tof_1_2_2 > tof_1_2_2:if1 spine_2_1_1 122 Node > Yes, comparison is -1 <- Spine > receives Node TIE from tof_1_2_2 different from the one stored in the DB, > should reflect to tof_1_2_1 > spine_2_1_1:if2 tof_1_2_1 122 Node > Yes, comparison is 0 <- > Reflection of tof_1_2_2 TIE to tof_1_2_1. Why comparison is 0 on ToF, if it > is -1 on spine (it reflects wrong TIE?)? > spine_2_1_1:if3 tof_1_2_2 121 Node > Yes, comparison is 0 <- > Reflection of tof_1_2_1 TIE to tof_1_2_2. Why comparison is 0 on ToF, if > it is -1 on spine (it reflects wrong TIE?)? > ... > tof_1_2_1:if1 spine_2_1_1 121 Node > Yes, comparison is 0 <- > tof_1_2_1 resends to spine_2_1_1 its own node TIE > tof_1_2_2:if1 spine_2_1_1 122 Node > Yes, comparison is 0 <- > tof_1_2_2 resends to spine_2_1_1 its own node TIE > > Hope this is useful to hunt the problem! > Mariano and Tommaso. > Il 12/05/2020 00:55, Mariano Scazzariello ha scritto: > > Another little update, the last for today since it's 1AM :D > > The final scenario is: > tof_1_2_2 sends its node TIE to its southbound adjacencies > (spine_2_1_1/spine_3_1_1/spine_4_1_1). Each spine reflects it to tof_1_2_1. > tof_1_2_1 resends this TIE to spine_2_1_1/spine_3_1_1/spine_4_1_1. Spines > bounce it back to tof_1_2_2. > > So, the final question is: is that correct that the ToF re-sends > southbound the node TIE received by southern reflection? Or is this a > strange behaviour? > > Good night, > Mariano and Tommaso. > Il 12/05/2020 00:06, Mariano Scazzariello ha scritto: > > Little update: > the southern reflection is working properly. What we're seeing is this > behaviour (similar to n.3 of previous mail). As example: > > After spine_1_1_1 failure, tof_1_2_1 sends a node TIE (with seq n.7, > originated by tof_1_2_2) to spine_2_1_1/spine_3_1_1/spine_4_1_1. Spines > bounce it back to tof_1_2_2. > > It seems that tof_1_2_1 reflects something that is already coming from a > reflection since originator is tof_1_2_2 (?) Is that possible? Also, is > this correct that a TIE is reflected to the same node that originated it? > > We'll keep you updated. > Mariano and Tommaso. > Il 11/05/2020 23:09, Mariano Scazzariello ha scritto: > > Hi Bruno, > as disussed today we'll report the strange behaviours found in RIFT. > > 1. *ToFs/Spines sending more than one neg disagg TIE* > > Steps to reproduce: build a FT with K=2 and R=1, after convergence destroy > spine_1_1_1. This is the state after the failure (don't mind the numbers > :D). > <jdcclkbcdlgpfggo.png> > > In the figure below, spine_3_1_1 interface 0 (connected to leaf_3_0_1) > sends 2 neg disagg TIEs after failure time. > > <jnilpccgldahgdmk.png> > > 2.* Southern Reflection bounces PDUs back to the sender* > Same scenario as before, spine_4_1_1 interface 3 (connected to tof_1_2_2) > bounces back packets to tof_1_2_2 instead of sending them to tof_1_2_1 > <fgffmbbocoomlknk.png> > > *UPDATE: *We found that, for some reason, spine_3_1_1 sends reflection > correctly. Interface 3 (connected to tof_1_2_2) sends TIEs coming from > tof_1_2_1: > <hbodnfifioifnhng.png> > Me and Tommaso are investigating on that, however we have some > difficulties on finding the code that reflects TIEs. Is the > `unsol_flood_tie_packet_info` in Node class, right? > > 3. > *Another strange behaviour? * > > tof_1_2_2 sends to spine_2_1_1 some Node TIEs originated by tof_1_2_1, why? > <poionlajflebeigb.png> > This screen wraps both the duplicated packets problem (229-233) and packet > 228 is the "strange one". > > Hope to hear from you soon, > Mariano and Tommaso. > > > _______________________________________________ > RIFT mailing list > RIFT@ietf.org > https://www.ietf.org/mailman/listinfo/rift >
- Re: [Rift] RIFT strange behaviours discussed today Bruno Rijsman
- Re: [Rift] RIFT strange behaviours discussed today Tony Przygienda