Re: [Rift] RIFT strange behaviours discussed today

Tony Przygienda <tonysietf@gmail.com> Sun, 17 May 2020 17:35 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EF1AD3A09EA for <rift@ietfa.amsl.com>; Sun, 17 May 2020 10:35:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.197
X-Spam-Level:
X-Spam-Status: No, score=-0.197 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LlkHmZqY7skv for <rift@ietfa.amsl.com>; Sun, 17 May 2020 10:35:46 -0700 (PDT)
Received: from mail-il1-x129.google.com (mail-il1-x129.google.com [IPv6:2607:f8b0:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1BB943A09E9 for <rift@ietf.org>; Sun, 17 May 2020 10:35:46 -0700 (PDT)
Received: by mail-il1-x129.google.com with SMTP id n11so7621508ilj.4 for <rift@ietf.org>; Sun, 17 May 2020 10:35:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lrXYqPg2r91TfekSvmKeupfRgUApaC1krc0UVMroh+A=; b=riKMOji+gYcUaqEr0l96hg6Q33pL3z/+7QZr9rteWNbLvhhmRAZlaCVQwZtKYV1jot BoRDffWRkOgf5RHH3qcVDo0k/Lcd7jKE76UGRbhTXgU+c5MpgtLBw7ED/Hq31OxRmhzZ QPjzfLIbya2Gdhzt7rc+da60mtLxBagaPm949h/RNwWCAsHnq0h6EXyE5gNqptad4MrY TA6hgoZbVqhaR3CPitNPPWAhV1XGFRCEe8j5rcHBX3meRFn+zYENX8/jAdO4D0AXD4KZ A6nuwlq+52JiGbYVeTnJD7dlRqKOss95EWCz9AI3a107fqNWhGrph0A4C/WZN1MSSfBI 2GIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lrXYqPg2r91TfekSvmKeupfRgUApaC1krc0UVMroh+A=; b=fEDvMVN8OPjRyPHI6c4ftPpKHSOOI9BV02TqSmV+G3z9LBV7wwZRIE0qmFrvE40WM/ 2i7Qd+2S4baTtVC7nJtGDo8UXAZnqae+9B+2uvEqZ1dsvno4X0luxRR81NiyJSd5GmJx 6o92qg9TR8jptkW5xM29R/HtXWAIh5hYNbF625sWdJirpow+IAYH//fozizDxFUTdGob e/Wu4U5VUh2ncnv20dnGxtfD5VTNnSf1e9xoR1pTYrcxzFTpfmn+v1pqr6WEkRFiCVVS 4xgYlBVYaxjqdC2c3qRaHzXd6iKUBm4YaB0OtaJ9AuRMx8z2Hp2XaNW+RTBjCZMF5NgH wChw==
X-Gm-Message-State: AOAM531/2XbcefYP+v54VoDorLKSEV4J5ldAq/cJHuVF4w2j4l8jNoXO wb+fgidTyrDJJj/apmXTiiEi7bSYFLNz5sXuaS0=
X-Google-Smtp-Source: ABdhPJwMOhXN9BZYdyFxWiFDnEpg2P80z1Dv/84PP41Y16CM6M9S7kF4IAf6cH9AJjrxqlHp5Wk/iTuGyHcam/DtvGo=
X-Received: by 2002:a92:4909:: with SMTP id w9mr13077095ila.302.1589736945262; Sun, 17 May 2020 10:35:45 -0700 (PDT)
MIME-Version: 1.0
References: <4448e295-b6c3-d826-92db-1dd1ee7d5996@os.uniroma3.it> <db500ae6-12e0-b7a3-5f04-4400a4d2c384@os.uniroma3.it> <A04E3EEF-EC22-4A9D-89FF-238D77F574D9@gmail.com> <B1E9CAC8-D4ED-4712-ABCE-595C562A1420@gmail.com> <fc9abc89-b73c-d43e-c069-8fbdc666cd2f@os.uniroma3.it> <6A19CEF0-606E-4CE9-9459-A7CB46BE59EA@gmail.com> <148a7b61-7dd0-ab67-9c77-61e337cdf953@os.uniroma3.it> <732175B4-C269-4B44-BFA6-0D6118436CE0@gmail.com> <VI1PR04MB531289F7C4D22B0C385D6980C1A50@VI1PR04MB5312.eurprd04.prod.outlook.com> <9C5C0379-5277-4D38-BC31-E1898C8A2B18@gmail.com> <F861539E-1C0B-4ECF-9DC5-CE9C0BC276A2@gmail.com> <d0b6be4c-ccb7-f3d6-bd03-530bfeb9141f@os.uniroma3.it> <A334FA3A-173D-4242-B3DD-0448C8C948EE@gmail.com> <0E36293F-96E0-4994-ADFC-A4A819C98153@gmail.com> <f2832f67-7453-8819-20ce-08af56a1dbcf@os.uniroma3.it> <f39dab64-d80a-cbb6-35c8-9fc1d9529439@os.uniroma3.it> <9b3a2597-2e8d-6a15-1b23-894d0bf91d58@os.uniroma3.it> <d0f94342-a7e6-8c36-72e1-19260bd5d1a6@os.uniroma3.it> <af31e987-3d57-b90f-e12b-c63e694382d6@os.uniroma3.it> <869B95E0-99D2-495E-9D90-E4CCFE93E0A9@gmail.com>
In-Reply-To: <869B95E0-99D2-495E-9D90-E4CCFE93E0A9@gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Sun, 17 May 2020 10:34:06 -0700
Message-ID: <CA+wi2hMLsqN7pQ8qSTiwrXwt=qf86t=uuaRf7ibmxiBrG=iMtQ@mail.gmail.com>
To: Bruno Rijsman <brunorijsman@gmail.com>
Cc: Mariano Scazzariello <mscazzariello@os.uniroma3.it>, "tommasocaiazzi@gmail.com" <tommasocaiazzi@gmail.com>, rift@ietf.org
Content-Type: multipart/alternative; boundary="000000000000e8a5d205a5db7929"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/xQM9gw0eYgVveCsJq1g91uOca7Q>
Subject: Re: [Rift] RIFT strange behaviours discussed today
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 May 2020 17:35:49 -0000

nice drilling but once it starts around convergence speed/scalability
python is really not that much your friend. You'd need a proper system
language.

Mariano, once you start to push that kind of envelope you probably want to
slowly test our scaled-up version ...

-- tony

On Sat, May 16, 2020 at 3:51 PM Bruno Rijsman <brunorijsman@gmail.com>
wrote:

> Hi Mariano and Tomasso,
>
>
> I have made two big changes to the way messages are queued in RIFT-Python.
>
> This code has been committed to the master branch.
>
> So it is ready for you to test, just in time for the paper deadline.
> Although it is a bit risky to introduce such a large change with not much
> time to fully “soak it in” before the paper deadline. Also, starting Monday
> morning I will be unreachable for a few days.  I leave it up to you to
> decide whether or not you want to use this new code for the paper.
>
>
> Big change #1
>
> In the old code, the TIES_TX, TIES_ACK, and TIES_REQ queue were all
> serviced only once per second, no matter what. The TIES_RTX queue didn’t
> really do anything.
>
> TIDEs were sent once per 2 seconds (that did not change).
>
> In the new code, there are two queue for TIEs, TIE ACKs, and TIE REQs:
>
> (a) A fast tx_queue for the initial transmission after the item is
> enqueued. It is serviced every 50ms. (With an optimization to make sure we
> don’t run such a fast timer unless it is really needed - i.e. unless there
> is at least one entry in some fast queue.)
>
> (b) A slow rtf_queue for subsequent retransmission if needed. It is
> serviced every 1000ms.
>
> See the new file msg_queues.py for details.
>
>
> Big change #2
>
> In the old code, if a node regenerated a local TIE for any reason, it was
> not immediately reflooded.
>
> Instead we would have to wait up to 2 seconds for the regenerated TIE to
> be advertised in a TIDE, and then up to 1 second for the other side to
> request it, and the up to another 1 second for this node to react to the
> request.
>
> I changed that: if a local TIE is regenerated for any reason, the TIE
> itself is immediately put on the fast TIE tx_queue, so it will be sent
> after 50ms.
>
>
> These two changes combined make most of the problems that you saw go away.
>
> You used see that a node would send a TIE, and then a second later *the
> exact same* version of that TIE (same tie-nr, same seq-nr).
>
> This was because the TIRE was not sent fast enough to ack the TIE.
>
> That behavior should go away — the TIRE is now sent much faster.
>
> In general, convergence should be much faster.
>
>
> You may see some other behavior that you did not see before.
>
> If you kill a node in the topology, you may see some “additional
> intermediate states” because reconvergence is so much faster.
>
> Consider for example, that node X is killed, and node X had adjacencies
> with Y1, Y2, Y3, Y4, …., Yn
>
> Each of those neighbor nodes Y1, Y2, …., Yn will lose their adjacency with
> X, and reflood their local node TIE.
>
> Now, consider that Y1, Y2, …., Yn are also all adjacent to node Z (this is
> quite common in Clos topologies).
>
> So, Z is going to receive updated node TIEs from Y1, Y2, Y3, …., Yn.
>
> And (this is the important part), Z may ALSO receive pos-disagg-prefix and
> neg-disagg-prefix TIEs from Y1, Y2, …., Yn.
>
> Z receives these multiple pos-disagg-prefix and neg-disagg-prefix messages
> asynchronously.
>
> That may cause Z to “change it mind” a few times about whether and what it
> should disaggregate itself, and that may cause Z to originate multiple
> different versions (i.e. sequence numbers) of its own pos-disagg-prefix or
> neg-disagg-prefix message in quick succession.
>
> This will also have the effect to sending more messages than you might
> expect, but this is really different from the behavior you were seeing
> before.
>
> Before you were seeing the identical TIE (same seq-nr) being sent multiple
> times.  I would consider that to be real bug.
>
> Now you might see multiple versions of the same TIE (same tie-nr,
> different seq-nr) being sent multiple times.  I don’t consider that to be a
> bug.  Not a bug in the code at least.  It is just a consequence of the RIFT
> protocol reacting quickly to multiple adjacencies going down in quick
> succession after a node failure.
>
>
> I have a gut feeling that this “hunting” behavior will be less if we do
> negative disaggregation everywhere (and no positive disaggregation).
>
> I am adding a “disaggregation: negative-only” configuration knob to test
> that hypothesis.
>
> Don’t try it out yet, the code for this new knob is not finished yet.
>
> I will update you when it is.
>
>
> — Bruno
>
>
>
>
>
>
>
>
> On May 15, 2020, at 10:43 AM, Mariano Scazzariello <
> mscazzariello@os.uniroma3.it> wrote:
>
> Hi Bruno,
> any news with the duplicated packets issue?
>
> We just want to know that since 19th of May is the abstract submission
> deadline, so we should know if we can match it in time.
> Our "internal deadline" is Sunday 17th of May.
>
> Thanks,
> Mariano and Tommaso.
> Il 12/05/2020 13:14, Mariano Scazzariello ha scritto:
>
> Hi Bruno,
> sorry for the spamming :D.
>
> Today me and Tommy further investigated on the problem that I reported
> yesterday.
> It seems that it is caused by TIEs sent multiple times by the same node.
> We are not sure, we'll keep investigating, however we would like to have a
> feedback from you.
>
> EXAMPLE OF MULTIPLE NEG DISAGG TIES
> This example highlights what we saw yesterday during the call.
> The first two packets are correct, since the ToFs send them to
> spine_2_1_1.  However after some time, the same ToFs resend the same neg
> disagg TIEs, in fact the comparison is 0 (in red).
>
> *Sender                    Receiver                 Originator     TIE
> Type                    In DB? If so, comparison result.*
> tof_1_2_1:if1         spine_2_1_1         121
> Neg-Dis-Prefix          No
> tof_1_2_2:if1         spine_2_1_1         122
> Neg-Dis-Prefix          No
> ....
> tof_1_2_1:if1         spine_2_1_1         121
> Neg-Dis-Prefix         Yes, comparison is 0
> tof_1_2_2:if1         spine_2_1_1         122
> Neg-Dis-Prefix         Yes, comparison is 0
> ...
> tof_1_2_2:if1         spine_2_1_1         122
> Neg-Dis-Prefix         Yes, comparison is 0
>
>
> EXAMPLE OF MULTIPLE NODE TIES
> Here we can see that the ToFs send to spine_2_1_1 a new node TIE that is
> stored and reflected correctly (in green). Strange TIEs are highlighted in
> red.
>
> *Sender                     Receiver                    Originator     TIE
> Type                   In DB? If so, comparison result.*
> tof_1_2_1:if1         spine_2_1_1             121                 Node
>                     Yes, comparison is -1
> tof_1_2_2:if1         spine_2_1_1             122                 Node
>                     Yes, comparison is -1
> spine_2_1_1:if2     tof_1_2_1                 122                 Node
>                     Yes, comparison is -1                        <- CORRECT
> REFLECTION
> spine_2_1_1:if3     tof_1_2_2                 121                 Node
>                      Yes, comparison is -1                        <-
> CORRECT REFLECTION
> ...
> tof_1_2_1:if1         spine_2_1_1             121                 Node
>                     Yes, comparison is -1                       <- Spine
> receives Node TIE from tof_1_2_1 different from the one stored in the DB,
> should reflect to tof_1_2_2
> tof_1_2_2:if1         spine_2_1_1             122                 Node
>                     Yes, comparison is -1                       <- Spine
> receives Node TIE from tof_1_2_2 different from the one stored in the DB,
> should reflect to tof_1_2_1
> spine_2_1_1:if2     tof_1_2_1                 122                 Node
>                     Yes, comparison is 0                        <-
> Reflection of tof_1_2_2 TIE to tof_1_2_1. Why comparison is 0 on ToF, if it
> is -1 on spine (it reflects wrong TIE?)?
> spine_2_1_1:if3     tof_1_2_2                 121                 Node
>                     Yes, comparison is 0                        <-
> Reflection of tof_1_2_1 TIE to tof_1_2_2. Why comparison is 0 on ToF, if
> it is -1 on spine (it reflects wrong TIE?)?
> ...
> tof_1_2_1:if1         spine_2_1_1             121                 Node
>                     Yes, comparison is 0                        <-
> tof_1_2_1 resends to spine_2_1_1 its own node TIE
> tof_1_2_2:if1         spine_2_1_1             122                 Node
>                     Yes, comparison is 0                        <-
> tof_1_2_2 resends to spine_2_1_1 its own node TIE
>
> Hope this is useful to hunt the problem!
> Mariano and Tommaso.
> Il 12/05/2020 00:55, Mariano Scazzariello ha scritto:
>
> Another little update, the last for today since it's 1AM :D
>
> The final scenario is:
> tof_1_2_2 sends its node TIE to its southbound adjacencies
> (spine_2_1_1/spine_3_1_1/spine_4_1_1). Each spine reflects it to tof_1_2_1.
> tof_1_2_1 resends this TIE to spine_2_1_1/spine_3_1_1/spine_4_1_1. Spines
> bounce it back to tof_1_2_2.
>
> So, the final question is: is that correct that the ToF re-sends
> southbound the node TIE received by southern reflection? Or is this a
> strange behaviour?
>
> Good night,
> Mariano and Tommaso.
> Il 12/05/2020 00:06, Mariano Scazzariello ha scritto:
>
> Little update:
> the southern reflection is working properly. What we're seeing is this
> behaviour (similar to n.3 of previous mail). As example:
>
> After spine_1_1_1 failure, tof_1_2_1 sends a node TIE (with seq n.7,
> originated by tof_1_2_2) to spine_2_1_1/spine_3_1_1/spine_4_1_1. Spines
> bounce it back to tof_1_2_2.
>
> It seems that tof_1_2_1 reflects something that is already coming from a
> reflection since originator is tof_1_2_2 (?) Is that possible? Also, is
> this correct that a TIE is reflected to the same node that originated it?
>
> We'll keep you updated.
> Mariano and Tommaso.
> Il 11/05/2020 23:09, Mariano Scazzariello ha scritto:
>
> Hi Bruno,
> as disussed today we'll report the strange behaviours found in RIFT.
>
> 1. *ToFs/Spines sending more than one neg disagg TIE*
>
> Steps to reproduce: build a FT with K=2 and R=1, after convergence destroy
> spine_1_1_1. This is the state after the failure (don't mind the numbers
> :D).
> <jdcclkbcdlgpfggo.png>
>
> In the figure below, spine_3_1_1 interface 0 (connected to leaf_3_0_1)
> sends 2 neg disagg TIEs after failure time.
>
> <jnilpccgldahgdmk.png>
>
> 2.* Southern Reflection bounces PDUs back to the sender*
> Same scenario as before, spine_4_1_1 interface 3 (connected to tof_1_2_2)
> bounces back packets to tof_1_2_2 instead of sending them to tof_1_2_1
> <fgffmbbocoomlknk.png>
>
> *UPDATE: *We found that, for some reason, spine_3_1_1 sends reflection
> correctly. Interface 3 (connected to tof_1_2_2) sends TIEs coming from
> tof_1_2_1:
> <hbodnfifioifnhng.png>
> Me and Tommaso are investigating on that, however we have some
> difficulties on finding the code that reflects TIEs. Is the
> `unsol_flood_tie_packet_info` in Node class, right?
>
> 3.
> *Another strange behaviour? *
>
> tof_1_2_2 sends to spine_2_1_1 some Node TIEs originated by tof_1_2_1, why?
> <poionlajflebeigb.png>
> This screen wraps both the duplicated packets problem (229-233) and packet
> 228 is the "strange one".
>
> Hope to hear from you soon,
> Mariano and Tommaso.
>
>
> _______________________________________________
> RIFT mailing list
> RIFT@ietf.org
> https://www.ietf.org/mailman/listinfo/rift
>