Re: [Rift] RIFT strange behaviours discussed today

Bruno Rijsman <brunorijsman@gmail.com> Sat, 16 May 2020 22:51 UTC

Return-Path: <brunorijsman@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AC80B3A0BC1 for <rift@ietfa.amsl.com>; Sat, 16 May 2020 15:51:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.197
X-Spam-Level:
X-Spam-Status: No, score=-0.197 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V6hxOhTw_2tl for <rift@ietfa.amsl.com>; Sat, 16 May 2020 15:51:42 -0700 (PDT)
Received: from mail-vk1-xa2a.google.com (mail-vk1-xa2a.google.com [IPv6:2607:f8b0:4864:20::a2a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 279CC3A0BBB for <rift@ietf.org>; Sat, 16 May 2020 15:51:42 -0700 (PDT)
Received: by mail-vk1-xa2a.google.com with SMTP id m18so1502071vkk.9 for <rift@ietf.org>; Sat, 16 May 2020 15:51:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=WL7b698YUu2F1b3SjT9TAwneupLBh9b4Wf7hlseI4UQ=; b=Xvj+rkwbmCyuUi376agx2JO5/SBroCR2Nkx1hNmI4Ke1J8I00eRa48dNhn8oMS8Sv2 kqpcAw7F2Z/uN4thuzRmXzPzMYEodpMpndpoA1VKW4+s/4v2VaAESb3nDW4tPfICxV8B MLKfnscmTzHZOCHy+ZQ+nn9gnsgZH+7VIGYeLUe1SthXL+XLoSk69q8LW5ZY6sv1J0yp avpegAljkGbXZoOURIROWtuQZ6lndXqNMBMwZTsnT9+tAMBmqwxJ+1e1T1BUfWOoX/ad gcjVyuYCAuPcYOOfw8jnmvmbdFI8YnU/Skw64UE7PhRpdCorKeRd6KGh6h948cMzpFzn Sh2A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=WL7b698YUu2F1b3SjT9TAwneupLBh9b4Wf7hlseI4UQ=; b=HtshBHX1jqNxEVGTjkkGtMG77JH27qCZpqB+GAcRpMArTkCCSNbQjDWUVCh6l0U7qe ylVt9+r1XMvBoT8XKH3xHum/Ee6Bedba+FsX46q3PQnO2N0V1GK1CQeYslEtbHH9TVve RTSzmtwsYkZrd3dRXte8FkQbGeJuCHkb/y8GTEznvxr4msZbPqtWPMaJkdsHQEJY5jPn N6qNbVdhosYI+V6oXXFhQdCx+oa6+5LyO+vZwhB76oAKeRocs4nHbd86FW0HgSJV5Um4 oIUfKuIhVEON2yX1Dvd1c210OR7QewbtsLujskWFOJu8tC+gQzJmfa5vP1XVmMo/UpDQ MHlg==
X-Gm-Message-State: AOAM533zqMjWT/5mZOSO5BCF6TugcPchQYpobAK/fX/XQTuho5L95l5z cEe53/WUf9jhuaUR79tito4=
X-Google-Smtp-Source: ABdhPJwLB+S0Fhm+NnByIReLnPzvTSNz4NBOEJTFg3ThJWrmoIBY/vyaxU8xY23DwwRs8ajQbupSEg==
X-Received: by 2002:a1f:cd06:: with SMTP id d6mr6864688vkg.94.1589669500908; Sat, 16 May 2020 15:51:40 -0700 (PDT)
Received: from brunos-macbook.conexion.com (wisp-63.conexion.com. [168.234.219.63]) by smtp.gmail.com with ESMTPSA id a2sm1678762uae.10.2020.05.16.15.51.39 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 16 May 2020 15:51:40 -0700 (PDT)
From: Bruno Rijsman <brunorijsman@gmail.com>
Message-Id: <869B95E0-99D2-495E-9D90-E4CCFE93E0A9@gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_3FFDA71B-67C3-47A8-A21C-BE3E7D169771"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Sat, 16 May 2020 16:51:37 -0600
In-Reply-To: <af31e987-3d57-b90f-e12b-c63e694382d6@os.uniroma3.it>
Cc: "tommasocaiazzi@gmail.com" <tommasocaiazzi@gmail.com>, rift@ietf.org
To: Mariano Scazzariello <mscazzariello@os.uniroma3.it>
References: <4448e295-b6c3-d826-92db-1dd1ee7d5996@os.uniroma3.it> <db500ae6-12e0-b7a3-5f04-4400a4d2c384@os.uniroma3.it> <A04E3EEF-EC22-4A9D-89FF-238D77F574D9@gmail.com> <B1E9CAC8-D4ED-4712-ABCE-595C562A1420@gmail.com> <fc9abc89-b73c-d43e-c069-8fbdc666cd2f@os.uniroma3.it> <6A19CEF0-606E-4CE9-9459-A7CB46BE59EA@gmail.com> <148a7b61-7dd0-ab67-9c77-61e337cdf953@os.uniroma3.it> <732175B4-C269-4B44-BFA6-0D6118436CE0@gmail.com> <VI1PR04MB531289F7C4D22B0C385D6980C1A50@VI1PR04MB5312.eurprd04.prod.outlook.com> <9C5C0379-5277-4D38-BC31-E1898C8A2B18@gmail.com> <F861539E-1C0B-4ECF-9DC5-CE9C0BC276A2@gmail.com> <d0b6be4c-ccb7-f3d6-bd03-530bfeb9141f@os.uniroma3.it> <A334FA3A-173D-4242-B3DD-0448C8C948EE@gmail.com> <0E36293F-96E0-4994-ADFC-A4A819C98153@gmail.com> <f2832f67-7453-8819-20ce-08af56a1dbcf@os.uniroma3.it> <f39dab64-d80a-cbb6-35c8-9fc1d9529439@os.uniroma3.it> <9b3a2597-2e8d-6a15-1b23-894d0bf91d58@os.uniroma3.it> <d0f94342-a7e6-8c36-72e1-19260bd5d1a6@os.uniroma3.it> <af31e987-3d57-b90f-e12b-c63e694382d6@os.uniroma3.it>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/lMXuTkIDepi_81k25b7mKEyY50o>
Subject: Re: [Rift] RIFT strange behaviours discussed today
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 16 May 2020 22:51:45 -0000

Hi Mariano and Tomasso,


I have made two big changes to the way messages are queued in RIFT-Python. 

This code has been committed to the master branch.

So it is ready for you to test, just in time for the paper deadline. Although it is a bit risky to introduce such a large change with not much time to fully “soak it in” before the paper deadline. Also, starting Monday morning I will be unreachable for a few days.  I leave it up to you to decide whether or not you want to use this new code for the paper.


Big change #1

In the old code, the TIES_TX, TIES_ACK, and TIES_REQ queue were all serviced only once per second, no matter what. The TIES_RTX queue didn’t really do anything.

TIDEs were sent once per 2 seconds (that did not change).

In the new code, there are two queue for TIEs, TIE ACKs, and TIE REQs:

(a) A fast tx_queue for the initial transmission after the item is enqueued. It is serviced every 50ms. (With an optimization to make sure we don’t run such a fast timer unless it is really needed - i.e. unless there is at least one entry in some fast queue.)

(b) A slow rtf_queue for subsequent retransmission if needed. It is serviced every 1000ms.

See the new file msg_queues.py for details.


Big change #2

In the old code, if a node regenerated a local TIE for any reason, it was not immediately reflooded.

Instead we would have to wait up to 2 seconds for the regenerated TIE to be advertised in a TIDE, and then up to 1 second for the other side to request it, and the up to another 1 second for this node to react to the request.

I changed that: if a local TIE is regenerated for any reason, the TIE itself is immediately put on the fast TIE tx_queue, so it will be sent after 50ms.


These two changes combined make most of the problems that you saw go away.

You used see that a node would send a TIE, and then a second later the exact same version of that TIE (same tie-nr, same seq-nr).

This was because the TIRE was not sent fast enough to ack the TIE.

That behavior should go away — the TIRE is now sent much faster.

In general, convergence should be much faster.


You may see some other behavior that you did not see before.

If you kill a node in the topology, you may see some “additional intermediate states” because reconvergence is so much faster.

Consider for example, that node X is killed, and node X had adjacencies with Y1, Y2, Y3, Y4, …., Yn

Each of those neighbor nodes Y1, Y2, …., Yn will lose their adjacency with X, and reflood their local node TIE.

Now, consider that Y1, Y2, …., Yn are also all adjacent to node Z (this is quite common in Clos topologies).

So, Z is going to receive updated node TIEs from Y1, Y2, Y3, …., Yn.

And (this is the important part), Z may ALSO receive pos-disagg-prefix and neg-disagg-prefix TIEs from Y1, Y2, …., Yn.

Z receives these multiple pos-disagg-prefix and neg-disagg-prefix messages asynchronously.

That may cause Z to “change it mind” a few times about whether and what it should disaggregate itself, and that may cause Z to originate multiple different versions (i.e. sequence numbers) of its own pos-disagg-prefix or neg-disagg-prefix message in quick succession.

This will also have the effect to sending more messages than you might expect, but this is really different from the behavior you were seeing before.

Before you were seeing the identical TIE (same seq-nr) being sent multiple times.  I would consider that to be real bug.

Now you might see multiple versions of the same TIE (same tie-nr, different seq-nr) being sent multiple times.  I don’t consider that to be a bug.  Not a bug in the code at least.  It is just a consequence of the RIFT protocol reacting quickly to multiple adjacencies going down in quick succession after a node failure.


I have a gut feeling that this “hunting” behavior will be less if we do negative disaggregation everywhere (and no positive disaggregation).

I am adding a “disaggregation: negative-only” configuration knob to test that hypothesis.

Don’t try it out yet, the code for this new knob is not finished yet. 

I will update you when it is.


— Bruno








> On May 15, 2020, at 10:43 AM, Mariano Scazzariello <mscazzariello@os.uniroma3.it> wrote:
> 
> Hi Bruno,
> any news with the duplicated packets issue?
> 
> We just want to know that since 19th of May is the abstract submission deadline, so we should know if we can match it in time.
> Our "internal deadline" is Sunday 17th of May.
> 
> Thanks,
> Mariano and Tommaso.
> 
> Il 12/05/2020 13:14, Mariano Scazzariello ha scritto:
>> Hi Bruno,
>> sorry for the spamming :D.
>> 
>> Today me and Tommy further investigated on the problem that I reported yesterday. 
>> It seems that it is caused by TIEs sent multiple times by the same node.
>> We are not sure, we'll keep investigating, however we would like to have a feedback from you.
>> 
>> 
>> EXAMPLE OF MULTIPLE NEG DISAGG TIES
>> This example highlights what we saw yesterday during the call.
>> The first two packets are correct, since the ToFs send them to spine_2_1_1.  However after some time, the same ToFs resend the same neg disagg TIEs, in fact the comparison is 0 (in red).
>> 
>> Sender                    Receiver                 Originator     TIE Type                    In DB? If so, comparison result.
>> tof_1_2_1:if1         spine_2_1_1         121                 Neg-Dis-Prefix          No
>> tof_1_2_2:if1         spine_2_1_1         122                 Neg-Dis-Prefix          No
>> ....
>> tof_1_2_1:if1         spine_2_1_1         121                 Neg-Dis-Prefix         Yes, comparison is 0
>> tof_1_2_2:if1         spine_2_1_1         122                 Neg-Dis-Prefix         Yes, comparison is 0
>> ...
>> tof_1_2_2:if1         spine_2_1_1         122                 Neg-Dis-Prefix         Yes, comparison is 0
>> 
>> EXAMPLE OF MULTIPLE NODE TIES
>> Here we can see that the ToFs send to spine_2_1_1 a new node TIE that is stored and reflected correctly (in green). Strange TIEs are highlighted in red.
>> 
>> Sender                     Receiver                    Originator     TIE Type                   In DB? If so, comparison result.
>> tof_1_2_1:if1         spine_2_1_1             121                 Node                         Yes, comparison is -1
>> tof_1_2_2:if1         spine_2_1_1             122                 Node                         Yes, comparison is -1
>> spine_2_1_1:if2     tof_1_2_1                 122                 Node                         Yes, comparison is -1                        <- CORRECT REFLECTION
>> spine_2_1_1:if3     tof_1_2_2                 121                 Node                         Yes, comparison is -1                        <- CORRECT REFLECTION
>> ...
>> tof_1_2_1:if1         spine_2_1_1             121                 Node                         Yes, comparison is -1                       <- Spine receives Node TIE from tof_1_2_1 different from the one stored in the DB, should reflect to tof_1_2_2
>> tof_1_2_2:if1         spine_2_1_1             122                 Node                         Yes, comparison is -1                       <- Spine receives Node TIE from tof_1_2_2 different from the one stored in the DB, should reflect to tof_1_2_1
>> spine_2_1_1:if2     tof_1_2_1                 122                 Node                         Yes, comparison is 0                        <- Reflection of tof_1_2_2 TIE to tof_1_2_1. Why comparison is 0 on ToF, if it is -1 on spine (it reflects wrong TIE?)?
>> spine_2_1_1:if3     tof_1_2_2                 121                 Node                         Yes, comparison is 0                        <- Reflection of tof_1_2_1 TIE to tof_1_2_2. Why comparison is 0 on ToF, if it is -1 on spine (it reflects wrong TIE?)?
>> ...
>> tof_1_2_1:if1         spine_2_1_1             121                 Node                         Yes, comparison is 0                        <- tof_1_2_1 resends to spine_2_1_1 its own node TIE
>> tof_1_2_2:if1         spine_2_1_1             122                 Node                         Yes, comparison is 0                        <- tof_1_2_2 resends to spine_2_1_1 its own node TIE
>> 
>> Hope this is useful to hunt the problem!
>> Mariano and Tommaso.
>> 
>> Il 12/05/2020 00:55, Mariano Scazzariello ha scritto:
>>> Another little update, the last for today since it's 1AM :D
>>> 
>>> The final scenario is:
>>> tof_1_2_2 sends its node TIE to its southbound adjacencies (spine_2_1_1/spine_3_1_1/spine_4_1_1). Each spine reflects it to tof_1_2_1.
>>> tof_1_2_1 resends this TIE to spine_2_1_1/spine_3_1_1/spine_4_1_1. Spines bounce it back to tof_1_2_2.
>>> 
>>> So, the final question is: is that correct that the ToF re-sends southbound the node TIE received by southern reflection? Or is this a strange behaviour?
>>> 
>>> Good night,
>>> Mariano and Tommaso.
>>> 
>>> Il 12/05/2020 00:06, Mariano Scazzariello ha scritto:
>>>> Little update:
>>>> the southern reflection is working properly. What we're seeing is this behaviour (similar to n.3 of previous mail). As example:
>>>> 
>>>> After spine_1_1_1 failure, tof_1_2_1 sends a node TIE (with seq n.7, originated by tof_1_2_2) to spine_2_1_1/spine_3_1_1/spine_4_1_1. Spines bounce it back to tof_1_2_2.
>>>> 
>>>> It seems that tof_1_2_1 reflects something that is already coming from a reflection since originator is tof_1_2_2 (?) Is that possible? Also, is this correct that a TIE is reflected to the same node that originated it?
>>>> 
>>>> We'll keep you updated.
>>>> Mariano and Tommaso.
>>>> 
>>>> Il 11/05/2020 23:09, Mariano Scazzariello ha scritto:
>>>>> Hi Bruno,
>>>>> as disussed today we'll report the strange behaviours found in RIFT.
>>>>> 
>>>>> 1. ToFs/Spines sending more than one neg disagg TIE
>>>>> Steps to reproduce: build a FT with K=2 and R=1, after convergence destroy spine_1_1_1. This is the state after the failure (don't mind the numbers :D).
>>>>> <jdcclkbcdlgpfggo.png>
>>>>> 
>>>>> In the figure below, spine_3_1_1 interface 0 (connected to leaf_3_0_1) sends 2 neg disagg TIEs after failure time.
>>>>> 
>>>>> <jnilpccgldahgdmk.png>
>>>>> 
>>>>> 2. Southern Reflection bounces PDUs back to the sender
>>>>> 
>>>>> Same scenario as before, spine_4_1_1 interface 3 (connected to tof_1_2_2) bounces back packets to tof_1_2_2 instead of sending them to tof_1_2_1
>>>>> <fgffmbbocoomlknk.png>
>>>>> UPDATE: We found that, for some reason, spine_3_1_1 sends reflection correctly. Interface 3 (connected to tof_1_2_2) sends TIEs coming from tof_1_2_1:
>>>>> <hbodnfifioifnhng.png>
>>>>> Me and Tommaso are investigating on that, however we have some difficulties on finding the code that reflects TIEs. Is the `unsol_flood_tie_packet_info` in Node class, right?
>>>>> 
>>>>> 3. Another strange behaviour?
>>>>> 
>>>>> tof_1_2_2 sends to spine_2_1_1 some Node TIEs originated by tof_1_2_1, why?
>>>>> <poionlajflebeigb.png>
>>>>> This screen wraps both the duplicated packets problem (229-233) and packet 228 is the "strange one".
>>>>> 
>>>>> Hope to hear from you soon,
>>>>> Mariano and Tommaso.
>>>>>