Re: [Rift] Negative disaggregation feature guide

Bruno Rijsman <brunorijsman@hotmail.com> Sun, 17 May 2020 23:02 UTC

Return-Path: <brunorijsman@hotmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8694A3A0840 for <rift@ietfa.amsl.com>; Sun, 17 May 2020 16:02:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.199
X-Spam-Level:
X-Spam-Status: No, score=-0.199 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=hotmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XC165vjsMrD2 for <rift@ietfa.amsl.com>; Sun, 17 May 2020 16:02:26 -0700 (PDT)
Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10olkn2015.outbound.protection.outlook.com [40.92.41.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 270373A0844 for <rift@ietf.org>; Sun, 17 May 2020 16:02:26 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kZaZlUqwnLFnk1E2l2HkoMan6SAzvviyMSMrEf6bngK7dOoRkaoaiwFIeMlsiYEPQet0jOZ512ft3sfWOMxOVXCNIpE1WTBmePSZACOYZEXO4ZviO8nNvJws2W4+A/XNX9LiUcVN7h+lgx+gX6Wd98Q5/5sgRNy0HGONFZIxZrUK4OWA3M2ar2h4FsyDbUCPJ4vw8fwfWUz6lu96TMoSlStFR8xdEvuUOLRgD7M9HK6K+fhIQ+kZkgdgnHdtu62xgt7aaCy7ZKsiVXOrwHhlBRX1xLP4/aNtmFMmNvFxOZp6trcv2G4n35EMPcViUlqe68zFgQTZ4XAQmXCs/68zaw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PfpGMHo3zmr+bVcgWTqvgPxrU/F7gWYRER7XlnDFTTI=; b=aC93PC8GTIVKBMlRm8skFeW98ejyyrvvZyCzM1Z34Klw8sxwhqq/kJ7SXdTEtHMyt1O7NyvlkaMpHmpb2ps556UNejRh5wtgvQF1vJtZ9c0Nrm+FyDkPh10956UeTjF+8DfzelmuW+nb+Ntqk7cnaJGb1ftfrLTYfC5hs1rv+xJwH3QeS4eWWzXiCO0IsMsVgaS8J8321M71u8UkBZh/j6RYTY7NZjcNXxmfiHYhy9r8ISJWnHJ9ymWFXmHevwHNriWf1Bbh2VRrog+IcTL3EAwfIbMmRWGvQG1taHgF7snMzcGQUD5NQVeZJFrqNWZOv+fBDNhcTIiyR9EH+OciHw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PfpGMHo3zmr+bVcgWTqvgPxrU/F7gWYRER7XlnDFTTI=; b=gZrx3HjaBRnPC9LpwSbLTCj+7lYsMAvIO4KfnZ04tlNsx1iBo0eRbtdBQIIn/8Pm1D/iojic78Yys6Xu6DNFCYSO22I7o/OQN7HvalcDbwFrOXkvxFbRRwzQfzNFWx4goRZZlz63gYe3hHmlgUlB+eiwxqxXhbbCk8b2ZFMv4+tOExtom6zUkOSdGdK1LM6i5iMvb3McUbbQB5lGxe2xZq3jKgXih5MeKBGWu1AyiKThoL7tgj7u+dKwq7HW57SuFsyzy0jndsfiFRI7y1DpMl0bUbWVIEYZCfuSlWv1eK3iWZNfgGj5Q+EX9L10YUvxc4VEdUaMaBFL+rsE9WX+8g==
Received: from MW2NAM10FT034.eop-nam10.prod.protection.outlook.com (2a01:111:e400:7e87::42) by MW2NAM10HT126.eop-nam10.prod.protection.outlook.com (2a01:111:e400:7e87::233) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3000.19; Sun, 17 May 2020 23:02:04 +0000
Received: from BYAPR06MB5845.namprd06.prod.outlook.com (2a01:111:e400:7e87::41) by MW2NAM10FT034.mail.protection.outlook.com (2a01:111:e400:7e87::231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3000.19 via Frontend Transport; Sun, 17 May 2020 23:02:04 +0000
Received: from BYAPR06MB5845.namprd06.prod.outlook.com ([fe80::7033:3bf:78c4:3ef9]) by BYAPR06MB5845.namprd06.prod.outlook.com ([fe80::7033:3bf:78c4:3ef9%4]) with mapi id 15.20.3000.033; Sun, 17 May 2020 23:02:04 +0000
From: Bruno Rijsman <brunorijsman@hotmail.com>
To: Antoni Przygienda <prz=40juniper.net@dmarc.ietf.org>
CC: Tony Przygienda <tonysietf@gmail.com>, "rift@ietf.org" <rift@ietf.org>
Thread-Topic: [Rift] Negative disaggregation feature guide
Thread-Index: AQHWKXf3Q/xs7t0/3UuNVqLdtIM2cqimncCAgACc0dCAAHCAYIAADK8AgAAA4+CAACjmgIAAAYyAgAAEcYCABQPlgA==
Date: Sun, 17 May 2020 23:02:04 +0000
Message-ID: <4E305EBE-1287-44AA-AF26-66152576E56E@hotmail.com>
References: <068412A2-1E85-4327-A50E-F6138C6D7EC0@hotmail.com> <9E819710-00BC-4285-9146-F655CAA7E1CA@hotmail.com> <MN2PR11MB3565EF85007A2737846E962CD8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <MN2PR11MB35653D1AB0505A0D36522E74D8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <432D1BE9-8BC3-4627-ACA5-72AA52A79C8C@hotmail.com> <MN2PR11MB3565DB833B52E9C0381343FCD8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <4272B942-2C6F-4974-8515-E295A6BFF757@hotmail.com> <CA+wi2hO=3U==_xLLhuCyTAKCa7cNqWRHzgwXgJVJSTOvemOnMw@mail.gmail.com> <A967D4CC-107D-4707-A277-0BE8394576FB@juniper.net>
In-Reply-To: <A967D4CC-107D-4707-A277-0BE8394576FB@juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-incomingtopheadermarker: OriginalChecksum:3E1A903D197DC33A899135056AA02D2B6A0A46E007C4784F766550FA1DBE7CB1; UpperCasedChecksum:CFEAAE53248DA9CBB61E00E7098203BD83A64CA298FA569C2C6AF75C5CD4E071; SizeAsReceived:7516; Count:46
x-ms-exchange-messagesentrepresentingtype: 1
x-tmn: [TKy8oax3YarzIpRF1NmQ9GjQiDi91ukf]
x-ms-publictraffictype: Email
x-incomingheadercount: 46
x-eopattributedmessage: 0
x-ms-office365-filtering-correlation-id: 93960292-87b2-42c9-7004-08d7fab650c8
x-ms-traffictypediagnostic: MW2NAM10HT126:
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: eDd1loEzX9C1ChDezNY5ZwP/ekONRfffutqTmkNaYbEOwHWCD/k3tDQJZBgHZmVpY8u8OD+LxIrc3TSFtkg8gV3qJueWgl+hWGKOicRms79mFBoAn9jz8w9s0DJL4AJ0v1EvV8ngBsDg4/ahmbP+o5LtHf2joo8vXoJhNrFwPpGQKEMKmE3i/mpUMBXO3RuYSoQHXzZZJhTl/iICXQb0wza4CDNAWQm+WEy4dbYOInlg45FwTbAgPBgPHb4FycxE
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:0; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR06MB5845.namprd06.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:; DIR:OUT; SFP:1901;
x-ms-exchange-antispam-messagedata: o5/wLXTFXsJPrZgDXebDzwgJVAxeiSC+XnFmyTavWr1T7k/wAa1Xck9NbvFyQQ5XjqizDp47MojbC+gCRZ9ASkxXQDUghk9s5epYUMnuXVtCQELchugc5PNWUli7yEtkk9+ZWokjO4S/CeXI7zDhHw==
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_4E305EBE128744AAAF2666152576E56Ehotmailcom_"
MIME-Version: 1.0
X-OriginatorOrg: hotmail.com
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-CrossTenant-Network-Message-Id: 93960292-87b2-42c9-7004-08d7fab650c8
X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 May 2020 23:02:04.2019 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Internet
X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2NAM10HT126
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/SEYbhGCM7BU7Uks352qSzkLZY5M>
Subject: Re: [Rift] Negative disaggregation feature guide
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 May 2020 23:02:30 -0000

Hi Tony,

I ran into various complications and a road-block while working on the “disaggregation negative-only” knob in RIFT-python, so I will be ripping that knob out of my code.

Instead I will stick to the three commandments as handed down on the stone tablet by the wise RIFT gods from above :-)

On May 14, 2020, at 12:26 PM, Antoni Przygienda <prz=40juniper.net@dmarc.ietf.org<mailto:prz=40juniper.net@dmarc.ietf.org>> wrote:

In a probably more detailed way with a little stone tablet I suggest 😉


  1.  You SHALL negatively disaggregate only if you’re ToF and have horizontal links (ring)
  2.  You SHALL propagate transitively negative disaggregation
  3.  You SHALL use positive if you’re not ToF or have no horizontal


Of course, any vendor, even open sourced one can add any flavor of knobs that makes their dish unique even if it breaks the spec strictly speaking … So did I 😉

--- tony

From: Tony Przygienda <tonysietf@gmail.com<mailto:tonysietf@gmail.com>>
Date: Thursday, May 14, 2020 at 11:12 AM
To: Bruno Rijsman <brunorijsman@hotmail.com<mailto:brunorijsman@hotmail.com>>
Cc: "Pascal Thubert (pthubert)" <pthubert@cisco.com<mailto:pthubert@cisco.com>>, "rift@ietf.org<mailto:rift@ietf.org>" <rift@ietf.org<mailto:rift@ietf.org>>, Melchior Aelmans <maelmans@juniper.net<mailto:maelmans@juniper.net>>, Christian Graf <cgraf@juniper.net<mailto:cgraf@juniper.net>>, Oliver Steudler <osteudler@juniper.net<mailto:osteudler@juniper.net>>, Olivier Vandezande <ovandezande@juniper.net<mailto:ovandezande@juniper.net>>, Antoni Przygienda <prz@juniper.net<mailto:prz@juniper.net>>, "tommasocaiazzi@gmail.com<mailto:tommasocaiazzi@gmail.com>" <tommasocaiazzi@gmail.com<mailto:tommasocaiazzi@gmail.com>>, Jeff Tantsura <jefftant.ietf@gmail.com<mailto:jefftant.ietf@gmail.com>>, Zhaohui Zhang <zzhang@juniper.net<mailto:zzhang@juniper.net>>, Zhaohui Zhang <zzhang@juniper.net<mailto:zzhang@juniper.net>>, Jeffrey Zhang <zzhang2003@gmail.com<mailto:zzhang2003@gmail.com>>
Subject: Re: Negative disaggregation feature guide

[External Email. Be cautious of content]

 my thoughts:

negative is seriously more complex to  implement and understand operationally and only needed on multi-plane fabrics, taht's why positive is KISS ;-) Also, negative always forces you to ring the top of the fabric which in single plane design is an unnecessary requirement.

I foresee vendors not implementing negative for simplicity, very small footprint fabrics.

the discussion about capability advertisement is interesting, we can just add one bit on the node capabilities element, make it optional and default true.

Of course an implementation taht understands that all involved nodes understand negative is free to use negative instead of positive then.

-- tony

On Thu, May 14, 2020 at 11:05 AM Bruno Rijsman <brunorijsman@hotmail.com<mailto:brunorijsman@hotmail.com>> wrote:
Broadening a discussion on negative disaggregation to the RIFT mailing list.

Now that we have implemented negative disaggregation in RIFT-Python, and now that we are getting some operational experience with it, the following is becoming more and more evident:

It seems to us (Bruno and Pascal, for now) that once negative disaggregation is implemented, there is really no need for positive disaggregation any more.

Every use case that can be solved by positive disaggregation can (as far as we can currently tell) also be solved by negative disaggregation.

Furthermore, negative disaggregation solves the problems in a far simpler and more elegant way: there are fewer advertising nodes and fewer advertised TIEs involved, and we don’t have the synchronization issue that positive disaggregation has that potentially causes incast problems.

Thus, at the very least, it makes sense to recommend that in a given fabric we use either only negative disaggregation only or positive disaggregation only, with negative disaggregation being the default.

If positive and negative disaggregation are enabled simultaneously in one and the same fabric, everything still works fine as far as we can see. But there are some “interesting” interactions that make things unnecessarily complex and potentially fragile.

If support for negative disaggregation is mandatory, and if our assessment that negative disaggregation can solve all use cases is correct, then we could go one bold step further and completely remove positive disaggregation from the specification.

Thoughts?

— Bruno


On May 14, 2020, at 10:48 AM, Pascal Thubert (pthubert) <pthubert@cisco.com<mailto:pthubert@cisco.com>> wrote:

Hello Bruno:

I like your negative-only default because it is probably the safest. Let me elaborate..

partitioning saves a lot more ports per ToF node than the ring cost so it does not matter.

If you enable negative, there’s no point in doing any positive at all, is there? So what do we try to save by combining? In your example, I guess that super 1 2 had to retract the positive disag and then do negative instead when the link to spine 1 1 fell. Was that not a bit complex and error prone?

Also I’m not clear how we can always decide that dynamically and I do not believe that we specified that. So I understand that it’s more like a use case thingy, like a configuration that would be adapted to the use case.

=> do not configure both at the same time use either positive or negative disag.

We do not try to do positive disag transitively. So if there is a need for transitive, you have to use negative.  In a multiplane case, there’s usually (as in your picture) one ToP per plane in each PoD, and a first link failure (say Spine 1 1 to leaf 1 1) can already cause a fallen leaf: leaf 1 1 is no more reachable within plane 1. This tells you that all leaves in the other PoDs must avoid that plane, and that’s a leaf decision; which tells you that you need to recurse transitively down. This indicates that negative must be activated in a highly partitioned (meaning low redundancy) multiplane.

Say we try to use positive disaggregation transitively in a ToF that is partitioned like in planes, you’d run a risk that none of the ToF nodes that can still reach the leaf can see (through south reflection) the ToF node that cannot so they do not know they need to disag. So you end up needing the same ring as for negative, the southern reflection become mostly useless and the benefits of positive are gone.

Partitionned ToF => negative

I trust that we can use positive in a highly redundant and symmetrical non-partitioned ToF. If no one can reach the leaf anymore (like it’s dead) we have no issue since it should not be disaggregated, just use the discards route at any ToF node. If you have enough redundancy and no partition,  there will always be nodes that can both reach a leaf and discover that a peer does not.

=> Positive is indicated for a very specific use case, ideally 2 levels and fully meshed. Maybe that’s enough deployments to justify the feature in the code.

So the question is really that blurry zone between highly redundant single plane and highly partitioned multiplane what should one do? I becomes a risk/benefit judgement… If negative is implemented and the ring is there, I’d use it, better safe than sorry.

Take care

Pascal



From: Bruno Rijsman <brunorijsman@hotmail.com<mailto:brunorijsman@hotmail.com>>
Sent: jeudi 14 mai 2020 17:36
To: Pascal Thubert (pthubert) <pthubert@cisco.com<mailto:pthubert@cisco.com>>
Cc: Melchior Aelmans <maelmans@juniper.net<mailto:maelmans@juniper.net>>; Christian Graf <cgraf@juniper.net<mailto:cgraf@juniper.net>>; Oliver Steudler <osteudler@juniper.net<mailto:osteudler@juniper.net>>; Olivier Vandezande <ovandezande@juniper.net<mailto:ovandezande@juniper.net>>; Tony Przygienda <tonysietf@gmail.com<mailto:tonysietf@gmail.com>>; Antoni Przygienda <prz=40juniper.net@dmarc.ietf.org<mailto:prz=40juniper.net@dmarc.ietf.org>>; tommasocaiazzi@gmail.com<mailto:tommasocaiazzi@gmail.com>; Jeff Tantsura <jefftant.ietf@gmail.com<mailto:jefftant.ietf@gmail.com>>; Jeffrey (Zhaohui) Zhang <zzhang@juniper.net<mailto:zzhang@juniper.net>>; Jeffrey (Zhaohui) Zhang <zzhang=40juniper.net@dmarc.ietf.org<mailto:zzhang=40juniper.net@dmarc..ietf.org>>; Jeffrey Zhang <zzhang2003@gmail.com<mailto:zzhang2003@gmail.com>>
Subject: Re: Negative disaggregation feature guide


On May 14, 2020, at 8:53 AM, Pascal Thubert (pthubert) <pthubert@cisco.com<mailto:pthubert@cisco.com>> wrote:

This does not show because on your first breakage you use the positive disag. If you stick to the logic you used at the beginning, that is pick an example where positive applies but use negative, then on your first breakage you’d show that the negative does not need to go transitively to the leaf, because the spine nodes still have solutions. Is that a lot of work to change?

This is exactly what I was thinking as well.

Maybe I need to go through the current specification with a fine-toothed comb again, but my (possibly incorrect) understanding is that the current version of the specification is that we positive disaggregation is used (MUST be used? SHOULD be used?) in the “first failure” scenario.

Personally, I see no reason why we would not be able to use negative disaggregation instead of positive disaggregation in the first failure scenario as well.

In the context of the tutorial, this would have the advantage of clearly demonstrating the propagation logic more clearly.

But more importantly, it seems to me that once negative disaggregation is implemented, there is really no reason to use positive disaggregation anywhere. It would be “cleaner” (less signaling) and “better” (no incast) to simply use negative disaggregation everywhere.

So…. I was thinking to add a configuration knob to my code “positive-only”, “negative-only”, “positive-and-negative”, with “negative-only” being the default? (Seems like a waste to have coded up the positive disaggregation).

What does everyone think of that?

Should we go even further and simply remove positive disaggregation from the spec?

Should we have negative disaggregation as a capability in the capability negotiation? (Seems non-sensical, given it is something that the whole fabric needs to support or not.)

Should we take this to the RIFT mailing list?

— Bruno

_______________________________________________
RIFT mailing list
RIFT@ietf.org<mailto:RIFT@ietf.org>
https://www.ietf.org/mailman/listinfo/rift