Re: [Rift] Why negative disaggregation cannot replace positive disaggregation in all cases.

Tony Przygienda <tonysietf@gmail.com> Mon, 18 May 2020 15:47 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 275973A08AC for <rift@ietfa.amsl.com>; Mon, 18 May 2020 08:47:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.903
X-Spam-Level:
X-Spam-Status: No, score=0.903 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TzRdOEi33U4V for <rift@ietfa.amsl.com>; Mon, 18 May 2020 08:47:28 -0700 (PDT)
Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 948B63A089D for <rift@ietf.org>; Mon, 18 May 2020 08:47:28 -0700 (PDT)
Received: by mail-io1-xd2d.google.com with SMTP id w25so11006621iol.12 for <rift@ietf.org>; Mon, 18 May 2020 08:47:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wnmATiz8v7KojM0WtGyLptTMPF909WZ79WKnD5ensy0=; b=PZRVNkKeWQ+/yoVXmUew0rcoO68vh582IPsc8gL6TRHPzEd8nKK6TKxC4MBtwv2MIU ceDY7z7LZQOJAo/k/mVC8RaQFobt5MENtm039eMLL12Z9QGdfYWCN0vkhtlm0YrqW1we WH6TRBEbcYuoESYOHUP6dLInSYw9PfD7xmLbuzJou2P6fHuBCTFxviTciU2DaaPSPdJw 7uYJSLphjddzYxx2HSWSmJlUkoywAPZQDG/1XhTAgbNRjRgnA34PF8fyZjeNsq15lFeP tBf6Y4lBbUZAJnfPbTOHvpNiIAU9kI66VNDno3vzQnKQDHeibICNvILqA1ogBuZHIW61 FlEQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wnmATiz8v7KojM0WtGyLptTMPF909WZ79WKnD5ensy0=; b=E7WjxvaBjSU4g65hZd0mYJHDqIiDoEZW8gdl6SdJcVzfhn7ldk9N+DSlZ5NYKqKgaX llh46SbousR/Z9+bs23twn7v2RiMEIHs+HS4AtZC70eA6z8I3tcHpdw5uRDa24Em90Zg BzXUZsJKES4keXmJIbfmkzmqdGQWBRe71zB3FdcPYjhYO0pPNN/LUMgAFpwRT5TSoeBs mmobCjZIetZxSJBIzHVpcVaM/S+NdG9fLEUahs6wH9Qba7t/1Aa9lFCgshyLPQNBm234 byhIkvHe4t7aQ4cZ99qvrscFcyHPI36Q5T1z6OlYOjJk8hUwowNaoFmftWOmdPdcYnSS KA/A==
X-Gm-Message-State: AOAM5300XySNYLz+E+i49Obefl3v9YlWAIWnOdLU1PSafb9fox/+tSH5 z2+mx+NimQKX3M5i+55fK5nuSSIA/B47v5kb4aabIYFK
X-Google-Smtp-Source: ABdhPJzJtxLscBwNQTzkOTwF9Jn7zOOe77+2QroG4iJEmK2JazsxWFqUl7v98WQ8ZnXosvREutI2TAnGz/gVf4dNFm8=
X-Received: by 2002:a02:a58b:: with SMTP id b11mr16103496jam.56.1589816846120; Mon, 18 May 2020 08:47:26 -0700 (PDT)
MIME-Version: 1.0
References: <068412A2-1E85-4327-A50E-F6138C6D7EC0@hotmail.com> <9E819710-00BC-4285-9146-F655CAA7E1CA@hotmail.com> <MN2PR11MB3565EF85007A2737846E962CD8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <MN2PR11MB35653D1AB0505A0D36522E74D8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <432D1BE9-8BC3-4627-ACA5-72AA52A79C8C@hotmail.com> <MN2PR11MB3565DB833B52E9C0381343FCD8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <4272B942-2C6F-4974-8515-E295A6BFF757@hotmail.com> <CA+wi2hO=3U==_xLLhuCyTAKCa7cNqWRHzgwXgJVJSTOvemOnMw@mail.gmail.com> <A967D4CC-107D-4707-A277-0BE8394576FB@juniper.net> <4E305EBE-1287-44AA-AF26-66152576E56E@hotmail.com> <DC8B77F8-CC2D-4C66-BAFA-D821D473D243@cisco.com> <B437D814-7C03-467B-9099-B8E925959DA7@juniper.net> <2D5113CF-444D-4759-8B98-4ABC0C46CDAC@hotmail.com>
In-Reply-To: <2D5113CF-444D-4759-8B98-4ABC0C46CDAC@hotmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Mon, 18 May 2020 08:45:46 -0700
Message-ID: <CA+wi2hNuWKM-6fXTTDeUfcA2Y_8DQB7P9ffRF9B2BMjnkaaJsg@mail.gmail.com>
To: Bruno Rijsman <brunorijsman@hotmail.com>
Cc: "Pascal Thubert (pthubert)" <pthubert=40cisco.com@dmarc.ietf.org>, "rift@ietf.org" <rift@ietf.org>, Antoni Przygienda <prz@juniper.net>
Content-Type: multipart/alternative; boundary="0000000000005eff0005a5ee14d3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/yVgeWf1c5xUYiLk3MXyffpvJBr8>
Subject: Re: [Rift] Why negative disaggregation cannot replace positive disaggregation in all cases.
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 May 2020 15:47:32 -0000

chairs, if we start a monthly again I think we need interim framework to be
in line here ...

-- tony

On Mon, May 18, 2020 at 7:25 AM Bruno Rijsman <brunorijsman@hotmail.com>
wrote:

> Hi Pascal,
>
> Yes, we can do a monthly.
>
> For now, here is the issue that I ran into that makes me believe that
> negative disaggregation cannot replace positive disaggregation in all cases
> (specifically not for the “normal” case of non-tof routers).
>
> Here is a simplified summary of how positive disaggregation works (see
> section 4.2.5.1 for details):
>
> If this router R1 has a "direct route" for prefix P, and that route P has
> a set of next-hop {NH}, and there is some other router R2 at the same level
> as R1 that does not have NH as a south-adjacency for each NH in {NH}, then
> this router R1 must positively disaggregate prefix P: this router R1 must
> "attract" the traffic so that it doesn't go to the other router R2 (who
> cannot delivery the traffic to P).
>
> Now let's try to the the reverse thing with negative disaggregation:
>
> If some other router R2 at the same level as R1, and R2 has a "direct
> route" for prefix P, and that route P has some set of next-hops {NH}, and
> this router R1 does not have NH as a south-adjacency for each NH in {NH},
> then this router R1 must negatively disaggregate prefix P: this router R1
> must "repel" the traffic so that it goes to R2 instead of R1.
>
> The problem that this requires router R1 to “ now” what the set of routes
> is that router R2 has, and to know the next-hops for those routes. But R1
> cannot know/calculate the "direct routes" that R2 has because it doesn't
> know which north-TIEs R2 has in its database. So, I don’t think we can do
> negative disaggregation here...
>
> Am I correct or am I missing something?
>
> — Bruno
>
>
> On May 18, 2020, at 12:28 AM, Antoni Przygienda <prz@juniper.net> wrote:
>
> Agreed. Let’s set up a monthly
>
>
>    - Tony
>
>
> *From: *"Pascal Thubert (pthubert)" <pthubert=40cisco.com@dmarc.ietf.org>
> *Date: *Sunday, May 17, 2020 at 10:15 PM
> *To: *Bruno Rijsman <brunorijsman@hotmail.com>
> *Cc: *Antoni Przygienda <prz@juniper.net>, "rift@ietf.org" <rift@ietf.org>,
> Tony Przygienda <tonysietf@gmail.com>
> *Subject: *Re: [Rift] Negative disaggregation feature guide
>
> *[External Email. Be cautious of content]*
>
> This is quite interesting, Bruno.
>
>
> Does that mean that there are issues running negative in stand alone? I’d
> be really interested if we have a monthly or something one of these days.
> Also there’s the section on disaggregation in the applicability draft that
> we may enrich
> Many thanks for all your efforts!
>
> Pascal
>
>
> Le 18 mai 2020 à 01:02, Bruno Rijsman <brunorijsman@hotmail.com> a écrit :
>
> Hi Tony,
>
> I ran into various complications and a road-block while working on the
> “disaggregation negative-only” knob in RIFT-python, so I will be ripping
> that knob out of my code.
>
> Instead I will stick to the three commandments as handed down on the stone
> tablet by the wise RIFT gods from above :-)
>
>
> On May 14, 2020, at 12:26 PM, Antoni Przygienda <
> prz=40juniper.net@dmarc.ietf.org> wrote:
>
> In a probably more detailed way with a little stone tablet I suggest 😉
>
>
>    1. You SHALL negatively disaggregate only if you’re ToF and have
>    horizontal links (ring)
>    2. You SHALL propagate transitively negative disaggregation
>    3. You SHALL use positive if you’re not ToF or have no horizontal
>
>
> Of course, any vendor, even open sourced one can add any flavor of knobs
> that makes their dish unique even if it breaks the spec strictly speaking …
> So did I 😉
>
> --- tony
>
> *From: *Tony Przygienda <tonysietf@gmail.com>
> *Date: *Thursday, May 14, 2020 at 11:12 AM
> *To: *Bruno Rijsman <brunorijsman@hotmail.com>
> *Cc: *"Pascal Thubert (pthubert)" <pthubert@cisco.com>, "rift@ietf.org" <
> rift@ietf.org>, Melchior Aelmans <maelmans@juniper.net>, Christian Graf <
> cgraf@juniper.net>, Oliver Steudler <osteudler@juniper.net>, Olivier
> Vandezande <ovandezande@juniper.net>, Antoni Przygienda <prz@juniper.net>,
> "tommasocaiazzi@gmail.com" <tommasocaiazzi@gmail.com>, Jeff Tantsura <
> jefftant.ietf@gmail.com>, Zhaohui Zhang <zzhang@juniper.net>, Zhaohui
> Zhang <zzhang@juniper.net>, Jeffrey Zhang <zzhang2003@gmail.com>
> *Subject: *Re: Negative disaggregation feature guide
>
> *[External Email. Be cautious of content]*
>
>  my thoughts:
>
> negative is seriously more complex to  implement and understand
> operationally and only needed on multi-plane fabrics, taht's why positive
> is KISS ;-) Also, negative always forces you to ring the top of the fabric
> which in single plane design is an unnecessary requirement.
>
> I foresee vendors not implementing negative for simplicity, very small
> footprint fabrics.
>
> the discussion about capability advertisement is interesting, we can just
> add one bit on the node capabilities element, make it optional and default
> true.
>
> Of course an implementation taht understands that all involved nodes
> understand negative is free to use negative instead of positive then.
>
> -- tony
>
> On Thu, May 14, 2020 at 11:05 AM Bruno Rijsman <brunorijsman@hotmail.com>
> wrote:
>
> Broadening a discussion on negative disaggregation to the RIFT mailing
> list.
>
> Now that we have implemented negative disaggregation in RIFT-Python, and
> now that we are getting some operational experience with it, the following
> is becoming more and more evident:
>
> It seems to us (Bruno and Pascal, for now) that once negative
> disaggregation is implemented, there is really no need for positive
> disaggregation any more.
>
> Every use case that can be solved by positive disaggregation can (as far
> as we can currently tell) also be solved by negative disaggregation.
>
> Furthermore, negative disaggregation solves the problems in a far simpler
> and more elegant way: there are fewer advertising nodes and fewer
> advertised TIEs involved, and we don’t have the synchronization issue that
> positive disaggregation has that potentially causes incast problems.
>
> Thus, at the very least, it makes sense to recommend that in a given
> fabric we use either only negative disaggregation only or positive
> disaggregation only, with negative disaggregation being the default.
>
> If positive and negative disaggregation are enabled simultaneously in one
> and the same fabric, everything still works fine as far as we can see. But
> there are some “interesting” interactions that make things unnecessarily
> complex and potentially fragile.
>
> If support for negative disaggregation is mandatory, and if our assessment
> that negative disaggregation can solve all use cases is correct, then we
> could go one bold step further and completely remove positive
> disaggregation from the specification.
>
> Thoughts?
>
> — Bruno
>
>
>
> On May 14, 2020, at 10:48 AM, Pascal Thubert (pthubert) <
> pthubert@cisco.com> wrote:
>
> Hello Bruno:
>
> I like your negative-only default because it is probably the safest. Let
> me elaborate..
>
> partitioning saves a lot more ports per ToF node than the ring cost so it
> does not matter.
>
> If you enable negative, there’s no point in doing any positive at all, is
> there? So what do we try to save by combining? In your example, I guess
> that super 1 2 had to retract the positive disag and then do negative
> instead when the link to spine 1 1 fell. Was that not a bit complex and
> error prone?
>
> Also I’m not clear how we can always decide that dynamically and I do not
> believe that we specified that. So I understand that it’s more like a use
> case thingy, like a configuration that would be adapted to the use case.
>
> => do not configure both at the same time use either positive or negative
> disag.
>
> We do not try to do positive disag transitively. So if there is a need for
> transitive, you have to use negative.  In a multiplane case, there’s
> usually (as in your picture) one ToP per plane in each PoD, and a first
> link failure (say Spine 1 1 to leaf 1 1) can already cause a fallen leaf:
> leaf 1 1 is no more reachable within plane 1. This tells you that all
> leaves in the other PoDs must avoid that plane, and that’s a leaf decision;
> which tells you that you need to recurse transitively down. This indicates
> that negative must be activated in a highly partitioned (meaning low
> redundancy) multiplane.
>
> Say we try to use positive disaggregation transitively in a ToF that is
> partitioned like in planes, you’d run a risk that none of the ToF nodes
> that can still reach the leaf can see (through south reflection) the ToF
> node that cannot so they do not know they need to disag. So you end up
> needing the same ring as for negative, the southern reflection become
> mostly useless and the benefits of positive are gone.
>
> Partitionned ToF => negative
>
> I trust that we can use positive in a highly redundant and symmetrical
> non-partitioned ToF. If no one can reach the leaf anymore (like it’s dead)
> we have no issue since it should not be disaggregated, just use the
> discards route at any ToF node. If you have enough redundancy and no
> partition,  there will always be nodes that can both reach a leaf and
> discover that a peer does not.
>
> => Positive is indicated for a very specific use case, ideally 2 levels
> and fully meshed. Maybe that’s enough deployments to justify the feature in
> the code.
>
> So the question is really that blurry zone between highly redundant single
> plane and highly partitioned multiplane what should one do? I becomes a
> risk/benefit judgement… If negative is implemented and the ring is there,
> I’d use it, better safe than sorry.
>
> Take care
>
> Pascal
>
>
>
> *From:* Bruno Rijsman <brunorijsman@hotmail.com>
> *Sent:* jeudi 14 mai 2020 17:36
> *To:* Pascal Thubert (pthubert) <pthubert@cisco.com>
> *Cc:* Melchior Aelmans <maelmans@juniper.net>; Christian Graf <
> cgraf@juniper.net>; Oliver Steudler <osteudler@juniper.net>; Olivier
> Vandezande <ovandezande@juniper.net>; Tony Przygienda <tonysietf@gmail.com>;
> Antoni Przygienda <prz=40juniper.net@dmarc.ietf.org>;
> tommasocaiazzi@gmail.com; Jeff Tantsura <jefftant.ietf@gmail.com>;
> Jeffrey (Zhaohui) Zhang <zzhang@juniper.net>; Jeffrey (Zhaohui) Zhang <
> zzhang=40juniper.net@dmarc.ietf.org <zzhang=40juniper.net@dmarc..ietf.org>>;
> Jeffrey Zhang <zzhang2003@gmail.com>
> *Subject:* Re: Negative disaggregation feature guide
>
>
>
> On May 14, 2020, at 8:53 AM, Pascal Thubert (pthubert) <pthubert@cisco.com>
> wrote:
>
> This does not show because on your first breakage you use the positive
> disag. If you stick to the logic you used at the beginning, that is pick an
> example where positive applies but use negative, then on your first
> breakage you’d show that the negative does not need to go transitively to
> the leaf, because the spine nodes still have solutions. Is that a lot of
> work to change?
>
>
> This is exactly what I was thinking as well.
>
> Maybe I need to go through the current specification with a fine-toothed
> comb again, but my (possibly incorrect) understanding is that the current
> version of the specification is that we positive disaggregation is used
> (MUST be used? SHOULD be used?) in the “first failure” scenario.
>
> Personally, I see no reason why we would not be able to use negative
> disaggregation instead of positive disaggregation in the first failure
> scenario as well.
>
> In the context of the tutorial, this would have the advantage of clearly
> demonstrating the propagation logic more clearly.
>
> But more importantly, it seems to me that once negative disaggregation is
> implemented, there is really no reason to use positive disaggregation
> anywhere. It would be “cleaner” (less signaling) and “better” (no incast)
> to simply use negative disaggregation everywhere.
>
> So…. I was thinking to add a configuration knob to my code
> “positive-only”, “negative-only”, “positive-and-negative”, with
> “negative-only” being the default? (Seems like a waste to have coded up the
> positive disaggregation).
>
> What does everyone think of that?
>
> Should we go even further and simply remove positive disaggregation from
> the spec?
>
> Should we have negative disaggregation as a capability in the capability
> negotiation? (Seems non-sensical, given it is something that the whole
> fabric needs to support or not.)
>
> Should we take this to the RIFT mailing list?
>
> — Bruno
>
>
>
> _______________________________________________
> RIFT mailing list
> RIFT@ietf.org
> https://www.ietf.org/mailman/listinfo/rift
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Frift__%3B!!NEt6yMaO-gk!TmsyrEr6P-LonGpbG3vC0JFQ_i21Ign6Y6BE9wvWBuAioJWe1MkV9Q-fUBH1Rg%24&data=02%7C01%7C%7C4db24006111d4624c40708d7faf4bcef%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637253801557732691&sdata=R8MLL0Q8aTmhr4xq%2BGTTTf2IgKLWcDoBKQSz2APcGsI%3D&reserved=0>
>
>
> _______________________________________________
> RIFT mailing list
> RIFT@ietf.org
> https://www.ietf.org/mailman/listinfo/rift
>
>
> Juniper Business Use Only
>
>
>