Re: [Rift] Why negative disaggregation cannot replace positive disaggregation in all cases.
Tony Przygienda <tonysietf@gmail.com> Mon, 18 May 2020 15:47 UTC
Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 275973A08AC for <rift@ietfa.amsl.com>; Mon, 18 May 2020 08:47:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.903
X-Spam-Level:
X-Spam-Status: No, score=0.903 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TzRdOEi33U4V for <rift@ietfa.amsl.com>; Mon, 18 May 2020 08:47:28 -0700 (PDT)
Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 948B63A089D for <rift@ietf.org>; Mon, 18 May 2020 08:47:28 -0700 (PDT)
Received: by mail-io1-xd2d.google.com with SMTP id w25so11006621iol.12 for <rift@ietf.org>; Mon, 18 May 2020 08:47:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wnmATiz8v7KojM0WtGyLptTMPF909WZ79WKnD5ensy0=; b=PZRVNkKeWQ+/yoVXmUew0rcoO68vh582IPsc8gL6TRHPzEd8nKK6TKxC4MBtwv2MIU ceDY7z7LZQOJAo/k/mVC8RaQFobt5MENtm039eMLL12Z9QGdfYWCN0vkhtlm0YrqW1we WH6TRBEbcYuoESYOHUP6dLInSYw9PfD7xmLbuzJou2P6fHuBCTFxviTciU2DaaPSPdJw 7uYJSLphjddzYxx2HSWSmJlUkoywAPZQDG/1XhTAgbNRjRgnA34PF8fyZjeNsq15lFeP tBf6Y4lBbUZAJnfPbTOHvpNiIAU9kI66VNDno3vzQnKQDHeibICNvILqA1ogBuZHIW61 FlEQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wnmATiz8v7KojM0WtGyLptTMPF909WZ79WKnD5ensy0=; b=E7WjxvaBjSU4g65hZd0mYJHDqIiDoEZW8gdl6SdJcVzfhn7ldk9N+DSlZ5NYKqKgaX llh46SbousR/Z9+bs23twn7v2RiMEIHs+HS4AtZC70eA6z8I3tcHpdw5uRDa24Em90Zg BzXUZsJKES4keXmJIbfmkzmqdGQWBRe71zB3FdcPYjhYO0pPNN/LUMgAFpwRT5TSoeBs mmobCjZIetZxSJBIzHVpcVaM/S+NdG9fLEUahs6wH9Qba7t/1Aa9lFCgshyLPQNBm234 byhIkvHe4t7aQ4cZ99qvrscFcyHPI36Q5T1z6OlYOjJk8hUwowNaoFmftWOmdPdcYnSS KA/A==
X-Gm-Message-State: AOAM5300XySNYLz+E+i49Obefl3v9YlWAIWnOdLU1PSafb9fox/+tSH5 z2+mx+NimQKX3M5i+55fK5nuSSIA/B47v5kb4aabIYFK
X-Google-Smtp-Source: ABdhPJzJtxLscBwNQTzkOTwF9Jn7zOOe77+2QroG4iJEmK2JazsxWFqUl7v98WQ8ZnXosvREutI2TAnGz/gVf4dNFm8=
X-Received: by 2002:a02:a58b:: with SMTP id b11mr16103496jam.56.1589816846120; Mon, 18 May 2020 08:47:26 -0700 (PDT)
MIME-Version: 1.0
References: <068412A2-1E85-4327-A50E-F6138C6D7EC0@hotmail.com> <9E819710-00BC-4285-9146-F655CAA7E1CA@hotmail.com> <MN2PR11MB3565EF85007A2737846E962CD8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <MN2PR11MB35653D1AB0505A0D36522E74D8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <432D1BE9-8BC3-4627-ACA5-72AA52A79C8C@hotmail.com> <MN2PR11MB3565DB833B52E9C0381343FCD8BC0@MN2PR11MB3565.namprd11.prod.outlook.com> <4272B942-2C6F-4974-8515-E295A6BFF757@hotmail.com> <CA+wi2hO=3U==_xLLhuCyTAKCa7cNqWRHzgwXgJVJSTOvemOnMw@mail.gmail.com> <A967D4CC-107D-4707-A277-0BE8394576FB@juniper.net> <4E305EBE-1287-44AA-AF26-66152576E56E@hotmail.com> <DC8B77F8-CC2D-4C66-BAFA-D821D473D243@cisco.com> <B437D814-7C03-467B-9099-B8E925959DA7@juniper.net> <2D5113CF-444D-4759-8B98-4ABC0C46CDAC@hotmail.com>
In-Reply-To: <2D5113CF-444D-4759-8B98-4ABC0C46CDAC@hotmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Mon, 18 May 2020 08:45:46 -0700
Message-ID: <CA+wi2hNuWKM-6fXTTDeUfcA2Y_8DQB7P9ffRF9B2BMjnkaaJsg@mail.gmail.com>
To: Bruno Rijsman <brunorijsman@hotmail.com>
Cc: "Pascal Thubert (pthubert)" <pthubert=40cisco.com@dmarc.ietf.org>, "rift@ietf.org" <rift@ietf.org>, Antoni Przygienda <prz@juniper.net>
Content-Type: multipart/alternative; boundary="0000000000005eff0005a5ee14d3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/yVgeWf1c5xUYiLk3MXyffpvJBr8>
Subject: Re: [Rift] Why negative disaggregation cannot replace positive disaggregation in all cases.
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 May 2020 15:47:32 -0000
chairs, if we start a monthly again I think we need interim framework to be in line here ... -- tony On Mon, May 18, 2020 at 7:25 AM Bruno Rijsman <brunorijsman@hotmail.com> wrote: > Hi Pascal, > > Yes, we can do a monthly. > > For now, here is the issue that I ran into that makes me believe that > negative disaggregation cannot replace positive disaggregation in all cases > (specifically not for the “normal” case of non-tof routers). > > Here is a simplified summary of how positive disaggregation works (see > section 4.2.5.1 for details): > > If this router R1 has a "direct route" for prefix P, and that route P has > a set of next-hop {NH}, and there is some other router R2 at the same level > as R1 that does not have NH as a south-adjacency for each NH in {NH}, then > this router R1 must positively disaggregate prefix P: this router R1 must > "attract" the traffic so that it doesn't go to the other router R2 (who > cannot delivery the traffic to P). > > Now let's try to the the reverse thing with negative disaggregation: > > If some other router R2 at the same level as R1, and R2 has a "direct > route" for prefix P, and that route P has some set of next-hops {NH}, and > this router R1 does not have NH as a south-adjacency for each NH in {NH}, > then this router R1 must negatively disaggregate prefix P: this router R1 > must "repel" the traffic so that it goes to R2 instead of R1. > > The problem that this requires router R1 to “ now” what the set of routes > is that router R2 has, and to know the next-hops for those routes. But R1 > cannot know/calculate the "direct routes" that R2 has because it doesn't > know which north-TIEs R2 has in its database. So, I don’t think we can do > negative disaggregation here... > > Am I correct or am I missing something? > > — Bruno > > > On May 18, 2020, at 12:28 AM, Antoni Przygienda <prz@juniper.net> wrote: > > Agreed. Let’s set up a monthly > > > - Tony > > > *From: *"Pascal Thubert (pthubert)" <pthubert=40cisco.com@dmarc.ietf.org> > *Date: *Sunday, May 17, 2020 at 10:15 PM > *To: *Bruno Rijsman <brunorijsman@hotmail.com> > *Cc: *Antoni Przygienda <prz@juniper.net>, "rift@ietf.org" <rift@ietf.org>, > Tony Przygienda <tonysietf@gmail.com> > *Subject: *Re: [Rift] Negative disaggregation feature guide > > *[External Email. Be cautious of content]* > > This is quite interesting, Bruno. > > > Does that mean that there are issues running negative in stand alone? I’d > be really interested if we have a monthly or something one of these days. > Also there’s the section on disaggregation in the applicability draft that > we may enrich > Many thanks for all your efforts! > > Pascal > > > Le 18 mai 2020 à 01:02, Bruno Rijsman <brunorijsman@hotmail.com> a écrit : > > Hi Tony, > > I ran into various complications and a road-block while working on the > “disaggregation negative-only” knob in RIFT-python, so I will be ripping > that knob out of my code. > > Instead I will stick to the three commandments as handed down on the stone > tablet by the wise RIFT gods from above :-) > > > On May 14, 2020, at 12:26 PM, Antoni Przygienda < > prz=40juniper.net@dmarc.ietf.org> wrote: > > In a probably more detailed way with a little stone tablet I suggest 😉 > > > 1. You SHALL negatively disaggregate only if you’re ToF and have > horizontal links (ring) > 2. You SHALL propagate transitively negative disaggregation > 3. You SHALL use positive if you’re not ToF or have no horizontal > > > Of course, any vendor, even open sourced one can add any flavor of knobs > that makes their dish unique even if it breaks the spec strictly speaking … > So did I 😉 > > --- tony > > *From: *Tony Przygienda <tonysietf@gmail.com> > *Date: *Thursday, May 14, 2020 at 11:12 AM > *To: *Bruno Rijsman <brunorijsman@hotmail.com> > *Cc: *"Pascal Thubert (pthubert)" <pthubert@cisco.com>, "rift@ietf.org" < > rift@ietf.org>, Melchior Aelmans <maelmans@juniper.net>, Christian Graf < > cgraf@juniper.net>, Oliver Steudler <osteudler@juniper.net>, Olivier > Vandezande <ovandezande@juniper.net>, Antoni Przygienda <prz@juniper.net>, > "tommasocaiazzi@gmail.com" <tommasocaiazzi@gmail.com>, Jeff Tantsura < > jefftant.ietf@gmail.com>, Zhaohui Zhang <zzhang@juniper.net>, Zhaohui > Zhang <zzhang@juniper.net>, Jeffrey Zhang <zzhang2003@gmail.com> > *Subject: *Re: Negative disaggregation feature guide > > *[External Email. Be cautious of content]* > > my thoughts: > > negative is seriously more complex to implement and understand > operationally and only needed on multi-plane fabrics, taht's why positive > is KISS ;-) Also, negative always forces you to ring the top of the fabric > which in single plane design is an unnecessary requirement. > > I foresee vendors not implementing negative for simplicity, very small > footprint fabrics. > > the discussion about capability advertisement is interesting, we can just > add one bit on the node capabilities element, make it optional and default > true. > > Of course an implementation taht understands that all involved nodes > understand negative is free to use negative instead of positive then. > > -- tony > > On Thu, May 14, 2020 at 11:05 AM Bruno Rijsman <brunorijsman@hotmail.com> > wrote: > > Broadening a discussion on negative disaggregation to the RIFT mailing > list. > > Now that we have implemented negative disaggregation in RIFT-Python, and > now that we are getting some operational experience with it, the following > is becoming more and more evident: > > It seems to us (Bruno and Pascal, for now) that once negative > disaggregation is implemented, there is really no need for positive > disaggregation any more. > > Every use case that can be solved by positive disaggregation can (as far > as we can currently tell) also be solved by negative disaggregation. > > Furthermore, negative disaggregation solves the problems in a far simpler > and more elegant way: there are fewer advertising nodes and fewer > advertised TIEs involved, and we don’t have the synchronization issue that > positive disaggregation has that potentially causes incast problems. > > Thus, at the very least, it makes sense to recommend that in a given > fabric we use either only negative disaggregation only or positive > disaggregation only, with negative disaggregation being the default. > > If positive and negative disaggregation are enabled simultaneously in one > and the same fabric, everything still works fine as far as we can see. But > there are some “interesting” interactions that make things unnecessarily > complex and potentially fragile. > > If support for negative disaggregation is mandatory, and if our assessment > that negative disaggregation can solve all use cases is correct, then we > could go one bold step further and completely remove positive > disaggregation from the specification. > > Thoughts? > > — Bruno > > > > On May 14, 2020, at 10:48 AM, Pascal Thubert (pthubert) < > pthubert@cisco.com> wrote: > > Hello Bruno: > > I like your negative-only default because it is probably the safest. Let > me elaborate.. > > partitioning saves a lot more ports per ToF node than the ring cost so it > does not matter. > > If you enable negative, there’s no point in doing any positive at all, is > there? So what do we try to save by combining? In your example, I guess > that super 1 2 had to retract the positive disag and then do negative > instead when the link to spine 1 1 fell. Was that not a bit complex and > error prone? > > Also I’m not clear how we can always decide that dynamically and I do not > believe that we specified that. So I understand that it’s more like a use > case thingy, like a configuration that would be adapted to the use case. > > => do not configure both at the same time use either positive or negative > disag. > > We do not try to do positive disag transitively. So if there is a need for > transitive, you have to use negative. In a multiplane case, there’s > usually (as in your picture) one ToP per plane in each PoD, and a first > link failure (say Spine 1 1 to leaf 1 1) can already cause a fallen leaf: > leaf 1 1 is no more reachable within plane 1. This tells you that all > leaves in the other PoDs must avoid that plane, and that’s a leaf decision; > which tells you that you need to recurse transitively down. This indicates > that negative must be activated in a highly partitioned (meaning low > redundancy) multiplane. > > Say we try to use positive disaggregation transitively in a ToF that is > partitioned like in planes, you’d run a risk that none of the ToF nodes > that can still reach the leaf can see (through south reflection) the ToF > node that cannot so they do not know they need to disag. So you end up > needing the same ring as for negative, the southern reflection become > mostly useless and the benefits of positive are gone. > > Partitionned ToF => negative > > I trust that we can use positive in a highly redundant and symmetrical > non-partitioned ToF. If no one can reach the leaf anymore (like it’s dead) > we have no issue since it should not be disaggregated, just use the > discards route at any ToF node. If you have enough redundancy and no > partition, there will always be nodes that can both reach a leaf and > discover that a peer does not. > > => Positive is indicated for a very specific use case, ideally 2 levels > and fully meshed. Maybe that’s enough deployments to justify the feature in > the code. > > So the question is really that blurry zone between highly redundant single > plane and highly partitioned multiplane what should one do? I becomes a > risk/benefit judgement… If negative is implemented and the ring is there, > I’d use it, better safe than sorry. > > Take care > > Pascal > > > > *From:* Bruno Rijsman <brunorijsman@hotmail.com> > *Sent:* jeudi 14 mai 2020 17:36 > *To:* Pascal Thubert (pthubert) <pthubert@cisco.com> > *Cc:* Melchior Aelmans <maelmans@juniper.net>; Christian Graf < > cgraf@juniper.net>; Oliver Steudler <osteudler@juniper.net>; Olivier > Vandezande <ovandezande@juniper.net>; Tony Przygienda <tonysietf@gmail.com>; > Antoni Przygienda <prz=40juniper.net@dmarc.ietf.org>; > tommasocaiazzi@gmail.com; Jeff Tantsura <jefftant.ietf@gmail.com>; > Jeffrey (Zhaohui) Zhang <zzhang@juniper.net>; Jeffrey (Zhaohui) Zhang < > zzhang=40juniper.net@dmarc.ietf.org <zzhang=40juniper.net@dmarc..ietf.org>>; > Jeffrey Zhang <zzhang2003@gmail.com> > *Subject:* Re: Negative disaggregation feature guide > > > > On May 14, 2020, at 8:53 AM, Pascal Thubert (pthubert) <pthubert@cisco.com> > wrote: > > This does not show because on your first breakage you use the positive > disag. If you stick to the logic you used at the beginning, that is pick an > example where positive applies but use negative, then on your first > breakage you’d show that the negative does not need to go transitively to > the leaf, because the spine nodes still have solutions. Is that a lot of > work to change? > > > This is exactly what I was thinking as well. > > Maybe I need to go through the current specification with a fine-toothed > comb again, but my (possibly incorrect) understanding is that the current > version of the specification is that we positive disaggregation is used > (MUST be used? SHOULD be used?) in the “first failure” scenario. > > Personally, I see no reason why we would not be able to use negative > disaggregation instead of positive disaggregation in the first failure > scenario as well. > > In the context of the tutorial, this would have the advantage of clearly > demonstrating the propagation logic more clearly. > > But more importantly, it seems to me that once negative disaggregation is > implemented, there is really no reason to use positive disaggregation > anywhere. It would be “cleaner” (less signaling) and “better” (no incast) > to simply use negative disaggregation everywhere. > > So…. I was thinking to add a configuration knob to my code > “positive-only”, “negative-only”, “positive-and-negative”, with > “negative-only” being the default? (Seems like a waste to have coded up the > positive disaggregation). > > What does everyone think of that? > > Should we go even further and simply remove positive disaggregation from > the spec? > > Should we have negative disaggregation as a capability in the capability > negotiation? (Seems non-sensical, given it is something that the whole > fabric needs to support or not.) > > Should we take this to the RIFT mailing list? > > — Bruno > > > > _______________________________________________ > RIFT mailing list > RIFT@ietf.org > https://www.ietf.org/mailman/listinfo/rift > <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Frift__%3B!!NEt6yMaO-gk!TmsyrEr6P-LonGpbG3vC0JFQ_i21Ign6Y6BE9wvWBuAioJWe1MkV9Q-fUBH1Rg%24&data=02%7C01%7C%7C4db24006111d4624c40708d7faf4bcef%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637253801557732691&sdata=R8MLL0Q8aTmhr4xq%2BGTTTf2IgKLWcDoBKQSz2APcGsI%3D&reserved=0> > > > _______________________________________________ > RIFT mailing list > RIFT@ietf.org > https://www.ietf.org/mailman/listinfo/rift > > > Juniper Business Use Only > > >
- Re: [Rift] Negative disaggregation feature guide Bruno Rijsman
- Re: [Rift] Negative disaggregation feature guide Tony Przygienda
- Re: [Rift] Negative disaggregation feature guide Bruno Rijsman
- Re: [Rift] Negative disaggregation feature guide Antoni Przygienda
- Re: [Rift] Negative disaggregation feature guide Bruno Rijsman
- Re: [Rift] Negative disaggregation feature guide Pascal Thubert (pthubert)
- Re: [Rift] Negative disaggregation feature guide Antoni Przygienda
- [Rift] Why negative disaggregation cannot replace… Bruno Rijsman
- Re: [Rift] Why negative disaggregation cannot rep… Pascal Thubert (pthubert)
- Re: [Rift] Why negative disaggregation cannot rep… Antoni Przygienda
- Re: [Rift] Why negative disaggregation cannot rep… Tony Przygienda