Re: [Lsr] Open issues with Dynamic Flooding

Robert Raszuk <robert@raszuk.net> Tue, 05 March 2019 21:06 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 66336128D0B for <lsr@ietfa.amsl.com>; Tue, 5 Mar 2019 13:06:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 325Eho67Wx5E for <lsr@ietfa.amsl.com>; Tue, 5 Mar 2019 13:06:53 -0800 (PST)
Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 399B01288BD for <lsr@ietf.org>; Tue, 5 Mar 2019 13:06:53 -0800 (PST)
Received: by mail-qk1-x72e.google.com with SMTP id a15so5581539qkc.1 for <lsr@ietf.org>; Tue, 05 Mar 2019 13:06:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KCYP0/oWWvBSEeC7oH/E0Q493czllTOGjk1vurUX2Eg=; b=S1QzzqZ7E7N9TIbvqin3ibB4YB0ZbUeq+FZM1zrLN8/kfsq1JM1wblQri7860XJWeg 0V+Fck+GBB6uQTNZ5Lj4CPHZ7LF3KC7pX5lHxRsew3UDrorZcTXJh8GNs3ScpiVdt7a4 jZjYqIFhNGxvp89/GUd95VeWiSb2A0ZT/63d+YEhcID7zoln0KIKZE1YIaw5Rf+cVV39 z9KiUynA9RXJu0G+5D5Fpx89567lzFOIZUltPq94c+E34A0sEudIQllxm7lyNpYYyjeY 99r0DPnPsjgP17Toi7hZQJSUQLsdY2zk2MwsQ8oKrTth/8EKhTAfqgvYLgAba4kzFLHc LQ6A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KCYP0/oWWvBSEeC7oH/E0Q493czllTOGjk1vurUX2Eg=; b=orHWZfUWQuIiM128UCuy7bzKyWZiXNDcS8r1npL6oX7T7ePXIWnpN/j7uMABb/7WOV wNSHIGMV3PpcUm9zaXnHK/d12HC/PpPrh15sn3QItaNKkMKovq0niHHwoo1MnvQnNQbu ++RaUmWGK116E9VY7Yi0XEwxDAlJtCdb24odNOzkiMyDjLpw0xjRGJMEZCc078+NDRaB 4fdfc9+PwEEh+VMOmdNdR44yudFVnzfvdUEuc9gaPzovhGAvQHbXecg12/eVN1bLODy/ dnlceMIlLWEZNbs+Vv9zOlWvvWZDe3Vz5LFPoGIGGUbzS9ytTHlutAzKXXxORcTdYGuZ hplw==
X-Gm-Message-State: APjAAAUv2BfOEeT5fhtQU+v5QD9nBbBieLl6UKvYPFZkRlYaP79fa7g9 uiri7XzCrvWI8NSEwU2fZlwNT2lvVHOdhmLvkP8dhA==
X-Google-Smtp-Source: APXvYqzF5UEIh+CNNyM6xxoYveuFURa0t8+HFNo2lGPecyd/HBiiv+OxUFLQQJJNn9jyf5vb9iK0umSDOPAypQxtB+w=
X-Received: by 2002:a37:ef0a:: with SMTP id j10mr3340739qkk.336.1551820012034; Tue, 05 Mar 2019 13:06:52 -0800 (PST)
MIME-Version: 1.0
References: <AAD29CF0-F0CA-4C3C-B73A-78CD2573C446@tony.li> <c1adac3a-cd4b-130e-d225-a5f40bf0ef55@cisco.com> <F3C4B9B2-F101-4E28-8928-9208D5EBAF99@tony.li> <be28dbcf-8382-329a-229f-5b146538fabe@cisco.com> <CA+wi2hPt-UrekyA9LpCWJHo9KyaOR1=eVQD29y54sciv3zh10A@mail.gmail.com> <CAOj+MMGPp=DffEw7vS4PH_vDtmYL5y2Xxgx2utNt4R6cxsCiwg@mail.gmail.com> <41bd7097-0d25-a2e0-843d-cb25fd13a84f@cisco.com>
In-Reply-To: <41bd7097-0d25-a2e0-843d-cb25fd13a84f@cisco.com>
From: Robert Raszuk <robert@raszuk.net>
Date: Tue, 05 Mar 2019 22:06:41 +0100
Message-ID: <CAOj+MMHmi-Ch43=YJ=LphPxiJmoHyg1fovqnT5iJxB6ASPfUXQ@mail.gmail.com>
To: Peter Psenak <ppsenak@cisco.com>
Cc: Tony Przygienda <tonysietf@gmail.com>, lsr@ietf.org, Tony Li <tony.li@tony.li>
Content-Type: multipart/alternative; boundary="00000000000092b23605835f408e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/ZKjvmQVqrEq9DdjxQa3IMlagtQQ>
Subject: Re: [Lsr] Open issues with Dynamic Flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2019 21:06:57 -0000

Peter,

> you only have two paths to reach any node.

Who says that you must be limited to two paths only ?

Why not create a flooding graph such that flooding will happen over 4 paths
as opposed to flooding over 16 or 32 today without optimization.

And if you are worried that you loose *wisely selected* all 4 paths before
you manage to distribute new flooding topology you can always flood over 6
:)

Best,
R.







On Tue, Mar 5, 2019 at 9:17 PM Peter Psenak <ppsenak@cisco.com> wrote:

> Robert,
>
> On 05/03/2019 20:12 , Robert Raszuk wrote:
> >
> >> Slow convergence is obviously not a good thing
> >
> > Could you please kindly elaborate why ?
> >
> > With tons of ECMP in DCs or with number of mechanism for very fast data
> > plane repairs in WAN (well beyond FRR) IMHO any protocol *fast
> > convergence* is no longer a necessity. Yet many folks still talk about
> > it like the only possible rescue ...
>
> we are talking about the control plane convergence, not data plane one.
> If the flooding topology is subset of the real topology, then at the
> flooding level you don't have all the ECMPs available - you only have
> two paths to reach any node. In such case it is possible that the
> flooding topology gets partitioned and you want to get out of that state
> quickly, as you may get out of sync with the the reality and eventually
> loose all the data plane ECMPs as a consequence.
>
> thanks,
> Peter
>
> >
> >
> > On Tue, Mar 5, 2019 at 5:42 PM Tony Przygienda <tonysietf@gmail.com
> > <mailto:tonysietf@gmail.com>> wrote:
> >
> >     in practical terms +1 to Peter's take here ... Unless we're talking
> >     tons of failures simultaneously (which AFAI talked to folks are not
> >     that common but can sometimes happen in DCs BTW due to weird things)
> >     smaller scale failures with few links would cause potentially
> >     diffused "chaining" of convergence behavior rather than IGP-style
> >     fast healing (and on top of that I didn't see a lot of interest in
> >     formalizing a rigorous distributed algorithm which IMO would be
> >     necessary to ensure ultimate convergence when only one/subset of
> >     links is used). Slow convergence is obviously not a good thing
> >     unless we assume people will run FRR with its complexity in DC
> >     and/or no more than one link every fails which seems to me bending
> >     assumptions to whatever solution is available/preferred. To Tony's
> >     point though, on large scale failures enabling all links would cause
> >     heavy flood load, yes, but in a sense it's the "initial bootup" case
> >     anyway (especially in centralized case) since nodes need all
> >     topology to make informed correct decisions about what the FT should
> >     be if they don't rely on whatever the centralized instance thinks
> >     (which they won't be able to do given the FT from centralized
> >     instance will indicate lots links that are "gone" due to failure).
> >     As to p2p, I suggest to agree whether you use dense mesh (DC) case
> >     or sparse mesh (WAN) case or "every topology imaginable" since that
> >     drives lots design trade-offs.
> >
> >     my 2.71828182 cents ;-)
> >
> >     --- tony
> >
> >     On Tue, Mar 5, 2019 at 8:27 AM Peter Psenak <ppsenak@cisco.com
> >     <mailto:ppsenak@cisco.com>> wrote:
> >
> >         Hi Tony,
> >
> >         On 05/03/2019 17:16 , tony.li@tony.li <mailto:tony.li@tony.li>
> >         wrote:
> >         >
> >         > Peter,
> >         >
> >         >>>    (a) Temporarily add all of the links that would appear to
> >         remedy the partition. This has the advantage that it is very
> >         likely to heal the partition and will do so in the minimal
> >         amount of convergence time.
> >         >>
> >         >> I prefer (a) because of the faster convergence.
> >         >> Adding all links on a single node to the flooding topology is
> >         not going to cause issues to flooding IMHO.
> >         >
> >         >
> >         > Could you (or John) please explain your rationale behind that?
> >         It seems counter-intuitive.
> >
> >         it's limited to the links on a single node. From all the
> practical
> >         purposes I don't expect single node to have thousands of
> >         adjacencies, at
> >         least not in the DC topologies for which the dynamic flooding is
> >         being
> >         primary invented.
> >
> >         In the environments with large number of adjacencies (e.g.
> >         hub-and-spoke) it is likely that we would have to make all these
> >         links
> >         part of the flooding topology anyway, because the spoke is
> >         typically
> >         dual attached to two hubs only. And the incremental adjacency
> >         bringup is
> >         something that an implementation may already support.
> >
> >         >
> >         >
> >         >
> >         >> given that the flooding on the LAN in both OSPF and ISIS is
> >         done as multicast, there is currently no way to enable flooding,
> >         either permanent or temporary, towards a subset of the neighbors
> >         on the LAN. So if the flooding is enabled on a LAN it is done
> >         towards all routers connected to the it..
> >         >
> >         >
> >         > Agreed.
> >         >
> >         >
> >         >> Given that all links between routers are p2p these days, I
> >         would vote for simplicity and make the LAN always part of the FT.
> >         >
> >         >
> >         > I’m not on board with this yet.  Our simulations suggest that
> >         this is not necessarily optimal.  There are lots of topologies
> >         (e..g., parallel LANs) where this blanket approach is suboptimal.
> >
> >         the question is how much are true LANs used as transit links in
> >         today's
> >         networks.
> >
> >         thanks,
> >         Peter
> >
> >         >
> >         > Tony
> >         >
> >         > .
> >         >
> >
> >         _______________________________________________
> >         Lsr mailing list
> >         Lsr@ietf.org <mailto:Lsr@ietf.org>
> >         https://www.ietf.org/mailman/listinfo/lsr
> >
> >     _______________________________________________
> >     Lsr mailing list
> >     Lsr@ietf.org <mailto:Lsr@ietf.org>
> >     https://www.ietf.org/mailman/listinfo/lsr
> >
>
>