Re: [Lsr] Open issues with Dynamic Flooding

Peter Psenak <ppsenak@cisco.com> Tue, 05 March 2019 20:18 UTC

Return-Path: <ppsenak@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 01D04126F72 for <lsr@ietfa.amsl.com>; Tue, 5 Mar 2019 12:18:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.501
X-Spam-Level:
X-Spam-Status: No, score=-14.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OsK90v5_uidz for <lsr@ietfa.amsl.com>; Tue, 5 Mar 2019 12:17:59 -0800 (PST)
Received: from aer-iport-2.cisco.com (aer-iport-2.cisco.com [173.38.203.52]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 27F74126C15 for <lsr@ietf.org>; Tue, 5 Mar 2019 12:17:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=5730; q=dns/txt; s=iport; t=1551817079; x=1553026679; h=subject:to:references:cc:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=6btcMi/5IDq1wPB4C94Py86MJVpyk+P4mmYCjkfTxVA=; b=TUr5Gb0a6f6aun5l8uiumWy+tdFvfggC0DbBjhDvOn2vrqWExbK2Xd4J vRd2tAx21/6Z7IcQMq2g6kZz1fToIVjordY2A/fcBz94mYh3sSiFbL8Cw ftJp2wLOp/OMPVcthiOT7Hkb0+2JfBVAkLaI5bLegIYOr12RGadJ/i9Ch s=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0APAADQ2H5c/xbLJq1kGQEBAQEBAQEBAQEBAQcBAQEBAQGBVAEBAQEBAQsBgWWBEoEDJ4QIiHmMeHyIMpBuDRgLhEkChE43Bg0BAQMBAQMBAwJtHAyFSgEBAQECAQEBIQ8BBTYLEAsYAgIjAwICIQYfEQYBDAYCAQEXgwcBgV0DDQgPqzSBL4VEgkENghkFgQskAYs+gUA/gREngjYHLoJXRwEBgTqDMYJXAopCjHiMGjMJjzeDNwYZgXSFZIMiiC2KZYcKi0OBXSKBVjMaCBsVO4Jsgy0BCYYHgU6FQD4DMI5IKoIjAQE
X-IronPort-AV: E=Sophos;i="5.58,445,1544486400"; d="scan'208";a="10560876"
Received: from aer-iport-nat.cisco.com (HELO aer-core-3.cisco.com) ([173.38.203.22]) by aer-iport-2.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Mar 2019 20:17:56 +0000
Received: from [10.60.140.54] (ams-ppsenak-nitro5.cisco.com [10.60.140.54]) by aer-core-3.cisco.com (8.15.2/8.15.2) with ESMTP id x25KHtNY009107; Tue, 5 Mar 2019 20:17:56 GMT
To: Robert Raszuk <robert@raszuk.net>, Tony Przygienda <tonysietf@gmail.com>
References: <AAD29CF0-F0CA-4C3C-B73A-78CD2573C446@tony.li> <c1adac3a-cd4b-130e-d225-a5f40bf0ef55@cisco.com> <F3C4B9B2-F101-4E28-8928-9208D5EBAF99@tony.li> <be28dbcf-8382-329a-229f-5b146538fabe@cisco.com> <CA+wi2hPt-UrekyA9LpCWJHo9KyaOR1=eVQD29y54sciv3zh10A@mail.gmail.com> <CAOj+MMGPp=DffEw7vS4PH_vDtmYL5y2Xxgx2utNt4R6cxsCiwg@mail.gmail.com>
Cc: lsr@ietf.org, Tony Li <tony.li@tony.li>
From: Peter Psenak <ppsenak@cisco.com>
Message-ID: <41bd7097-0d25-a2e0-843d-cb25fd13a84f@cisco.com>
Date: Tue, 05 Mar 2019 21:17:55 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <CAOj+MMGPp=DffEw7vS4PH_vDtmYL5y2Xxgx2utNt4R6cxsCiwg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Outbound-SMTP-Client: 10.60.140.54, ams-ppsenak-nitro5.cisco.com
X-Outbound-Node: aer-core-3.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/JNX4gzgljRPIVIj-l_4-jVkxDls>
Subject: Re: [Lsr] Open issues with Dynamic Flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2019 20:18:02 -0000

Robert,

On 05/03/2019 20:12 , Robert Raszuk wrote:
>
>> Slow convergence is obviously not a good thing
>
> Could you please kindly elaborate why ?
>
> With tons of ECMP in DCs or with number of mechanism for very fast data
> plane repairs in WAN (well beyond FRR) IMHO any protocol *fast
> convergence* is no longer a necessity. Yet many folks still talk about
> it like the only possible rescue ...

we are talking about the control plane convergence, not data plane one.
If the flooding topology is subset of the real topology, then at the 
flooding level you don't have all the ECMPs available - you only have 
two paths to reach any node. In such case it is possible that the 
flooding topology gets partitioned and you want to get out of that state 
quickly, as you may get out of sync with the the reality and eventually 
loose all the data plane ECMPs as a consequence.

thanks,
Peter

>
>
> On Tue, Mar 5, 2019 at 5:42 PM Tony Przygienda <tonysietf@gmail.com
> <mailto:tonysietf@gmail.com>> wrote:
>
>     in practical terms +1 to Peter's take here ... Unless we're talking
>     tons of failures simultaneously (which AFAI talked to folks are not
>     that common but can sometimes happen in DCs BTW due to weird things)
>     smaller scale failures with few links would cause potentially
>     diffused "chaining" of convergence behavior rather than IGP-style
>     fast healing (and on top of that I didn't see a lot of interest in
>     formalizing a rigorous distributed algorithm which IMO would be
>     necessary to ensure ultimate convergence when only one/subset of
>     links is used). Slow convergence is obviously not a good thing
>     unless we assume people will run FRR with its complexity in DC
>     and/or no more than one link every fails which seems to me bending
>     assumptions to whatever solution is available/preferred. To Tony's
>     point though, on large scale failures enabling all links would cause
>     heavy flood load, yes, but in a sense it's the "initial bootup" case
>     anyway (especially in centralized case) since nodes need all
>     topology to make informed correct decisions about what the FT should
>     be if they don't rely on whatever the centralized instance thinks
>     (which they won't be able to do given the FT from centralized
>     instance will indicate lots links that are "gone" due to failure).
>     As to p2p, I suggest to agree whether you use dense mesh (DC) case
>     or sparse mesh (WAN) case or "every topology imaginable" since that
>     drives lots design trade-offs.
>
>     my 2.71828182 cents ;-)
>
>     --- tony
>
>     On Tue, Mar 5, 2019 at 8:27 AM Peter Psenak <ppsenak@cisco.com
>     <mailto:ppsenak@cisco.com>> wrote:
>
>         Hi Tony,
>
>         On 05/03/2019 17:16 , tony.li@tony.li <mailto:tony.li@tony.li>
>         wrote:
>         >
>         > Peter,
>         >
>         >>>    (a) Temporarily add all of the links that would appear to
>         remedy the partition. This has the advantage that it is very
>         likely to heal the partition and will do so in the minimal
>         amount of convergence time.
>         >>
>         >> I prefer (a) because of the faster convergence.
>         >> Adding all links on a single node to the flooding topology is
>         not going to cause issues to flooding IMHO.
>         >
>         >
>         > Could you (or John) please explain your rationale behind that?
>         It seems counter-intuitive.
>
>         it's limited to the links on a single node. From all the practical
>         purposes I don't expect single node to have thousands of
>         adjacencies, at
>         least not in the DC topologies for which the dynamic flooding is
>         being
>         primary invented.
>
>         In the environments with large number of adjacencies (e.g.
>         hub-and-spoke) it is likely that we would have to make all these
>         links
>         part of the flooding topology anyway, because the spoke is
>         typically
>         dual attached to two hubs only. And the incremental adjacency
>         bringup is
>         something that an implementation may already support.
>
>         >
>         >
>         >
>         >> given that the flooding on the LAN in both OSPF and ISIS is
>         done as multicast, there is currently no way to enable flooding,
>         either permanent or temporary, towards a subset of the neighbors
>         on the LAN. So if the flooding is enabled on a LAN it is done
>         towards all routers connected to the it..
>         >
>         >
>         > Agreed.
>         >
>         >
>         >> Given that all links between routers are p2p these days, I
>         would vote for simplicity and make the LAN always part of the FT.
>         >
>         >
>         > I’m not on board with this yet.  Our simulations suggest that
>         this is not necessarily optimal.  There are lots of topologies
>         (e..g., parallel LANs) where this blanket approach is suboptimal.
>
>         the question is how much are true LANs used as transit links in
>         today's
>         networks.
>
>         thanks,
>         Peter
>
>         >
>         > Tony
>         >
>         > .
>         >
>
>         _______________________________________________
>         Lsr mailing list
>         Lsr@ietf.org <mailto:Lsr@ietf.org>
>         https://www.ietf.org/mailman/listinfo/lsr
>
>     _______________________________________________
>     Lsr mailing list
>     Lsr@ietf.org <mailto:Lsr@ietf.org>
>     https://www.ietf.org/mailman/listinfo/lsr
>