Re: [Lsr] Open issues with Dynamic Flooding

Peter Psenak <ppsenak@cisco.com> Tue, 05 March 2019 21:10 UTC

Return-Path: <ppsenak@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0EC40130E62 for <lsr@ietfa.amsl.com>; Tue, 5 Mar 2019 13:10:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.501
X-Spam-Level:
X-Spam-Status: No, score=-14.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QFQ6rHTZX52J for <lsr@ietfa.amsl.com>; Tue, 5 Mar 2019 13:10:17 -0800 (PST)
Received: from aer-iport-3.cisco.com (aer-iport-3.cisco.com [173.38.203.53]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DAAE01288BD for <lsr@ietf.org>; Tue, 5 Mar 2019 13:10:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=7694; q=dns/txt; s=iport; t=1551820217; x=1553029817; h=subject:to:references:cc:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=9NdfxXl1G8TajDMkDLLOoXCX3A7b/fwbvDBaczXeIzo=; b=AB8JA0XAVJuanOiU6MtUJau+OWl58p7VkXnfdV/6YHs7WD0JDzJL7MrO 2HbR/WPSxzsZ4rv7siiM6+bLleo2+wRUAJbAJuiOaWsUL4/UmGWBLnFVb j2MaQnoHQdugphN2PhPpYIdG3XlG1MFUpm8I8S0pl9QpVPCnNQm/72ZvQ Y=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0AHAADJ5H5c/xbLJq1kGQEBAQEBAQEBAQEBAQcBAQEBAQGBUwIBAQEBAQsBgWWBEoEDJ4QIiHmMeHyIMo5zgXsNGAuESQKETjYHDQEBAwEBAwEDAm0cDIVKAQEBAQIBAQEhDwEFNgsQCxgCAiMDAgIhBh8RBg0GAgEBF4MHAYFdAw0ID6skgS+FRIJBDYIZBYELJAGLPoFAP4ERJ4I2By6CV0cBAYE6AYMwglcCikKMeIwaMwmPN4M3BhmBdIVkgyKILZFvi0OBTgIvgVYzGggbFTuCbIMtAQyGBIFOhUA+AzCOSAElBIIjAQE
X-IronPort-AV: E=Sophos;i="5.58,445,1544486400"; d="scan'208";a="10506290"
Received: from aer-iport-nat.cisco.com (HELO aer-core-1.cisco.com) ([173.38.203.22]) by aer-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Mar 2019 21:10:14 +0000
Received: from [10.60.140.54] (ams-ppsenak-nitro5.cisco.com [10.60.140.54]) by aer-core-1.cisco.com (8.15.2/8.15.2) with ESMTP id x25LADlc007131; Tue, 5 Mar 2019 21:10:14 GMT
To: Robert Raszuk <robert@raszuk.net>
References: <AAD29CF0-F0CA-4C3C-B73A-78CD2573C446@tony.li> <c1adac3a-cd4b-130e-d225-a5f40bf0ef55@cisco.com> <F3C4B9B2-F101-4E28-8928-9208D5EBAF99@tony.li> <be28dbcf-8382-329a-229f-5b146538fabe@cisco.com> <CA+wi2hPt-UrekyA9LpCWJHo9KyaOR1=eVQD29y54sciv3zh10A@mail.gmail.com> <CAOj+MMGPp=DffEw7vS4PH_vDtmYL5y2Xxgx2utNt4R6cxsCiwg@mail.gmail.com> <41bd7097-0d25-a2e0-843d-cb25fd13a84f@cisco.com> <CAOj+MMHmi-Ch43=YJ=LphPxiJmoHyg1fovqnT5iJxB6ASPfUXQ@mail.gmail.com>
Cc: Tony Przygienda <tonysietf@gmail.com>, lsr@ietf.org, Tony Li <tony.li@tony.li>
From: Peter Psenak <ppsenak@cisco.com>
Message-ID: <f1888bd8-e54e-2a3b-57a4-a05c632c6c7e@cisco.com>
Date: Tue, 05 Mar 2019 22:10:13 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <CAOj+MMHmi-Ch43=YJ=LphPxiJmoHyg1fovqnT5iJxB6ASPfUXQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Outbound-SMTP-Client: 10.60.140.54, ams-ppsenak-nitro5.cisco.com
X-Outbound-Node: aer-core-1.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/xaDETfzrcuDYz54DaYfHVHzIGSs>
Subject: Re: [Lsr] Open issues with Dynamic Flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2019 21:10:20 -0000

Robert,

On 05/03/2019 22:06 , Robert Raszuk wrote:
> Peter,
>
>> you only have two paths to reach any node.
>
> Who says that you must be limited to two paths only ?
>
> Why not create a flooding graph such that flooding will happen over 4
> paths as opposed to flooding over 16 or 32 today without optimization.
>
> And if you are worried that you loose *wisely selected* all 4 paths
> before you manage to distribute new flooding topology you can always
> flood over 6 :)

we want to limit the flooding to minimum, which is 2.

thanks,
Peter

>
> Best,
> R.
>
>
>
>
>
>
>
> On Tue, Mar 5, 2019 at 9:17 PM Peter Psenak <ppsenak@cisco.com
> <mailto:ppsenak@cisco.com>> wrote:
>
>     Robert,
>
>     On 05/03/2019 20:12 , Robert Raszuk wrote:
>     >
>     >> Slow convergence is obviously not a good thing
>     >
>     > Could you please kindly elaborate why ?
>     >
>     > With tons of ECMP in DCs or with number of mechanism for very fast
>     data
>     > plane repairs in WAN (well beyond FRR) IMHO any protocol *fast
>     > convergence* is no longer a necessity. Yet many folks still talk about
>     > it like the only possible rescue ...
>
>     we are talking about the control plane convergence, not data plane one.
>     If the flooding topology is subset of the real topology, then at the
>     flooding level you don't have all the ECMPs available - you only have
>     two paths to reach any node. In such case it is possible that the
>     flooding topology gets partitioned and you want to get out of that
>     state
>     quickly, as you may get out of sync with the the reality and eventually
>     loose all the data plane ECMPs as a consequence.
>
>     thanks,
>     Peter
>
>     >
>     >
>     > On Tue, Mar 5, 2019 at 5:42 PM Tony Przygienda
>     <tonysietf@gmail.com <mailto:tonysietf@gmail.com>
>     > <mailto:tonysietf@gmail.com <mailto:tonysietf@gmail.com>>> wrote:
>     >
>     >     in practical terms +1 to Peter's take here ... Unless we're
>     talking
>     >     tons of failures simultaneously (which AFAI talked to folks
>     are not
>     >     that common but can sometimes happen in DCs BTW due to weird
>     things)
>     >     smaller scale failures with few links would cause potentially
>     >     diffused "chaining" of convergence behavior rather than IGP-style
>     >     fast healing (and on top of that I didn't see a lot of interest in
>     >     formalizing a rigorous distributed algorithm which IMO would be
>     >     necessary to ensure ultimate convergence when only one/subset of
>     >     links is used). Slow convergence is obviously not a good thing
>     >     unless we assume people will run FRR with its complexity in DC
>     >     and/or no more than one link every fails which seems to me bending
>     >     assumptions to whatever solution is available/preferred. To Tony's
>     >     point though, on large scale failures enabling all links would
>     cause
>     >     heavy flood load, yes, but in a sense it's the "initial
>     bootup" case
>     >     anyway (especially in centralized case) since nodes need all
>     >     topology to make informed correct decisions about what the FT
>     should
>     >     be if they don't rely on whatever the centralized instance thinks
>     >     (which they won't be able to do given the FT from centralized
>     >     instance will indicate lots links that are "gone" due to failure).
>     >     As to p2p, I suggest to agree whether you use dense mesh (DC) case
>     >     or sparse mesh (WAN) case or "every topology imaginable" since
>     that
>     >     drives lots design trade-offs.
>     >
>     >     my 2.71828182 cents ;-)
>     >
>     >     --- tony
>     >
>     >     On Tue, Mar 5, 2019 at 8:27 AM Peter Psenak <ppsenak@cisco.com
>     <mailto:ppsenak@cisco.com>
>     >     <mailto:ppsenak@cisco.com <mailto:ppsenak@cisco.com>>> wrote:
>     >
>     >         Hi Tony,
>     >
>     >         On 05/03/2019 17:16 , tony.li@tony.li
>     <mailto:tony.li@tony.li> <mailto:tony.li@tony.li
>     <mailto:tony.li@tony.li>>
>     >         wrote:
>     >         >
>     >         > Peter,
>     >         >
>     >         >>>    (a) Temporarily add all of the links that would
>     appear to
>     >         remedy the partition. This has the advantage that it is very
>     >         likely to heal the partition and will do so in the minimal
>     >         amount of convergence time.
>     >         >>
>     >         >> I prefer (a) because of the faster convergence.
>     >         >> Adding all links on a single node to the flooding
>     topology is
>     >         not going to cause issues to flooding IMHO.
>     >         >
>     >         >
>     >         > Could you (or John) please explain your rationale behind
>     that?
>     >         It seems counter-intuitive.
>     >
>     >         it's limited to the links on a single node. From all the
>     practical
>     >         purposes I don't expect single node to have thousands of
>     >         adjacencies, at
>     >         least not in the DC topologies for which the dynamic
>     flooding is
>     >         being
>     >         primary invented.
>     >
>     >         In the environments with large number of adjacencies (e.g.
>     >         hub-and-spoke) it is likely that we would have to make all
>     these
>     >         links
>     >         part of the flooding topology anyway, because the spoke is
>     >         typically
>     >         dual attached to two hubs only. And the incremental adjacency
>     >         bringup is
>     >         something that an implementation may already support.
>     >
>     >         >
>     >         >
>     >         >
>     >         >> given that the flooding on the LAN in both OSPF and ISIS is
>     >         done as multicast, there is currently no way to enable
>     flooding,
>     >         either permanent or temporary, towards a subset of the
>     neighbors
>     >         on the LAN. So if the flooding is enabled on a LAN it is done
>     >         towards all routers connected to the it..
>     >         >
>     >         >
>     >         > Agreed.
>     >         >
>     >         >
>     >         >> Given that all links between routers are p2p these days, I
>     >         would vote for simplicity and make the LAN always part of
>     the FT.
>     >         >
>     >         >
>     >         > I’m not on board with this yet.  Our simulations suggest
>     that
>     >         this is not necessarily optimal.  There are lots of topologies
>     >         (e..g., parallel LANs) where this blanket approach is
>     suboptimal.
>     >
>     >         the question is how much are true LANs used as transit
>     links in
>     >         today's
>     >         networks.
>     >
>     >         thanks,
>     >         Peter
>     >
>     >         >
>     >         > Tony
>     >         >
>     >         > .
>     >         >
>     >
>     >         _______________________________________________
>     >         Lsr mailing list
>     >         Lsr@ietf.org <mailto:Lsr@ietf.org> <mailto:Lsr@ietf.org
>     <mailto:Lsr@ietf.org>>
>     >         https://www.ietf.org/mailman/listinfo/lsr
>     >
>     >     _______________________________________________
>     >     Lsr mailing list
>     >     Lsr@ietf.org <mailto:Lsr@ietf.org> <mailto:Lsr@ietf.org
>     <mailto:Lsr@ietf.org>>
>     >     https://www.ietf.org/mailman/listinfo/lsr
>     >
>