Re: [Lsr] LSR Flooding Reduction Drafts - Moving Forward

tony.li@tony.li Fri, 24 August 2018 16:29 UTC

Return-Path: <tony1athome@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C42EB130E19 for <lsr@ietfa.amsl.com>; Fri, 24 Aug 2018 09:29:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BffyjXmehmbE for <lsr@ietfa.amsl.com>; Fri, 24 Aug 2018 09:29:04 -0700 (PDT)
Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A3157130E10 for <lsr@ietf.org>; Fri, 24 Aug 2018 09:29:04 -0700 (PDT)
Received: by mail-pf1-x429.google.com with SMTP id k21-v6so4765195pff.11 for <lsr@ietf.org>; Fri, 24 Aug 2018 09:29:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=D0IBLKYtaq6N5MRiopfSl6GCemlH8C0364X9eqEsMF0=; b=EjTUlapGngFA/yi+YMoWAiZYfsolhE82DG4JpCq/01e4nQ8dhN6i7sP660PTMiTw/r 2YJAmu9RUD0lSbEDH4bgoRJd+u4Dnkdvpi8CFkytOVX9jirZphNT3Pz9xIphIt/nCN+M ZyB+KPVNMbkiU1WgjGLImLuSoXMzC6hl4B68H8LVZNZr93jJ2CDRW1JA/TjSTH6HQ/qs hmKlysek7RUBVXzW2g8A9BLo4lmLQlQtDN/Hs9BsoiQOS9OTeXFZ01i5FVa4yv4KbfS4 F2OiFc/UKfOJzOOU1e2/Xokzn/RS1mp3xZgz+ppazLHqWzcoT81ewG9FnGyBIuK5dZFG ulig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:mime-version:subject:from:in-reply-to :date:cc:content-transfer-encoding:message-id:references:to; bh=D0IBLKYtaq6N5MRiopfSl6GCemlH8C0364X9eqEsMF0=; b=ZCWBr0E0vaTLz8YqNCOp1G6KAa3XxxskHOaNqXACs7FKdUD7nB6T5uS7xW3eQUl/BC KZgGLx9bVYyMTfXbVQPphCUsbdntGvUR1o1iJGsEpeGT23jYrIQMIfG3YOCUy4BixcSh C4IZKkygV97tlTpH1/DqbQ7uszCFY3G6W6xslB8Fr5UG+iJ4leiS4hTedoDbuBJdYvjn +ks7GM2sfTgVUgGWBF+OfQRKFnEMjomTHD5G010sXpNs2dxP5zE+Bj5ZKU7wxgMsW3/G bP6DVpult8D3VAVvIBgOvjnHCjS0o/GLNHQCSc158rdGO1S9RwyMjtwsNuTV2mrVaWLj TwrQ==
X-Gm-Message-State: APzg51DieTn5V9rh3nA2QeRUcPYd98lFXihbzSBQTWCfJ7D+spN9xyUS T2YJ1Y1d1diso+TWZk5yBzg=
X-Google-Smtp-Source: ANB0VdZWPYle8pNWoojOS18CTI8JEODklNBu89MZUMZNdgXUpli38tb62UyqUuHJRlnCqHk7tc0hhg==
X-Received: by 2002:a63:3245:: with SMTP id y66-v6mr2397317pgy.101.1535128144049; Fri, 24 Aug 2018 09:29:04 -0700 (PDT)
Received: from [172.22.228.216] ([162.210.130.3]) by smtp.gmail.com with ESMTPSA id 17-v6sm10057498pgl.1.2018.08.24.09.29.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 24 Aug 2018 09:29:03 -0700 (PDT)
Sender: Tony Li <tony1athome@gmail.com>
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: tony.li@tony.li
In-Reply-To: <CA+wi2hNv8AVyR81LRmJ=Pd5_p5rS2djCOjY9YDgKxG=KEO_MkA@mail.gmail.com>
Date: Fri, 24 Aug 2018 09:29:02 -0700
Cc: "Acee Lindem (acee)" <acee=40cisco.com@dmarc.ietf.org>, Peter Psenak <ppsenak@cisco.com>, lsr@ietf.org, Jeff Tantsura <jefftant.ietf@gmail.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <39509D13-4D2D-49A9-8738-C9D1F7C54223@tony.li>
References: <8F5D2891-2DD1-4E51-9617-C30FF716E9FB@cisco.com> <C64E476F-1C00-435E-9C74-BEC3053377E8@gmail.com> <2F5FDB3F-ADCA-4DB4-83DA-D2BC3129D2F2@gmail.com> <5B7E78DD.90302@cisco.com> <172728E8-49E6-4F43-9356-815E1F4C22E7@gmail.com> <5B7FCAB3.6040600@cisco.com> <3D1DEC37-ACE7-4412-BB2E-4C441A4E7455@tony.li> <CCF220A3-8308-47B8-8CC6-1989705FF05C@cisco.com> <CA+wi2hNv8AVyR81LRmJ=Pd5_p5rS2djCOjY9YDgKxG=KEO_MkA@mail.gmail.com>
To: Tony Przygienda <tonysietf@gmail.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/wIyVk8-Fh9qklDl9Cy7XpbrJLKU>
Subject: Re: [Lsr] LSR Flooding Reduction Drafts - Moving Forward
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Aug 2018 16:29:07 -0000

> a) we are talking any kind of topology for the solution, i.e. generic graph? 


Well, the problem with a topology restriction is that mistakes happen.  If we have a solution for a pure bipartite graph (i.e., a leaf-spine topology) and someone mistakenly inserts a leaf to leaf link, what happens?  Having the entire DC implode would be a Bad Thing, IMHO.


> and then suggestion for IME realistic, operational MUST requirements 
> 
> b) req a): the solution should support distributed and centralized algorithm to compute/signal reduced mesh(es). I personally think distributed is the more practical choice for something like this but it's my 2c from having lived the telephony controller fashion, the distributed fashion and the controller fashion now again ;-)


Well, I did think long and hard about this.

Being distributed would be very nice.  However, that implies that all nodes are going to get to the exact same solution. Which implies that they all must execute the same algorithm, presumably with the same inputs.

That’s all well and good, but we don’t have an algorithm to really put on the table yet.  We need experience with one.  We know we want to tweak things based on biconnectivity, performance, and degree because doing it right day one seems unlikely.  Changing algorithms is going to be VERY painful if it’s distributed.  

However, if it’s centralized, it’s completely trivial.

So, my strong preference is to start centralized.  Iterate on the algorithm until we have it where we want it.  And then take it distributed if there’s a point to it.  However, at that point, we have something working.  So why fix it?


> c) req b): the solution should include redundancy (i.e. @ least 2 maximally disjoint vertex covers/lifts) to deal with single link failure (unless the link is unavoidably a minimal cut on the graph) 


Not everyone agrees with this, but some do.  This seems like one possible input to the algorithm.


> d) req c): the solution should guarantee disruption free flooding in case of 
>   i) single link failure
>  ii) single node failure
>  iii) change in one of the vertex lifts 


Sorry, I don’t understand point iii).


> e) the solution should not lead to "hot-spot" or "minimal-cut" links which will disrupt flooding between two partitions on failure or lead to flood throughput bottlenecks 


Agreed.

> I am agnostic to Tony L's thinking about diameter and so on. It makes sense but is not necessarily easy to pull into the solution. 


It all boils down to the point that Peter just made about performance.  A topology with a high diameter is going to require many flooding hops and hurt performance.  To be avoided...


> moreover, I observe that IME ISIS is much more robust under such optimizations since the CSNPs catch (@ a somehow ugly delay cost) any corner cases whereas OSPF after IDBE will happily stay out of sync forever if flooding skips something (that may actually become a reason to introduce periodic stuff on OSPF to do CSNP equivalent albeit it won't be trivial in backwards compatible way on my first thought, I was thinking a bit about cut/snapshot consistency check [in practical terms OSPF hellos could carry a checksum on the DB headers] but we never have that luxury on a link-state routing protocol [i.e. we always operate under unbounded epsilon consistency ;-) and in case of BGP stable oscialltions BTW not even that ;^} ]). 

Emacs

Tony