Re: [Lsr] Thoughts on the area proxy and flood reflector drafts.

Christian Hopps <chopps@chopps.org> Wed, 10 June 2020 11:27 UTC

Return-Path: <chopps@chopps.org>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0F6563A095E for <lsr@ietfa.amsl.com>; Wed, 10 Jun 2020 04:27:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ISb_dTbOQ7T8 for <lsr@ietfa.amsl.com>; Wed, 10 Jun 2020 04:27:49 -0700 (PDT)
Received: from smtp.chopps.org (smtp.chopps.org [54.88.81.56]) by ietfa.amsl.com (Postfix) with ESMTP id F3A0C3A0954 for <lsr@ietf.org>; Wed, 10 Jun 2020 04:27:48 -0700 (PDT)
Received: from stubbs.int.chopps.org (047-050-069-038.biz.spectrum.com [47.50.69.38]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by smtp.chopps.org (Postfix) with ESMTPSA id 60D4460D9D; Wed, 10 Jun 2020 11:27:48 +0000 (UTC)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Christian Hopps <chopps@chopps.org>
In-Reply-To: <CA+wi2hMGcfqgPBoWLbqhS5vrF_Jy1RtAM7iMan4uYUjEc9X_2Q@mail.gmail.com>
Date: Wed, 10 Jun 2020 07:27:47 -0400
Cc: Christian Hopps <chopps@chopps.org>, lsr@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <48779A7B-FC92-495E-A2D6-98700E9FB337@chopps.org>
References: <790B898F-DB03-499E-BAAE-369504539475@chopps.org> <22086D70-6A19-4EA3-B15B-405FD5271262@chopps.org> <CA+wi2hMGcfqgPBoWLbqhS5vrF_Jy1RtAM7iMan4uYUjEc9X_2Q@mail.gmail.com>
To: Tony Przygienda <tonysietf@gmail.com>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/eU4yjcoiySR6vo0dG7j81M6n6oI>
Subject: Re: [Lsr] Thoughts on the area proxy and flood reflector drafts.
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jun 2020 11:27:51 -0000


> On Jun 9, 2020, at 10:01 PM, Tony Przygienda <tonysietf@gmail.com> wrote:
> 
> Chris (addressing in WG member context you declared), I reply tersely since we will put more work into the draft once it's adopted (for which I think you saw a good amount of support in two threads already). 
> 
> I deferred from your email since the chain-terrasteam topology you're showing is simply not what we are dealing in any operational, successful networks today AFAIK frankly and I saw lots of "assume complexity" and "dislikes" in your email which I didn't read as technical arguments but mental attitudes. Likes or dislikes and assumptions are fine but we should probably focus on existing network/customer technical + operational arguments & requirements when building solutions and now what you or we like first. 

Both Area Proxy and Flood Reflector are proposing to use L1 areas as transit to connect L2, isn't that chaining? It seemed like a decent way to help visualize the proposals along with some numbers, perhaps you have something better...

The Area Proxy draft is making everything L2 and using the L1 areas to redefine the advertised topology information to allow it to scale. Because everything is ultimately L2 nothing changes in the data plane to provide this transit.

The flood reflector draft is keeping the L1-only abstraction so it has to provide for transit some other way.

> So trying to extract the technical point you seem to be making inbetween all that
> 
> a) I see how you can try to have a mental model of "virtual links". What we suggest are not virtual links (I implemented VL @ least once but it's so long ago I forgot pretty much all the details so had to look stuff up :-) Virtual links in OSPF were "magic bits on LSAs" that kind of computed "SPF reachability through the area to change SPF" edge-to-edge and the asynchronicity of all that flooding-being-tunnels-being-SPF was playing havoc on us @ this time of 300MHz CPUs + frankly, the complexity of that was not needed @ this time just as partition healing was never implemented in ISIS.. That's why it never went anywhere much, my take, others may correct. Saying "virtual links are bad" and "this is virtual links to me so it's bad" is simply a "strawman fallacy" to me frankly. This draft suggests (but as I wrote Bruno as answer to his fairly deep email) to run proper flooding over proper tunnels (we run routing over tunnels all the time in all kind of scenarios be it BGP proper or SD-WAN or overlays obviously) but if you choose FRs to be one hop away you can get away without any "L2 tunneled adjacencies", that's deployment choice. 

If you are not using tunnels, but are still trying to provide transit through the L1 area for L2, that is exactly what OSPF virtual links are doing. Part of making those work is the advertisement of reachability into the area from another, changes to router advertisements (to indicate if the area is transit capable), and changes to the SPF calculation to modify route choices based on whether they are based on virtual links or not.

The complexity of OSPF virtual links is there to make them work, right? I'm not trying to make any argument, strawman or otherwise; I'm just trying to understand what is being proposed as a replacement for the full-mesh of tunnels.

It's nice if it reduces to OSPF virtual links b/c we then have an example of how to actually implement it, and years of experience to understand it. If that highlights that getting the non-tunneled choice right isn't easy, well I guess that's important, right?

> b) Generally we may seem a bit muddled between different types of "tunnels" and "tunnels are bad" and "lots of tunnels". The draft talks about 2 types of tunnels and it seemed to be written clearly enough to distinguish that easily based on feedback I got so far
> 
> i) control plane tunnels are proper L2 entities (again, if your FR is one hop away from leafs then you don't need any tunneling but can run normal L2 adjacencies which I hope are not too scary; whole thing is really equivalent to BGP RR, do you put it in path, do you want to run multi-hop and how confident are you in your lower level infra not dropping TCP  "tunnels" under you; every day's business since years really). So, no, there is no magic and hidden complexity and whatever not, you may or may not use auto-discovery the draft provides (you can just build something completely statically configured and in fact several customers told me they'd prefer it that way just as they don't auto-discover RRs normally) to build bunch of tunnels towards your 2-3 FRs from your edges and you're in business after L2 adjacency comes up. Looks like any old ISIS + some optional TLVs you can ignore that indicate for some smart future folks to know it's a FR adjacnecy and not "real L2 adjaqcency". No fork-lifting of whole cluster, no fork-lifting routers outside a cluster, no single point of failure under some fancy new name, barely any protocol changes (in fact I didn't see any proposal to run anything more minimal than that except maybe very simple version of TTZ which however has too many L2 adjacencies exposed @ any reasonable cluster size to really solve the problem of amount of L2 information sloshed around)

I think I understand the flood reflector bit; It's replacing the network of (now data-plane only) tunnels in flooding graph and the advertised topology.

> ii) data plane tunnels. The draft basically explains that for the solution to work in a simple fashion,full-mesh of data forwarding tunnels can be established (which are NOT visible in L2) as shortcuts that allow to utilize all paths through L1 and that will work fine since it doesn't spill into L2. You want to run L1 adjacencies over those tunnels if you care whether they are up but you could do just BFD e.g. and use them as forwarding next-hops in the computation without them being visible in L1 ISIS. The other option is to not use such a data plane mesh and use reachability instead and we can explain that further in detail after adoption and we get more people talking through that etc. or you can look @ my preso @ last IETF where I kind of quickly ran through that (and it seemed relatively obvious to me how it works). In summary, we chose to do real work rather than polilsh optional points in individual drafts because, frankly, customers are not interested all that much often whether IETF WG feels like working on it while they have a pressing problem and need solutions in a timely manner. And AFAIR the chairs guided the group multiple times towards "ignore the problem, of no importance" and now with a certain urgency want to have everything @ the same time.  

So this is creating a full-mesh overlay network of tunnels between the edge routers on the L1 area. I don't think you would want to advertise that overlay network back into the underlay network (L1 area) so you can't just form up L1 adjacencies (unmodified) to determine if they are up or not, some other mechanism would have to be used. OSPF virtual links use the intra-area connectivity to "v-bit" routers to determine if the area is transit capable -- or perhaps BFD as you suggest.

I guess the question is can there be topological control-plane connectivity to the flood reflector, but not data-plane overlay network tunnel connectivity?

The draft is also saying you suppress advertising the overlay network in L2 as well b/c it is represented instead by the advertised topology created by using the flood reflector. So the overlay is network represented, but unadvertised itself. This suppression of advertising the overlay tunnel network seems similar to how the L12 topology inside an area proxy is also suppressed, except ... tunnels. :)

Thanks,
Chris.
[as WG member]

> So, I look forward to this draft being adopted as WG item given the support seen and urgency of having this in the field and obviously, being running, working code since a bit ... 
> 
> -- tony 
> 
> 
> On Tue, Jun 9, 2020 at 5:46 PM Christian Hopps <chopps@chopps.org> wrote:
> Hi,
> 
> Given that the flood reflector authors have asked the chairs to do a call for adoption, might someone from that group be able to talk to a couple of the points/questions from my mail? I think it would help me (and maybe others) in making informed responses to any adoption call.
> 
> - Is this proposing to add "virtual links" to IS-IS (in addition to the flood reflector part)?
> 
> - Is there a general-purpose non-complex non-tunneling solution possible?
> 
> Thanks,
> Chris.
> [as WG member]
> 
> 
> > On Jun 6, 2020, at 7:18 AM, Christian Hopps <chopps@chopps.org> wrote:
> > 
> > [ all the following is as a WG member ]
> > 
> > I've been thinking a lot about the proposed flooding reduction drafts currently vying for adoption by the WG. I've been doing this thinking in the context of wearing an operator hat (i.e., trying to leverage my prior experience working for DT on Terastream). In the abstract the Terastream architecture presented a good way to visualize and compare the benefits of using one of these solutions w/o getting too complex. A simplified model of this design can be seen as a horseshoe of horseshoes. the Major horseshoe is L2 and the minor horseshoes are L1. Each L1 area has 2 L2 routers for redundancy (I'll consider more though), and all L2 routers are full-mesh connected to support that redundancy.
> > 
> > Telco
> > 
> > <PastedGraphic-4.png>
> > 
> > But let's map this to a more DC centric view (I guess?) where each L1 area now has 10 L2 routers instead of 2 (i.e., 10%, but that could be change that later if need be).
> > 
> > Natural Design
> > 
> > 
> > <PastedGraphic-5.png>
> > 
> > Now for whatever reason some operators do not want to provision high-bandwidth transit links between their L2 routers in their L1 areas. This is critically important b/c otherwise you would simply use the above Natural Design. I'd like to better understand why that isn't just the answer here.
> > 
> > Anyway, forging ahead, here's what we get with unchanged IS-IS to support "use everything for transit"
> > 
> > All In
> > 
> > <PastedGraphic-6.png>
> > 
> > So the 800 L2 LSPs, and the impact on flooding dynamics, are what these drafts are trying to reduce or avoid.
> > 
> > Area Proxy
> > 
> > First I'll look at area proxy. This seems a fairly simple idea, basically it's taking the now L1L2 areas and advertising them externally as a single LSP so the impact is very similar to if they were L1 only. This maps fairly closely to the Telco and Natural Design from above. Each L1 router in the Telco design would have 100 LSPs The L12 routers would have 100 L1 + 16 L2 LSP. In the Natural Design each L1 router has 100 L1 and each L12 router would have 100 L1 and 80 L2. With Area Proxy each router  has 100 L1 and 100 "Inner L2 LSPs" and 80 "Outer LSPs" + 8 "Outer L2 LSPs"
> > 
> > The key thing to note here is that if you double the number of areas you only add to the Outside LSP and Proxy count just as it would scale in the Natural Design, so going from 8 to 16 areas here adds 80 more "Outside LSPs" and 8 more L2 Proxy LSPs for a total of 276 L2 LSPs even though you've added 800 more routers to your network.
> > 
> > 
> > <PastedGraphic-10.png>
> > 
> > Flood Reflectors
> > 
> > I'm less sure I understand this draft so please forgive me if I get this wrong. After reading this draft a few times I believe it is basically providing "L2 virtual links" over an L1 area between the area's L2 edge routers and then using a "flood reflector" to reduce the number of required "virtual links" by creating a hub-and-spoke topology of them.
> > 
> > The draft does a bit of hand-waving at "judicious scoping of reachability" being used in place of tunneling. I could guess at what this might mean, but I shouldn't have to; the draft should spell it out so it can be judged as a viable option or not. So, the only choice I see presented is to use a full-mesh of L1 tunnels between the L12 edge routers.
> > 
> > Anyway, here's a picture
> > 
> > <PastedGraphic-9.png>
> > 
> > If, in fact, the draft is talking about adding virtual links to IS-IS I think it should say that. There's a lot of experience in OSPF with virtual links and some of the trouble they have caused. There's also important details in making them work right in OSPF that doesn't seem to be in the current draft, so it's not clear what actual level of complexity is going to be required to make this all work, and importantly, if that would then be palatable.
> > 
> > I also do not like the idea of all of these automatic "fowarding-tunnels". They would be disjoint from the advertised "flood reflector" tunnel topology (right?) -- it seems like a management/debugability nightmare to me, and I'm curious which operators are keen to have a bunch of unadvertised tunnels added to their IGP network. :) If the draft really can work in general w/o the tunnels I would appreciate seeing those mechanics described rather than just hinted at.
> > 
> > The ideas are interesting for sure, but the draft doesn't seem fully fleshed out, and I'm worried it'll be overly complex when it is.
> > 
> > Thanks,
> > Chris.
> > [as WG member]
> > 
> > _______________________________________________
> > Lsr mailing list
> > Lsr@ietf.org
> > https://www.ietf.org/mailman/listinfo/lsr
> 
> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr