Re: [pim] Adoption call for draft-mcbride-mboned-lessons-learned-02
Toerless Eckert <tte@cs.fau.de> Tue, 14 March 2023 03:33 UTC
Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: pim@ietfa.amsl.com
Delivered-To: pim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47570C151540; Mon, 13 Mar 2023 20:33:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2rAcp09kaoHZ; Mon, 13 Mar 2023 20:33:46 -0700 (PDT)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [IPv6:2001:638:a000:4134::ffff:40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 47875C14CE4B; Mon, 13 Mar 2023 20:33:43 -0700 (PDT)
Received: from faui48e.informatik.uni-erlangen.de (faui48e.informatik.uni-erlangen.de [IPv6:2001:638:a000:4134::ffff:51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTPS id 4PbJyl6Y4cznkbm; Tue, 14 Mar 2023 04:33:35 +0100 (CET)
Received: by faui48e.informatik.uni-erlangen.de (Postfix, from userid 10463) id 4PbJyl5tZNzkvJj; Tue, 14 Mar 2023 04:33:35 +0100 (CET)
Date: Tue, 14 Mar 2023 04:33:35 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: Stig Venaas <stig@venaas.com>
Cc: pim@ietf.org, draft-mcbride-mboned-lessons-learned@ietf.org
Message-ID: <ZA/rD50/SPIPmOYr@faui48e.informatik.uni-erlangen.de>
References: <CAHANBtKAoPquU4Eq73PNmnq_U+mdfgcCLXVZxaLBkTQkcCWwsA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAHANBtKAoPquU4Eq73PNmnq_U+mdfgcCLXVZxaLBkTQkcCWwsA@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/pim/L-6CiLGaAm_oZ_AjvMMJ40sarKc>
Subject: Re: [pim] Adoption call for draft-mcbride-mboned-lessons-learned-02
X-BeenThere: pim@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Protocol Independent Multicast <pim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pim>, <mailto:pim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/pim/>
List-Post: <mailto:pim@ietf.org>
List-Help: <mailto:pim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pim>, <mailto:pim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Mar 2023 03:33:48 -0000
Dear co-authors of that draft: I think this is a very laudable topic to write down. Thank you for starting the effort. However, at the stage the text is in, i have trouble to vet whether to support adoption or not. So, just to keep this discussion open, i fear i would have to say, this document is primarily for process reasons not ready for adoption now: The main reason is that this topic to me is very important, but am quite unclear how editing of the document as a working group draft would work out: Judgements and Analsysis are a lot more subject to opinions and experiences of individuals and much less easily handled by the ususal rough consensus process of the IETF. Therefore it would be great to see if or how the authors are willing to open up the editing of this document to the text input from other members of the working group. For example, it could be put onto github and be collaboratively edited there. And if that doesn't end up in a frankenstein document after one IETF cycle nobody likes then we know its a good go for WG adoption. But if the authors feel that they would like to be in more control of the text and rather reject or strip text from other WG participants (which is of course in their full rights), then it might be better to go for individual submission track with this work. (btw: i am in exactly the opposite seat with another draft of mine, so i am also not sure on that draft if WG adoption would make sense). To give you an idea about what i am thinking about, please find below some text that i would find very valuable to have included in principle in the document (or at least the thoughts covered by the text in the level of detail. I have never quarrels for the actual wording of anything i propose to be corrected by someone who actually speaks, writes and understands actual english instead of just faking it like me and ChatGPT ;-). I quickly wrote this down after just reading the "DVMRP" section, which i felt was a lot of deep "inside baseball", and while nothing in the section is wrong, i think it completely lacks the background setting so that anybody except us who have lived through the experience would know what we're talking about, but i also think its missing to highlight real important insights that transcends the pure routing-protocol perspective of IP Multicast. And luckily we do not only own a routing protocol (like those poor unicast routing protocols folks ;-), but we do own a whole network service - and i think that context needs to be given credit in such an experience report, As you can see, i did not hold back on a lot of what my personal analsysis is, and much of that i find a lot more important to understand than some of the more obvious apsects already covered by the document. Which is easily where a contention of scope is easily possible. But obviously i think if we do such a document then the utliate goal is to work out as good as possible guidance that other can use - both for proceeding with IP multicast, but even way beyond it - because as i think some of my propose dtext lays out, i think when it comes to experience from technology, IP Multicast is not even second to IPv6, but very much unique in how it does expand all the way into the application space. Cheers Toerless ---- # The MBone and DVMRP DVMRP was at the end of the 1980th the first IP Multicast routing protocol that was deployed across the Internet because it had freely available and production ready implementations for various flavors of Unix and directly supported establishing adjacencies across tunnels so that it was easy to build an overlay network of DVMRP routers. This overlay network became to be known as the MBone (Multicast Backbone). DVMRP also performed all functions of IP Multicast routing exclusively by itself, as did all other prior IP Multicast routing protocols (CBT, MOSPF,...), so deployment was extremely easy. Lesson/Analysis: The MBone was (arguably) the most important tool for the following success of IP Multicast not because of DVMRP (quite the opposite), but because IP Multicast (RFC1112) was a new network layer service that required applications to support it. So the first problem to solve was to overcome the chicken-and-egg problem of why would anybody want to develop applications against a new network service when there is no network of interest that supports the service. The aforementioned properties of the DVMRP software allowed researchers that (worst case) had never in before bothered with networking to collaboratively build a network amongst themselves to enable this network service - and (best case) continue to develop, deploy and experiment with the applications that became crucial for the success of IP multicast. In return this application researcher interest in the service and the use of the research network provided the insight how to further develop IP multicast routing protocols and ultimately also the incentive for commercial network equipment vendors to not only invest into the technology, but successfully find interest for adoption of their commercial implementations by much more than the research community that needed better than the open source / DVMRP based solution. One can observe that except for similar research networks about a decade later with the introduction of IPv6 (6Bone and others), that this fundamental lesson has not been well learned or taken into account in the plan for any other proposed network service enhancement proposals. In that respect, everybody who wants to introduce a new network service should well analysie all the aspects one could and should copy from the MBone effort. Only some of which is described here. # Go Native When the campus networks of research organizations that used IP Multicast with DVMRP/MBoned expanded, they started to deploy commercial IP routers (not supporting IP Multicast initially) and often ended up having hundreds of IP subnets connected through many such routers. It became infeasible to put researcher operated unix workstations onto each of these IP subnets to enable support for IP multicast (which is what the origina MBone approach was). Instead, those organizations then started to ask for IP Multicast support in the commercial routers they used. Many vendors simply offered any of the campus scale IP multicast routing protocols that had been developed, such as MOSPF or CBT, but of course also DVMRP. These deployments of IP Multicast at campus scale without tunnels on the networks IP (unicast) routers itself was back then often called "Native IP Multicast". These commercial IP multicast router implementations saw a much wider proliferation than to those research organizations that participated in the MBone because the development that had led to the MBone, had also led to the proliferation of the IP Multicast host stack (as specified in {{RFC1112}}) across the industry of hosts, mostly based around BSD Unix, ATT SysV, but quickly also Linux and many proprietary, closed-source host operating systems used in commercial environments. It was this availability of the IP Multicast host stack together with commercial IP multicast router implementations that allowed for IP multicast in commercial applications to first offer services beyond those of pure ethernet broadcast/multicast and many different type of commercial applications started to implement often business critical applications by depending on functioning IP multicast in the network. Lesson/Analysis: Commercially viable support for IP Multicast routing in networks was ultimately depending on commercial application development, which in itself required seeding through open source implementation in BSD Unix much more than the IETF standard {{RFC1112}} specifying it. But the wide proliferation of IP Multicast applications in the 1990th was also because of a wide degree of research funding into applications as well as a lot of enthusiasm in commercial application developers and startups in utilizing the new technology. Unfortunately, as will be described later in this memo, while the enthusiasm of application developers would have endured, given how IP Multicast was (and still is) very simple for applications to use, it was the ensueing problems and complexities of actually scaling IP Multicast routing that made operators more and more careful in relying on IP Multicast, resulting in todays policy of most often only agree to the use of IP Multicast if there is no other possible choice but not anymore that it helps applications to be simpler. # Protocol Independent Multicast The "Protocol Independent Multicast" (PIM) family of IP Multicast routing protocols was the most fundamental architectural change to IP Multicast, happening in the early 1990th (ref to arch draft here..). Its main intention was to solve the problem of how IP multicast routing could not only co-exist with the variety of deployed IP (unicast)routing protocols in networks, all the way from RIP, OSPF, ISIS over to EGP and BGP, but actually leverage them instead of completely making IP multicast routing run as a "ships-in-the-night" solution to IP unicast routing. While ships-in-the-night was perfect and simple for MBone, it was (as PIM proponents will argue) not the correct choice for native IP multicast deployments because it either meant that network operators would have had to learn everything in routing new when the IP multicast routing protocol did things differently from the deployed IP (unicast)routing protocols, or the IP multicast routing protocol would have had to duplicate all the features of all the possible IP unicast routing protocols - something even less feasible. In result, the architecture of PIM was based on splitting the task of IP Multicast routing into two big blocks: The actual IP Multicat Tree building performed by one of the PIM routing protocols, and the so-called "RPF-selection" performed by PIM against any pre-existing (unicast) routing protocol so as not to duplicate that functionality. In result of this architecture of PIM, it should not really be called the "PIM routing protocol", but rather the "PIM tree building protocol". In addition to this fundamental architectural aspect, PIM was also designed to allow different tree building mechanisms while still using a common set of PIM message headers. Today, there are three such tree building mechanisms in PIM: dense-mode (which is the PIM equivalent of DVMRP), sparse-mode (for which later a subset was called source-specific-mode), and bidir-mode. Lessons/Analysis: Back when the choice of PIM as the primary choice of IP multicast routing protocol to be standardized in the IETF was still contentuous, many proponents of the competing protocols such as MOSPF, CBT or DVMRP argued that their protocols had benefits over PIM (and its proposed tree building options, back then dense-mode and sparse-mode), and pointed to well working campus level deployments of their protocols. Ultimately, all the protocols worked fine "enough" for campus level, and only the desire of customers to at least have the option of supporting IP multicast at larger scale and ideally interdomain was the technically winning argument. Likely equally important was also that PIM was the choice promoted by vendors that then happened to become dominant in the market. Extensiblity and wide applicability even beyond what is immediately known to be required is an important safety network when making long-term technology choices. In hindsight, the core architectural aspect of PIM doing RPF-selection to leverage unicast routing protocols was a very good choice, and it did result in almost all IP multicast deployments in the desired simplification of operations and easier deployments in wide variety of network designs. In fact, it did establish for many more IP multicast architecture design questions the paradigm (for better or worse) of trying to do things as much as possible as IP (unicast) does it - unless there was a (perceived) good reason not to. # RPF-selection For all the good that RPF-selection did (and does) bring with it, the combination between PIM and the IP unicast routing protocols and routing tables that PIM draws it RPF information from, ended up becoming a convoluted dependency, maybe similar only to how LDP was also becoming intertwined a decade later with IP unicast routing protocols in MPLS. Unfortunately, by the time MPLS/LDP where designed, the insights into this problem with IP Multicast and PIM (where it was invented first) where not well-known enough, and even today are not widely enough experienced to consider significant enhancements. ## Asymmetric Paths One fundamental issue of RPF-selection in PIM is that it needs to know routes towards sources of traffic, whereas IP unicast needs routes towards destinations. In networks with all symmetric paths and no otherwise convoluted policies, there is no difference: The path where you would send packets to if the address of interest was was the destination is also the path from which should except to receive traffic from if the address of interest was the source - and hence you will send PIM packets towards that path for that source. Unfortunately, this is not the case when you have asymmetric paths. This did first hit PIM based solutions in the 1990th with unidirectional paths and a rnge of clunky workarounds was built. there was never a real good solution to the unidirectional path problem, but the clunky solutions seemed to get the job done for the deployments that needed them. And none of this ever had to touch he IETF. ## IGP metric engineering The likely most common case of asymmetric routing is when networks with a large number of non-(physcically)-equal-cost paths do use IGP metric engineering as a form of path engineering. In those deployments, the IGP metrics for both sides of a (p2p) link will often have different metrics, and the reverse direction of some now IGP-shortest path is definitely not the desired reverse shortest path, but instead IP multicast traffic created by PIM easily puts load onto the more loaded paths. The solution for this problem offered today by the IETF consist of several solutions that allow operators to configure another separate set of link-metric that will only be used to calulate forward paths that are then only used for IP multicast, but not IP unicast traffic (multi-topology, flex-topologies). Nevertheless, the operator still needs to understand that it needs to reverse the metric assignments on the links if he wants to have the IGP calculat paths that are useable for RPF-selection and not forward traffic. In link-state protocols such as ISIS or OSPF, it would of course not be necessary at all to configure such a secondary set of forward metric that are disguised RPF-selection metric, but the IGP could simply do a reverse-SPF calculation and avoid the operator the trouble of any additional calculation. One of the reason why such an operator friendly solution has never happened is likely that solving the problem within the IGP implementation is more work for the IGP implementation than letting the operator have to do more configuration work and being able to implement in the IGP not anything specific to IP multicast, but only technologies that can equally be used for IP unicast when different set of paths are required. This of course type of problem analysis is one example of the generic problem of re-using IP unicast technologies through IP multicast solutions such as RPF-selection: Instead of solving the problem through code that can be written specifically for IP multicast itself, and whose development costs always has to be justified by the IP multicast business case, the solution to the IP multicast requirement now lies in a piece of code whose development business criteria are completely different, and where requirements for IP multicast are but a smaller subset piece of a much larger set of competiing requirements. # Partial PIM deployments Beyond those differences between forward and backward paths, the coupling between IGP and PIM also caused, and still continues to cause confusion in customers: They continue to believe what seems logical when you are paying money for a solution you do nott even want to understand: They want to enable PIM only where they tink they need it, and the network will take care of the rest. And actually, this is how DVMRP or any of the other "all-in-one" IP Multicast routing protocols work. In PIM on the other hand, configuring PIM does not (usually) impact the unicast routing. Not even when there is a dedicated topology for IP multicsat through mechanisms such as multi or flex topologies. At least there is no IETF specification defining those interactions, so any dployment of only IETF defined mechanisms are left with the most cumbersome experience for operators. This is to a good extend caused by work in the IETF primarily performed by vendors whose main desire is to only get the interoperability impacting aspects of specifications right, whereas they are quite happy to have any operator experience impacting aspects to be competitive. And operators in the IETF who would want a good integration of function are then often deterred by vendors when they ask for functionalities that reduce their OPEX when it has to compete with feature requests that instead can directly be tied to new revenue. # Unicast/PIM Synchronization The non-existsing synchronization between PIM and the routing protocols it relies on for RPF-selection is not only relevant for partial deployments. Even more important is the dynamic behavior under failure and recovery scenarios - very much like in IGP/LDP situations. When the IGP converges faster than the PIM Hello Signalling, there is unnecessary interruption of traffic. When a recovering PIM-DR starts to take responsibility for serving IGMP/MLD joined IP multicast traffic, it may for several minutes or more create an unnecessary blackhole, because it also happens to be a router that gets its routes from BGP and that ay take several minutes to re-cover all the necessary routes. These type of problems are not really difficult to slve, but they have astoundingly long not been standardized, resulting in a degree of fragility in IP Multicast solutions in redundant networks that makes it unnecessarily easy to be concerned about the complexity of an IP Multicast deployment. If IP Multicast is deployed at all, it typically is for mission critical purposes and then it needs to be working automatically in all corner cases instead of requring operators to have an advanced degree in IP multicast with a PhD in nerd-knob-tuning. takes se Even in simple environments such as routers with large BGP routing tables is it easy for a PIM router to become active and of multi or flex topologies On Mon, Mar 13, 2023 at 12:05:49PM -0700, Stig Venaas wrote: > Dear pim wg > > This draft was presented at our last meeting. There seemed to be > interest in this in the meeting, but we did not do a poll. > > This starts an adoption call to see if we have enough support to adopt > the draft. > Please review and let us know by Friday 24th whether you support > adoption or not. > > Regards, > Stig > -- --- tte@cs.fau.de
- [pim] Adoption call for draft-mcbride-mboned-less… Stig Venaas
- Re: [pim] Adoption call for draft-mcbride-mboned-… zhang.zheng
- Re: [pim] Adoption call for draft-mcbride-mboned-… Toerless Eckert
- Re: [pim] Adoption call for draft-mcbride-mboned-… Hitoshi Asaeda
- Re: [pim] Adoption call for draft-mcbride-mboned-… Gengxuesong (Geng Xuesong)
- Re: [pim] Adoption call for draft-mcbride-mboned-… liuyisong
- Re: [pim] Adoption call for draft-mcbride-mboned-… duanfanghong
- Re: [pim] Adoption call for draft-mcbride-mboned-… Stig Venaas