Re: [pim] Adoption call for draft-mcbride-mboned-lessons-learned-02

Toerless Eckert <tte@cs.fau.de> Tue, 14 March 2023 03:33 UTC

Date: Tue, 14 Mar 2023 04:33:35 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: Stig Venaas <stig@venaas.com>
Cc: pim@ietf.org, draft-mcbride-mboned-lessons-learned@ietf.org
Message-ID: <ZA/rD50/SPIPmOYr@faui48e.informatik.uni-erlangen.de>
References: <CAHANBtKAoPquU4Eq73PNmnq_U+mdfgcCLXVZxaLBkTQkcCWwsA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAHANBtKAoPquU4Eq73PNmnq_U+mdfgcCLXVZxaLBkTQkcCWwsA@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/pim/L-6CiLGaAm_oZ_AjvMMJ40sarKc>
Subject: Re: [pim] Adoption call for draft-mcbride-mboned-lessons-learned-02
Precedence: list

Dear co-authors of that draft:

I think this is a very laudable topic to write down. Thank you for starting the effort.

However, at the stage the text is in, i have trouble to vet whether to support
adoption or not. So, just to keep this discussion open, i fear i would have
to say, this document is primarily for process reasons not ready for adoption now:

 The main reason is that this topic to me is very important,
but am quite unclear how editing of the document as a working group draft would
work out: Judgements and Analsysis are a lot more subject to opinions and
experiences of individuals and much less easily handled by the ususal rough consensus
process of the IETF. Therefore it would be great to see if or how the authors
are willing to open up the editing of this document to the text input from
other members of the working group.

For example, it could be put onto github and be collaboratively edited there.
And if that doesn't end up in a frankenstein document after one IETF cycle
nobody likes then we know its a good go for WG adoption. But if the authors feel
that they would like to be in more control of the text and rather reject or strip
text from other WG participants (which is of course in their full rights), then it might be better
to go for individual submission track with this work.

(btw: i am in exactly the opposite seat with another draft of mine, so i am also
 not sure on that draft if WG adoption would make sense).

To give you an idea about what i am thinking about, please find below some text that
i would find very valuable to have included in principle in the document (or at least
the thoughts covered by the text in the level of detail. I have never quarrels for
the actual wording of anything i propose to be corrected by someone who actually
speaks, writes and understands actual english instead of just faking it like me and ChatGPT ;-).

I quickly wrote this down after just reading the "DVMRP" section, which  i felt was a lot of
deep "inside baseball", and while nothing in the section is wrong, i think it
completely lacks the background setting so that anybody except us who have lived through
the experience would know what we're talking about, but i also think its missing
to highlight real important insights that transcends the pure routing-protocol
perspective of IP Multicast. And luckily we do not only own a routing protocol
(like those poor unicast routing protocols folks ;-), but we do own a whole
network service - and i think that context needs to be given credit in such an experience report,

As you can see, i did not hold back on a lot of what my personal analsysis is,
and much of that i find a lot more important to understand than some of the
more obvious apsects already covered by the document. Which is easily where a
contention of scope is easily possible. But obviously i think if we do such a
document then the utliate goal is to work out as good as possible guidance that
other can use - both for proceeding with IP multicast, but even way beyond it - because
as i think some of my propose dtext lays out, i think when it comes to experience
from technology, IP Multicast is not even second to IPv6, but very much unique
in how it does expand all the way into the application space.

Cheers
    Toerless

----

# The MBone and DVMRP

DVMRP was at the end of the 1980th the first IP Multicast routing protocol that 
was deployed across the Internet because it had freely available and production ready
implementations for various flavors of Unix and directly supported establishing
adjacencies across tunnels so that it was easy to build an overlay network of
DVMRP routers. This overlay network became to be known as the MBone (Multicast Backbone).
DVMRP also performed all functions of IP Multicast routing exclusively by itself,
as did all other prior IP Multicast routing protocols (CBT, MOSPF,...), so deployment
was extremely easy. 

Lesson/Analysis:

The MBone was (arguably) the most important tool for the following success of IP Multicast
not because of DVMRP (quite the opposite), but because IP Multicast (RFC1112)
was a new network layer service that required applications to  support it. So the
first problem to solve was to overcome the chicken-and-egg problem of why
would anybody want to develop applications against a new network service when there
is no network of interest that supports the service. The aforementioned properties
of the DVMRP software allowed researchers that (worst case) had never in before
bothered with networking to collaboratively build a network amongst themselves to
enable this network service - and (best case) continue to develop, deploy and
experiment with the applications that became crucial for the success of IP multicast.
In return this application researcher interest in the service and the use of the
research network provided the insight how to further develop IP multicast routing
protocols and ultimately also the incentive for commercial network equipment
vendors to not only invest into the technology, but successfully find interest
for adoption of their commercial implementations by much more than the research community that
needed better than the open source / DVMRP based solution.

One can observe that except for similar research networks about a decade later with
the introduction of IPv6 (6Bone and others), that this fundamental lesson has not been well
learned or taken into account in the plan for any other proposed network service enhancement
proposals. In that respect, everybody who wants to introduce a new network service should
well analysie all the aspects one could and should copy from the MBone effort. Only some
of which is described here.

# Go Native

When the campus networks of research organizations that used IP Multicast with DVMRP/MBoned
expanded, they started to deploy commercial IP routers (not supporting IP Multicast
initially) and often ended up having hundreds of IP subnets connected through many
such routers. It became infeasible to put researcher operated unix workstations onto
each of these IP subnets to enable support for IP multicast (which is what the
origina MBone approach was). Instead, those organizations then started to ask for
IP Multicast support in the commercial routers they used. Many vendors simply offered
any of the campus scale IP multicast routing protocols that had been developed, such as MOSPF
or CBT, but of course also DVMRP. These deployments of IP Multicast at campus scale without tunnels
on the networks IP (unicast) routers itself was back then often called "Native IP Multicast".

These commercial IP multicast router implementations saw a much wider proliferation
than to those research organizations that participated in the MBone because the
development that had led to the MBone, had also led to the proliferation of the
IP Multicast host stack (as specified in {{RFC1112}}) across the industry of hosts,
mostly based around BSD Unix, ATT SysV, but quickly also Linux and many proprietary,
closed-source host operating systems used in commercial environments. It was
this availability of the IP Multicast host stack together with commercial IP multicast
router implementations that allowed for IP multicast in commercial applications to
first offer services beyond those of pure ethernet broadcast/multicast and many
different type of commercial applications started to implement often business
critical applications by depending on functioning IP multicast in the network.

Lesson/Analysis:

Commercially viable support for IP Multicast routing in networks was ultimately
depending on commercial application development, which in itself required seeding through
open source implementation in BSD Unix much more than the IETF standard {{RFC1112}}
specifying it. But the wide proliferation of IP Multicast applications in the 1990th
was also because of a wide degree of research funding into applications as well
as a lot of enthusiasm in commercial application developers and startups in utilizing
the new technology. Unfortunately, as will be described later in this memo, while
the enthusiasm of application developers would have endured, given how IP Multicast
was (and still is) very simple for applications to use, it was the ensueing problems
and complexities of actually scaling IP Multicast routing that made operators more
and more careful in relying on IP Multicast, resulting in todays policy of most often
only agree to the use of IP Multicast if there is no other possible choice but
not anymore that it helps applications to be simpler.

# Protocol Independent Multicast

The "Protocol Independent Multicast" (PIM) family of IP Multicast routing protocols
was the most fundamental architectural change to IP Multicast, happening in the
early 1990th (ref to arch draft here..). Its main intention was to solve the problem
of how IP multicast routing could not only co-exist with the variety of deployed
IP (unicast)routing protocols in networks, all the way from RIP, OSPF, ISIS over to
EGP and BGP, but actually leverage them instead of completely making IP multicast
routing run as a "ships-in-the-night" solution to IP unicast routing.

While ships-in-the-night was perfect and simple for MBone, it was (as PIM proponents
will argue) not the correct choice for native IP multicast deployments because
it either meant that network operators would have had to learn everything in routing
new when the IP multicast routing protocol did things differently from the deployed
IP (unicast)routing protocols, or the IP multicast routing protocol would have had
to duplicate all the features of all the possible IP unicast routing protocols - something
even less feasible. 

In result, the architecture of PIM was based on splitting the task of IP Multicast
routing into two big blocks: The actual IP Multicat Tree building performed by
one of the PIM routing protocols, and the so-called "RPF-selection" performed by
PIM against any pre-existing (unicast) routing protocol so as not to duplicate that functionality.
In result of this architecture of PIM, it should not really be called the 
"PIM routing protocol", but rather the "PIM tree building protocol".

In addition to this fundamental architectural aspect, PIM was also designed to
allow different tree building mechanisms while still using a common set of
PIM message headers. Today, there are three such tree building mechanisms in PIM:
dense-mode (which is the PIM equivalent of DVMRP), sparse-mode (for which later
a subset was called source-specific-mode), and bidir-mode.

Lessons/Analysis:

Back when the choice of PIM as the primary choice of IP multicast routing protocol
to be standardized in the IETF was still contentuous, many proponents of the competing
protocols such as MOSPF, CBT or DVMRP argued that their protocols had benefits
over PIM (and its proposed tree building options, back then dense-mode and sparse-mode),
and pointed to well working campus level deployments of their protocols. Ultimately,
all the protocols worked fine "enough" for campus level, and only the desire of customers
to at least have the option of supporting IP multicast at larger scale and
ideally interdomain was the technically winning argument. Likely equally important
was also that PIM was the choice promoted by vendors that then happened to become
dominant in the market. Extensiblity and wide applicability even beyond what is immediately
known to be required is an important safety network when making long-term technology choices.

In hindsight, the core architectural aspect of PIM doing RPF-selection to leverage
unicast routing protocols was a very good choice, and it did result in almost all
IP multicast deployments in the desired simplification of operations and easier
deployments in wide variety of network designs. In fact, it did establish for
many more IP multicast architecture design questions the paradigm (for better or worse)
of trying to do things as much as possible as IP (unicast) does it - unless there was a 
(perceived) good reason not to. 

# RPF-selection

For all the good that RPF-selection did (and does) bring with it, the combination between
PIM and the IP unicast routing protocols and routing tables that PIM draws it RPF information
from, ended up becoming a convoluted dependency, maybe similar only to how LDP was
also becoming intertwined a decade later with IP unicast routing protocols in MPLS.
Unfortunately, by the time MPLS/LDP where designed, the insights into this problem
with IP Multicast and PIM (where it was invented first) where not well-known enough,
and even today are not widely enough experienced to consider significant enhancements.

## Asymmetric Paths

One fundamental issue of RPF-selection in PIM is that it needs to know routes
towards sources of traffic, whereas IP unicast needs routes towards destinations.
In networks with all symmetric paths and no otherwise convoluted policies, there
is no difference: The path where you would send packets to if the address of interest
was was the destination is also the path from which should except to receive traffic
from if the address of interest was the source - and hence you will send PIM packets
towards that path for that source.

Unfortunately, this is not the case when you have asymmetric paths. This did first
hit PIM based solutions in the 1990th with unidirectional paths and a rnge of clunky
workarounds was built. there was never a real good solution to the unidirectional
path problem, but the clunky solutions seemed to get the job done for the deployments
that needed them. And none of this ever had to touch he IETF.

## IGP metric engineering

The likely most common case of asymmetric routing is when networks with a large
number of non-(physcically)-equal-cost paths do use IGP metric engineering as a
form of path engineering. In those deployments, the IGP metrics for both sides
of a (p2p) link will often have different metrics, and the reverse direction
of some now IGP-shortest path is definitely not the desired reverse shortest path,
but instead IP multicast traffic created by PIM easily puts load onto the more
loaded paths.

The solution for this problem offered today by the IETF consist
of several solutions that allow operators to configure another separate set
of link-metric that will only be used to calulate forward paths that are
then only used for IP multicast, but not IP unicast traffic (multi-topology, 
flex-topologies). Nevertheless, the operator still needs to understand that
it needs to reverse the metric assignments on the links if he wants to
have the IGP calculat paths that are useable for RPF-selection and not forward
traffic.

In link-state protocols such as ISIS or OSPF, it would of course not
be necessary at all to configure such a secondary set of forward metric
that are disguised RPF-selection metric, but the IGP could simply do
a reverse-SPF calculation and avoid the operator the trouble of any
additional calculation.

One of the reason why such an operator friendly solution has never happened
is likely that solving the problem within the IGP implementation is more
work for the IGP implementation than letting the operator have to do more
configuration work and being able to implement in the IGP not anything
specific to IP multicast, but only technologies that can equally be
used for IP unicast when different set of paths are required.

This of course type of problem analysis is one example of the generic
problem of re-using IP unicast technologies through IP multicast solutions
such as RPF-selection: Instead of solving the problem through code that can
be written specifically for IP multicast itself, and whose development costs
always has to be justified by the IP multicast business case, the solution
to the IP multicast requirement now lies in a piece of code whose development
business criteria are completely different, and where requirements for
IP multicast are but a smaller subset piece of a much larger set of competiing
requirements.

# Partial PIM deployments

Beyond those differences between forward and backward paths, the coupling
between IGP and PIM also caused, and still continues to cause confusion
in customers: They continue to believe what seems logical when you are
paying money for a solution you do nott even want to understand: They want
to enable PIM only where they tink they need it, and the network will take
care of the rest. And actually, this is how DVMRP or any of the other
"all-in-one" IP Multicast routing protocols work. 

In PIM on the other hand, configuring PIM does not (usually) impact
the unicast routing. Not even when there is a dedicated topology for
IP multicsat through mechanisms such as multi or flex topologies. At least
there is no IETF specification defining those interactions, so any
dployment of only IETF defined mechanisms are left with the most cumbersome
experience for operators. This is to a good extend caused by work in
the IETF primarily performed by vendors whose main desire is to
only get the interoperability impacting aspects of specifications right,
whereas they are quite happy to have any operator experience impacting
aspects to be competitive. And operators in the IETF who would want a good integration
of function are then often deterred by vendors when they ask for 
functionalities that reduce their OPEX when it has to compete with feature
requests that instead can directly be tied to new revenue.

# Unicast/PIM Synchronization

The non-existsing synchronization between PIM and the routing protocols it
relies on for RPF-selection is not only relevant for partial deployments.
Even more important is the dynamic behavior under failure and recovery
scenarios - very much like in IGP/LDP situations. When the IGP
converges faster than the PIM Hello Signalling, there is unnecessary
interruption of traffic. When a recovering PIM-DR starts to take
responsibility for serving IGMP/MLD joined IP multicast traffic, it
may for several minutes or more create an unnecessary blackhole, because
it also happens to be a router that gets its routes from BGP and that
ay take several minutes to re-cover all the necessary routes.

These type of problems are not really difficult to slve, but they have
astoundingly long not been standardized, resulting in a degree of
fragility in IP Multicast solutions in redundant networks that
makes it unnecessarily easy to be concerned about the complexity
of an IP Multicast deployment. If IP Multicast is deployed at all,
it typically is for mission critical purposes and then it needs
to be working automatically in all corner cases instead of requring
operators to have an advanced degree in IP multicast with a PhD
in nerd-knob-tuning.
takes se

Even in simple environments such as routers with large BGP routing tables
is it easy for a PIM router to become active and 

of multi or flex topologies

On Mon, Mar 13, 2023 at 12:05:49PM -0700, Stig Venaas wrote:
> Dear pim wg
> 
> This draft was presented at our last meeting. There seemed to be
> interest in this in the meeting, but we did not do a poll.
> 
> This starts an adoption call to see if we have enough support to adopt
> the draft.
> Please review and let us know by Friday 24th whether you support
> adoption or not.
> 
> Regards,
> Stig
> 

-- 
---
tte@cs.fau.de

[pim] Adoption call for draft-mcbride-mboned-less… Stig Venaas
Re: [pim] Adoption call for draft-mcbride-mboned-… zhang.zheng
Re: [pim] Adoption call for draft-mcbride-mboned-… Toerless Eckert
Re: [pim] Adoption call for draft-mcbride-mboned-… Hitoshi Asaeda
Re: [pim] Adoption call for draft-mcbride-mboned-… Gengxuesong (Geng Xuesong)
Re: [pim] Adoption call for draft-mcbride-mboned-… liuyisong
Re: [pim] Adoption call for draft-mcbride-mboned-… duanfanghong
Re: [pim] Adoption call for draft-mcbride-mboned-… Stig Venaas