Re: [pim] Q on the congestion awareness of routing protocols

Toerless Eckert <tte@cs.fau.de> Fri, 02 December 2022 22:36 UTC

Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: tsv-area@ietfa.amsl.com
Delivered-To: tsv-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 449EDC14CF14; Fri, 2 Dec 2022 14:36:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.646
X-Spam-Level:
X-Spam-Status: No, score=-1.646 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id igvzL1edFxO2; Fri, 2 Dec 2022 14:36:05 -0800 (PST)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [IPv6:2001:638:a000:4134::ffff:40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6A868C14F748; Fri, 2 Dec 2022 14:36:03 -0800 (PST)
Received: from faui48e.informatik.uni-erlangen.de (faui48e.informatik.uni-erlangen.de [131.188.34.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTPS id E57BD5486A5; Fri, 2 Dec 2022 23:35:58 +0100 (CET)
Received: by faui48e.informatik.uni-erlangen.de (Postfix, from userid 10463) id CF0E94EC1C1; Fri, 2 Dec 2022 23:35:58 +0100 (CET)
Date: Fri, 02 Dec 2022 23:35:58 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: Greg Shepherd <gjshep@gmail.com>
Cc: routing-discussion@ietf.org, tsv-area@ietf.org, pim@ietf.org, bier@ietf.org
Subject: Re: [pim] Q on the congestion awareness of routing protocols
Message-ID: <Y4p9zpBGhPb9HEy9@faui48e.informatik.uni-erlangen.de>
References: <Y4ovyV074qa3gLSu@faui48e.informatik.uni-erlangen.de> <CABFReBqjYjYwjBYay1fXi+b5Nh=5DoWYny34urbp_zQCmQkbqw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CABFReBqjYjYwjBYay1fXi+b5Nh=5DoWYny34urbp_zQCmQkbqw@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-area/Z8gLyuGxi-lp1mWmfbXu1PLPM9Y>
X-BeenThere: tsv-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Transport and Services Area Mailing List <tsv-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-area/>
List-Post: <mailto:tsv-area@ietf.org>
List-Help: <mailto:tsv-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Dec 2022 22:36:09 -0000

On Fri, Dec 02, 2022 at 09:37:03AM -0800, Greg Shepherd wrote:
> First, a nit: I don't think it is accurate to classify a control-plane
> scaling failure as congestion control failure.

I am not talking about scale in general,
but just about the problems related to (PIM) signaling packet loss because of 
PIM sending packets without any specification how to react to underlying network path
congestion. Should i not call this something like "PIM congestion control awareness" ?

> This does happen quite often in the field. In some very public cases,
> resulting in a PIM network that never converges post link/node failure.
> Non-affected traffic (unicast/BIER) continues to be forwarded without issue.
> 
> I agree that with PIM signalling for BIER PIM-over-TCP (RFC6388) would be a
> good (better) fit than RFC7761
> 
> However, you left out some very relevant work in the BIER WG which has been
> a part of the solution space since the beginning.

;-) Yes, mail was already long enough. Ultimately BIER is just the "use case"
for PIM lite, so i did not want to elaborate more than i felt necessary.

> See https://datatracker.ietf.org/doc/draft-ietf-bier-idr-extensions/
> 
> BIER, being a forwarding plane architecture, is agnostic to the other
> layers, allowing operators to pick their favorite/appropriate poison for
> signaling as well as UFIB propagation.

Right. And the congestion/scale requirements of BIER support in the IGPs
carrying BIER extensions is i think well in line of well understood pre-existing
IGP data. And it only needs to scale to the size of the topology (primarily
number of BFER == number of PE in the most common case). Aka: I don't think
IGP signaling for BIER has problems, where the IGPs don't have problems
with unicast. Aka: when a DCN can not use ISIS/OSPS but needs to use lsvr
or rift, then we'd obviously also have to look what to do for BIER, but i am
not sure if i would call these issues "congestion control" related (and that's
maybe what you where trying to allude to initially).

But the PIM lite solution is intended to be used as an overlay signaling
with underlying BIER, and because BIER can scale to support arbitrary
number of overlay (S,G) it is important that the overlay signaling then
does not start to mess up even with medium number of (S,G). And i think
the way networks evolve the PIM lite on a BFIR would be able to scale
to really high number of (S,G) but not if we run PIM as datagram bursts.

> https://datatracker.ietf.org/doc/draft-ietf-bier-pim-signaling/ should
> include RFC6388as well as RFC776. Please take this up with the authors and
> the BIER WG for input.

Sure, will try to find time to also take a look at this. But PIM lite looked
the most simple starting point to mandatorily adopt how we already know
to be required to make PIM reliable at (S,G), related to PIM message congestion
issues. And of course it is not only relevant for BIER deployments, but
it would hurt BIER deployment likely the most (eliminating benefits we
would otherwise get from BIER).

Cheers
    Toerless

> - Shep
> 
> On Fri, Dec 2, 2022 at 9:03 AM Toerless Eckert <tte@cs.fau.de> wrote:
> 
> > Dear routing-discussion / TSV folks
> > (sorry for escalating this, but it really bugs me - Cc'ing PIM/BIER)
> >
> > What are these days the expectations against let's say a full Internet
> > Standard
> > for a routing protocol to support in terms of congestion safe behavior ?
> > And
> > what are congestion control expectation for new routing protocl RFCs even
> > if
> > just proposed standard ?
> >
> > I am asking, because i think that our core IP multicast routing protocol
> > fails miserably on this end, and quite frankly i do not understand how
> > PIM-SM (RFC7761) could have become a full Internet standard given how it
> > has zilch discussion about congestion or loss handling.
> >
> > [ Especially, when in comparison a protocol like RFC7450 where TSV did
> > raise concerns
> >   about multicast data plane congestion awareness, and it  was held up for
> > years, and
> >   GregS as the WG-chair for the WG responsible for RFC7450 had to even help
> >   co-author RFC8085 to cut through the congestion control concern-cord.
> > But likely
> >   all for the better!].
> >
> > To quickly summarize the issue with PIM-SM to those who do not know it:
> >
> >                  /- R2 -------- R6 -\
> >      Rcvrs ... R1                    R7 ... Senders
> >                  \- R3 -- R4 -- R5 -/
> >
> >         CE ... PE .. P    P     P    PE  CE ...
> >
> > R1 has let's say 100,000 ulticast/PIM (S,G) states with sources behind R7,
> > so
> > it has to maintain 1000,000 so-called PIM (S,G) joins across the path R2,
> > R6, R7.
> > Lets say roughly an (S,G) join for IPv6 is about 38 byte (IPv6), maybe 35
> > (S,G)
> > per 1500 byte packet, so 2857 packets of 1500 byte to carry all 100,000
> > (S,G).
> >
> > Assume link R6/R7 fails, IGP reconverges, R1 recognizes that it needs to
> > change path, so it sends 2857 PIM-SM packets with prunes to R2 and 2857
> > PIM -SM
> > packets with joins to R3.
> >
> > Assume R1 is a PE, R2 and R3 are P routers in an SP, and actually R2/R3
> > connect
> > to lets say 100 routers like R1. Now R2 and R3 get 100 x 2857 1500 byte
> > packets.
> >
> > And there is nothing in the PIM-SM spec that talks about how to throttle
> > this
> > heap of PIM-SM packets. Typically, routers would just send them
> > back-to-back.
> > And those packets repeat every 60 seconds given how PIM-SM is datagram /
> > periodic
> > soft-state.  In fact, if you try to scale this in production networks, you
> > will
> > most likely fail a lot more than IP multicast in those routers, because
> > PIM not
> > only will badly compete on control-plane CPU time, but even more so on
> > control-plane
> > to hardware-forwarding time when updating the 100,000 (S,G) hardware
> > forwarding entries.
> >
> > Correct me if i am wrong, but did the same type of issues in ISIS/OSPF in
> > DC because of so many parallel paths and hence duplication of LSA recently
> > lead to the creation of multiple IETF working groups in RTG to solve these
> > issues ?
> >
> > In IP multicast, we where well aware of these issues and they where a core
> > reason to not build a PIM-based MPLS multicast protocol, but use the TCP
> > based LDP
> > to specify mLDP (RFC6388). Same thing, when various BGP multicast work was
> > done as an alternative to PIM for SPs (BCP also being TCP based).
> >
> > We did even fix this problem in PIM by specifying RFC6559 (PIM over TCP),
> > but instead of making that mechanisms mandatory and become the only option
> > for PIM when moving PIM up the IETF standards ladder to RFC7761, that
> > RFC had seemingly fallen into ignorance in the IP Multicast community,
> > because most IP multicast deployments are small enough that these issues
> > do not occur.
> >
> > So, why do i escalate this issue now ?
> >
> > We have a great new multicast architecture called BIER that eliminates
> > all this PIM multicast state issues from the P routers of such large
> > service provider networks by being stateless. But it still leaves the
> > need for overlay signaling, such as with PIM to operate between the
> > PE, such as in above picture the hundreds if not thousands
> > of receiver PE R1' and sender PE R7'. In which case you would have
> > PIM directly between those R1'/R7' across multihop paths, leading
> > to even more congestion considerations. And in support of such BIER
> > networks,
> > there is a draft draft-hb-pim-light proposed to PIM-WG to optimize PIM
> > explicitly
> > for this type of deployment. And when i said in PIM@IETF115, that such a
> > draft IMHO
> > should only allowed to proceed when it is written to say it MUST
> > be based on PIM over TCP (RFC6388), all other people responding
> > on the thread said at best it could be be a MAY. Aka: Congestion control
> > optional.
> >
> > Am i a congestion control extremist ? I really only want to have
> > scaleable, reliably multicast RFCs, especially when they aspire and
> > go to full IETF standard and are meant to support our next-gen IP Multicast
> > architectures (BIER). I do fully understand how there is a lot
> > of cost pressure on vendor development, and having procrastinated
> > to implement, proliferate and deploy PIM over TCP so far (almost a decade!)
> > does make this a less attractive choice short term. And the whole purpose
> > of the PIM light draft of course is to reduce the amount of development
> > needed
> > by making PIM more "light" (which is a good think). But when it
> > carries forward the problems of PIM to another generation of networks
> > (using BIER) that was especially built to scale better, then one
> > should IMHO really become worried. At least i do. But i also struggled to
> > implement datagram PIM processing for 100,000 states in a prior life
> > and then pushed for PIM over TCP...
> >
> > Thanks!
> >     Toerless
> >
> > _______________________________________________
> > pim mailing list
> > pim@ietf.org
> > https://www.ietf.org/mailman/listinfo/pim
> >

-- 
---
tte@cs.fau.de