Re: [Bier] [pim] Q on the congestion awareness of routing protocols

Toerless Eckert <tte@cs.fau.de> Fri, 03 March 2023 02:35 UTC

Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: tsv-area@ietfa.amsl.com
Delivered-To: tsv-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 06CE5C151535; Thu, 2 Mar 2023 18:35:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.947
X-Spam-Level:
X-Spam-Status: No, score=-3.947 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id laubu-0R4gXW; Thu, 2 Mar 2023 18:35:15 -0800 (PST)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [131.188.34.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0B600C14CE30; Thu, 2 Mar 2023 18:35:06 -0800 (PST)
Received: from faui48e.informatik.uni-erlangen.de (faui48e.informatik.uni-erlangen.de [131.188.34.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTPS id 4PSXBD3CWdznkgW; Fri, 3 Mar 2023 03:35:00 +0100 (CET)
Received: by faui48e.informatik.uni-erlangen.de (Postfix, from userid 10463) id 4PSXBD2W22zkvCW; Fri, 3 Mar 2023 03:35:00 +0100 (CET)
Date: Fri, 03 Mar 2023 03:35:00 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: "Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net>
Cc: Jon Crowcroft <Jon.Crowcroft@cl.cam.ac.uk>, BIER WG <bier@ietf.org>, "routing-discussion@ietf.org" <routing-discussion@ietf.org>, Matt Mathis <mattmathis=40google.com@dmarc.ietf.org>, "tsv-area@ietf.org" <tsv-area@ietf.org>, Stewart Bryant <stewart.bryant@gmail.com>, pim <pim@ietf.org>
Subject: Re: [Bier] [pim] Q on the congestion awareness of routing protocols
Message-ID: <ZAFc1A5XQuKYbCsT@faui48e.informatik.uni-erlangen.de>
References: <CAH56bmBnqi4peTWUXOVy0KRRXRc1L7TP+atFfVF6qb_OKBMBwg@mail.gmail.com> <C303F9BF-F96A-4710-A4B5-4228807C07F7@gmail.com> <52907137-CA5A-4042-AB2C-23FD9B032210@gmail.com> <E1p2SAw-006HQa-3s@mta0.cl.cam.ac.uk> <Y5M8RSjDuTLqJ/+v@faui48e.informatik.uni-erlangen.de> <BL0PR05MB56529DBC5D9299428B0D2A84D4E79@BL0PR05MB5652.namprd05.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <BL0PR05MB56529DBC5D9299428B0D2A84D4E79@BL0PR05MB5652.namprd05.prod.outlook.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-area/_DdrixIzMLl9PCvN75xVVCaGASU>
X-BeenThere: tsv-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Transport and Services Area Mailing List <tsv-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-area/>
List-Post: <mailto:tsv-area@ietf.org>
List-Help: <mailto:tsv-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Mar 2023 02:35:19 -0000

Jeff, *

Sorry, i forgot about this thread because of christmas, just remembering it now ;-)

I think your observations are all spot on, but the causality that is being implied is not correct.

I will claim that we will have a very hard time to make PIM be any faster without TCP
than with TCP. Even getting close to the performance of TCP will be a lot of
work, replicating work that was done so many times in before in TCP, or worse yet
coming up with PIM specific optimizations (not enough of a market to invest a lot in that).

Sure we can and should also spend some some cycles to ask TCP experts which TCP feature/profile/CC
we would recommend for use with PORT, but thats but a fraction of the work we would
need otherwise if we wanted to re-invent the wheel or figure out which non-TCP
alternative reliability sublayer we wanted to recommend/use for PIM.

I remember soemthing like 10 or more than 10 years ago to have had the discussion
about TCP in routers, and indeed, earlier implementations where very bad and often
sucked up CPU, but even back then, the current TCP was a minor consumer of CPU
compared to the actual BGP work of best path calculation and dealing with state. 

What can be a typical issue of non-ideal TCP implementations is the incast issue,
when you have a single node (BGP/PIM whatever) that simulataneously receives
traffic from multiple senders. And i am sure that not all TCP CC options will
perform equally well. But i am quite sure that all of them perform better than
datatram PIM where we simply get packet loss because of queue overrun on
this receiving PIM router - and not forwarding traffic for 60 seconds or
continuing to send unwanted traffic.

The other part of course is that it's easy to misimplement how the application
(PIM/BGP) ineracts with TCP: If the application is not written so that it will
accept traffic from the TCP socket arbitrarily fast, then you easily run into
non-idel flow-control at TCP level because you exhaust the TCP socket buffer.
So at least you need to make the TCP socket buffer sufficiently larger than
the maximum amount of data you may get from your neighbors so that tcp flow
control can operate at its fastest. Of course, when you have a big LAN with
20 downstream routers sending you after a reconvergence PIM joins for the same
100,000 groups, that can be a good amount of buffer memory, but in todays big
routers IMHO not an issue anymore. And that will also ensure that even if the
TCP implementations used are not idal for incast, that at least they will 
do retransmissions as faest as possible, because they never need to wait for the
app (PIM). Good incast-friendly TCP implementations would of course share
buffer across sockets and not require memory based on how many TCP connections
you have, but only based on what your aggregate incast bandwidth is.

There are of course optimizations that can be done within PIM itself
to save even more memory, but i really don't want to start explaining those
details unless i am really persuaded its necesssary. Which i am totally not
right now.

So, in summary: I'd rather go for PORT which gives me reliability instead of random burst
join loss caused issues, even if vendors then may take one or two releases to optimize
convergence speed under high load and/or lots of downstream routers! Its just going
to be soo much work reduction for years to come than tinkering on a PIM-datagram
specific solution.

Cheers
    Toerless

P.S.: The one past datapoint of interest you did not mention:

When we designed mLDP as part of MPLS for multicast in the 200x, we initially
looked at Dino's 1998 implementation of MPLS for PIM, which actually was released
to customers but only in software routers of course, so very few people had
actually looked at it, and decided against it and for LDP also because to a large
extend because of TCP.

[ Of course now the use of LDP in mLDP hurts given how customers want to get rid
  of LDP and think that mLDP must also go because it's LDP. If you jump on
  a protocol as a buzzword train you trive and perish with it i guess, but
  back then if it would have been just PIM over TCP with MPLS labels, it would
  have been less well accepted *sigh* ;-)]


On Sat, Dec 17, 2022 at 02:32:36PM +0000, Jeffrey (Zhaohui) Zhang wrote:
> Hi Toerless,
> 
> Some late comments - first specifically on the PIM topic and then extend to the general point of congestion aware routing protocols.
> 
> The TCP-based PIM protocol RFC6559 was designed to handle the congestion-on-scale problem. However, most PIM deployments have not come to the point where scaling become a acute problem where RFC6559 solution must be used, so its deployment has been limited.
> 
> The congestion-on-scale point was also taken when BGP-MVPN (RFC 6514) was developed. The Rosen/PIM-MVPN was very popular and there was a big debate when BGP-MVPN was proposed. Good that it eventually got standardized and became mainstream (at least for new deployments).
> 
> Someone already brought up a point of BGP updates being potentially slow. I've also heard about that many times (sometimes from known BGP experts), including when I work on BGP based multicast (beyond RFC 6559).
> 
> However, there are also protocols that rely on fast convergence even though they use BGP. EVPN is one example.
> 
> Then mobile network's control plane relies on UDP-based GTP-C. I wonder why they're not concerned with congestion in scaled situations.
> 
> For some 5G use cases I was proposing to use BGP to propagate routing information in place of some mobile user session information, and I often get asked "can you do that very fast"?
> 
> So, I am struggling with these two things:
> 
> - TCP-based solutions reduce protocol messages, but BGP may be deemed slow (or should I say with uncontrolled delay), though BGP-based EVPN actually relies on fast exchange of (at least some) BGP routes (e.g., for DF election).
> - Other solutions may lead to lots of protocol messages including refreshes, but the mobile operators seem to have been fine with UDP-based control plane.
> 
> As for the "a totally non-congestion aware sending of protocol packets should not be permitted anymore for new RFC IMHO and i am just baffled how this is permitted anymore by the IETF. Where is adult supervision by TSV when we need it" comment below, I have the following view:
> 
> - I am not sure if this involves TSV. A protocol sending lots of protocol packets is no different from an application sending lots of application traffic as far as transport is concerned. It is ultimately an issue with protocol design itself.
> - There are situations where a non-TCP based solution is needed even when a parallel TCP-based option is also present, so we can not simply disallow the former. We can discuss examples separately (one example is actually PIM as BIER overlay vs mLDP/BGP as BIER overlay).
> 
> Thanks.
> 
> Jeffrey
> 
> 
> Juniper Business Use Only
> 
> -----Original Message-----
> From: pim <pim-bounces@ietf.org> On Behalf Of Toerless Eckert
> Sent: Friday, December 9, 2022 8:47 AM
> To: Jon Crowcroft <Jon.Crowcroft@cl.cam.ac.uk>
> Cc: BIER WG <bier@ietf.org>; routing-discussion@ietf.org; Matt Mathis <mattmathis=40google.com@dmarc.ietf.org>; tsv-area@ietf.org; Stewart Bryant <stewart.bryant@gmail.com>; pim <pim@ietf.org>
> Subject: Re: [pim] Q on the congestion awareness of routing protocols
> 
> [External Email. Be cautious of content]
> 
> 
> On Tue, Dec 06, 2022 at 07:15:31AM +0000, Jon Crowcroft wrote:
> > path exploration? but consider the shadow pricing...
> >
> > the tradeoff between convergence rate and congestion control seems to 
> > be something that ought to be put on a more systematic grounding
> 
> You folks are all thinking way beyond the point i was making and looking for support:
> 
> In PIM, we have potentially gigantic burst of datagrams without any specification of pacing sent to routers across a network core (with easily likelyhood of path congestion). Such a totally non-congestion aware sending of protocol packets should not be permitted anymore for new RFC IMHO and i am just baffled how this is permitted anymore by the IETF. Where is adult supervision by TSV when we need it ;-)
> 
> Yes, the incast issue is an interesting aspect, but i have not seen good simulations whether / to-what-extend it would happen in the PIM/BGP cases, but i would bet any sum, that a TCP solution, as bad as it may be will outperform the no-congestion-control periodic burst solution of (datagram) PIM.
> 
> Cheers
>     Toerless
> 
> _______________________________________________
> pim mailing list
> pim@ietf.org
> https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/pim__;!!NEt6yMaO-gk!ASlubqGLmV8O43aB2Lcffy5JQ7FN49DnrotemtmPtVIat4Zubv-4DnJEjmh7o_4QoUn9BRIsoiEJuQ$
> 

-- 
---
tte@cs.fau.de