Re: Q on the congestion awareness of routing protocols

Jon Crowcroft <jon.crowcroft@cl.cam.ac.uk> Fri, 02 December 2022 17:56 UTC

Return-Path: <crowcroft@gmail.com>
X-Original-To: tsv-area@ietfa.amsl.com
Delivered-To: tsv-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8D618C14F718; Fri, 2 Dec 2022 09:56:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.397
X-Spam-Level:
X-Spam-Status: No, score=-6.397 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wtJTWpybZxoc; Fri, 2 Dec 2022 09:56:48 -0800 (PST)
Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 56B8AC14F607; Fri, 2 Dec 2022 09:56:48 -0800 (PST)
Received: by mail-ej1-f47.google.com with SMTP id vv4so13245161ejc.2; Fri, 02 Dec 2022 09:56:48 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nIKwQsrhDOZd0G5zeeyJxbX5de4Ima8T1kLvtjW5710=; b=4el9zhWigKUvIfMeElU3ohm4wM8sU59/g2OAQaj3lQ9mJDtIKo+qjHcsdkvTiZcfPM O+VlvSqxT6rEmhKj/v6zoi6maMakrLRcu+iZgWDLN0BSsuQrJHwAH+D45bpd/GjrvcFH Uxnkk+T8uR89JOPuupGRLztBNKOKNKyThRNEDog3+eMsQ1ZWIC4FTz/inMREoKK9Sxha Ax7ndTrh/6eNXCaKWDWBioyJisU5UkeLVpoD9+xdO3hnDU+fdcY5731VQX6qRkBrnE4Y IT8odaCROJV49z4GiL7w1gNhBhKK+OuMKe8qAroojWy1w6AYVBSLRiWKcP4NiWkw8zgQ Z9YA==
X-Gm-Message-State: ANoB5pnsm4yJ1HxC1lDUmcbmoqQIi/ltXK9JQ7qBgsu0Dtx+npIWD/Po iEIdPlb2YEqwGFFk+Uxf44WChXGTTPzEyiePTuk=
X-Google-Smtp-Source: AA0mqf6TUSgXf7VQG3E6fimRX/kRE+MzIfIG0I63zF4HvOSPtDSiV+d9GtmBpg5OI4GoWgjQEsc1McfFTQ5kjXbr5LA=
X-Received: by 2002:a17:906:5055:b0:78d:cdce:bc52 with SMTP id e21-20020a170906505500b0078dcdcebc52mr46899431ejk.469.1670003806606; Fri, 02 Dec 2022 09:56:46 -0800 (PST)
MIME-Version: 1.0
References: <Y4ovyV074qa3gLSu@faui48e.informatik.uni-erlangen.de>
In-Reply-To: <Y4ovyV074qa3gLSu@faui48e.informatik.uni-erlangen.de>
From: Jon Crowcroft <jon.crowcroft@cl.cam.ac.uk>
Date: Fri, 02 Dec 2022 17:56:35 +0000
Message-ID: <CAEeTejLa8sdJVU_2OfTo=ZgWRY-kv_7M=xiR-bLyBEXhSDP=Eg@mail.gmail.com>
Subject: Re: Q on the congestion awareness of routing protocols
To: Toerless Eckert <tte@cs.fau.de>
Cc: routing-discussion@ietf.org, tsv-area@ietf.org, pim@ietf.org, bier@ietf.org
Content-Type: multipart/alternative; boundary="000000000000aa991f05eedc0f97"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-area/4SaKnju-wUSRYVnHoEJuQDEqE5Q>
X-BeenThere: tsv-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Transport and Services Area Mailing List <tsv-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-area/>
List-Post: <mailto:tsv-area@ietf.org>
List-Help: <mailto:tsv-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Dec 2022 17:56:50 -0000

Gonna say, ironically, one early use of multicast was a proposal to use SRM
instead of a mesh of tcp connections for iBGP...so some people do think
about scaling control plane traffic in the presence of congestion, some
times:-)

On Fri, 2 Dec 2022, 17:03 Toerless Eckert, <tte@cs.fau.de> wrote:

> Dear routing-discussion / TSV folks
> (sorry for escalating this, but it really bugs me - Cc'ing PIM/BIER)
>
> What are these days the expectations against let's say a full Internet
> Standard
> for a routing protocol to support in terms of congestion safe behavior ?
> And
> what are congestion control expectation for new routing protocl RFCs even
> if
> just proposed standard ?
>
> I am asking, because i think that our core IP multicast routing protocol
> fails miserably on this end, and quite frankly i do not understand how
> PIM-SM (RFC7761) could have become a full Internet standard given how it
> has zilch discussion about congestion or loss handling.
>
> [ Especially, when in comparison a protocol like RFC7450 where TSV did
> raise concerns
>   about multicast data plane congestion awareness, and it  was held up for
> years, and
>   GregS as the WG-chair for the WG responsible for RFC7450 had to even help
>   co-author RFC8085 to cut through the congestion control concern-cord.
> But likely
>   all for the better!].
>
> To quickly summarize the issue with PIM-SM to those who do not know it:
>
>                  /- R2 -------- R6 -\
>      Rcvrs ... R1                    R7 ... Senders
>                  \- R3 -- R4 -- R5 -/
>
>         CE ... PE .. P    P     P    PE  CE ...
>
> R1 has let's say 100,000 ulticast/PIM (S,G) states with sources behind R7,
> so
> it has to maintain 1000,000 so-called PIM (S,G) joins across the path R2,
> R6, R7.
> Lets say roughly an (S,G) join for IPv6 is about 38 byte (IPv6), maybe 35
> (S,G)
> per 1500 byte packet, so 2857 packets of 1500 byte to carry all 100,000
> (S,G).
>
> Assume link R6/R7 fails, IGP reconverges, R1 recognizes that it needs to
> change path, so it sends 2857 PIM-SM packets with prunes to R2 and 2857
> PIM -SM
> packets with joins to R3.
>
> Assume R1 is a PE, R2 and R3 are P routers in an SP, and actually R2/R3
> connect
> to lets say 100 routers like R1. Now R2 and R3 get 100 x 2857 1500 byte
> packets.
>
> And there is nothing in the PIM-SM spec that talks about how to throttle
> this
> heap of PIM-SM packets. Typically, routers would just send them
> back-to-back.
> And those packets repeat every 60 seconds given how PIM-SM is datagram /
> periodic
> soft-state.  In fact, if you try to scale this in production networks, you
> will
> most likely fail a lot more than IP multicast in those routers, because
> PIM not
> only will badly compete on control-plane CPU time, but even more so on
> control-plane
> to hardware-forwarding time when updating the 100,000 (S,G) hardware
> forwarding entries.
>
> Correct me if i am wrong, but did the same type of issues in ISIS/OSPF in
> DC because of so many parallel paths and hence duplication of LSA recently
> lead to the creation of multiple IETF working groups in RTG to solve these
> issues ?
>
> In IP multicast, we where well aware of these issues and they where a core
> reason to not build a PIM-based MPLS multicast protocol, but use the TCP
> based LDP
> to specify mLDP (RFC6388). Same thing, when various BGP multicast work was
> done as an alternative to PIM for SPs (BCP also being TCP based).
>
> We did even fix this problem in PIM by specifying RFC6559 (PIM over TCP),
> but instead of making that mechanisms mandatory and become the only option
> for PIM when moving PIM up the IETF standards ladder to RFC7761, that
> RFC had seemingly fallen into ignorance in the IP Multicast community,
> because most IP multicast deployments are small enough that these issues
> do not occur.
>
> So, why do i escalate this issue now ?
>
> We have a great new multicast architecture called BIER that eliminates
> all this PIM multicast state issues from the P routers of such large
> service provider networks by being stateless. But it still leaves the
> need for overlay signaling, such as with PIM to operate between the
> PE, such as in above picture the hundreds if not thousands
> of receiver PE R1' and sender PE R7'. In which case you would have
> PIM directly between those R1'/R7' across multihop paths, leading
> to even more congestion considerations. And in support of such BIER
> networks,
> there is a draft draft-hb-pim-light proposed to PIM-WG to optimize PIM
> explicitly
> for this type of deployment. And when i said in PIM@IETF115, that such a
> draft IMHO
> should only allowed to proceed when it is written to say it MUST
> be based on PIM over TCP (RFC6388), all other people responding
> on the thread said at best it could be be a MAY. Aka: Congestion control
> optional.
>
> Am i a congestion control extremist ? I really only want to have
> scaleable, reliably multicast RFCs, especially when they aspire and
> go to full IETF standard and are meant to support our next-gen IP Multicast
> architectures (BIER). I do fully understand how there is a lot
> of cost pressure on vendor development, and having procrastinated
> to implement, proliferate and deploy PIM over TCP so far (almost a decade!)
> does make this a less attractive choice short term. And the whole purpose
> of the PIM light draft of course is to reduce the amount of development
> needed
> by making PIM more "light" (which is a good think). But when it
> carries forward the problems of PIM to another generation of networks
> (using BIER) that was especially built to scale better, then one
> should IMHO really become worried. At least i do. But i also struggled to
> implement datagram PIM processing for 100,000 states in a prior life
> and then pushed for PIM over TCP...
>
> Thanks!
>     Toerless
>
> _______________________________________________
> routing-discussion mailing list
> routing-discussion@ietf.org
> https://www.ietf.org/mailman/listinfo/routing-discussion
>