Re: [pim] Q on the congestion awareness of routing protocols

Greg Shepherd <gjshep@gmail.com> Fri, 02 December 2022 17:37 UTC

Return-Path: <gjshep@gmail.com>
X-Original-To: tsv-area@ietfa.amsl.com
Delivered-To: tsv-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EFC06C14CF06; Fri, 2 Dec 2022 09:37:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Sm_wgySNB9Ra; Fri, 2 Dec 2022 09:37:22 -0800 (PST)
Received: from mail-vs1-xe34.google.com (mail-vs1-xe34.google.com [IPv6:2607:f8b0:4864:20::e34]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00CD3C14F739; Fri, 2 Dec 2022 09:37:16 -0800 (PST)
Received: by mail-vs1-xe34.google.com with SMTP id k17so5305390vsr.10; Fri, 02 Dec 2022 09:37:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=BfCO6l3dPqptATCLJuwQyKQ9qiGtHB4RoxbGdo6itgE=; b=GhYsl7kHJVUxMDMTEfRaXkoZi5Wnvn2N8eR/e7ynByzDKko7CoCjl/I/UznRl32yWJ I5jDoo06lkRuYVJpU6THfvnTR40rsW/DI6eOPF7Zew3H5BT2gP9JPvGujD1bVoaItECl 12K92JO7GFUestygjjrWNkGhSnNBCZZjEs8IHwi+FnE1YEgq0MVpke1Z0AFNkZ2loMtr P3VhFJ0RG4Tb9xD3AFQ4nff2qxgR0Uf+2QIzdAl2/5aWl2JZiE/B3Wf+4ue52U7zq9jk BbZQWY+S+JmBk941QP9jPqe1wit3+l4QgzMxcuUJ/GgS0+0h3YLPaP4Ie18FsGXn0iiF 1ARg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BfCO6l3dPqptATCLJuwQyKQ9qiGtHB4RoxbGdo6itgE=; b=BWGipNI1hxHcJ2Q7BC0pcZ9sX+LnfQ5RM96aF2S5I+GEk2QM84QELnX/w0TWixh20U kyKldNnPglPWMSyAtvJf0M4sZzXS1+Xn8RW4ycDgp/kFsT1yERSDzlzfecMsXXll8Ott DSUQaj2YpflyU9Ohpo2C/Lmg9M2Q1f6AHPw1EsLEoyMCNU7jYud2tHOmfuO16Te/FDfl +p+ISzKFwdcl1dSAsJZlwSSqMOBrDJxHwFCFQ0x80lkNLrWYOBOxQBBaCyFG6XbJBE35 lS3qhHVbGdO9ePU3ZZpaLIAFHGISLs2kA+F6ybb9eH1NK/w45oT6HlcmHRwZb+PGHA6S yTHQ==
X-Gm-Message-State: ANoB5pnybbgxkuKRuVVw2CD9kEcHScOxMJ4TTlN7aiA5PIEoXrjwVN59 CTStnf1zACCIFLqLBEExKsSBCxKu0D+ervOKwvl6+YgcjoA=
X-Google-Smtp-Source: AA0mqf4FN5lZ6myTKNwmRH3Ozu2f2hZ7YRIhS6onwyZhlHP/lfSW5B00ssCe/EjHV/BJJmMrttWqA5LOzFUtk4BE30k=
X-Received: by 2002:a05:6102:914:b0:3ac:6376:1e41 with SMTP id x20-20020a056102091400b003ac63761e41mr34355431vsh.80.1670002635462; Fri, 02 Dec 2022 09:37:15 -0800 (PST)
MIME-Version: 1.0
References: <Y4ovyV074qa3gLSu@faui48e.informatik.uni-erlangen.de>
In-Reply-To: <Y4ovyV074qa3gLSu@faui48e.informatik.uni-erlangen.de>
Reply-To: gjshep@gmail.com
From: Greg Shepherd <gjshep@gmail.com>
Date: Fri, 02 Dec 2022 09:37:03 -0800
Message-ID: <CABFReBqjYjYwjBYay1fXi+b5Nh=5DoWYny34urbp_zQCmQkbqw@mail.gmail.com>
Subject: Re: [pim] Q on the congestion awareness of routing protocols
To: Toerless Eckert <tte@cs.fau.de>
Cc: routing-discussion@ietf.org, tsv-area@ietf.org, pim@ietf.org, bier@ietf.org
Content-Type: multipart/alternative; boundary="000000000000dc5d3e05eedbc9d5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-area/zRSFz8tSJ3Ko1AjJ1vCCWSFC1q8>
X-BeenThere: tsv-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Transport and Services Area Mailing List <tsv-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-area/>
List-Post: <mailto:tsv-area@ietf.org>
List-Help: <mailto:tsv-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Dec 2022 17:37:23 -0000

First, a nit: I don't think it is accurate to classify a control-plane
scaling failure as congestion control failure.

This does happen quite often in the field. In some very public cases,
resulting in a PIM network that never converges post link/node failure.
Non-affected traffic (unicast/BIER) continues to be forwarded without issue.

I agree that with PIM signalling for BIER PIM-over-TCP (RFC6388) would be a
good (better) fit than RFC7761

However, you left out some very relevant work in the BIER WG which has been
a part of the solution space since the beginning.

See https://datatracker.ietf.org/doc/draft-ietf-bier-idr-extensions/

BIER, being a forwarding plane architecture, is agnostic to the other
layers, allowing operators to pick their favorite/appropriate poison for
signaling as well as UFIB propagation.

https://datatracker.ietf.org/doc/draft-ietf-bier-pim-signaling/ should
include RFC6388as well as RFC776. Please take this up with the authors and
the BIER WG for input.

- Shep

On Fri, Dec 2, 2022 at 9:03 AM Toerless Eckert <tte@cs.fau.de> wrote:

> Dear routing-discussion / TSV folks
> (sorry for escalating this, but it really bugs me - Cc'ing PIM/BIER)
>
> What are these days the expectations against let's say a full Internet
> Standard
> for a routing protocol to support in terms of congestion safe behavior ?
> And
> what are congestion control expectation for new routing protocl RFCs even
> if
> just proposed standard ?
>
> I am asking, because i think that our core IP multicast routing protocol
> fails miserably on this end, and quite frankly i do not understand how
> PIM-SM (RFC7761) could have become a full Internet standard given how it
> has zilch discussion about congestion or loss handling.
>
> [ Especially, when in comparison a protocol like RFC7450 where TSV did
> raise concerns
>   about multicast data plane congestion awareness, and it  was held up for
> years, and
>   GregS as the WG-chair for the WG responsible for RFC7450 had to even help
>   co-author RFC8085 to cut through the congestion control concern-cord.
> But likely
>   all for the better!].
>
> To quickly summarize the issue with PIM-SM to those who do not know it:
>
>                  /- R2 -------- R6 -\
>      Rcvrs ... R1                    R7 ... Senders
>                  \- R3 -- R4 -- R5 -/
>
>         CE ... PE .. P    P     P    PE  CE ...
>
> R1 has let's say 100,000 ulticast/PIM (S,G) states with sources behind R7,
> so
> it has to maintain 1000,000 so-called PIM (S,G) joins across the path R2,
> R6, R7.
> Lets say roughly an (S,G) join for IPv6 is about 38 byte (IPv6), maybe 35
> (S,G)
> per 1500 byte packet, so 2857 packets of 1500 byte to carry all 100,000
> (S,G).
>
> Assume link R6/R7 fails, IGP reconverges, R1 recognizes that it needs to
> change path, so it sends 2857 PIM-SM packets with prunes to R2 and 2857
> PIM -SM
> packets with joins to R3.
>
> Assume R1 is a PE, R2 and R3 are P routers in an SP, and actually R2/R3
> connect
> to lets say 100 routers like R1. Now R2 and R3 get 100 x 2857 1500 byte
> packets.
>
> And there is nothing in the PIM-SM spec that talks about how to throttle
> this
> heap of PIM-SM packets. Typically, routers would just send them
> back-to-back.
> And those packets repeat every 60 seconds given how PIM-SM is datagram /
> periodic
> soft-state.  In fact, if you try to scale this in production networks, you
> will
> most likely fail a lot more than IP multicast in those routers, because
> PIM not
> only will badly compete on control-plane CPU time, but even more so on
> control-plane
> to hardware-forwarding time when updating the 100,000 (S,G) hardware
> forwarding entries.
>
> Correct me if i am wrong, but did the same type of issues in ISIS/OSPF in
> DC because of so many parallel paths and hence duplication of LSA recently
> lead to the creation of multiple IETF working groups in RTG to solve these
> issues ?
>
> In IP multicast, we where well aware of these issues and they where a core
> reason to not build a PIM-based MPLS multicast protocol, but use the TCP
> based LDP
> to specify mLDP (RFC6388). Same thing, when various BGP multicast work was
> done as an alternative to PIM for SPs (BCP also being TCP based).
>
> We did even fix this problem in PIM by specifying RFC6559 (PIM over TCP),
> but instead of making that mechanisms mandatory and become the only option
> for PIM when moving PIM up the IETF standards ladder to RFC7761, that
> RFC had seemingly fallen into ignorance in the IP Multicast community,
> because most IP multicast deployments are small enough that these issues
> do not occur.
>
> So, why do i escalate this issue now ?
>
> We have a great new multicast architecture called BIER that eliminates
> all this PIM multicast state issues from the P routers of such large
> service provider networks by being stateless. But it still leaves the
> need for overlay signaling, such as with PIM to operate between the
> PE, such as in above picture the hundreds if not thousands
> of receiver PE R1' and sender PE R7'. In which case you would have
> PIM directly between those R1'/R7' across multihop paths, leading
> to even more congestion considerations. And in support of such BIER
> networks,
> there is a draft draft-hb-pim-light proposed to PIM-WG to optimize PIM
> explicitly
> for this type of deployment. And when i said in PIM@IETF115, that such a
> draft IMHO
> should only allowed to proceed when it is written to say it MUST
> be based on PIM over TCP (RFC6388), all other people responding
> on the thread said at best it could be be a MAY. Aka: Congestion control
> optional.
>
> Am i a congestion control extremist ? I really only want to have
> scaleable, reliably multicast RFCs, especially when they aspire and
> go to full IETF standard and are meant to support our next-gen IP Multicast
> architectures (BIER). I do fully understand how there is a lot
> of cost pressure on vendor development, and having procrastinated
> to implement, proliferate and deploy PIM over TCP so far (almost a decade!)
> does make this a less attractive choice short term. And the whole purpose
> of the PIM light draft of course is to reduce the amount of development
> needed
> by making PIM more "light" (which is a good think). But when it
> carries forward the problems of PIM to another generation of networks
> (using BIER) that was especially built to scale better, then one
> should IMHO really become worried. At least i do. But i also struggled to
> implement datagram PIM processing for 100,000 states in a prior life
> and then pushed for PIM over TCP...
>
> Thanks!
>     Toerless
>
> _______________________________________________
> pim mailing list
> pim@ietf.org
> https://www.ietf.org/mailman/listinfo/pim
>