Re: [RTG-DIR] [Last-Call] Rtgdir last call review of draft-ietf-mpls-p2mp-bfd-06

loa@pi.nu Sun, 25 February 2024 04:53 UTC

Return-Path: <loa@pi.nu>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 25C58C14F5F7; Sat, 24 Feb 2024 20:53:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uDIoIiEZrl1B; Sat, 24 Feb 2024 20:53:41 -0800 (PST)
Received: from pipi.pi.nu (pipi.pi.nu [83.168.239.141]) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2053C14F5E9; Sat, 24 Feb 2024 20:53:39 -0800 (PST)
Received: from pi.nu (localhost.localdomain [127.0.0.1]) by pipi.pi.nu (Postfix) with ESMTP id 778E43A8885; Sun, 25 Feb 2024 05:53:36 +0100 (CET)
Received: from 124.106.198.177 (SquirrelMail authenticated user loa@pi.nu) by pi.nu with HTTP; Sun, 25 Feb 2024 05:53:36 +0100
Message-ID: <b56587b0d21440ab06bc5dfd7a1621ca.squirrel@pi.nu>
In-Reply-To: <131d1ad4-7210-4aad-bd4c-81ffa29f50f3@joelhalpern.com>
References: <170864700898.14065.4946299905740369098@ietfa.amsl.com> <CA+RyBmXitJr-57P3y_=pYEqwoHeMo4HKqPKOud-ZZ2dQQb_gGQ@mail.gmail.com> <176e1397-5b01-487f-8ae0-078bfe2f8ee7@joelhalpern.com> <CA+RyBmUMit0oc1MZTnQ0apTM8Wj_ra7Tna5JCwwMbtbKOfgyCQ@mail.gmail.com> <ca4d0846-9ac9-4846-8bf6-f2e68787c9c8@joelhalpern.com> <CA+RyBmWUgge9E28Y_CCF1_EQB1YzchWXzDK9P4qYxozmR7KFyw@mail.gmail.com> <52902652-167a-414f-8ca6-c13c80504829@joelhalpern.com> <6ab7c53107501d33f468282fd8a7523d.squirrel@pi.nu> <131d1ad4-7210-4aad-bd4c-81ffa29f50f3@joelhalpern.com>
Date: Sun, 25 Feb 2024 05:53:36 +0100
From: loa@pi.nu
To: Joel Halpern <jmh@joelhalpern.com>
Cc: loa@pi.nu, Greg Mirsky <gregimirsky@gmail.com>, rtg-dir@ietf.org, draft-ietf-mpls-p2mp-bfd.all@ietf.org, last-call@ietf.org, mpls@ietf.org
User-Agent: SquirrelMail/1.4.22
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/9a62cYJPRW7-948cZL21SfTYRNA>
Subject: Re: [RTG-DIR] [Last-Call] Rtgdir last call review of draft-ietf-mpls-p2mp-bfd-06
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Feb 2024 04:53:45 -0000

Joel,

> Ib my experience (and I did charter the PIM and MPLS working groups in
> ancient history),

I think you are forgiven by now :).

> the distinction between pt2mp for MPLS and IP
> Multicast are not about scale.Â

Admittedly, but we have experiences of existing networks, I'd say that
there are many more "leaves" in a multicast service network, than in an
MPLS p2mp network.

I still think it would be good to know how many MPLS "leaves" it takes to
cause control plane overload, but since BFD control messages might not be
the only factor, a definitive answer might not be possible.

>  I do not think any of the defining
> documents deal with what scale they aim at.  IP Multicast includes SSM,
> which is pt-2-mpt, and ASM, which is mp-2-mp.
>
> I actually tend to doubt that the BFD error indications will cause data
> plane congestion, given the relative rates and expected scales.  But it
> is our job to say so.  Then other folks can decide wheether we have
> sufficiently addressed the questions.

I don't see that we disagree.
>
> Yours,
>
> Joel
>
> On 2/24/2024 10:56 PM, loa@pi.nu wrote:
>> Greg, Joel, all
>>
>> Traditionally we have distinguished between "p2mp" for MPLS, and
>> "multicast" for IP. An IP multicast service might easily reach a "large
>> number of leaves", while MPLS p2mp is more of an "transport" service
>> where the number of leaves are moderate.
>>
>> I'm not saying that that "moderate number" might not cause the problems
>> Greg and Joel discusses, but it might be an idea to think a bit about
>> the scale. How many leaves is required to cause:
>>
>> - data plane congestion?
>> - control plane overload?
>>
>> Currently I don't see any data plane problems (correct me if I'm wrong),
>> while control plane overload is a possibility.
>>
>> /Loa
>>
>>
>>> Mostly.  THere is one other aspect.  You may consider it
>>> irrelevant, in
>>> which case we can simply say so.  Can the inbound notifications
>>> coming
>>> from a large number of leaves at the same time cause data plane
>>> congestion?
>>>
>>> Yours,
>>>
>>> Joel
>>>
>>> On 2/24/2024 8:44 PM, Greg Mirsky wrote:
>>>> Hi Joel,
>>>> thank you for your quick response. I consider two risks that may
>>>> stress the root's control plane:
>>>>
>>>>    * notifications transmitted by the leaves reporting the failure of
>>>>      the p2mp LSP
>>>>    * notifications transmitted by the root to every leave closing the
>>>>      Poll sequence
>>>>
>>>> As I understand it, you refer to the former as inbound congestion. The
>>>> latter - outbound. Is that correct? I agree that even the inbound
>>>> stream of notifications may overload the root's control plane. And the
>>>> outbound process further increases the probability of the congestion
>>>> in the control plane. My proposal is to apply a rate limiter to
>>>> control inbound flow of BFD Control messages punted to the control
>>>> plane.
>>>> What would you suggest in addition to the proposed text?
>>>>
>>>> Best regards,
>>>> Greg
>>>>
>>>> On Sat, Feb 24, 2024 at 3:28 PM Joel Halpern
>>>> <jmh.direct@joelhalpern.com> wrote:
>>>>
>>>>      What you say makes sense.  I think we need to acknowledge the
>>>>      inbound congestion risk, even if we choose not to try to
>>>>      ameliorate it.  Your approaches seems to address the outbound
>>>>      congestion risk from the root.
>>>>
>>>>      YOurs,
>>>>
>>>>      Joel
>>>>
>>>>      On 2/24/2024 6:25 PM, Greg Mirsky wrote:
>>>>>      Hi Joel,
>>>>>      thank you for the clarification. My idea is to use a rate
>>>>> limiter
>>>>>      at the root of the p2mp LSP that may receive notifications
>>>>> from
>>>>>      the leaves affected by the failure. I imagine that the threshold
>>>>>      of the rate limiter might be exceeded and the notifications will
>>>>>      be discarded. As a result, some notifications will be processed
>>>>>      by the headend of the p2mp BFD session later, as the tails
>>>>>      transmit notifications periodically until the receive the BFD
>>>>>      Control message with the Final flag set.  Thus, we cannot
>>>>> avoid
>>>>>      the congestion but mitigate the negative effect it might cause
>>>>> by
>>>>>      extending the convergence. Does that make sense?
>>>>>
>>>>>      Regards,
>>>>>      Greg
>>>>>
>>>>>      On Sat, Feb 24, 2024 at 2:39 PM Joel Halpern
>>>>>      <jmh@joelhalpern.com> wrote:
>>>>>
>>>>>          That covers part of my concern.  But....  A failure
>>>>> near the
>>>>>          root means that a lot of leaves will see failure, and they
>>>>>          will all send notifications converging on the root. 
>>>>> Those
>>>>>          notifications themselves, not just the final messages, seem
>>>>>          able to cause congestion.  I am not sure what can be done
>>>>>          about it, but we aren't allowed to ignore it.
>>>>>
>>>>>          Yours,
>>>>>
>>>>>          Joel
>>>>>
>>>>>          On 2/24/2024 3:34 PM, Greg Mirsky wrote:
>>>>>>          Hi Joel,
>>>>>>          thank you for your support of this work and the suggestion.
>>>>>>          Would the following update of the last paragraph of
>>>>>> Section
>>>>>>          5 help:
>>>>>>          OLD TEXT:
>>>>>>             An ingress LSR that has received the BFD Control
>>>>>> packet,
>>>>>>          as described
>>>>>>             above, sends the unicast IP/UDP encapsulated BFD
>>>>>> Control
>>>>>>          packet with
>>>>>>             the Final (F) bit set to the egress LSR.
>>>>>>          NEW TEXT:
>>>>>>             As described above, an ingress LSR that has
>>>>>> received the
>>>>>>          BFD Control
>>>>>>             packet sends the unicast IP/UDP encapsulated BFD
>>>>>> Control
>>>>>>          packet with
>>>>>>             the Final (F) bit set to the egress LSR.  In
>>>>>> some
>>>>>>          scenarios, e.g.,
>>>>>>             when a p2mp LSP is broken close to its root, and
>>>>>> the
>>>>>>          number of egress
>>>>>>             LSRs is significantly large, the control plane of
>>>>>> the
>>>>>>          ingress LSR
>>>>>>             might be congested by the BFD Control packets
>>>>>> transmitted
>>>>>>          by egress
>>>>>>             LSRs and the process of generating unicast BFD
>>>>>> Control
>>>>>>          packets, as
>>>>>>             noted above.  To mitigate that, a BFD
>>>>>> implementation
>>>>>> that
>>>>>>          supports
>>>>>>             this specification is RECOMMENDED to use a rate
>>>>>> limiter
>>>>>>          of received
>>>>>>             BFD Control packets passed to processing in the
>>>>>> control
>>>>>>          plane of the
>>>>>>             ingress LSR.
>>>>>>
>>>>>>          Regards,
>>>>>>          Greg
>>>>>>
>>>>>>          On Thu, Feb 22, 2024 at 4:10 PM Joel Halpern via
>>>>>> Datatracker
>>>>>>          <noreply@ietf.org> wrote:
>>>>>>
>>>>>>              Reviewer: Joel Halpern
>>>>>>              Review result: Ready
>>>>>>
>>>>>>              Hello,
>>>>>>
>>>>>>              I have been selected as the Routing Directorate
>>>>>> reviewer
>>>>>>              for this draft. The
>>>>>>              Routing Directorate seeks to review all routing or
>>>>>>              routing-related drafts as
>>>>>>              they pass through IETF last call and IESG review, and
>>>>>>              sometimes on special
>>>>>>              request. The purpose of the review is to provide
>>>>>>              assistance to the Routing ADs.
>>>>>>              For more information about the Routing Directorate,
>>>>>>              please see
>>>>>>              https://wiki.ietf.org/en/group/rtg/RtgDir
>>>>>>
>>>>>>              Although these comments are primarily for the use of
>>>>>> the
>>>>>>              Routing ADs, it would
>>>>>>              be helpful if you could consider them along with any
>>>>>>              other IETF Last Call
>>>>>>              comments that you receive, and strive to resolve them
>>>>>>              through discussion or by
>>>>>>              updating the draft.
>>>>>>
>>>>>>              Document: draft-name-version
>>>>>>              Reviewer: your-name
>>>>>>              Review Date: date
>>>>>>              IETF LC End Date: date-if-known
>>>>>>              Intended Status: copy-from-I-D
>>>>>>
>>>>>>              Summary:  This document is ready for publication as
>>>>>> a
>>>>>>              Proposed Standard.
>>>>>>                  I do have one question that I would
>>>>>> appreciate being
>>>>>>              considered.
>>>>>>
>>>>>>              Comments:
>>>>>>                  The document is clear and readable, with
>>>>>> careful
>>>>>>              references for those
>>>>>>                  needing additional details.
>>>>>>
>>>>>>              Major Issues: None
>>>>>>
>>>>>>              Minor Issues:
>>>>>>                  I note that the security considerations
>>>>>> (section 6)
>>>>>>              does refer to
>>>>>>                  congestion issues caused by excessive
>>>>>> transmission
>>>>>>              of BFD requests.   I
>>>>>>                  wonder if section 5 ("Operation of Multipoint
>>>>>> BFD
>>>>>>              with Active Tail over
>>>>>>                  P2MP MPLS LSP") should include a discussion
>>>>>> of the
>>>>>>              congestion implications
>>>>>>                  of multiple tails sending notifications at
>>>>>> the rate
>>>>>>              of 1 per second to the
>>>>>>                  head end, particularly if the failure is near
>>>>>> the
>>>>>>              head end.  While I
>>>>>>                  suspect that the 1 / second rate is low
>>>>>> enough for
>>>>>>              this to be safe,
>>>>>>                  discussion in the document would be helpful.
>>>>>>
>>>>>>
>>
>
> --
> last-call mailing list
> last-call@ietf.org
> https://www.ietf.org/mailman/listinfo/last-call
>