Re: [RTG-DIR] Rtgdir last call review of draft-ietf-mpls-p2mp-bfd-06

Joel Halpern <jmh.direct@joelhalpern.com> Sun, 25 February 2024 04:29 UTC

Return-Path: <jmh.direct@joelhalpern.com>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 32341C14F5F7; Sat, 24 Feb 2024 20:29:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=joelhalpern.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3yyuYdHrqJqQ; Sat, 24 Feb 2024 20:29:02 -0800 (PST)
Received: from mailb2.tigertech.net (mailb2.tigertech.net [208.80.4.154]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0F022C14F5F5; Sat, 24 Feb 2024 20:28:56 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by mailb2.tigertech.net (Postfix) with ESMTP id 4Tj9k055jLz1nv7N; Sat, 24 Feb 2024 20:28:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelhalpern.com; s=2.tigertech; t=1708835336; bh=MS3AUNaMU8weG56SQbDxRF0BNfb+ew2fY3NiNmqnEoM=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=DxS5RJimxMTB/BjTpBrxo6EQTn/8NZIOIzlypWrhJd5crtkJw/Ip0TaVa2IYuDOgt 9zG164zkLjxwixiVkSYIiOmUKtp1aNqES6Cbc6egaY1veIg9nnRfeGmAh5EwqRj5T+ GnZLVsiY5u9hhY1HLcbHjUv2uG8Xf2sq5k+uSo/w=
X-Quarantine-ID: <MwUl8hHi2F5K>
X-Virus-Scanned: Debian amavisd-new at b2.tigertech.net
Received: from [192.168.20.146] (unknown [50.233.136.230]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mailb2.tigertech.net (Postfix) with ESMTPSA id 4Tj9jz697mz1nv24; Sat, 24 Feb 2024 20:28:55 -0800 (PST)
Content-Type: multipart/alternative; boundary="------------h73TWyyxoPhTY05o0Ufi3nGC"
Message-ID: <a68578c8-5c8e-4dd0-8f5c-7c93787877dd@joelhalpern.com>
Date: Sat, 24 Feb 2024 23:28:52 -0500
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-US
To: Greg Mirsky <gregimirsky@gmail.com>
Cc: rtg-dir@ietf.org, draft-ietf-mpls-p2mp-bfd.all@ietf.org, last-call@ietf.org, mpls@ietf.org
References: <170864700898.14065.4946299905740369098@ietfa.amsl.com> <CA+RyBmXitJr-57P3y_=pYEqwoHeMo4HKqPKOud-ZZ2dQQb_gGQ@mail.gmail.com> <176e1397-5b01-487f-8ae0-078bfe2f8ee7@joelhalpern.com> <CA+RyBmUMit0oc1MZTnQ0apTM8Wj_ra7Tna5JCwwMbtbKOfgyCQ@mail.gmail.com> <ca4d0846-9ac9-4846-8bf6-f2e68787c9c8@joelhalpern.com> <CA+RyBmWUgge9E28Y_CCF1_EQB1YzchWXzDK9P4qYxozmR7KFyw@mail.gmail.com> <52902652-167a-414f-8ca6-c13c80504829@joelhalpern.com> <CA+RyBmU+vzeW8YOmyf1xsUcGfPPBVCnLgFELcj26D8JNR0N_2w@mail.gmail.com>
From: Joel Halpern <jmh.direct@joelhalpern.com>
In-Reply-To: <CA+RyBmU+vzeW8YOmyf1xsUcGfPPBVCnLgFELcj26D8JNR0N_2w@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/o-zZJ8VSEGBCa8vDkNTa5QOGUBs>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-mpls-p2mp-bfd-06
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Feb 2024 04:29:06 -0000

That seems to state the situation fairly.  I can live with it.
Yours,
Joel

On 2/24/2024 9:24 PM, Greg Mirsky wrote:
> Now I see where the disconnect was, thank you for pointing it out to me.
> As I understand it, the notifications from leaves to the root will not 
> use DetNet resources and, as a result, would not congest DetNet flows 
> although may have negative effect on other flows. I've updated text as 
> follows:
> NEW TEXT:
>    As described above, an ingress LSR that has received the BFD Control
>    packet sends the unicast IP/UDP encapsulated BFD Control packet with
>    the Final (F) bit set to the egress LSR.  In some scenarios, e.g.,
>    when a p2mp LSP is broken close to its root, and the number of egress
>    LSRs is significantly large, the root might receive a large number of
>    notifications.  The notifications from leaves to the root will not
>    use DetNet resources and, as a result, will not congest DetNet flows,
>    although they may negatively affect other flows. However, the
>    control plane of the ingress LSR might be congested by the BFD
>    Control packets transmitted by egress LSRs and the process of
>    generating unicast BFD Control packets, as noted above.  To mitigate
>    that, a BFD implementation that supports this specification is
>    RECOMMENDED to use a rate limiter of received BFD Control packets
>    passed to the ingress LSR’s control plane for processing.
>
> What are your thoughts?
>
> Regards,
> Greg
>
> On Sat, Feb 24, 2024 at 5:56 PM Joel Halpern <jmh@joelhalpern.com> wrote:
>
>     Mostly.  THere is one other aspect.  You may consider it
>     irrelevant, in which case we can simply say so.  Can the inbound
>     notifications coming from a large number of leaves at the same
>     time cause data plane congestion?
>
>     Yours,
>
>     Joel
>
>     On 2/24/2024 8:44 PM, Greg Mirsky wrote:
>>     Hi Joel,
>>     thank you for your quick response. I consider two risks that may
>>     stress the root's control plane:
>>
>>       * notifications transmitted by the leaves reporting the failure
>>         of the p2mp LSP
>>       * notifications transmitted by the root to every leave closing
>>         the Poll sequence
>>
>>     As I understand it, you refer to the former as inbound
>>     congestion. The latter - outbound. Is that correct? I agree that
>>     even the inbound stream of notifications may overload the root's
>>     control plane. And the outbound process further increases the
>>     probability of the congestion in the control plane. My proposal
>>     is to apply a rate limiter to control inbound flow of BFD Control
>>     messages punted to the control plane.
>>     What would you suggest in addition to the proposed text?
>>
>>     Best regards,
>>     Greg
>>
>>     On Sat, Feb 24, 2024 at 3:28 PM Joel Halpern
>>     <jmh.direct@joelhalpern.com> wrote:
>>
>>         What you say makes sense.  I think we need to acknowledge the
>>         inbound congestion risk, even if we choose not to try to
>>         ameliorate it.  Your approaches seems to address the outbound
>>         congestion risk from the root.
>>
>>         YOurs,
>>
>>         Joel
>>
>>         On 2/24/2024 6:25 PM, Greg Mirsky wrote:
>>>         Hi Joel,
>>>         thank you for the clarification. My idea is to use a rate
>>>         limiter at the root of the p2mp LSP that may
>>>         receive notifications from the leaves affected by the
>>>         failure. I imagine that the threshold of the rate limiter
>>>         might be exceeded and the notifications will be discarded.
>>>         As a result, some notifications will be processed by the
>>>         headend of the p2mp BFD session later, as the tails transmit
>>>         notifications periodically until the receive the BFD Control
>>>         message with the Final flag set.  Thus, we cannot avoid the
>>>         congestion but mitigate the negative effect it might cause
>>>         by extending the convergence. Does that make sense?
>>>
>>>         Regards,
>>>         Greg
>>>
>>>         On Sat, Feb 24, 2024 at 2:39 PM Joel Halpern
>>>         <jmh@joelhalpern.com> wrote:
>>>
>>>             That covers part of my concern.  But.... A failure near
>>>             the root means that a lot of leaves will see failure,
>>>             and they will all send notifications converging on the
>>>             root.  Those notifications themselves, not just the
>>>             final messages, seem able to cause congestion.  I am not
>>>             sure what can be done about it, but we aren't allowed to
>>>             ignore it.
>>>
>>>             Yours,
>>>
>>>             Joel
>>>
>>>             On 2/24/2024 3:34 PM, Greg Mirsky wrote:
>>>>             Hi Joel,
>>>>             thank you for your support of this work and the
>>>>             suggestion. Would the following update of the last
>>>>             paragraph of Section 5 help:
>>>>             OLD TEXT:
>>>>                An ingress LSR that has received the BFD Control
>>>>             packet, as described
>>>>                above, sends the unicast IP/UDP encapsulated BFD
>>>>             Control packet with
>>>>                the Final (F) bit set to the egress LSR.
>>>>             NEW TEXT:
>>>>                As described above, an ingress LSR that has received
>>>>             the BFD Control
>>>>                packet sends the unicast IP/UDP encapsulated BFD
>>>>             Control packet with
>>>>                the Final (F) bit set to the egress LSR.  In some
>>>>             scenarios, e.g.,
>>>>                when a p2mp LSP is broken close to its root, and the
>>>>             number of egress
>>>>                LSRs is significantly large, the control plane of
>>>>             the ingress LSR
>>>>                might be congested by the BFD Control packets
>>>>             transmitted by egress
>>>>                LSRs and the process of generating unicast BFD
>>>>             Control packets, as
>>>>                noted above.  To mitigate that, a BFD implementation
>>>>             that supports
>>>>                this specification is RECOMMENDED to use a rate
>>>>             limiter of received
>>>>                BFD Control packets passed to processing in the
>>>>             control plane of the
>>>>                ingress LSR.
>>>>
>>>>             Regards,
>>>>             Greg
>>>>
>>>>             On Thu, Feb 22, 2024 at 4:10 PM Joel Halpern via
>>>>             Datatracker <noreply@ietf.org> wrote:
>>>>
>>>>                 Reviewer: Joel Halpern
>>>>                 Review result: Ready
>>>>
>>>>                 Hello,
>>>>
>>>>                 I have been selected as the Routing Directorate
>>>>                 reviewer for this draft. The
>>>>                 Routing Directorate seeks to review all routing or
>>>>                 routing-related drafts as
>>>>                 they pass through IETF last call and IESG review,
>>>>                 and sometimes on special
>>>>                 request. The purpose of the review is to provide
>>>>                 assistance to the Routing ADs.
>>>>                 For more information about the Routing Directorate,
>>>>                 please see
>>>>                 https://wiki.ietf.org/en/group/rtg/RtgDir
>>>>
>>>>                 Although these comments are primarily for the use
>>>>                 of the Routing ADs, it would
>>>>                 be helpful if you could consider them along with
>>>>                 any other IETF Last Call
>>>>                 comments that you receive, and strive to resolve
>>>>                 them through discussion or by
>>>>                 updating the draft.
>>>>
>>>>                 Document: draft-name-version
>>>>                 Reviewer: your-name
>>>>                 Review Date: date
>>>>                 IETF LC End Date: date-if-known
>>>>                 Intended Status: copy-from-I-D
>>>>
>>>>                 Summary:  This document is ready for publication as
>>>>                 a Proposed Standard.
>>>>                     I do have one question that I would appreciate
>>>>                 being considered.
>>>>
>>>>                 Comments:
>>>>                     The document is clear and readable, with
>>>>                 careful references for those
>>>>                     needing additional details.
>>>>
>>>>                 Major Issues: None
>>>>
>>>>                 Minor Issues:
>>>>                     I note that the security considerations
>>>>                 (section 6) does refer to
>>>>                     congestion issues caused by excessive
>>>>                 transmission of BFD requests.   I
>>>>                     wonder if section 5 ("Operation of Multipoint
>>>>                 BFD with Active Tail over
>>>>                     P2MP MPLS LSP") should include a discussion of
>>>>                 the congestion implications
>>>>                     of multiple tails sending notifications at the
>>>>                 rate of 1 per second to the
>>>>                     head end, particularly if the failure is near
>>>>                 the head end.  While I
>>>>                     suspect that the 1 / second rate is low enough
>>>>                 for this to be safe,
>>>>                     discussion in the document would be helpful.
>>>>
>>>>