Re: [RTG-DIR] Rtgdir last call review of draft-ietf-mpls-p2mp-bfd-06

Greg Mirsky <gregimirsky@gmail.com> Sun, 25 February 2024 01:44 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D8D20C14F6A8; Sat, 24 Feb 2024 17:44:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.103
X-Spam-Level:
X-Spam-Status: No, score=-2.103 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 39eOxU_mUW-8; Sat, 24 Feb 2024 17:44:31 -0800 (PST)
Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D2A83C14F6A0; Sat, 24 Feb 2024 17:44:30 -0800 (PST)
Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-608aada6268so15637277b3.0; Sat, 24 Feb 2024 17:44:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708825469; x=1709430269; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=GfAi6T99ev5vEzbhAn+NrDBnnwlcHI8VUYDjSARZkHw=; b=bg3gaVPJa9XbfCfa1bJ5blAv6Ons0V3TlBmrMxoC6teowmT0pqnXJoaX5lNBapbK3W lX6RFllSXZYFbLwSVp7UwZTggVGxhAqPtlc1A0HIbNbQCfd0Ge6OlpaBavDgIsJ6LqiY I1AOeAsq+VjH+ajcxpfPGYaLcZ48tVNtGDwPo+Q/Qi+Kf/64DIK/kEjtLgE9z95ebidO 9LemIT3nLB8JTGr+qfkk8HwU4Kii05xc6a8EDi5Zi6msuS969yicXxgmy2unqLP7luun vyJc3JjLdJe95N9kjtz1ENPC6+A4JdgqLn1YCJjoUUadNlIeZMtFu9PWWvZtvFefCAMo LavQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708825469; x=1709430269; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GfAi6T99ev5vEzbhAn+NrDBnnwlcHI8VUYDjSARZkHw=; b=dx4UWXTYpaQmJ/jPLCnoeUTXzP1CclrgvBgGN1xIcU+5o0qRJAAxX6iaaYjrz4w58s Cf5Pmmsg1InjZEXMPzXf1NJ8vpkEtzvmsU+siCKoXmAPI94eUEEf8q6sFcdA0V9/8vsb tbCjssekxpzgP/Q4WzXEpZoxlMmYTOEn9lcdxADw99QBEsYGwil3UntiYKUatttq9xy8 A/OJYs2+32nAlEjbyJ59aH23hWo0p06aOCt6vRGraTwAc9fCysLA68BY83e/yqdcNJrH mkukdUCLwuc5uomnjEYU64JOrkJiB9VnW18XVoTSf6FGEsPU59gz/2XVU+LIav71c+lG Ppcg==
X-Forwarded-Encrypted: i=1; AJvYcCUfOjVfGG8Ci4PlgyMyX5XAKbe9kK5eOsyRaJANLqwlnUPXSAthmZGSx8zBNM3xwqKwDA9nuX/ymodcqmrg0ydZUmogY26opofS0dIODkrzWOyxFWb+qy2MNl1NiB/Naa84ZEvcB9TRvAQRPXS7tg47SfNaXQlWmN4=
X-Gm-Message-State: AOJu0YxbW7dTm9ttL3toj/E4BsBSECz7dVIUMkibUYMrKZHkGnBS3YOW ZLOWIEX7HJfzebAkYj1VBkrOGode67i2L0XB4ouFzmmJyrXlotmis0YNpUXbsl1tiColsH3KYI9 iRLfNoalJBA6wueRN2jqHHHiDl2CZss6c
X-Google-Smtp-Source: AGHT+IEqtj+u1jLqM4hC32t13mpojS9IqILJoKQWWLVB+/h9pOza8CyQBIw0Ib6u6of/XiVTswR9/AWo+P72NymEhx8=
X-Received: by 2002:a81:a085:0:b0:608:2f27:5c0e with SMTP id x127-20020a81a085000000b006082f275c0emr3675274ywg.8.1708825469404; Sat, 24 Feb 2024 17:44:29 -0800 (PST)
MIME-Version: 1.0
References: <170864700898.14065.4946299905740369098@ietfa.amsl.com> <CA+RyBmXitJr-57P3y_=pYEqwoHeMo4HKqPKOud-ZZ2dQQb_gGQ@mail.gmail.com> <176e1397-5b01-487f-8ae0-078bfe2f8ee7@joelhalpern.com> <CA+RyBmUMit0oc1MZTnQ0apTM8Wj_ra7Tna5JCwwMbtbKOfgyCQ@mail.gmail.com> <ca4d0846-9ac9-4846-8bf6-f2e68787c9c8@joelhalpern.com>
In-Reply-To: <ca4d0846-9ac9-4846-8bf6-f2e68787c9c8@joelhalpern.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Sat, 24 Feb 2024 17:44:18 -0800
Message-ID: <CA+RyBmWUgge9E28Y_CCF1_EQB1YzchWXzDK9P4qYxozmR7KFyw@mail.gmail.com>
To: Joel Halpern <jmh.direct@joelhalpern.com>
Cc: rtg-dir@ietf.org, draft-ietf-mpls-p2mp-bfd.all@ietf.org, last-call@ietf.org, mpls@ietf.org
Content-Type: multipart/alternative; boundary="000000000000164a1806122aef4c"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/mig_pk3rEbWfbYDsjpNz3SMR2ic>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-mpls-p2mp-bfd-06
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Feb 2024 01:44:32 -0000

Hi Joel,
thank you for your quick response. I consider two risks that may stress the
root's control plane:

   - notifications transmitted by the leaves reporting the failure of the
   p2mp LSP
   - notifications transmitted by the root to every leave closing the Poll
   sequence

As I understand it, you refer to the former as inbound congestion. The
latter - outbound. Is that correct? I agree that even the inbound stream of
notifications may overload the root's control plane. And the outbound
process further increases the probability of the congestion in the control
plane. My proposal is to apply a rate limiter to control inbound flow of
BFD Control messages punted to the control plane.
What would you suggest in addition to the proposed text?

Best regards,
Greg

On Sat, Feb 24, 2024 at 3:28 PM Joel Halpern <jmh.direct@joelhalpern.com>
wrote:

> What you say makes sense.  I think we need to acknowledge the inbound
> congestion risk, even if we choose not to try to ameliorate it.  Your
> approaches seems to address the outbound congestion risk from the root.
>
> YOurs,
>
> Joel
> On 2/24/2024 6:25 PM, Greg Mirsky wrote:
>
> Hi Joel,
> thank you for the clarification. My idea is to use a rate limiter at the
> root of the p2mp LSP that may receive notifications from the leaves
> affected by the failure. I imagine that the threshold of the rate limiter
> might be exceeded and the notifications will be discarded. As a result,
> some notifications will be processed by the headend of the p2mp BFD session
> later, as the tails transmit notifications periodically until the receive
> the BFD Control message with the Final flag set.  Thus, we cannot avoid the
> congestion but mitigate the negative effect it might cause by extending the
> convergence. Does that make sense?
>
> Regards,
> Greg
>
> On Sat, Feb 24, 2024 at 2:39 PM Joel Halpern <jmh@joelhalpern.com> wrote:
>
>> That covers part of my concern.  But....  A failure near the root means
>> that a lot of leaves will see failure, and they will all send notifications
>> converging on the root.  Those notifications themselves, not just the final
>> messages, seem able to cause congestion.  I am not sure what can be done
>> about it, but we aren't allowed to ignore it.
>>
>> Yours,
>>
>> Joel
>> On 2/24/2024 3:34 PM, Greg Mirsky wrote:
>>
>> Hi Joel,
>> thank you for your support of this work and the suggestion. Would the
>> following update of the last paragraph of Section 5 help:
>> OLD TEXT:
>>    An ingress LSR that has received the BFD Control packet, as described
>>    above, sends the unicast IP/UDP encapsulated BFD Control packet with
>>    the Final (F) bit set to the egress LSR.
>> NEW TEXT:
>>    As described above, an ingress LSR that has received the BFD Control
>>    packet sends the unicast IP/UDP encapsulated BFD Control packet with
>>    the Final (F) bit set to the egress LSR.  In some scenarios, e.g.,
>>    when a p2mp LSP is broken close to its root, and the number of egress
>>    LSRs is significantly large, the control plane of the ingress LSR
>>    might be congested by the BFD Control packets transmitted by egress
>>    LSRs and the process of generating unicast BFD Control packets, as
>>    noted above.  To mitigate that, a BFD implementation that supports
>>    this specification is RECOMMENDED to use a rate limiter of received
>>    BFD Control packets passed to processing in the control plane of the
>>    ingress LSR.
>>
>> Regards,
>> Greg
>>
>> On Thu, Feb 22, 2024 at 4:10 PM Joel Halpern via Datatracker <
>> noreply@ietf.org> wrote:
>>
>>> Reviewer: Joel Halpern
>>> Review result: Ready
>>>
>>> Hello,
>>>
>>> I have been selected as the Routing Directorate reviewer for this draft.
>>> The
>>> Routing Directorate seeks to review all routing or routing-related
>>> drafts as
>>> they pass through IETF last call and IESG review, and sometimes on
>>> special
>>> request. The purpose of the review is to provide assistance to the
>>> Routing ADs.
>>> For more information about the Routing Directorate, please see
>>> https://wiki.ietf.org/en/group/rtg/RtgDir
>>>
>>> Although these comments are primarily for the use of the Routing ADs, it
>>> would
>>> be helpful if you could consider them along with any other IETF Last Call
>>> comments that you receive, and strive to resolve them through discussion
>>> or by
>>> updating the draft.
>>>
>>> Document: draft-name-version
>>> Reviewer: your-name
>>> Review Date: date
>>> IETF LC End Date: date-if-known
>>> Intended Status: copy-from-I-D
>>>
>>> Summary:  This document is ready for publication as a Proposed Standard.
>>>     I do have one question that I would appreciate being considered.
>>>
>>> Comments:
>>>     The document is clear and readable, with careful references for those
>>>     needing additional details.
>>>
>>> Major Issues: None
>>>
>>> Minor Issues:
>>>     I note that the security considerations (section 6) does refer to
>>>     congestion issues caused by excessive transmission of BFD requests.
>>>  I
>>>     wonder if section 5 ("Operation of Multipoint BFD with Active Tail
>>> over
>>>     P2MP MPLS LSP") should include a discussion of the congestion
>>> implications
>>>     of multiple tails sending notifications at the rate of 1 per second
>>> to the
>>>     head end, particularly if the failure is near the head end.  While I
>>>     suspect that the 1 / second rate is low enough for this to be safe,
>>>     discussion in the document would be helpful.
>>>
>>>
>>>