Re: [pim] AD Review of draft-ietf-pim-bfd-p2mp-use-case-05

Jeffrey Haas <jhaas@pfrc.org> Tue, 31 August 2021 20:50 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C39363A0C81; Tue, 31 Aug 2021 13:50:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Fr3bBJq6tlFV; Tue, 31 Aug 2021 13:50:41 -0700 (PDT)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id 35A113A0C7D; Tue, 31 Aug 2021 13:50:41 -0700 (PDT)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 898D11E28C; Tue, 31 Aug 2021 16:50:40 -0400 (EDT)
Date: Tue, 31 Aug 2021 16:50:40 -0400
From: Jeffrey Haas <jhaas@pfrc.org>
To: Reshad Rahman <reshad@yahoo.com>
Cc: Michael McBride <michael.mcbride@futurewei.com>, Rtg-bfd WG <rtg-bfd@ietf.org>, "pim-chairs@ietf.org" <pim-chairs@ietf.org>
Subject: Re: [pim] AD Review of draft-ietf-pim-bfd-p2mp-use-case-05
Message-ID: <20210831205040.GC2820@pfrc.org>
References: <202107220742327030208@zte.com.cn> <CAMMESsznPjjXD44S5gc=QeAEdZA4cEOwPJJcxPbgxxiktiOoaA@mail.gmail.com> <BYAPR13MB258298BD266B52AA6F0110EFF4FF9@BYAPR13MB2582.namprd13.prod.outlook.com> <BYAPR13MB2582DC846916E97FDED192AAF4FF9@BYAPR13MB2582.namprd13.prod.outlook.com> <1867869445.417157.1629489910471@mail.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1867869445.417157.1629489910471@mail.yahoo.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/hZGZ_GIuxUWkR6BUvCI6RN5kKiM>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 31 Aug 2021 20:50:47 -0000

Mike,

Thanks for bringing this to the BFD Working Group.  I've had a chance to
read through the draft.  In spite of it being several years since I've
worked on/with PIM, I think I've understood it. :-)

If I were to re-state the longer version of the draft's name, this is
effectively "using BFD-MP to let PIM hello procedures fail faster".  If
that's an adequate assessment, perhaps the title might be adjusted a bit.

I think Alvaro's point is that this is really covering a small bit of
PIM procedure and it sounds like it's intended to be a lot more broad from
the document title.

The procedures seem to be largely straight-forward, as long as you're
comfortable with the error handling procedures for the wrong OptionLength.
Perhaps alternatively if this field is the incorrect length you simply
ignore the procedure rather than stopping the processing of PIM.  That said,
do whatever is normative within the protocol for such cases.

My two remaining comments are one operational one, and one related to
forwarding:

As written, any PIM speaker could become a BFD MultiPointHead.  This
theoretically means that on a busy LAN segment you can get a lot of chatty
multi-point BFD traffic.  I realize that typical PIM deployments mean that
this is less likely to happen.

Since PIM is used to build multicast trees, realistically you only need the
MultiPointHead procedures to be running on upstream nodes.  At typical
multicast endpoints, it's usually clear what the upstreams will be.  At
other arbitrary points in the network, this is less clear but those are
typically not LAN segments.

If there's text to operationally suggest that you only do the MultiPointHead
configuration on appropriate nodes, that may be helpful.

My second concern is shorter.  Section 2.3 recommends that the p2mp BFD
sessions use a TTL of 255 and reference the GTSM procedures in RFC 5881.
However, since the destination address is a multicast group and the
underlying PIM protocol uses a TTL of 1 for its messages, I'm not sure the
255 is appropriate for this use case.

If there is documentation in PIM for the use of GTSM procedures, I'd suggest
referencing those procedures in this draft.

-- Jeff


On Fri, Aug 20, 2021 at 08:05:10PM +0000, Reshad Rahman wrote:
>  Thanks Michael.  Adding BFD WG.
> 
> Regards,Reshad.
> -----Original Message-----
> From: pim <pim-bounces@ietf.org> On Behalf Of Michael McBride
> Sent: Tuesday, August 17, 2021 10:53 PM
> To: Alvaro Retana <aretana.ietf@gmail.com>; gregory.mirsky@ztetx.com
> Cc: draft-ietf-pim-bfd-p2mp-use-case@ietf.org; mmcbride7@gmail.com; pim-chairs@ietf.org; pim@ietf.org
> Subject: Re: [pim] AD Review of draft-ietf-pim-bfd-p2mp-use-case-05
> 
> Hi Greg,
> 
> Please consider either 1) dropping the "use-case" from the draft name and keep it a specific pim-bfd-p2mp solution and less generalized or 2) keeping the existing name and make it more generalized with more use cases as Alvaro suggests. 
> 
> I'll ping the bfd chairs and cc you to ensure they are in the loop.
> 
> Thanks,
> mike
> 
> -----Original Message-----2021-02-04-Network TMT Meetings : https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fefutureway.sharepoint.com%2Fteams%2FNetworkLabTMT%2FShared&data=04%7C01%7Cmichael.mcbride%40futurewei.com%7C58e9d5e586be49da514708d9620c8c21%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637648628320865112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=AlTELyg1EqhpLnl8aYSCv9YKEH6zldm46UVAx8MZaus%3D&reserved=0 Documents/General/Network Lab TMT Meeting/February 4 2021 Meeting/2021-02-04-Network TMT Meetings.pptx?web=1G
> From: pim <pim-bounces@ietf.org> On Behalf Of Alvaro Retana
> Sent: Tuesday, August 17, 2021 8:19 AM
> To: gregory.mirsky@ztetx.com
> Cc: draft-ietf-pim-bfd-p2mp-use-case@ietf.org; mmcbride7@gmail.com; pim-chairs@ietf.org; pim@ietf.org
> Subject: Re: [pim] AD Review of draft-ietf-pim-bfd-p2mp-use-case-05
> 
> On July 21, 2021 at 7:42:42 PM, gregory.mirsky@ztetx.com wrote:
> 
> 
> Greg:
> 
> Hi!
> 
> Thanks for the update!!
> 
> I have some comments in-line.  I am also attaching (below) a review of -06, which mostly contains minor issues and nits related to new/updated text.
> 
> 
> The main item I would like WG input on is the generalization of the mechanism.  In short, bootstrapping the BFD session is the main focus of this document; I don't see a reason to avoid generalizing its use to other scenarios.  We already agree on the wider applicability -- the incremental changes needed to generalize the text is far less than the process it will take to do it later.
> 
> Mike: as Shepherd/Chair, please start a conversation in the WG as needed.  Please take a look at the discussion below.
> 
> 
> Thanks!
> 
> Alvaro.
> 
> 
> 
> ...
> > Sender: AlvaroRetana
> ...
> > This document specifies two things: (1) a Hello option to help the 
> > tail bootstrap the BFD session, and, (2) the actions that the tail 
> > takes when a failure is detected. The former is common to all cases, 
> > while the latter depends on the role of the head with respect to the 
> > tail. The actions are basically an acceleration of what would 
> > naturally happen (if BFD was not used and the failure detection was 
> > "slow"). Is this a fair characterization of the document?
> 
> GIM>> Yes, absolutely correct.
> 
> ...
> > Are there other cases that could use this mechanism to track? I can 
> > think of a couple of cases: monitor PIM neighbors that send Joins, RPF 
> > neighbors, Assert winners... As with the DR case, for example, these 
> > cases don't require actions beyond an acceleration. It would be ideal 
> > if the document could cover these cases, and possibly others, in a 
> > generic way -- I can't think of good phrasing with now, but I'm sure 
> > you can. ;-)
> 
> GIM>> I agree that the defined PIM Hello BFD Discriminator option can be 
> GIM>> used by not only PIM DR/BDR nodes. Indeed, there are other use 
> GIM>> cases where a faster detection will improve the convergence in the 
> GIM>> control plane and minimize the negative impact on the multicast 
> GIM>> data plane. These use cases may be covered in the future.
> 
> Maybe I'm missing something obvious, and would like to understand what. :-)
> 
> The Hello option helps the tail bootstrap a BFD session, correct?  If so, there's nothing in that description about the function of the tail (or the head).  The point that I'm trying to make above is that the only thing that BFD is providing any use case is faster detection of a failure (which is very significant, of course!), so regardless of the application or function of the tail/head, the operation can be generalized.  Is this not true?  What am I missing?
> 
> To be more specific, the description in §2.1 (Using P2MP BFD in PIM DR/BDR Monitoring) is a generic description, and not one that only applies to a DR/BDR.  In fact, the text says that any node "regardless of its role, MAY become a head of a p2mp BFD session" -- which means that it is up to the tail to monitor it or not.
> 
> The last two paragraphs in §2.1 do mention the DR/BDR function, but they could easily be generalized:
> 
> OLD>
>    If the tail detects a MultipointHead failure [RFC8562], it MUST
>    delete the corresponding neighbor state.  If the failed head was the
>    DR (or BDR), the DR (or BDR) election mechanism in [RFC7761] or
>    [I-D.ietf-pim-dr-improvement] is followed.
> 
>    If the head ceases to include the BFD Discriminator PIM Hello option
>    in its PIM-Hello message, tails MUST close the corresponding
>    MultipointTail BFD session.  Thus the tail stops using BFD to monitor
>    the head and reverts to the procedures defined in [RFC7761] and
>    [I-D.ietf-pim-dr-improvement].
> 
> 
> NEW>
>    If the tail detects a MultipointHead failure [RFC8562], it MUST
>    delete the corresponding neighbor state.  If the head ceases to
>    include the BFD Discriminator PIM Hello option in its PIM-Hello
>    message, tails MUST close the corresponding MultipointTail BFD
>    session.  In both cases, the tail continues to follow the
>    specification related to the function of the head.
> 
> 
> 
> GIM>> As I think of it, one aspect would be homogeneity of p2mp BFD 
> GIM>> capability throughout the domain. In other words, what happens if 
> GIM>> some PIM nodes don't support the BFD Discriminator option and do 
> GIM>> not use p2mp BFD? What their slow (regular) convergence impact 
> GIM>> other nodes? But that, I think, is for further discussion, work. Would you agree?
> 
> I don't.
> 
> In fact, this is a very important deployment point that I had overlooked.  If the support is not homogeneous, then some parts of the network will converge (to the new state) faster than others. As with unicast routing I think the main effect may be longer than expected inconsistency, but not worst than without BFD.  The specific effect relates to the use case...but because the deployment would be localized (the DR and other routers on the LAN, for example), then any negative effect of not supporting BFD (i.e. behaving as today) would be localized.
> 
> Thank you for bringing this up.  If you consider the effect significant for a specific case then please add a couple of sentences.
> 
> 
> ...
> > 183 3.1. Using P2MP BFD in PIM DR/BDR Monitoring
> ...
> > 229 If the head ceased to include BFD TLV in its PIM-Hello message, 
> > tails
> > 230 MUST close the corresponding MultipointTail BFD session. Thus the
> > 231 tail stops using BFD to monitor the head and reverts to the
> > 232 procedures defined in [RFC7761] and [I-D.ietf-pim-dr-improvement].
> ...
> > [major] Let me see if I understand: if the head doesn't use the BFD 
> > hello option anymore then the tail can gracefully stop using BFD.
> > IOW, this way the BFD session does not expire and result in the DR 
> > being declared dead. Is that it?
> 
> GIM>> Yes, that is what we've intended - revert to "slow" detection.
> 
> > Given that the BFD session can be bootstrapped at the tail by manually 
> > configuring the corresponding discriminator, it seems that stopping 
> > the use of the BFD hello option may not result in the expected 
> > outcome. ???
> 
> GIM>> Yes, the head's My Discriminator value can be provisioned using 
> GIM>> the management plane. If that is the case, then I think this 
> GIM>> document is not applicable as the head and leaves use RFC 8562 without any additions.
> 
> The problem is that the text now says this:
> 
>    If the head ceases to include the BFD Discriminator PIM Hello option
>    in its PIM-Hello message, tails MUST close the corresponding
>    MultipointTail BFD session.  Thus the tail stops using BFD to monitor
>    the head...
> 
> s/MUST/SHOULD   Provisioning the node is the exception, so the action should be recommended and not required.
> 
> 
> ...
> > 288 5. Security Considerations
> ...
> > 299 An implementation that supports this specification SHOULD use a
> > 300 mechanism to control the maximum number of BFD sessions that can 
> > be
> > 301 active at the same time.
> >
> > [major] rfc8562 already requires "protective measures to prevent an 
> > infinite number of MultipointTail sessions from being created". It is 
> > then not needed for this document to recommend anything that is 
> > required elsewhere.
> 
> GIM>> Done.
> 
> ??  You left the paragraph in.
> 
> 
> > [major] What new security risks are introduced by the mechanism in 
> > this draft? In general, a rogue node can stop sending or delay BFD 
> > packets causing the tail to conclude that the head is down: the DR/BDR 
> > may change causing instability. I was surprised that rfc8562 did not 
> > mention the interaction risk, but rfc5880 already does. I feel that 
> > something needs to be mentioned specific to this document, even if it 
> > is highlighted that the risk is not new.
> 
> GIM>> Then that "rogue" node is the PIM DR/BDR, not a man-in-the-middle. 
> GIM>> And the attack is by making the p2mp BFD session expire on leaves 
> GIM>> while still periodically sending PIM Hello. AFAIK, since PIM 
> GIM>> DR/BDR election takes several Hello cycles, I don't think that 
> GIM>> that behavior will affect the multicast service. Perhaps I'm missing something, please advise.
> 
> Yes, the problem is that §2.1 says that the tail "MUST delete the corresponding neighbor state".  This results in the DR not being elected for a while.
> 
> I see what you mean: if the DR is still there then the election will probably elect it again.  However, in the meantime the DR may not think it is the DR anymore if the other routers in the LAN start a new DR election. (??)   Please add the explanation (or something like it) to make it clear that the risk is mitigated by the "double set of hellos".
> 
> In the general case...  If tracking the sender of a Join, for example, the effect would be more significant: an outage would exist until the next Join is received.
> 
> 
> ...
> > 310 7.1. Normative References
> > ..
> > 327 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding 
> > Detection
> > 328 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
> > 329 DOI 10.17487/RFC5881, June 2010,
> > 330 .
> >
> > [minor] This reference can be Informative.
> 
> I know you moved it...but now that you added a reference to it in §2.3 then we need it to be Normative.  Sorry.
> 
> 
> ...
> > [End of review -05.]
> 
> 
> 
> [Start -06]
> [Line numbers from idnits.]
> 
> ...
> 129    2.  BFD Discriminator PIM Hello Option
> ...
> 153      If the value of the OptionLength field is not equal to 4, the BFD
> 154      Discriminator PIM Hello option is considered malformed, and the
> 155      receiver MUST stop processing PIM Hello options.  If the value of the
> 156      My Discriminator field equals zero, then the BFD Discriminator PIM
> 157      Hello option MUST be considered invalid, and the receiver MUST ignore
> 158      it.  The receiver SHOULD log the notification regarding the malformed
> 159      or invalid BFD Discriminator Hello option under the control of a
> 160      throttling logging mechanism.
> 
> [major] "MUST stop processing PIM Hello options"
> 
> Stop in the current Hello message?  Should it ignore all the options or just the ones after this one?  In all future Hello messages?
> 
> I haven't thought about this enough, but there could be an effect on other functionality.  What is that effect?  I couldn't find anywhere a general way to handle malformed Hello options -- did I miss it?
> 
> 
> [nit] s/log the notification/log a notification
> 
> 
> 162    2.1.  Using P2MP BFD in PIM DR/BDR Monitoring
> ...
> 169      If a PIM-SM router is configured to monitor the head by using p2mp
> 170      BFD, referred to through this document as 'tail', receives PIM-Hello
> 171      packet with BFD Discriminator PIM Hello option, the tail MAY create a
> 172      p2mp BFD session of type MultipointTail, as defined in [RFC8562].
> 
> [minor] s/router is configured/router that is configured
> 
> [nit] s/receives PIM-Hello packet with BFD Discriminator/receives a PIM-Hello packet with the BFD Discriminator
> 
> 
> ...
> 188      If the head ceases to include the BFD Discriminator PIM Hello option
> 189      in its PIM-Hello message, tails MUST close the corresponding
> 190      MultipointTail BFD session.  Thus the tail stops using BFD to monitor
> 191      the head and reverts to the procedures defined in [RFC7761] and
> 192      [I-D.ietf-pim-dr-improvement].
> 
> [minor] "...MUST close the corresponding MultipointTail BFD session"
> 
> It might be a good thing adding that the PIM state is not affected by this action.
> 
> 
> 194    2.2.  P2MP BFD in PIM DR Load Balancing
> 
> 196      [RFC8775] specifies the PIM Designated Router Load Balancing (DRLB)
> 197      functionality.  Any PIM router that advertises the DRLB-Cap Hello
> 198      Option can become the head of a p2mp BFD session, as specified in
> 199      Section 2.1.  The head router administratively sets the
> 200      bfd.SessionState to Up in the MultipointHead session [RFC8562] only
> 201      if it is a Group Designated Router (GDR) Candidate, as specified in
> 202      Sections 5.5 and 5.6 of [RFC8775].  If the router is no longer the
> 203      GDR, then it MUST shut down following the procedures described in
> 204      Section 5.9 [RFC8562].  For each GDR Candidate that includes BFD
> 205      Discriminator option in its PIM Hello, the PIM DR creates a
> 206      MultipointTail session [RFC8562].  PIM DR demultiplexes BFD sessions
> 207      based on the value of the My Discriminator field and the source IP
> 208      address.  If PIM DR detects a failure of one of the sessions, it MUST
> 209      remove that router from the GDR Candidate list and immediately
> 210      transmit a new DRLB-List option.
> 
> [] Continuing with my theme of generalizing this specification...
> This section says everything that the last section already specified in a generic way.  IOW, it is not really needed.
> 
> There is one thing that this paragraph adds: "If the router is no longer the GDR, then it MUST shut down following the procedures described in Section 5.9 [RFC8562]."   Yes, shutting down the BFD session is important, but so is not including the BFD Discriminator option in the Hello anymore.  As with everything else, this part can also be generalized:
> 
>    If the head is no longer serving the function that prompted it
>    to be monitored, then it MUST cease including the BFD Discriminator
>    PIM Hello option in its PIM-Hello message, and it MUST shut down
>    the BFD session following the procedures described in Section 5.9
>    [RFC8562].
> 
> 
> 212    2.3.  Multipoint BFD Encapsulation
> 
> 214      The MultipointHead of a p2mp BFD session when transmitting BFD
> 215      Control packet:
> 
> [nit] s/packet/packets
> 
> 
> 217         MUST set TTL or Hop Limit value to 255 (Section 5 [RFC5881]);
> 
> [major] "MUST set...RFC5881"   This action is already required in an RFC that this document depends on, please don't specify the behavior again.  I understand that rfc5682 can be used in multi-hop scenarios, but rfc5881 is the source here.    s/MUST/must
> 
> 
> ...
> 222    3.  IANA Considerations
> ...
> 227      +=============+================+===================+===============+
> 228      | Value Name  | Length Number  | Name Protocol     | Reference     |
> 229      +=============+================+===================+===============+
> 230      | TBA         | 4              | BFD Discriminator | This document |
> 231      |             |                | Option            |               |
> 232      +-------------+----------------+-------------------+---------------+
> 
> [major] Please use the same field names as in the registry: Value, Length, Name, Reference
> 
> [EoR -06]
> 
> _______________________________________________
> pim mailing list
> pim@ietf.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fpim&data=04%7C01%7Cmichael.mcbride%40futurewei.com%7C58e9d5e586be49da514708d9620c8c21%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637648628320865112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zhFkLPkgOJ9EeuloewKnNDe75JqgzbOWdfV3%2Bd5EBRk%3D&reserved=0
> 
> _______________________________________________
> pim mailing list
> pim@ietf.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fpim&data=04%7C01%7Cmichael.mcbride%40futurewei.com%7C58e9d5e586be49da514708d9620c8c21%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637648628320875108%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4SrgiS9qdwYswCMitpP4Qf1iWZ4zJkVauDz%2B4%2BKyvyA%3D&reserved=0
>