Re: Can a BFD session change its source port to facilitate auto recovery

Jeffrey Haas <jhaas@pfrc.org> Thu, 23 March 2023 18:36 UTC

Return-Path: <jhaas@pfrc.org>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B6E39C151719 for <rtg-bfd@ietfa.amsl.com>; Thu, 23 Mar 2023 11:36:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VbzASQ8Zg65W for <rtg-bfd@ietfa.amsl.com>; Thu, 23 Mar 2023 11:36:53 -0700 (PDT)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id CC9A5C152567 for <rtg-bfd@ietf.org>; Thu, 23 Mar 2023 11:36:53 -0700 (PDT)
Received: from smtpclient.apple (104-10-90-238.lightspeed.livnmi.sbcglobal.net [104.10.90.238]) by slice.pfrc.org (Postfix) with ESMTPSA id 66AAE1E037; Thu, 23 Mar 2023 14:36:52 -0400 (EDT)
Content-Type: multipart/alternative; boundary="Apple-Mail=_1BA0A16A-BEC3-4EDF-A085-D1EBAC635E8F"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\))
Subject: Re: Can a BFD session change its source port to facilitate auto recovery
From: Jeffrey Haas <jhaas@pfrc.org>
In-Reply-To: <1269529512.2412873.1679595445738@mail.yahoo.com>
Date: Thu, 23 Mar 2023 14:36:51 -0400
Cc: Abhinav Srivastava <absrivas@gmail.com>, Jeff Tantsura <jefftant.ietf@gmail.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Message-Id: <4F6F3693-6B8B-4F16-BCAA-410558609248@pfrc.org>
References: <CAL9v8R2iYMGjxF-A9SuDMcu2EF6h0isquTxjuAtNdqFwv_6etg@mail.gmail.com> <6DE166F3-5E02-446B-A105-0C6E2CC4E448@gmail.com> <1269529512.2412873.1679595445738@mail.yahoo.com>
To: Reshad Rahman <reshad@yahoo.com>
X-Mailer: Apple Mail (2.3696.120.41.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/fdkjvsUkq37dKglRe5Y7buuhoWs>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Mar 2023 18:36:55 -0000


> On Mar 23, 2023, at 2:17 PM, Reshad Rahman <reshad=40yahoo.com@dmarc.ietf.org> wrote:
> 
> Hi all,
> 
> +1 to Jeff's comment on not wanting to pretend that everything is fine.
> 
> And if we're running BFD single-hop and BFDoLAG where needed, this is a non-issue right?

Not quite.

In theory, if we had a full set of link tests from A..Z, including exercising each LAG member, one would think everything should be fine.  This is an ideal basis case.

In practice, what's often seen is that even with full coverage of the paths that there are end-to-end forwarding faults for various reasons.  In at least some of these cases it's because BFD is implemented in a layer that isn't exercising the full data path.  To pick a somewhat vendor neutral example, consider BFD implemented directly on the line card but not participating in the layer 3 ECMP load balancer, or at the LAG level not participating in the layer 2 equivalent.

It's for reasons like this that we have discussions about whether it makes sense to run single-hop BFD in addition to BFD-on-LAG covering the same link.

(It's also worth reminding the Working Group that these types of discussions were a motivation for the LIME Working Group we had some years ago.  It very much covered this space, but didn't come to successful outcomes.)

Going back to Abhinav's original question, here are my own observations:

RFC 5880 tells us that once a session is Up, we should demultiplex solely based on the Discriminators.  (RFC 5880, §6.3)

RFC 5881, used by RFC 5883 tells us that we MUST NOT change the source ports.  However, it doesn't provide a lot of justification for the WHY of that.  Given the prior point, what is the harm?  Some speculation:

- Even if you MUST demux based on Discriminators, I wouldn't place wagers on there being no implementations that aren't looking at the full layer-4 signature as part of the procedures.  In particular, middlebox steering may get in the way.
- It's often necessary for hardware based BFD implementations to put in exceptions to rate policers to permit BFD to work.

Speculation aside, changing the source port most likely would work.

Is it a good idea?  Probably not.  

Is it a great tool to try to exercise specific legs of an ECMP?  Almost certainly not at high rates.  It'd also be clumsy.

Could you do this with some level of success?  Probably.

Would I want to support debugging issues with this as a vendor?  No.

-- Jeff