Re:Rtg-bfd Digest, Vol 164, Issue 4

"Albert Fu (BLOOMBERG/ 120 PARK)" <> Fri, 04 October 2019 15:49 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0B6931208D9 for <>; Fri, 4 Oct 2019 08:49:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.89
X-Spam-Status: No, score=-6.89 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_MIME_MALF=0.01] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 4Ob7znUx8rZM for <>; Fri, 4 Oct 2019 08:49:38 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 8932D1208D4 for <>; Fri, 4 Oct 2019 08:49:38 -0700 (PDT)
X-BB-Reception-Complete: 04 Oct 2019 11:49:37 -0400
X-IP-Listener: Outgoing Mail
X-IP-MID: 1267952496
Received: from (HELO msllnjpmsgsv06) ([]) by with SMTP; 04 Oct 2019 11:49:37 -0400
X-BLP-INETSVC: version=BLP_APP_S_INETSVC_1.0.1; host=mgnj2:25; conid=500
Date: Fri, 4 Oct 2019 15:49:37 -0000
From: "Albert Fu (BLOOMBERG/ 120 PARK)" <>
Reply-To: "Albert Fu" <>
MIME-Version: 1.0
Message-ID: <5D976A110166013A00390556_0_46868@msllnjpmsgsv06>
X-BLP-GUID: 5D976A110166013A003905560000
Subject: =?UTF-8?B?UmU6UnRnLWJmZCBEaWdlc3QsIFZvbCAxNjQsIElzc3VlIDQ=?=
Content-Type: multipart/alternative; boundary="BOUNDARY_5D976A110166013A00390556_0_60276_msllnjpmsgsv06"
Content-ID: <ID_5D976A110166013A00390556_0_46859@msllnjpmsgsv06>
Archived-At: <>
X-Mailman-Approved-At: Fri, 04 Oct 2019 08:55:52 -0700
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 04 Oct 2019 15:49:43 -0000

> Reason 1 -
> BFD works well to quickly detect failures. Loading on it more stuff
> compromises it. Moreover other vendors already have shipping tools which
> already can detect issues due to changes in MTU of the paths: Example:

A few points on why BFD is better:
1) Most of the tools available today like the one you mentioned are control plane dependent, and may be subject to false alarms. Most modern hardware supports control plane independent implementation, and have been proven to be extremely reliable regardless of control plane activities (network churn etc). 

2) Based on 1), it is likely that you will need to use relatively conservative timers (seconds), compared with BFD where you can use sub-second timer reliably.

3) You also need to incorporate additional script/action profile to disable the interface/peer upon detection of the issue. I would be hesitant of deploying such mechanism unless the detection mechanism is reliable.

BFD has the advantage that it is tied to the routing protocols, and enabled traffic to be automatically diverted as soon as the issue is detected, and this has been very reliable based on my experience.

> Reason 2 -

> As the draft states that the idea is to automate use of this extension by
> client protocols. I do not agree with such deployment model of this
> enhancement. At most if frequency of MTU probing would be 100-1000 times
> less frequent then up/down link detection it would serve its purpose - yet
> there is no word in the draft about such option. Essentially instead of
> replacing current tiny BFD packets one could use bfd-large as different
> sessions with completely different timers. Maybe even end to end instead of
>link by link.
I know there's a concern with scaling with BFD large packet. I will add that over the years, most vendors have improved their BFD scaling (I remember in early days, some vendors quoted max of 300ms interval to avoid false alarms). I see a trend where BFD scaling from vendors will continue to improve.

Also, for WAN deployment where there are lots of links in the IGP, some designers like us chose to use relatively conservative timers to reduce network noise, as flapping links could cause network wide events that could affect convergence e.g due to SPF hold-down being triggered unnecessary. We have found that even with conservative BFD timer of 150ms, we can achieve sub-second convergence with protection. This is only 7 packets per second. I will be surprised if this will pose scaling issue. Most network designer will validate the scaling capability before deploying the feature. Jeff has already highlighted that increasing the BFD interval is one of the methods that could help with scaling (e.g. 200msec means only 5 packets per second). Most network designer will undertake scaling test in the lab and choose appropriate timers based on their respective deployment scenario.

Another thing that could help with scaling is that BFD Large Packet can be enabled on a per interface/peer basis. Network Designers can choose not to enable Large Packet feature in DC/LAN type environment where the issue is less likely to occur (esp if they have good automation/provisioning too).

> Reason 3 -
> As we know BFD is very often used between ASes. How do you signal over p2p
> link willingness to now encapsulate BFD in stuffed UDP ? Email ? Phone ?
> Text ? Note that with mentioned icmp pathecho I can seamlessly detect issue
> with MTU of the link to my peer without telling anyone or asking for
>support of the other side.

I think this will depend on application. In our case, we currently deploy BFD on our InterAS eBGP WAN links (150ms timer). We will enable Large Packet feature on those links if it is available today. 

Our objective is quite simple. We have a lot of bandwidth/redundancy in our network. If a link can not carry the expected payload size for whatever reason (1512 bytes in our case), we want to divert traffic away from the link as soon as possible (in our case, with 150ms timer, it will be sub-second convergence), hence minimizing impact on our critical applications. Over the years, we have seen different issues that could cause this, e.g. Hardware faults, config issue, power reset, and most recently, software issue.

> Thx,
> Robert.



On Thu, Oct 3, 2019 at 9:34 PM Jeffrey Haas <> wrote:

> On Tue, Oct 01, 2019 at 11:11:13PM -0000, Albert Fu (BLOOMBERG/ 120 PARK)
> wrote:
> > There are well known cases, including those you mentioned, where BFD has
> > limitations in deterministically detecting data plane issue, and not
> > specific with the BFD Large Packet Draft. I am a novice to the IETF
> > process, and not sure if we need to mention them here, but shall discuss
> > with Jeff if it is worth highlighting them.
> It's reasonable to make note of issues where common operational scenarios
> will complicate the solution.  But it's not up to a draft carried on top of
> an RFC with that core issue to try to solve the issue in that core RFC.
> So, trying to solve "BFD doesn't work perfectly in the presence of LAGs" in
> bfd-large is the wrong place to do it. :-)
> That said, Robert, there's room for you to work on that if you want to kick
> off a draft on the topic.
> > > We won't have control over how the Provider maps our traffic
> (BFD/data).
> >
> > > Well of course you do :)  Just imagine if your BFD packets (in set
> equal to configured multiplier) would start
> > > using random UDP source port which then would be mapped to different
> ECMP buckets along the way in provider's
> > > underlay ?
> And that's an example of possible solution space for such a draft on the
> underlying issue.
> That said, LAG fan-out issues are a massive operational pain.  While it's
> likely that varying L3 PDU fields for entropy to distribute traffic across
> the LAG may work (and we have any number of customers who rely on this for
> UDP especially), it starts getting very problematic when you have multiple
> LAGs in a path.  I have a vague memory that someone had started some
> discussions with IEEE to try to figure out what OAM mechanisms would look
> like for such scenarios, but that's very much out of normal BFD scope.
> -- Jeff
-------------- next part --------------
An HTML attachment was scrubbed...


Message: 4
Date: Fri, 4 Oct 2019 00:02:04 +0000
From: "Les Ginsberg (ginsberg)" <>
To: Jeffrey Haas <>
Cc: "Ketan Talaulikar (ketant)" <>om>, "Reshad Rahman
        (rrahman)" <>om>, "" <>
Subject: RE: WGLC for draft-ietf-bfd-large-packets
Content-Type: text/plain; charset="us-ascii"

Jeff -

For some reason this is proving to be harder than I think it should be.

I keep thinking I am being transparent - yet you keep reading "ulterior 
motives" into what I say.
There are no ulterior motives.

Let me try again...inline...

> -----Original Message-----
> From: Jeffrey Haas <>
> Sent: Thursday, October 03, 2019 1:13 PM
> To: Les Ginsberg (ginsberg) <>
> Cc: Ketan Talaulikar (ketant) <>om>; Reshad Rahman
> (rrahman) <>om>;
> Subject: Re: WGLC for draft-ietf-bfd-large-packets
> Les,
> On Fri, Sep 27, 2019 at 09:14:08PM +0000, Les Ginsberg (ginsberg) wrote:
> > > The primary reason this is a "may" in the non-RFC 2119 sense is that our
> > > experience also suggests that when the scaling impacts are primarily pps
> > > rather than bps that this feature will likely have no major impact on
> > > implementations beyond your valid concerns about exercising bugs.
> > >
> > > I suspect had this not been mentioned at all, you would have been
> happier.
> > > But you're not the target audience for this weak caveat.
> > >
> >
> > [Les:] I am not opposed to a discussion of potential issues in the draft -
> > rather I am encouraging it. But the current text isn't really on the mark
> > as far as potential issues - and we seem to agree on that. It also
> > suggests lengthening detection time to compensate - which I think is not
> > at all what you want to suggest as it diminishes the value of the
> > extension. It also isn't likely to address a real problem.
> I think what I'm seeing from you is roughly:
> - Note that larger MTUs may have impact on some implementations for BFD
>   throughput.
> - And simply stop there.
[Les:] What I would like to see discussed are points "a" and "b" below.
This is a section on deployment issues - not a normative part of the spec.

> > For me, the potential issues are:
> >
> > a)Some BFD implementations might not be able to handle MTU sized BFD
> > packets - not because of performance - but because they did not expect
> > packets to be full size and therefore might have issues passing a large
> > packet through the local processing engine.
> In such cases, the BFD session wouldn't be able to come up.  Are you
> picturing a problem more dire than that?

[Les:] No. Again, as this is a discussion of deployment considerations I see 
this as an aid to indicate what problems may be seen.
I am not asking you to "fix" the extension to overcome this.

> > b)Accepted MTU is impacted by encapsulations and what layer is being
> > considered (L2 or L3). And oftentimes link MTUs do not match on both
> ends
> > ("shudder"), so you might end up with unidirectional connectivity.
> Did you mean for BFD or more in the general sense?

[Les:] It is a problem in the general sense, but it is relevant here because 
the extension proposes to send large packets. Absent that, MTU mismatches would 
be very unlikely to affect BFD since the BFD packet size is small.

> For BFD, if you have one side testing for large MTU but not the other, we
> can still have a Up BFD session with possible packet drop for large packets
> on the opposite side.  But there's the chance in some paths that MTU may be
> unidirectionally different - e.g. satellite down vs. land up.[1]
In such cases, configuring BFD large on both sides would be the right
> answer.  But it's also possible that large packets may only need to be
> unidirectionally delivered.

[Les:] I agree - and I think it is valid to use the extension unidirectionally 
in such cases.

> > I
> > appreciate that this is exactly the problem that the extensions are
> > designed to detect. I am just asking that these issues be discussed more
> > explicitly as an aid to the implementor. If that also makes Transports ADs
> > happier that is a side benefit - but that's not my motivation.
> We're happy to have that in the document.

[Les:] Great!!

> > > > What might be better?
> > > >
> > > > 1)Some statement that MTU isn't necessarily a consistent value for all
> > > > systems connected to an interface - which can impact the results when
> large
> > > > BFD packets are used. Implementations might then want to consider
> > > > supporting "bfd-mtu" configuration and/or iterating across a range of
> packet
> > > > sizes to determine what works and what doesn't.
> > >
> > > I'm not clear what you intend by this statement.
> > >
> > > Are you asking that we emphasize the use case in a different way?  The
> > > Introduction currently states:
> > >   "However,
> > >    some applications may require that the Path MTU [RFC1191] between
> > >    those two systems meets a certain minimum criteria.  When the Path
> > >    MTU decreases below the minimum threshold, those applications may
> > >    wish to consider the path unusable."
> > >
> > > I'm also unclear what "Implementations" may refer to here.  BFD?  An
> > > arbitrary user application?  If the latter, the application may not have
> > > strict control over the generation of a given PDU size; e.g. TCP
> > > applications.
> > >
> >
> > [Les:] I am talking about BFD implementations.
> > I suppose one can imagine each BFD client requesting a certain MTU value -
> > but that wouldn't be my choice.
> BFD conversations happen between pairs of devices.  In the case that you
> have multiple devices connected to a network segment, each conversation
> could (and may intentionally) have different properties.
> An easy example of this is two devices running an IGP may want fast failure
> and two other devices running BGP may be happy with just under second-
> level
> failure.
>  So too could some device decide that it cares about bi-directional
> path MTU while the others may not.

[Les:] I agree. My point was BFD sessions are requested by clients (such as a 
routing protocol). That client may/may not care about MTU e.g., a routing 
protocol may not use MTU sized packets.
But if the goal is to validate that MTU sized data traffic can successfully be 
sent then "someone" has to enable that. And I would argue that the most logical 
place to enable the feature is under BFD itself since a routing protocol 
(BGP/OSPF) won't necessarily care about MTU.
Since you are speaking at the "device" level (not BFD client level) I think we 
are in agreement.

> Given prior BFD documents' lack of discussion about such multi-access
> network considerations, I'm not sure it's in character to have it just for
> such a case, if that's what you're concerned with.
> > I would think the value we want is really the maximum L3 payload that the
> > link is intended to support - which should be independent of the BFD
> > client. This might be larger than any client actually uses - but that
> > seems like a good thing.
> In this case we have actual existence proof of desired behavior.  The links
> may be 9k but the user cares only about 1500 bytes end to end. If 1500 bytes
> for BFD large works but 9k doesn't, we've not tested what the user actually
> desired.

[Les:] This is fine. This is consistent with my suggestion that an 
implementation supports a "bfd-mtu" knob. This value can be <= link_mtu.


> -- Jeff


Subject: Digest Footer

Rtg-bfd mailing list


End of Rtg-bfd Digest, Vol 164, Issue 4