Re: WGLC for draft-ietf-bfd-large-packets

Jeffrey Haas <> Fri, 13 September 2019 12:33 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 2554A120219 for <>; Fri, 13 Sep 2019 05:33:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id lACZ3WNaYOG5 for <>; Fri, 13 Sep 2019 05:33:52 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id AFE34120091 for <>; Fri, 13 Sep 2019 05:33:52 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTPSA id EC6211E2F3; Fri, 13 Sep 2019 08:36:35 -0400 (EDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Subject: Re: WGLC for draft-ietf-bfd-large-packets
From: Jeffrey Haas <>
In-Reply-To: <>
Date: Fri, 13 Sep 2019 08:33:50 -0400
Cc: Reshad Rahman <>, "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <>
To: "Ketan Talaulikar (ketant)" <>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <>
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 13 Sep 2019 12:33:54 -0000


Thanks for your comments.

> On Sep 12, 2019, at 11:55 PM, Ketan Talaulikar (ketant) <> wrote:
> Hi All,
> I would like to ask some questions and seek clarifications on this draft.
> 	• I am aware that this draft originates from practical pain points at a specific operator. During the adoption calls, the scenarios were debated in detail. It was basically a L2 WAN circuit service over a provider network and the challenge was that the PMTU of the underlying path in the SP network changes dynamically. However, from the Enterprise POV, the L2 circuit is seen as a single link and the BFD runs in directly connected mode. The draft however, discusses BFD multi-hop which is an entirely different use-case.

This is perhaps some poor wording choices in the Introduction section.  The intent is really that this is usable for all cases where BFD may be used.  The Introduction text was intended to note that for single-hop scenarios, BFD Echo procedures may be sufficient to solve the problem.  (See comment to Carlos elsewhere in the thread.)  However, for any form of multi-hop, you can't solve this via Echo.

General note about the "pain points at a specific author": While Albert certainly was the first customer I'd had willing to help stamp their name on such a draft, I'd been approached in my chair capacity both in IETF and at my employer over the years about similar headaches many times.  This is sadly not a rare problem.

> When doing multi-hop, the BFD packet could go over entirely different paths in the network with different PMTUs (especially different from the application sending large packets) – this makes things flaky? So shouldn’t this mechanism be actually focussed on the single hop/directly connected mode instead?

Even for single-hop this may be slightly flakey.  Consider "directly connected" links that are composed from LAGs.  

The draft doesn't try to completely address all such forms of multipath problems, even for directly connected solutions.  When we were doing BFD for LAGs, we realized that trying to get overly specific in terms of the implementation not only got ourselves perilously into the specifics of a given vendor, but potentially into conflict with IEEE in terms of discussions about distributing traffic.  To that end, we've been silent on the matter in this draft.

It's reasonable in any BFD implementation dealing with multipath issues to potentially interact with the load balancer and decide to exercise specific links.  I believe it's the case for some scenarios that your own employer may do this in pre BFD-on-LAG implementations for single hop BFD based on some old mailing list discussion.  However, it's proven unreasonable over the life of the BFD protocol to try to get too pedantic over how to do such a thing; some hardware may simply not support it.

> 	• There are implementations with BFD offload to H/W out there. What happens when a particular implementation cannot handle such large packet sizes (and I am not specifically aware of one such)? Like other aspects of monitoring like the intervals, would there be a value in indicating a support for large packets during the signalling? The draft does raise the need for it but doesn’t seem to do anything about it – why? Either we need it and this draft should define it. Or we don’t need it and it would placing the onus on the operator to enable this when they know both ends support it. Then it is something for operational consideration section perhaps?

This is effectively a criticism of BFD in general rather than specifically this feature.  Part of the motivation for RFC 7419, Common Interval Support in Bidirectional Forwarding Detection, was to specify a number of common implementation timings.  But even then, that document doesn't talk much about scaling considerations for a given set of intervals.  Common considerations for implementors and operators tend to be "my implementation can support X sessions at 3.3ms, Y and 10ms, and Z much slower if we leave the fully distributed to the line card mode".  This just simply means the product sheet gets another axis in its listings.

What you're asking for in the protocol otherwise is a mechanism to negotiate or renegotiate a set of timers over a given set of interfaces based on existing or future re-provisioning of session quantity and parameters.  I think you'll find operators aren't particularly supportive of that.

FWIW, we expect exactly these same questions during IESG review.  BFD had exactly this type of question when doing even the base RFC 5880 work.  (BFD tends to make the current Transport AD unhappy.)

> 	• There was a discussion on the list about whether this needs to be done for every packet or not. I don’t find that discussion or the result captured in the draft. The draft just says that perhaps longer intervals should be applied to BFD when doing large packets – but this will defeat the purpose of fast-detection. What would be good is we have both fast-detection and slow PMTU validation? Perhaps we need some analysis of whether large packets should be used always or intermittently and the pros/cons or guidelines for the same?

See my comment to Carlos in-thread about why this makes the timers tricky.  See also how BFD Echo can solve this problem in some circumstances without negotiation.

It's worth noting that BFD is quite capable of dynamically re-negotiating its timers.  This means an implementation could have a behavior such as:

1. Start by bringing up its session with timers for regular sized packets, and 3.3ms
2. Decide to shift to large packets, but renegotiate to 10ms timers first.

The "signaling" that a system is willing to accept the large packets is shifting to a longer timer.

> 	• The draft is missing an operational considerations and manageability considerations sections. Some of this info is placed in sec 3, but would help if things were elaborated in their individual sections. It would provide more insight into how exactly this mechanism in BFD is envisaged to be actually deployed and used. More importantly, perhaps how it should NOT be used?

Is there specific information you think should go into such a section that isn't otherwise present in the draft?  The "how it should NOT" be used is a challenging section to write - somewhat similar to proving negatives - without specific content.

> Can the authors and WG discuss the above? I think it seems too rushed to go for WGLC just as yet?

The purpose of WGLC is to provide the working group an opportunity to drive out commentary like that above when things are otherwise quiet. :-)  Thanks for providing it.

-- Jeff