Re: Zaheduzzaman Sarker's Discuss on draft-ietf-bfd-unsolicited-11: (with DISCUSS and COMMENT)

Jeffrey Haas <jhaas@pfrc.org> Thu, 15 December 2022 22:39 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 98BB0C1527A6; Thu, 15 Dec 2022 14:39:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.9
X-Spam-Level:
X-Spam-Status: No, score=-6.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jaoGuS3h6p2W; Thu, 15 Dec 2022 14:39:24 -0800 (PST)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id F1D36C14CF00; Thu, 15 Dec 2022 14:39:23 -0800 (PST)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 014381E370; Thu, 15 Dec 2022 17:39:22 -0500 (EST)
Date: Thu, 15 Dec 2022 17:39:22 -0500
From: Jeffrey Haas <jhaas@pfrc.org>
To: Zaheduzzaman Sarker <Zaheduzzaman.Sarker@ericsson.com>
Cc: The IESG <iesg@ietf.org>, draft-ietf-bfd-unsolicited@ietf.org, bfd-chairs@ietf.org, rtg-bfd@ietf.org
Subject: Re: Zaheduzzaman Sarker's Discuss on draft-ietf-bfd-unsolicited-11: (with DISCUSS and COMMENT)
Message-ID: <20221215223922.GD23286@pfrc.org>
References: <167104636614.47387.14544637650303450586@ietfa.amsl.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <167104636614.47387.14544637650303450586@ietfa.amsl.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/EQpbik4X9LtnLcsdFQui8Zly1Gc>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Dec 2022 22:39:25 -0000

Zahed,

[Speaking as chair/shedpherd, not an author.]

On Wed, Dec 14, 2022 at 11:32:46AM -0800, Zaheduzzaman Sarker via Datatracker wrote:
> DISCUSS:
> ----------------------------------------------------------------------
> 
> Thanks for working on this specification.
> 
> Thanks to Magnus Westerlund for the TSVART review, based on that review and my
> own read, I am supporting both Lars's and Roman's discuss.
> 
> On top of that, as this document claims - "with "unsolicited BFD" there is
> potential risk for excessive resource usage by BFD from "unexpected" remote
> systems". This translates to me as potential injection of huge amount of
> traffic which is lacking a self-regulation mechanism in this specification. To

I suspect it's an unfamiliarity with core RFC 5880 behaviors that's lead you
to that incorrect observation.  BFD sessions negotiate the least aggressive
timer in each direction based on the timers present in each BFD PDU.

See RFC 5880 §6.8.7 for details.

It's also not uncommon for implementations to dyanmically adjust their
timers based on load within some constraints.  When that's not possible,
BFD traffic that becomes unsustainable causes the BFD sessions to start
losing packets, which in many cases will cause the session to transition to
the Down state - and thus back to slow PDU transmission.

The caveat in this draft is related to an unexpected number of BFD sessions.
Operators, who are already generally aware of BFD session and timer scaling
for their systems, need to plan within the bounds of their deployment.  For
example, if a /24 interface is permitted, it's not unreasonable for 255
sessions to be possible.  If scaling requires fewer to be guaranteed... then
configure that in the ACL.

"If it hurts, don't do that."

> large degrees the traffic volume could have random effects on the routing plane
> and what links are considered up etc. We can hide all these by saying "Deploy
> the feature only in certain "trustworthy" environment"", then I am completely
> missing the definition of "trustworthy" environment". I would like to discuss
> that.

The environment must be under reasonable operational control to satisfy the
scaling of the impacted system.  What words would you prefer to have there
instead?  How would those words change if you want to permit this feature to
be utilized when the operational environment spans multiple entities, such
as at an exchange point (IXP)?


"If it hurts, don't do that."

And note: You can happily end up with a badly behaved environment through
configuration as well.

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> Additional comments -
> 
> * This document also says - "When an Unsolicited BFD session goes down, an
> implementation MAY retain the session state for a period of time. Retaining
> this state can be useful for operational purposes." I am missing any discussion
> on the reduced functionality or any indication if the selected period time has
> any advantages or disadvantages. To be honest, without proper discussion or
> indication of some default values, I would remove the entire sentence or if
> this is just an additional implementation advice, I would drop the normative
> MAY.

The desired behavior here is that operational state associated with a
session and visible in management infrastructure (such as YANG modules)
requires that the state not disappear immediately when the session goes
down.  Such a behavior is indistinguishable from a very fast delete of
configuration state.

The reason the MAY was originally chosen was to counter possible arguments
that the state MUST be deleted immediately.

My recommendation would be to retain the MAY, but moving to the non RFC 2119
"may" would still satisfy the desired behavior. This is to inform
implementors that immediate deletion is not the obvious implementation
requirement.

-- Jeff