Re: [Idr] WG LC for draft-ietf-idr-bgp-sendholdtimer-03 (3/23 to 4/12/2024)

Jeffrey Haas <jhaas@pfrc.org> Mon, 25 March 2024 15:35 UTC

Return-Path: <jhaas@pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 16F70C14F683 for <idr@ietfa.amsl.com>; Mon, 25 Mar 2024 08:35:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.91
X-Spam-Level:
X-Spam-Status: No, score=-6.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cDwd93JCQZjy for <idr@ietfa.amsl.com>; Mon, 25 Mar 2024 08:35:51 -0700 (PDT)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id AE407C14CE44 for <idr@ietf.org>; Mon, 25 Mar 2024 08:30:25 -0700 (PDT)
Received: from smtpclient.apple (172-125-100-52.lightspeed.livnmi.sbcglobal.net [172.125.100.52]) by slice.pfrc.org (Postfix) with ESMTPSA id 877601E039; Mon, 25 Mar 2024 11:30:24 -0400 (EDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\))
From: Jeffrey Haas <jhaas@pfrc.org>
In-Reply-To: <z7xyp2afi6eqobgl5vpvl6as2yit6lyv3ye4so727gzobl4gdt@jp4akmw5lh7b>
Date: Mon, 25 Mar 2024 11:30:24 -0400
Cc: Sue Hares <shares@ndzh.com>, "idr@ietf.org" <idr@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <09C3AD6C-E195-454B-8DBB-81EFE0E01E22@pfrc.org>
References: <DM6PR08MB48574BAABAAC203EA2F9F139B3312@DM6PR08MB4857.namprd08.prod.outlook.com> <z7xyp2afi6eqobgl5vpvl6as2yit6lyv3ye4so727gzobl4gdt@jp4akmw5lh7b>
To: Ben Maddison <benm=40workonline.africa@dmarc.ietf.org>
X-Mailer: Apple Mail (2.3696.120.41.1.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/EngMLlWJgNDDr8HMkDcUy4EgPBM>
Subject: Re: [Idr] WG LC for draft-ietf-idr-bgp-sendholdtimer-03 (3/23 to 4/12/2024)
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Mar 2024 15:35:52 -0000

Ben,


> On Mar 25, 2024, at 6:44 AM, Ben Maddison <benm=40workonline.africa@dmarc.ietf.org> wrote:
> The only remaining issue, which I believe warrants further discussion in
> the WG, is the guidance that BGP speakers should send a NOTIFICATION in
> response to the SendHoldTimer_Expires event.
> 
> Although I appreciate that for the purposes of internal consistency it
> is desirable that the FSM mandates a consistent set of steps to take
> when tearing down a session, I think in this case it results in further
> contradiction.
> 
> Consider the possible outcomes for an implementation responding to a
> SendHoldTimer_Expires event by sending a NOTIFICATION to its peer, as
> written currently. Either:
> 
> 1. The local speaker waits indefinitely for confirmation that the
>   message has been sent on the socket, delaying the closure of the TCP
>   connection, and defeating the objective of the timer entirely; or
> 
> 2. The local speaker successfully sends the message to its peer, and
>   finds itself in the contradictory position of having successfully
>   sent a message to the peer, whilst handling a condition that has
>   arisen due to its supposed inability to do exactly that!
> 
> I believe that the intention here is that the local speaker performs its
> internal bookkeeping and event logging using the "Send Hold Timer
> Expired" error code, but only attempts to send the NOTIFICATION to its
> peer if it can do so without blocking the remainder of the session
> clean-up.

The intention overlaps the text I did for the BFD cease subcode, now RFC 9384.  In some circumstances, the connection may effectively be gone anyway, but we're in need of having diagnostic information as to why.  In the case of that RFC, we're not guaranteed to be send blocked but the TCP connection might otherwise have reason to be gone by that point anyway.

As you note in 2 above, successfully sending the notification would be perverse since we should only enter this situation because we're send blocked.

When you examine the remainder of the sendholdtimer text in section 3.3, it follows the usual pattern of sending a notification where we send it adn then tear down the tcp connection anyway.  The point I think you're highlighting is whether "send a message" should be considered a synchronous operation or not - see at least one prior comment I'd left in the thread for this feature suggesting that shouldn't be the case.

The desire, as you note above, is that all of the work we'd have done if we had sent the message on the wire even if we can't even successfully enqueue the message to tcp.  The implementation would do its diagnostics, track last sent code/subcode, etc.  

Is there any particular text you'd want to indicate the desire is that this should be done asynchronously in spite of the fact that nowhere else in the BGP FSM text we discuss how such enqueueing is done?

> Similarly, it seems unnecessary and confusing to restart the timer when
> a NOTIFICATION is sent.

That's a reasonable exception case.  

> 
> I have attempted some wording changes to address this, and submitted
> them for review in [1]. I would be glad to hear the thoughts of
> participants that are more practised in FSM surgery on this.

The current pull 4 diff is I think overly proscriptive, but is iterating in the right direction.

-- Jeff