[Idr] Some comments on draft-spaghetti-idr-bgp-sendholdtimer-09

Jeffrey Haas <jhaas@pfrc.org> Tue, 07 March 2023 22:23 UTC

Date: Tue, 07 Mar 2023 17:22:40 -0500
From: Jeffrey Haas <jhaas@pfrc.org>
To: draft-spaghetti-idr-bgp-sendholdtimer@ietf.org, idr@ietf.org
Message-ID: <20230307222240.GA16033@pfrc.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/u_1u4fa3JpHHvwBzVYTEde8IDFg>
Subject: [Idr] Some comments on draft-spaghetti-idr-bgp-sendholdtimer-09
Precedence: list

Authors,

Here's some comments on your proposal:

In general, the draft will take a tone that the situation where the
sendhold-timer feature is required because of "attacks".  I don't think that
serves to illustrate the scenario. It is also likely to bring undesired
and poorly focsued attention from our security-minded people in the IETF.
I'd suggest restraining your comparisons to clogged pipes at most. :-)

Another general thought is the document is currently written solely with the
thought that the only desired action is to bounce the session.  While I
agree that this is probably the most desired case for long-stalled sessions,
have you given any consideration to permitting the mechanisms solely to
raise an alarm and permit manual intervention?  For short timers, this may
be more desirable than aggressively bouncing a session in some
circumstances; e.g. the remote side is undergoing an upgrade and has
intentionally stalled the session.

One detail that could use some expansion is more detail as to when you bump the
MsgSent (noted below) counter.  While you might consider it obvious that it
should be bumped only when the entire BGP PDU is put into the socket, you'd be
surprised at what some other people are likely to decide is fine.  This
clarifies the partial writes of the PDU scenario.

Finally, before the misc. comments, while it's not your proposal, it might be
useful to contrast this vs. tcp user-timeout.  (Ignoring the inconsistent
support in various stacks.)  That mechanism works at a much finer level of
granularity than your proposal in the sense that any unacknowledged TCP state
within the user-defined time can result in the session being dropped.  For your
proposal, you're relying on counters being reset on PDU boundaries:
- This means that in the face of large PDUs (max 4k without extension), and
  sluggish receivers, it may be difficult to reset the sendholdtimer.
- By contrast, sluggish but still moving receivers won't trip the user time out
  feature as much.
- As RFC 8654 gets deployed in the future, this constrast will become more of an
  issue.

Misc comments:

11	Abstract

13	   This document defines the SendHoldTimer session attribute for the
14	   Border Gateway Protocol (BGP) Finite State Machine (FSM).

SendHoldTime is the session attribute.  Updating the document to distinguish
the SendHoldTime vs. its timer should be done.  Compare vs. the other
session attributes that are times and their respective timers in RFC 4271.

107	2.  Example of a problematic scenario

109	   In implementations lacking the concept of a SendHoldTimer, a
110	   malfunctioning or overwhelmed remote peer may cause data on the BGP
111	   socket in the local system to accumulate ad infinitum.  This could
112	   result in forwarding failure and traffic loss, as the overwhelmed
113	   peer continues to utilize stale routes.

The above is an example where saying less may be helpful.  It's not necessary to
speculate what went wrong.  All that is necessary is to say that stalled
sessions can result in stale routing state downstream of the stalled session.

115	   An example fault state: as BGP runs over TCP [RFC9293] it is possible
116	   for hosts in the ESTABLISHED state to encounter a BGP peer that is
117	   advertising a TCP Receive Window (RCV.WND) of size zero, this 0
118	   window prevents the local system from sending KEEPALIVE, CEASE,
119	   WITHDRAW, UPDATE, or any other critical BGP messages across the
120	   network socket to the remote peer.  Historically, many BGP
121	   implementations were unable to handle this situation in a robust
122	   fashion.  Previous BGP RFC specifications would not give cause for
123	   the session to be torn down in such situations.

Here, I suggest you be cautious about worrying over-much about zero-windowing.
This behavior is normal in TCP, and ideally short lived.  If you must mention it
at all, note that zero-windowing is an observable behavior when there is
potentially long-lived congestion.

146	3.2.  SendHoldTimer_Expires Event Definition
[...]
154	   If the SendHoldTimer_Expires (Event XX1), the local system:

[...]
163	      -  drops the TCP connection,

165	      -  increments the ConnectRetryCounter,

167	      -  (optionally) performs peer oscillation damping if the
168	         DampPeerOscillations attribute is set to TRUE, and

Delete the "increments" and "peer damping".  Those cases are handled by the Idle
state machinery.

172	   If the DelayOpenTimer_Expires event (Event 12) occurs in the Connect
173	   state, the local system:

175	      -  sends an OPEN message to its peer,

177	      -  sets the HoldTimer to a large value, and

179	      -  sets the SendHoldTimer to a large value, and

You should define a "large value" here compared to the default value of 8
minutes.  The same comment applies to the section directly below the above one
in the text.

201	3.3.  MsgSent Event Definition

203	   Section 8.1.5 [RFC4271] is extended as following:

205	   Event XX2: MsgSent
206	   Definition: An event is generated when a KEEPALIVE or UPDATE message
is transmitted.
207	   Status: Mandatory

It's worth noting that this corresponds to the event that bumps this in the BGP MIB:

        bgpPeerOutTotalMessages OBJECT-TYPE
            SYNTAX     Counter32
            MAX-ACCESS read-only
            STATUS     current
            DESCRIPTION
                    "The total number of messages transmitted to
                     the remote peer on this connection."
            REFERENCE
                    "RFC 4271, Section 4."
            ::= { bgpPeerEntry 13 }

And similar in the BGP YANG modules (IETF and OC)

By proxy, this means you have the ability to monitor the state of events that
should reset the sendholdtimer.

209	3.4.  Restarting the SendHoldTimer

211	   On page 74 [RFC4271] before "If the local system receives an UPDATE
212	   message, and the UPDATE message error handling procedure (see
213	   Section 6.3) detects an error (Event 28), the local system:", add the
214	   following:

216	   If the local system transmits a KEEPALIVE or UPDATE message (MsgSent
217	   (Event XX2)), the local system:

219	      -  restarts the SendHoldTimer, and

221	      -  remains in the Established state.

I think this section is probably in error.  The section you're doing surgery on
is part of the update error detection and we're about to drop the connection.
No additional work should be done.

223	4.  Send Hold Timer Expired Error Handling

225	   If a system does not send successive KEEPALIVE, UPDATE, and/or
226	   NOTIFICATION messages within the period specified in the Send Hold
227	   Time, then the BGP connection is closed and a log message is emitted.

While this is a good attempt at RFC 4271 surgery, we've added other messages
since then.  (E.g., refresh.)  Better text is to simply discuss sending a BGP
message.  RFC 4271, §4 and later uses this term.

253	   *  Failure to disconnect from a 'stuck' peer hinders the local
254	      system's ability to construct a non-stale local Routing
255	      Information Base (RIB).

I don't follow this point.  The local system has valid RIBs.  What is failing is
syncing the Adj-Rib-Out for a particular remote bgp speaker.

-- Jeff

[Idr] Some comments on draft-spaghetti-idr-bgp-se… Jeffrey Haas