Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt

Jeffrey Haas <jhaas@pfrc.org> Thu, 04 August 2022 11:53 UTC

Return-Path: <jhaas@pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7AA7DC13CCFA for <idr@ietfa.amsl.com>; Thu, 4 Aug 2022 04:53:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.907
X-Spam-Level:
X-Spam-Status: No, score=-1.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 62g6oWVnVV4m for <idr@ietfa.amsl.com>; Thu, 4 Aug 2022 04:53:51 -0700 (PDT)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id DF6AFC13CCE0 for <idr@ietf.org>; Thu, 4 Aug 2022 04:53:50 -0700 (PDT)
Received: from smtpclient.apple (99-59-193-67.lightspeed.livnmi.sbcglobal.net [99.59.193.67]) by slice.pfrc.org (Postfix) with ESMTPSA id 179CD1E355; Thu, 4 Aug 2022 07:53:49 -0400 (EDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
From: Jeffrey Haas <jhaas@pfrc.org>
In-Reply-To: <MW4PR02MB7394715B61411284F888994AC69C9@MW4PR02MB7394.namprd02.prod.outlook.com>
Date: Thu, 04 Aug 2022 07:53:48 -0400
Cc: Donatas Abraitis <donatas.abraitis@gmail.com>, Job Snijders <job=40fastly.com@dmarc.ietf.org>, "idr@ietf. org" <idr@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <4859BB49-2239-4C4C-B4D8-0F6700B86893@pfrc.org>
References: <165920076221.43110.14224170878306367770@ietfa.amsl.com> <CAMFGGcC19MJ4poutfp_C-=14RjQeNQXgc24vHyXoQsdZLNq5PQ@mail.gmail.com> <1cb64c4d-b0ea-747c-7eb9-f28f5d399361@foobar.org> <20220803170621.GC16746@pfrc.org> <YuqtIlMIbMgu4woA@snel> <CAPF+HwXaf0SVz38QHB+qLjFTaZG7tNKQk62E9zKmAErL8kQxeQ@mail.gmail.com> <20220803185353.GD16746@pfrc.org> <MW4PR02MB7394715B61411284F888994AC69C9@MW4PR02MB7394.namprd02.prod.outlook.com>
To: "UTTARO, JAMES" <ju1738@att.com>
X-Mailer: Apple Mail (2.3696.100.31)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/LPWX2GK0Rrp924_P6g1imItsC6g>
Subject: Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Aug 2022 11:53:54 -0000


> On Aug 3, 2022, at 7:50 PM, UTTARO, JAMES <ju1738@att.com> wrote:
> 
> The decision whether to retain forwarding state in the face of a control plane failure is dependent of the AFI/SAFI in question. For those that are internal to my network i.e Kompella VPLS/VPWS a catastrophic loss of the RR topology should not invalidate forwarding if NH viability remains. Long-lived-gr clearly calls out the reasoning for not nuking the data plane. IMO the decision needs to be made on an service basis where different operators may have varying tolerance of "incorrectness" for different AFI/SAFI(s).  The majority of customers remaining viable with some incorrectness is preferred over tearing everything down. 

Perhaps little surprise, Jim, you were one of the people that came to mind when I wrote my text below. :-)

Graceful restart (and it's long-lived cousin) don't dwell very much on the details of what sorts of incorrectness may be piling up behind the session being gone - only the hope that it's not too bad.  LLGR at least signals that it's stopped pretending everything is completely fine and gives the downstream routers enough choice to de-prefer those stale paths.

When you simply lose a session due to a TCP hiccup, it's probably not a problem.  When the control plane crashes suddenly, it's a problem, but you at least have faith that it'll come back at some point.

When the sendholdtimer mechanism kicks in, it's because you have 4 minutes (or whatever our magic constant is) of stuff which may include Update messages that hasn't gotten through, so you're already acting later than the above circumstances.  That detail, and its consequences, needs to be documented so that operators can make appropriate GR retention choices.

-- Jeff


> 
> Thanks,
> 	Jim Uttaro
> 
> -----Original Message-----
> From: Idr <idr-bounces@ietf.org> On Behalf Of Jeffrey Haas
> Sent: Wednesday, August 3, 2022 2:54 PM
> To: Donatas Abraitis <donatas.abraitis@gmail.com>
> Cc: Job Snijders <job=40fastly.com@dmarc.ietf.org>; idr@ietf. org <idr@ietf.org>
> Subject: Re: [Idr] Fwd: New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt
> 
> Donatas,
> 
> On Wed, Aug 03, 2022 at 09:19:02PM +0300, Donatas Abraitis wrote:
>> A couple of words regarding GR too (might be added):
>> 
>> "[RFC8538] defines an extension to BGP Graceful Restart that permits 
>> the Graceful Restart procedures to be performed when the BGP speaker 
>> receives a NOTIFICATION message or the Hold Time expires.
>> This document appends BGP Graceful Restart procedures to be performed 
>> also when Send Hold Time expires."
> 
> That's a good start for the side dropping the session due to the sendholdtimer expiring.
> 
> The authors will want to consider what the operational considerations are for the other side of that connection.  There's no way to tell in the face of a closed TCP session without related BGP signaling why the session went down.  If graceful restart is configured, the peer that was (probably) having problems will start retaining routes and the upstream already knew it was out of sync.
> 
> Jim Uttaro's point in the prior thread was, effectively, how bad are things?
> Are you better off retaining a lot of likely good state with some pending bad state?  Or, is BGP being used in a situation where graceful restart is simply inappropriate.
> 
> While this probably seems obvious, this draft motivates the question about dropping the session because Stuff Isn't Getting Through.  This bit of obvious discussion needs to be in the draft.
> 
> -- Jeff
> 
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/idr__;!!BhdT!jwmz-E-6Cd5M8DTgSBfS9MOgqZxUdyb4xRQfd6V_yDDb2FtQ9vbtvpt369JzsHi_Ef6mJ_k$