Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt

Enke Chen <enchen@paloaltonetworks.com> Fri, 19 August 2022 17:25 UTC

Return-Path: <enchen@paloaltonetworks.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 42B3BC1522A7 for <idr@ietfa.amsl.com>; Fri, 19 Aug 2022 10:25:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=paloaltonetworks.com header.b=JTqY5qhS; dkim=pass (2048-bit key) header.d=paloaltonetworks.com header.b=bp7EAqK1
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L3RLIwAEAjMN for <idr@ietfa.amsl.com>; Fri, 19 Aug 2022 10:25:18 -0700 (PDT)
Received: from mx0b-00169c01.pphosted.com (mx0b-00169c01.pphosted.com [67.231.156.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C2930C14F745 for <idr@ietf.org>; Fri, 19 Aug 2022 10:25:17 -0700 (PDT)
Received: from pps.filterd (m0281122.ppops.net [127.0.0.1]) by mx0b-00169c01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27JHPFIh026382 for <idr@ietf.org>; Fri, 19 Aug 2022 10:25:16 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=PPS12012017; bh=KI7njnv5aHWzIyYjvmOQoIZwQh8SKvTWE7H8G2mWsoc=; b=JTqY5qhS6OmLddh0bOOq0/3E4f3lVTJBhXLOe3JOrogzVEuyPBq2ucE/NP5mQlOQ095h fzkzzyKINElkIUQG1Gw4TZKatOfzT+xnH61YzzgspR+jK4867rMYvsoKA8beplpjgJiY qTsoYkO8JBwHZLWucLl575e6imyzth3m7N+DAZy9THM/xWwAxSktSC9/e7YP3tvQqI12 caZA271ZrCDQYoKWE3CfqNPp7A7omLaEqGZ4K5qBbyR1gLSWR9O7P9WixnxYAKDUBj20 2jJO6UrLQ8I00Cxf4q47CQMO7kpwpWZjljacK9P9we5NU9j7dOazm0j4LnI1HHv4exh6 gg==
Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) by mx0b-00169c01.pphosted.com (PPS) with ESMTPS id 3j2etu80cq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <idr@ietf.org>; Fri, 19 Aug 2022 10:25:15 -0700
Received: by mail-pj1-f71.google.com with SMTP id oo12-20020a17090b1c8c00b001faa0b549caso5187710pjb.0 for <idr@ietf.org>; Fri, 19 Aug 2022 10:25:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; s=google.paloaltonetworks.com; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=KI7njnv5aHWzIyYjvmOQoIZwQh8SKvTWE7H8G2mWsoc=; b=bp7EAqK18JVe9J31OK1MyHxCTTzFBldwmarEdxXotgZCtANNaWB4rNItQMK/eRymJd HQMTn2HwPmCZlUPzzNmokKB31WsjBE3sNn7ziJJAFJytaWrjlHb6hW4KNglgDE61l6oc AeRdqTRbp83q7dXiN1Zb+Yxdq1GxhQzdGylWGQumTnC5pmPwrtOFyGiSBR30oLXsKJuY qh+xC/g+6mw13kgCQnTdEjkebj26708N3M3VcXLIdcYA+N3HmLwLV8RgK0AtkkxWesm5 vnpA0oA59e2WU9LKp354i9V7QCb+z3X2oa1WCXA5xsN+D5OIpYyFK5yonz3JiLQDZyk9 xv+Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=KI7njnv5aHWzIyYjvmOQoIZwQh8SKvTWE7H8G2mWsoc=; b=uaPZaAJW389MWozgf3vDmI+145M5CRtm4dDR+bs88/TzKiZ5IBHUlyG80hRi2lDm16 95H+WnMqm0zlssV+mR0hM+f6w6HXSXrPSPCydumUUGRuIZDMYJgQkPJaBj/yoPLRrV4v PHS8GxgkbuHlZo7AE0EX/3ljQLF5wd1pQxhHz66xnynTD6sOSaiYUXK2n36cWPg8vRMp u6Jp1YN5iuyLw0RjkgsNzmMb/Adzfxtoj2ORofKYGZFvaAdCb7palvEJ4LNfMfrQUppn iDX0W86OigXeJV68oa2xd5g5U67aU4yHm5YmHkur9XLYRQON+t/tBoVwlPuJ97obAeej d1IQ==
X-Gm-Message-State: ACgBeo2gttq/Y5fF6Ar1yw54M1BzzaKNNwgfmX6i0Ib7cYHuG9uJKarq DGzViz7vwd07PBCejo25zMCuOkYtsinQmNO9tM5gsrjtRvxT9C3BeHCjqCM9KiwChoP8GkbWpqA tq9Sm4RWe+fLOzKOAij0=
X-Received: by 2002:a17:90a:ee96:b0:1fa:af87:95f9 with SMTP id i22-20020a17090aee9600b001faaf8795f9mr9195852pjz.243.1660929907144; Fri, 19 Aug 2022 10:25:07 -0700 (PDT)
X-Google-Smtp-Source: AA6agR6aYRZBOyCCOPDfhrmreOL2LvvAp+v5gVHEBkQD65B1Wkvi9V/28zI/OwUExv4mOJf1aJwTVD5T8HovR5Io6/Y=
X-Received: by 2002:a17:90a:ee96:b0:1fa:af87:95f9 with SMTP id i22-20020a17090aee9600b001faaf8795f9mr9195797pjz.243.1660929906332; Fri, 19 Aug 2022 10:25:06 -0700 (PDT)
MIME-Version: 1.0
References: <CAOj+MMGTQSOYbd6g55vquzBoE2EEGMu4QSMDpYSTWvFhX4+BHg@mail.gmail.com> <CAEm8Q11M35gp=m2pMjnQ_RnQ4S_Otx4wugwx03QRPDvCzMWcyw@mail.gmail.com> <CAOj+MMEdWr4mnp0Cr9QSQ+Msfb6jHwziu=ttPGhdXUrtgtZqBw@mail.gmail.com> <Yvp3eZ4iDccWNmIR@shrubbery.net> <CAOj+MMER5fTqyyXhFB0VkL51CHKC81=DNfGeqtHqPEcAgS0LBw@mail.gmail.com> <Yvq12HOd+1HPPa/t@shrubbery.net> <CAOj+MMFNVM7TrpGGrreWufkP97X0n0W11y2eOsnss+v5irE62g@mail.gmail.com> <AM7PR07MB6248651F07184633E93B1144A06A9@AM7PR07MB6248.eurprd07.prod.outlook.com> <83BA8ED7-3ABF-4079-AFC5-F9F60CEA9668@pfrc.org> <CANJ8pZ8Xv2PXTqmtv_pg5XCcAyN=5_UQa2ab9LeDkbiuFdXUqw@mail.gmail.com> <20220819162451.GA17925@pfrc.org>
In-Reply-To: <20220819162451.GA17925@pfrc.org>
From: Enke Chen <enchen@paloaltonetworks.com>
Date: Fri, 19 Aug 2022 10:24:54 -0700
Message-ID: <CANJ8pZ_RgU-fSKemrBDw1r1-9VnLyTPOryrOKPV0WUpLhkucCw@mail.gmail.com>
To: Jeffrey Haas <jhaas@pfrc.org>
Cc: tom petch <ietfc@btconnect.com>, Robert Raszuk <robert@raszuk.net>, John Heasley <heas@shrubbery.net>, "idr@ietf. org" <idr@ietf.org>
Content-Type: text/plain; charset="UTF-8"
X-Proofpoint-GUID: jjvg2TvarAchDSEo4DLvsM_jc6lX33Os
X-Proofpoint-ORIG-GUID: jjvg2TvarAchDSEo4DLvsM_jc6lX33Os
X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-19_09,2022-08-18_01,2022-06-22_01
X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 clxscore=1015 lowpriorityscore=0 bulkscore=0 phishscore=0 malwarescore=0 suspectscore=0 impostorscore=0 mlxscore=0 priorityscore=1501 spamscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208190064
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/aS8L1JW7NOgvNUS_RlMORcwvpns>
Subject: Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2022 17:25:23 -0000

Hi, Jeff:

Thank you for your comments. Please see my replies inline.

On Fri, Aug 19, 2022 at 9:24 AM Jeffrey Haas <jhaas@pfrc.org> wrote:
>
> Enke,
>
> On Thu, Aug 18, 2022 at 08:20:19AM -0700, Enke Chen wrote:
> > We have spelled out several recommendations for applying the TCP User
> > Timeout to BGP in the following draft:
> >
> >        https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-chen-idr-tcp-user-timeout/__;!!Mt_FR42WkD9csi9Y!Y42MYIGWjyExkrGXFCRWmilyyAOF-OGmR3HhTqCFjbbeHMVlDFNhrFCkBdNW9FxBv_0MJzduyRYnh0PvRA$
> >
> > I hope your concerns with using the TCP parameter are adequately
> > addressed. Please let us know if you have any comments.
>
> Thanks for the pointer to this new submission.
>
> With regard to the FSM interaction, your suggested procedure is clear: Use
> End-of-Rib as a trigger.  It's a good start to a proposal.
>
> Your draft also picks some values for the timers with a minimum value of 10
> minutes.  Loosely quoting John Scudder, "constants are always wrong", but we
> have to start somewhere. :-)

Yes, there needs to be a default value, and it should be configurable.
That's what we recommend in the draft.

>
> The detail I'd provide for Working Group and your consideration against when
> do we want to start the timer is that initial sync can get stuck (perhaps
> more so in some circumstances!) just as well as later timelines.  We know
> that devices initially booting are having to consume a significant amount of
> state and this both contributes toward per-peer slowness and also many
> opportunities for zero-windowing due to inability to service a given socket
> in a timely fashion.
>
> My comment applies equally to your draft and the sendholdtimer draft:
> Portions of the BGP FSM set timer values depending on what point we are at
> in the FSM's execution.  It's therefore reasonable to pick a very high value
> and always start the timer upon session startup and then potentially reduce
> the timer after we've achieved end-of-rib.  If end-of-rib isn't going to be
> received, that lower timer threshold may be reasonable to set after some
> drop-dead time; alternatively just stick with the very high value.
>

For such a corner case, IMO we should keep the solution simple.
It seems that the key is to make the timeout value large (and
configurable) so we don't generate false positives.

> Similarly applicable to both drafts, I'm wondering if there may be benefit
> in signaling the peer what the sendholdtimer (or equivalent tcp-user timer)
> values are.  It's effectively a warning, and a hint to the remote peer's
> schedulers to make sure that some amount of data is drained in a timely
> enough fashion.  The NOTIFICATION subcode we've started discussing for
> signaling that the sender has torn things down is the other half of this
> equation.

The signaling mechanism for TCP user-timeout is already specified in
RFC5482. It's optional.

>
> With regard to the TCP specific feature, I still have personal unease with
> what the timer implies.  The general desire articulated by many in these
> threads is "the peer makes progress and doesn't stay blocked forever".
> Certainly the TCP user timeout would accomplish that.  However, the timer is
> versus "unacknowledged data".  What this means is not "we've made no
> progress", rather "stuff was enqueued X seconds ago and unacknowledged".
>

The TCP user-timer applies to "un-delivered data", including both
"unacked data" and "buffered but untransmitted data".

> I think the above illustrates a key detail the Working Group will need
> consensus on: Do we care if we're not making progress (perhaps at all), or
> are we willing to put a timeliness requirement on the contents?
>

BGP only knows when the local TCP write-buffer is full, and that is
only one specific case. There are several other cases where the TCP
write-buffer is not full, but can also result in stale routes:

    - BGP messages are not delivered for a long time, either due to
zero-window or lost ack.

It can take a long time for the TCP buffer to be filled up when
routing is relatively stable.

That's why I believe a solution at the BGP level would be incomplete
and perhaps even more complex. On the other hand, TCP user-timeout is
specifically designed to handle these cases. It's more complete and
simpler.

-- Enke