Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Robert Raszuk <robert@raszuk.net> Sat, 12 December 2020 01:20 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 132E23A0CED for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 17:20:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gxP6NMSjA-4D for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 17:20:04 -0800 (PST)
Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 855AA3A0CEA for <idr@ietf.org>; Fri, 11 Dec 2020 17:20:04 -0800 (PST)
Received: by mail-lf1-x12e.google.com with SMTP id h19so15977699lfc.12 for <idr@ietf.org>; Fri, 11 Dec 2020 17:20:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lOSGVKMYjJyn8z0Ep8idvFdMq2upIJF8alSJYWHXDeU=; b=IPJmuiJrpUJvAfcrXAVKGy2gOq6jo9ydz1JKCmbdG/KHvPN6eVIiEl9hGREzO2TQQq 0WMtsWQ1my7IvVrcHy2nfNyTTdm6txMMnnLWr13hDVtZj2FV+D0X8b31gdy+dGQfOC8Q goTn2TqJ9P3bsWDCPBmqYXNhxuPQ4fvuRrFe2ZgH8vgMflZpv5/1lgo5jtZZ6jCgPHnP ggs4oPf59P84KTOBRCXqBBKAc9qGQyWiqb04Ec3Ui+DYFORX2DcNnT3GrqAJxka6kKTn pTsYZxh3ynauKCbOU9SvC9zW/nMFyv1dx8eGHCE29Ck6HzbWaWhbJh/Ly0UpOlxKGgOj tP6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lOSGVKMYjJyn8z0Ep8idvFdMq2upIJF8alSJYWHXDeU=; b=c9IIEe7N7uYNNJQM7cwike9fZyq5ytFKwvXhPk6aKPXvcD48rBHdqjnHT7VS/6bYjO pBv/zAqQpFnt38k4LnshqCbZQPmVw2PlyRm9iVsQwuH0J14LVhXAGBd7PrgqgpZbapIL WUUBJMBNMFrHz5ht91ILui2Dm4iLBzeMhATs8a2zjZognEgyAZ7kHSVMFQcmmQyU6sb/ 9WvNrx96Cg1BQK6Ia9fcABEMbE0Rja2lqZf9Cuhx9GouhQ3A3xawdojVQSbUiQZTybPX kMS9NyyrU24hqNMF07p/4sBSYHCSJu1aulRLFrqStFoEN/trx/NtKFZWHt6I5tvQYdEk FBxQ==
X-Gm-Message-State: AOAM530j1z275az/M3Rbu4gs7BT52ynf7WtJfi0ZMAM/hQMGM1lZUzve m4G8zYeCTr2CxhltwVt1KYStiPoEJoUk8enw0vj5rmMOfe+8Yw==
X-Google-Smtp-Source: ABdhPJyjd0Mzhx+gAuGOUY9k0T9Xm6i86L/k42+kRFAGGt7bDQorOv4eUdVoxrGW3jtBc3W18Cy3AxrlSgL5E0xjxMw=
X-Received: by 2002:a2e:9dd4:: with SMTP id x20mr3359838ljj.37.1607736002357; Fri, 11 Dec 2020 17:20:02 -0800 (PST)
MIME-Version: 1.0
References: <X9PHRuGndvsFzQrG@bench.sobornost.net> <FCB1ADB7-AD8C-447E-82FE-2EC15B8C3FB9@juniper.net> <CAOj+MMEGRLw9cRXJR4VgOYtoj+tRyeY4WhWsdkMuYktGh6THag@mail.gmail.com> <0F61A27E-935C-4B95-9761-0D454D0F66A8@tony.li>
In-Reply-To: <0F61A27E-935C-4B95-9761-0D454D0F66A8@tony.li>
From: Robert Raszuk <robert@raszuk.net>
Date: Sat, 12 Dec 2020 02:19:52 +0100
Message-ID: <CAOj+MMHt_JGt_do0gT1nwZ8z-d16woVAUqTqXOR7kZzVDv3fjA@mail.gmail.com>
To: Tony Li <tony.li@tony.li>
Cc: Job Snijders <job@sobornost.net>, "idr@ietf.org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000005040f505b63a35f1"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/3vIJ_vjGdCYElGfZPkVzJ6UOE4g>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Dec 2020 01:20:11 -0000

Hi Tony,

> * Is the "unable to send" only possible under Window = 0 ? What if there
> is a local NIC buffer full and we keep dropping it locally ? If we are
> going there perhaps we could say "unable to successfully send" meaning send
> and get an ACK for it ?
>
> There are many, many reasons why we might not be able to exchange bits.
> The specifics aren’t particularly relevant. The point is that we’re not
> able to make progress, so the session is clearly broken.
>

I think we all see this and agree.
My point was to be able to detect more cases of unidirectional stuck
sessions resulting in more triggers to shut it down (then just watch the
RCV WND 0 for t=HT.

> * The proposal is about reusing the HOLD TIME value to bring BGP down
> when you are still receiving keepalives however peer sent ACK for the last
> segment indicating zero window - is this right ?
>
> More generally, the proposal is that we apply the HOLD TIME on the
> transmit side as well as the receive side. If we are not able to transmit
> for that period of time, the receiver should give up and so should the
> transmitter. The session is broken, updates cannot flow, and we no longer
> have (eventual) consistency.
>

That means as you also mentioned below that receiver should close the
session when HOLD TIME expires. I would like to actually better understand
why this is not taking place here, before we continue.

Creating more TCP sessions is not likely to improve the behavior of a TCP
> receiver.
>

Hmmm if socket it full to one SAFI and other socket(s) are working just
fine to some other ones I am not sure if this is not a safer bet.

Say IGP is churning and BGP-LS is taking all resources of TCP single
socket. Then 1/1 or 2/1 will die with it - when they could run just fine
...

Hi Jakob,

> whereas NOTIFICATION and FIN precludes GR.

Then it is pretty clear that we must start with RST. Hoping that subsequent
TCP session will be healthy. Only then perhaps we should kill it hard.

Best,
R.