[tcpm] Re: CWND increase in disordered state in Linux?

Neal Cardwell <ncardwell@google.com> Tue, 03 September 2024 16:11 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A2D81C18DB8C for <tcpm@ietfa.amsl.com>; Tue, 3 Sep 2024 09:11:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.108
X-Spam-Level:
X-Spam-Status: No, score=-17.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, GB_ABOUTYOU=0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VOIyEEj51M3Q for <tcpm@ietfa.amsl.com>; Tue, 3 Sep 2024 09:11:51 -0700 (PDT)
Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 73448C1840FE for <tcpm@ietf.org>; Tue, 3 Sep 2024 09:11:51 -0700 (PDT)
Received: by mail-qt1-x82c.google.com with SMTP id d75a77b69052e-4567deb9f9dso771381cf.1 for <tcpm@ietf.org>; Tue, 03 Sep 2024 09:11:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725379910; x=1725984710; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=tkg7cibNb0vN4n9sxA7oVFhWEh3TbKdg4RFQo/W0Zok=; b=f4uIzUrhAWVmcm/voA/4+ek+dhGmRma6PuBXIPP+AgqNpvNYq2BonQ1dvwjri4Riol 8E7dJvBqdyVpF8YGHSanHvkc2r9gkonm7DWasIYPre4YgYB3E/oKVOczQKtVrKPmcvwv vQjiNYfrCfyazwll4/Tdc6HRCDh1sNxEhOa+dqY52JsxARy9/uSEHR9q61mpmTVUJr8J vhlVEAmLXs2sgj4a64DHBdsxiU1Y8lE85toDd/7SP1XM197CYWU6kgKShhwTfX7NiaKa H8OVMyzzsf+CiIDSeHgzFUlmZEOhwMvLT2kC+cqkDfboHjCVnPNc5SK7VrDT1AlH/8tX p9Wg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725379910; x=1725984710; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tkg7cibNb0vN4n9sxA7oVFhWEh3TbKdg4RFQo/W0Zok=; b=V2C0t6VY3CH5UMiOp9A/PFa1uTgLVeRPe6RrHIDG9k+R947g9ppEfJHzgAjrLGdJqm CG8K93+Gi14gpUcui7BSydqp+dYhGpdOzcCDDVIOokbzr0XH49sGbylHRbV/VLfi7Xxd GQU0lvD1qsh4dnS9hCmYfXJaf3j4tIt6OKWwetUl5SFuj7AjNC+9YWs9v5E9Qgv72QMV idnO18QBQOZBJOij6/nLaHl/KIupgz/P7QTbmbpAbZT2zwPFlIzpwJNbE+5nSpYLIWmb mHJVz+vt9xIuZbChlT/Ej0CixnJ80lL54XMu+LcUapY7IlYUeN79h34vwN8fwlMw9m7O RWYA==
X-Gm-Message-State: AOJu0YyyVENh1AswmEUDIkV7IIJvZHpS63mf7HuVlVuwZiroBSHbg322 50VraOlDKt3oXBaH3Wv8/ysq9Swp0NeDSSO3pAeoBXD6zNxJwDeLrn/087+qBoIkAr2jowVHcHX G3yQawX/MKw/aVNQ6FfIw45Ko582xDMELyRVT
X-Google-Smtp-Source: AGHT+IE+v+8y02hZBl1F/mGwr8JZGAC6LMHuluESyJ/Zik81I4ZS/eq3luOBOua60/hX9UJhyf8UTVJeMdT656OyK2E=
X-Received: by 2002:a05:622a:1999:b0:456:7ea2:f6f3 with SMTP id d75a77b69052e-457c41ba781mr8121471cf.5.1725379910137; Tue, 03 Sep 2024 09:11:50 -0700 (PDT)
MIME-Version: 1.0
References: <AM8PR07MB81375A216023B9AC11EA6F3DC2932@AM8PR07MB8137.eurprd07.prod.outlook.com>
In-Reply-To: <AM8PR07MB81375A216023B9AC11EA6F3DC2932@AM8PR07MB8137.eurprd07.prod.outlook.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 03 Sep 2024 12:11:31 -0400
Message-ID: <CADVnQymBZnijVoWX7_4fyYJXkf5Gs5-HXaF_4-7Xh2ROHisJ9Q@mail.gmail.com>
To: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>, Yuchung Cheng <ycheng@google.com>
Content-Type: multipart/alternative; boundary="000000000000a5f6800621395020"
Message-ID-Hash: 7XFQDGKCFMD63YG5GCPL4VJ23LGIMRQX
X-Message-ID-Hash: 7XFQDGKCFMD63YG5GCPL4VJ23LGIMRQX
X-MailFrom: ncardwell@google.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "tcpm@ietf.org" <tcpm@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Re: CWND increase in disordered state in Linux?
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/cFjRyYXAqNx05249r02UWi9Uf_E>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>

On Tue, Sep 3, 2024 at 5:47 AM Ingemar Johansson S <
ingemar.s.johansson@ericsson.com> wrote:

> Hi
>

Hi, Ingemar!


> Hope you are fine
>

Yes! Hope you are fine as well. :-)

I'm cc-ing +Yuchung Cheng <ycheng@google.com> explicitly on this thread,
since he is the author of the tcp_may_raise_cwnd() code that Linux TCP has
been using for about 11 years now, and he may want to add more comments or
context.

I am working on improving the TCP stack in our 5G system simulator to pass
> all relevant packetdrill tests.
>
> One thing that I I try to figure out is the behavior in CA_Disordered
> state. One such example is the prr-ss-10pkt-lost-1.pkt test.
> What I see in the current implementation is that cwnd increases from 10 to
> 12 when it is in disordered state.
> When it gets into CA_Recovery however the ssthresh should be set to 7 but
> it is set to 8 because int(12*0.7) = 8. So I guess ssthresh should be set
> based on cwnd from when the state was last in CA_Open ?.
>

I'm surprised you are seeing cwnd increases from 10 to 12 when the
connection is in Disorder state in that prr-ss-10pkt-lost-1.pkt test. The
Linux TCP logic should not increase cwnd when receiving the SACKs in that
test case (see below for the rationale). The test implicitly asserts that
the cwnd increase upon receiving SACKs does not happen, since it explicitly
asserts that cwnd and ssthresh are both set to 7. And AFAIK that test
passes on upstream kernels and the kernels we use. So I'm surprised you see
a cwnd of 8 rather than 7.

Are you using this version of the test:
https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fast_recovery/prr-ss-10pkt-lost-1.pkt
?

To help understand the discrepancy, can you please share:

(a) the exact version of the packetdrill script you are using (either with
a git SHA1, a github URL, or attaching the script) and

(b) the exact Linux kernel version you are testing (with a tag name or
SHA1)?


> What confuses me is that RFC3042 says that CWND should not be modified
> when in disordered state, however the explanation text in
> tcp_may_raise_cwnd() seem to indicate that CWND can increase as long as new
> data is SACKed, right?.
>

The Linux TCP code intentionally does not exactly follow RFC3042 to the
letter, when it comes to the limited transmit behavior. This is for good
reason, based on decades of real-world experience.

The rationale for the Linux TCP tcp_may_raise_cwnd() logic is largely
explained in Yuchung's commit messages from the 2013 code for this function:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f7cc9a3c2bd8
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16edfe7ee02dd


The full current tcp_may_raise_cwnd() logic can be browsed here:
  https://elixir.bootlin.com/linux/v6.10/source/net/ipv4/tcp_input.c#L3544

I will try to summarize the rationale by paraphrasing Yuchung's commit
messages:

Currently the Linux TCP stack detects reordering and avoids spurious
retransmissions quite effectively; since roughly 2018 this is by using
RACK-TLP: https://datatracker.ietf.org/doc/html/rfc8985

However, around 2013, before RACK-TLP, and when the Linux TCP code was
largely following the cwnd-increase dictates in RFC3042,and before
Yuchung's commits noted above, the throughput was suboptimal for
connections experiencing high reordering.This was because cwnd was
increased only if the data was delivered in order. i.e., FLAG_DATA_ACKED
was set in tcp_ack(). The more packets were reordered, the worse the
throughput was, because with high reordering only a small fraction of ACKs
advanced SND.UNA and allowed cwnd to increase.

Therefore, Yuchung's commits noted above changed the logic so that:

(A) when the measured degree of reordering is high (above the default
dupthresh of 3 packets), cwnd increases whenever data is delivered
(FLAG_FORWARD_PROGRESS is set) regardless of its ordering (regardless of
whether packets are marked as delivered via cumulative ACKs that advance
SND.UNA  or via SACKed out-of-order data)

(B) in the common case where reordering is low, the code conservatively
follows the cwnd increase conditions in RFC3042 and increases cwnd only on
ordered deliveries (cumulative ACKs that advance SND.UNA).

The performance difference Yuchung found in testing his changes was quite
large: using netperf on a qdisc setup of 20Mbps bandwidth and random RTT
from 45ms to 55ms (for reordering effect, Yuchung found his  changes
increased TCP throughput by 20 - 25%, allowing TCP CUBIC to reach a
throughput near the bottleneck link bandwidth.

Ingemar, because the test you mention above, prr-ss-10pkt-lost-1.pkt, does
not have reordering, the logic should be using the (B) path above, which
matches RFC3042 behavior. So I'm surprised you are seeing the cwnd increase
based on SACKs arriving in Disorder for that test. So I'd be curious to
hear more details about your test and kernel to understand why your tests
are not seeing the normal Linux TCP behavior.

best regards,
neal



>
>
> /Ingemar
>
> PS: CC:ed the tcpm group in case this topic is of more general interest
>