[tcpm] Re: CWND increase in disordered state in Linux?

Neal Cardwell <ncardwell@google.com> Tue, 03 September 2024 18:52 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2B010C1DFD54 for <tcpm@ietfa.amsl.com>; Tue, 3 Sep 2024 11:52:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.107
X-Spam-Level:
X-Spam-Status: No, score=-17.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, GB_ABOUTYOU=0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3HFb5ePBKNlw for <tcpm@ietfa.amsl.com>; Tue, 3 Sep 2024 11:52:34 -0700 (PDT)
Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6FEC4C1DA1ED for <tcpm@ietf.org>; Tue, 3 Sep 2024 11:52:34 -0700 (PDT)
Received: by mail-qt1-x832.google.com with SMTP id d75a77b69052e-457c6389a3aso40681cf.1 for <tcpm@ietf.org>; Tue, 03 Sep 2024 11:52:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725389553; x=1725994353; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hNyuauc6qfBWG030SwVJXH4aGN9dk8S6xGrnOK6plO8=; b=3UHVweHUgH79bI5SH1TGUZiOxyF4HYNAa8Zck19W4oadNCqPcejLViuBAroniYgtfG syzJaKnheL44KxuDQ2wf+BOw0ARllcrH8K0YGn8Au6wEDHz3/XKt7pdp1r93+WbzlPT4 OoN3x6knZ8ozr7JEn/KLzUMhIG9ZYWU7ZlCM5QOOSihYUJuuUek3rkVMnL/Re3mlHwQ6 8IMTzPouoZSdI9RTFIv98k2hsXV/nLfq6b60kUKVp+ShGNdcIe6hCWUaecZn0oZvnQcn 0N2w4WBkLwFUkRS5U/2o9TO0tRlXqQTJU7Ibwt/RgDZrP6N9SVu+pTOQq2dtpuRjEY5B fY1A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725389553; x=1725994353; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hNyuauc6qfBWG030SwVJXH4aGN9dk8S6xGrnOK6plO8=; b=e7onDcSAYa4w6TSPEJYnRe4TwdowKMXlfiBwkSChRCxFSaeThcMz+tqUoJbdyI7SoE 4FN+OC+fAhB+DC0wWcON8sv987A/HfgTTWXTYTuK7nfxiEhRuYV6yX5ULuzlFrNBkU07 +v/ewDW/7TR+wjQV/ZU7bhEKCC9iR9MZDR40au8rwViuxa0JywqBWLRyiLG2g2PzXife 2U5BVSS8N4xqhSrWNLXfxykPAT1AK/j9TbjXZQtcKphS9Lpun+o2j9dQzpFLqe+MpmNZ KO1dO4X4HRJa/jqDIllJr1GY1KYfN+a1TwKVH3bqZNEIBxIGy84JkgaRZYQ0ZySqctI6 z0Zg==
X-Forwarded-Encrypted: i=1; AJvYcCX270g+D2Ky9KnE651xYO3CE6+TPzmIwVjWseKrMU9Rxb6phsNQkFfSjVa0NpjPXVhF4Pt9@ietf.org
X-Gm-Message-State: AOJu0Yzl4T+qM9K6+XPuaxVQxYAUZtxtsQ/ZU6K4a5+pXNn3QbGYlQv4 EMWMpD+yuOAtV4u2/T4ZIG2r9cE5wnR+V0w9LYwjvvuqR0B6O+VtEcYYoWeCE0s6qwKCrBB72zo 1J8f0GUQeN4KGuKiBKywBZ8oBZ2sCZr9HOSbblMORjnSYOXsLuSYczUk=
X-Google-Smtp-Source: AGHT+IGg/q0fK5T/A1zM1HU/0EOrzDnYuBPYqfc58tlhDtNqB5VIq5hj4tSRVgWGgN5XNAsWDXiMiMt0/0K1i6nbl0E=
X-Received: by 2002:a05:622a:6994:b0:456:7740:c874 with SMTP id d75a77b69052e-457f64f33f8mr309601cf.1.1725389552644; Tue, 03 Sep 2024 11:52:32 -0700 (PDT)
MIME-Version: 1.0
References: <AM8PR07MB81375A216023B9AC11EA6F3DC2932@AM8PR07MB8137.eurprd07.prod.outlook.com> <CADVnQymBZnijVoWX7_4fyYJXkf5Gs5-HXaF_4-7Xh2ROHisJ9Q@mail.gmail.com> <AM8PR07MB81379953CC4CCDE2D7E3BEFCC2932@AM8PR07MB8137.eurprd07.prod.outlook.com>
In-Reply-To: <AM8PR07MB81379953CC4CCDE2D7E3BEFCC2932@AM8PR07MB8137.eurprd07.prod.outlook.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 03 Sep 2024 14:52:12 -0400
Message-ID: <CADVnQym_cOW2oUgte0yDLsDn+N+d7tA1Og3GLAdbA3xV5-S9wQ@mail.gmail.com>
To: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
Content-Type: multipart/alternative; boundary="00000000000062cfdc06213b8fe0"
Message-ID-Hash: 34ZYLQ5M4V7QPP6TKKIZH5TWMZV6N75N
X-Message-ID-Hash: 34ZYLQ5M4V7QPP6TKKIZH5TWMZV6N75N
X-MailFrom: ncardwell@google.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "tcpm@ietf.org" <tcpm@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Re: CWND increase in disordered state in Linux?
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/u4xkPxks8qZQ9eJvZ-_gUzbeKwA>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>

OK, thanks for the extra details, Ingemar! Sounds like a neat project.

>From your description, it sounds like the issue is in the port of the Linux
TCP stack to the simulator. The issue should be able to be root-caused by
adding print statements to the tcp_may_raise_cwnd() code in the simulator,
printing the value of all the expressions used in that code (perhaps only
printing the state if tp->sacked_out is non-zero, to keep the output
manageable).

best regards,
neal


On Tue, Sep 3, 2024 at 2:38 PM Ingemar Johansson S <
ingemar.s.johansson@ericsson.com> wrote:

> Hi Neal
>
>
>
> And thanks for the fast response. Now I know the expected behavior and as
> you say, it is pretty hard to get CWND and ssthresh == 7 if the CWND grows
> up to 12 MSS during the disordered state.
>
> RFC3042 is pretty clear that CWND should not be modified during disorered
> state but the text in tcp_may_raise_cwnd() made me think that it is
> perhaps allowed after all. Now I understand things better.
>
>
>
> More inline, marked [IJ]
>
>
>
> /Ingemar
>
>
>
> *From:* Neal Cardwell <ncardwell@google.com>
> *Sent:* Tuesday, 3 September 2024 18:12
> *To:* Ingemar Johansson S <ingemar.s.johansson@ericsson.com>; Yuchung
> Cheng <ycheng@google.com>
> *Cc:* tcpm@ietf.org
> *Subject:* Re: CWND increase in disordered state in Linux?
>
>
>
> On Tue, Sep 3, 2024 at 5:47 AM Ingemar Johansson S <
> ingemar.s.johansson@ericsson.com> wrote:
>
> Hi
>
>
>
> Hi, Ingemar!
>
>
>
> Hope you are fine
>
>
>
> Yes! Hope you are fine as well. :-)
>
>
>
> I'm cc-ing +Yuchung Cheng <ycheng@google.com> explicitly on this thread,
> since he is the author of the tcp_may_raise_cwnd() code that Linux TCP has
> been using for about 11 years now, and he may want to add more comments or
> context.
>
>
>
> I am working on improving the TCP stack in our 5G system simulator to pass
> all relevant packetdrill tests.
>
> One thing that I I try to figure out is the behavior in CA_Disordered
> state. One such example is the prr-ss-10pkt-lost-1.pkt test.
> What I see in the current implementation is that cwnd increases from 10 to
> 12 when it is in disordered state.
> When it gets into CA_Recovery however the ssthresh should be set to 7 but
> it is set to 8 because int(12*0.7) = 8. So I guess ssthresh should be set
> based on cwnd from when the state was last in CA_Open ?.
>
>
>
> I'm surprised you are seeing cwnd increases from 10 to 12 when the
> connection is in Disorder state in that prr-ss-10pkt-lost-1.pkt test. The
> Linux TCP logic should not increase cwnd when receiving the SACKs in that
> test case (see below for the rationale). The test implicitly asserts that
> the cwnd increase upon receiving SACKs does not happen, since it explicitly
> asserts that cwnd and ssthresh are both set to 7. And AFAIK that test
> passes on upstream kernels and the kernels we use. So I'm surprised you see
> a cwnd of 8 rather than 7.
>
>
>
> Are you using this version of the test:
> https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fast_recovery/prr-ss-10pkt-lost-1.pkt
> ?
>
>
>
> To help understand the discrepancy, can you please share:
>
>
>
>    1. the exact version of the packetdrill script you are using (either
>    with a git SHA1, a github URL, or attaching the script) and
>
> [IJ] The scripts are from the github repos you linked to
>
>
>
>    2. the exact Linux kernel version you are testing (with a tag name or
>    SHA1)?
>
> [IJ] 5.3, in fact it is a port of the Linux TCP stack to out Java based 5G
> system simulator, in the process I have made some errors and the packet
> drill scripts have been very helpful to identify those errors.
>
>
>
> What confuses me is that RFC3042 says that CWND should not be modified
> when in disordered state, however the explanation text in
> tcp_may_raise_cwnd() seem to indicate that CWND can increase as long as new
> data is SACKed, right?.
>
>
>
> The Linux TCP code intentionally does not exactly follow RFC3042 to the
> letter, when it comes to the limited transmit behavior. This is for good
> reason, based on decades of real-world experience.
>
> [IJ] OK, good to know
>
>
>
> The rationale for the Linux TCP tcp_may_raise_cwnd() logic is largely
> explained in Yuchung's commit messages from the 2013 code for this function:
>
>
>
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f7cc9a3c2bd8
>
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16edfe7ee02dd
>
>
>
>
> The full current tcp_may_raise_cwnd() logic can be browsed here:
>
>   https://elixir.bootlin.com/linux/v6.10/source/net/ipv4/tcp_input.c#L3544
>
>
>
> I will try to summarize the rationale by paraphrasing Yuchung's commit
> messages:
>
>
>
> Currently the Linux TCP stack detects reordering and avoids spurious
> retransmissions quite effectively; since roughly 2018 this is by using
> RACK-TLP: https://datatracker.ietf.org/doc/html/rfc8985
>
> However, around 2013, before RACK-TLP, and when the Linux TCP code was
> largely following the cwnd-increase dictates in RFC3042,and before
> Yuchung's commits noted above, the throughput was suboptimal for
> connections experiencing high reordering.This was because cwnd was
> increased only if the data was delivered in order. i.e., FLAG_DATA_ACKED
> was set in tcp_ack(). The more packets were reordered, the worse the
> throughput was, because with high reordering only a small fraction of ACKs
> advanced SND.UNA and allowed cwnd to increase.
>
>
> Therefore, Yuchung's commits noted above changed the logic so that:
>
>
>
> (A) when the measured degree of reordering is high (above the default
> dupthresh of 3 packets), cwnd increases whenever data is delivered
> (FLAG_FORWARD_PROGRESS is set) regardless of its ordering (regardless of
> whether packets are marked as delivered via cumulative ACKs that advance
> SND.UNA  or via SACKed out-of-order data)
>
>
>
> (B) in the common case where reordering is low, the code conservatively
> follows the cwnd increase conditions in RFC3042 and increases cwnd only on
> ordered deliveries (cumulative ACKs that advance SND.UNA).
>
>
>
> The performance difference Yuchung found in testing his changes was quite
> large: using netperf on a qdisc setup of 20Mbps bandwidth and random RTT
> from 45ms to 55ms (for reordering effect, Yuchung found his  changes
> increased TCP throughput by 20 - 25%, allowing TCP CUBIC to reach a
> throughput near the bottleneck link bandwidth.
>
> Ingemar, because the test you mention above, prr-ss-10pkt-lost-1.pkt, does
> not have reordering, the logic should be using the (B) path above, which
> matches RFC3042 behavior. So I'm surprised you are seeing the cwnd increase
> based on SACKs arriving in Disorder for that test. So I'd be curious to
> hear more details about your test and kernel to understand why your tests
> are not seeing the normal Linux TCP behavior.
>
> [IJ] Thanks for the explanation, it is very helpful. Now I just need to
> debug my code to see where it goes wrong.
>
>
>
> best regards,
>
> neal
>
>
>
>
>
>
>
> /Ingemar
>
> PS: CC:ed the tcpm group in case this topic is of more general interest
>
>