[tcpm] Re: PRR behaviour on detecting loss of a retransmission(WAS:I-DAction: draft-ietf-tcpm-prr-rfc6937bis-06.txt)

Neal Cardwell <ncardwell@google.com> Tue, 05 November 2024 12:37 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D95A8C1CAE9A for <tcpm@ietfa.amsl.com>; Tue, 5 Nov 2024 04:37:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.608
X-Spam-Level:
X-Spam-Status: No, score=-17.608 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1-gCq4lCXF3L for <tcpm@ietfa.amsl.com>; Tue, 5 Nov 2024 04:37:16 -0800 (PST)
Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ADADBC1D4A9F for <tcpm@ietf.org>; Tue, 5 Nov 2024 04:37:15 -0800 (PST)
Received: by mail-qt1-x82a.google.com with SMTP id d75a77b69052e-4608dddaa35so232541cf.0 for <tcpm@ietf.org>; Tue, 05 Nov 2024 04:37:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730810235; x=1731415035; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=i3fQ5IBJiB89mU8vQH0qSm1akCBdGDb/S6X2WhxEBSw=; b=nq5uJdyiyrop72HXdeE4FggZmTnAFHV6WyiEnt/SIlHVmTiCntGu1PgZ6Z0iq7Mmwk RbMr7gDn5ZhTWEVwBTuCD0/uBOx+U9lVnT1iLc7bg27wUb9dTt5a+jXxa3A5Pi88GnjK 7VVKNFia1qNN1N/ZcNt+hvx7ZqVPIhGsm4acZxpxAFIqtynO+3pXJqlbug5uDL9O3mhB I24KwbLbbAe6jKr7Bq/4Rl6cENM5KIJXQyC+/dNDPiptEDUw2LPyXf0E1oYpS3PpqpDS b+xRNidDQqDACp9Cdufi7h7a2egC6HnEJg3zSH4i+CaF1yWwVnN4/K+ZFpog1N8InsHD SQiQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730810235; x=1731415035; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i3fQ5IBJiB89mU8vQH0qSm1akCBdGDb/S6X2WhxEBSw=; b=fB6VWnQIn/k9wyQnij62Zcp8iQ+ikGacHBXWoR2HGhntxiEJHsZO3gvOZc2jlKtE8z bOFBVxMOtZEyQAUdLs2WEXhmBPbohD8BtLRqgJMAlsbHSbSycG8Iudn5Ci65aOtxdLlb Lof/dC4hJHtZgbLgxjX9OR5KwDvI1Uur9FqqQdb8pI5HhMeCUXEtTploC0jQygjkbB8x pQcLuSDASxSkdHAatTXl9EXXSFqaTgKn6GjVQFwrswJmoQ/szRYqjfmaC4UzkoZTXKuB dmuLs6dvIZORdReCy9+UTiBaSvuTIRfGN6JjtEvbf0ueAsSweb9M3NOc25RL2ySs7U8v crCg==
X-Forwarded-Encrypted: i=1; AJvYcCVaxvdQi/s4TI7nhh7u+KGfeFhhavxOAJHuY50zV9y8w+aBIR8FDwAsTQZVqDDD2dWGPhET@ietf.org
X-Gm-Message-State: AOJu0Yyl8SJSUPbKX4vKHXTpeBQD0COQEUa6MjC1Ou0TEC8NJW1HY2vp 4nFUvCzAYzasVsH/DV3Ua83EFZtiVdMnr3qJYSLlW3FKQDVKhTTfl2FtW7ZN09F8Idd6KexT6sZ b21HZ8+naMQUksjVdNKAAyzKBUorQIvVge4x4
X-Gm-Gg: ASbGncujaAK1KNWf5dlViwpqXipI6kQ2CNntYGjEeGh/NWiuR2jJv7o5slXk/MZV8VT NIu//GMqNSWkNNZrQ4CD4/T5yFB1MoncvKfBN9U/NoTe56WTjJl3Kbj41jbY33gs=
X-Google-Smtp-Source: AGHT+IHcy+9GU6v5r6RJ0oCz1oV2RuEUHxVRxZNuTHbwWf/9ZuGE3ksvxa2n/6zi6MU4gKflXrLdD8yeYXZeyYuphbg=
X-Received: by 2002:a05:622a:341:b0:462:b36f:294 with SMTP id d75a77b69052e-462e4f4b275mr2261721cf.24.1730810234550; Tue, 05 Nov 2024 04:37:14 -0800 (PST)
MIME-Version: 1.0
References: <170896098131.16189.4842811868600508870@ietfa.amsl.com> <CADVnQy=rvCoQC0RwVq=P2XWFGPrXvGKvj2cAooj94yx+WzXz3A@mail.gmail.com> <8e5f0a7-b39b-cfaa-5c38-edeb9916bef6@cs.helsinki.fi> <CADVnQynR99fQjWmYj-rYZ4nZxYS=-O7zbfWjJLMxd5Lqcpwgcg@mail.gmail.com> <705f77a7-2f1d-905c-cd6b-e3a7463239fb@cs.helsinki.fi> <CADVnQymFwhGuR7c9cYN5_xCdM=s1L=rjG+Tf6HsFkpyvPUmBLQ@mail.gmail.com> <b81cd0c3-ba7d-127c-135f-8f74e889d4eb@cs.helsinki.fi> <CAAK044Qs_xXAGDsdwmNwa=esLwyFUy3ibAdZGPvwQs4adVUPqA@mail.gmail.com> <1f105285-2f1e-5449-f775-d9719612aa34@cs.helsinki.fi>
In-Reply-To: <1f105285-2f1e-5449-f775-d9719612aa34@cs.helsinki.fi>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 05 Nov 2024 07:36:58 -0500
Message-ID: <CADVnQykgvdjKWvvAGjL1RuY5mecOm1yv9W2ESyzvUkY=gjDM1g@mail.gmail.com>
To: Markku Kojo <kojo@cs.helsinki.fi>
Content-Type: multipart/alternative; boundary="00000000000034c324062629a94c"
Message-ID-Hash: I3OUTLZIMDDRJDU6YUR3VKKAPDMNNAEN
X-Message-ID-Hash: I3OUTLZIMDDRJDU6YUR3VKKAPDMNNAEN
X-MailFrom: ncardwell@google.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: tcpm@ietf.org, Matt Mathis <mattmathis@measurementlab.net>, Matt Mathis <ietf@mattmathis.net>
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [tcpm] Re: PRR behaviour on detecting loss of a retransmission(WAS:I-DAction: draft-ietf-tcpm-prr-rfc6937bis-06.txt)
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/_zWfQwZwVb5tObSgnU-SyCqTqy4>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>

On Mon, Nov 4, 2024 at 8:23 PM Markku Kojo <kojo@cs.helsinki.fi> wrote:

> Hi Yoshi,
>
...

> The major point is that during fast recovery the value of FlightSize and
> cwnd is RANDOM (in certain range) and the value depends on loss pattern
> and recovery stage at the time the loss of a rexmit is detected. That is,
> if the amount of outstanding data is 100 segments with CUBIC when a loss
> is detected and fast rexmit&fast recovery is entered, the value of
> flightsize can be anything in range 0..170 at the time a loss of a
> rexmitted segments is detected. Similarly, cwnd can be anything in range
> 1..99.
> So, no matter if we use flightsize or cwnd to calculate the new walue of
> ssthresh, we may end up with extremely low value or way too high value.
> And, it is also irrelevant which multiplicative decrease factor we
> decide to use to calculate the new value ssthresh.
>
> ...

>
> > In my view, PRR is designed just to regulate the amount of data sent
> during the recovery based on input
> > parameters such as ssthresh, cwnd and pipe size. So, I would like to
> think the problems in input
> > parameters should basically be outside of PRR.
>
> [MK3] IMHO opinion RFCs should be of high enough quality such that an
> implementor is able to come up with correctly behaving implementation.
> I cannot quite see how it is possible to come up with a correct
> implementation by reading the current text in the PRR doc together with
> the text in RFC 5681 and RFC 9438 that advises to use either cwnd or
> FlightSize to calculate a new value of ssthresh. Could you possibly
> explain how a new value of ssthresh could be correctly calculated when
> a new PRR episode is initiated due to lost rexmit during fast recovery if
> one uses either cwnd or FlightSize to calculated ssthresh as instructed in
> RFC 5681 and/or RFC 9438?
>

Hi Markku,

Thanks for nicely outlining the problem where FlightSize is an incorrect
basis for computing a new ssthresh upon the detection of a lost retransmit.

As Matt nicely summarizes: "This problem is general, quite complicated, and
not specifically related to PRR."

Specifically, I may be missing something, but as far as I can tell, this
problem you identify also applies to the RFC 5681  and RFC 6675
standards, even when not using PRR. Specifically, RFC 5681 / 6675 specify
that upon detecting lost retransmits, ssthresh should be reduced, and
ssthresh should be computed using FlightSize/2. But as you nicely outline
in this thread,  a sender that started with cwnd=100, and entered fast
recovery and set ssthresh to 50, and then detects a lost retransmit, and is
implementing RFC 5681  and/or RFC 6675, may have a FlightSize that is 150,
so the ssthresh computed upon the lost retransmit would be FlightSize / 2 =
150 / 2 = 75. So upon the lost retransmit the ssthresh may, sadly,
*increase* from 50 to 75 when following RFC 5681  and RFC 6675. This is
bad, as you note. But this bad behavior seems implied by the existing RFC
5681  and RFC 6675 standards, even before we add PRR into the mix.

Since this is a pre-existing problem in the existing standards-track RFCs,
and AFAIK there is no solution to this problem that has deployment
experience, IMHO it does not make sense to block the PRR draft process
while the community waits for a solution to this issue.

best regards,
neal