[tcpm] Re: PRR behaviour on detecting loss of a retransmission(WAS:I-DAction: draft-ietf-tcpm-prr-rfc6937bis-06.txt)

Markku Kojo <kojo@cs.helsinki.fi> Tue, 05 November 2024 01:23 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 840A0C151088 for <tcpm@ietfa.amsl.com>; Mon, 4 Nov 2024 17:23:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_HELO_TEMPERROR=0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p8lSPYKAM_bN for <tcpm@ietfa.amsl.com>; Mon, 4 Nov 2024 17:23:41 -0800 (PST)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) by ietfa.amsl.com (Postfix) with ESMTP id ED319C151552 for <tcpm@ietf.org>; Mon, 4 Nov 2024 17:23:40 -0800 (PST)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Tue, 05 Nov 2024 03:23:36 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=rd8M0v 0IEUwQMQxue7zXG6WDp+4SNK8j4a/7bKBpPcg=; b=CQFOd9ifpJgIZ1h+M1aaF7 cwP8zUjRneAIXyF5qsyTN0xw3jGeKapFGg2rR8AgWAcQKhax/Mp5xrQ91p1upHjw CkBfOiZfONxvfw0rTf10FL+C0+DcpcYqAsrkL4AvGKUn1VtN+9zQPa14ujTak/Bb DM6JX4BBSjTUnHaJO4c/c=
Received: from hp8x-60.cs.helsinki.fi (121.red-81-38-46.dynamicip.rima-tde.net [81.38.46.121]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Tue, 05 Nov 2024 03:23:36 +0200 id 00000000005A011D.0000000067297398.0000719D
Date: Tue, 05 Nov 2024 03:23:33 +0200
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Yoshifumi Nishida <nsd.ietf@gmail.com>
In-Reply-To: <CAAK044Qs_xXAGDsdwmNwa=esLwyFUy3ibAdZGPvwQs4adVUPqA@mail.gmail.com>
Message-ID: <1f105285-2f1e-5449-f775-d9719612aa34@cs.helsinki.fi>
References: <170896098131.16189.4842811868600508870@ietfa.amsl.com> <CADVnQy=rvCoQC0RwVq=P2XWFGPrXvGKvj2cAooj94yx+WzXz3A@mail.gmail.com> <8e5f0a7-b39b-cfaa-5c38-edeb9916bef6@cs.helsinki.fi> <CADVnQynR99fQjWmYj-rYZ4nZxYS=-O7zbfWjJLMxd5Lqcpwgcg@mail.gmail.com> <705f77a7-2f1d-905c-cd6b-e3a7463239fb@cs.helsinki.fi> <CADVnQymFwhGuR7c9cYN5_xCdM=s1L=rjG+Tf6HsFkpyvPUmBLQ@mail.gmail.com> <b81cd0c3-ba7d-127c-135f-8f74e889d4eb@cs.helsinki.fi> <CAAK044Qs_xXAGDsdwmNwa=esLwyFUy3ibAdZGPvwQs4adVUPqA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-29109-1730769816-0001-2"
Content-ID: <e7537d8c-1a6c-ab2a-be31-59dd85a3d4fb@cs.helsinki.fi>
Message-ID-Hash: OCWTMFAD45HACR5G3KGE736M4BHJB3A7
X-Message-ID-Hash: OCWTMFAD45HACR5G3KGE736M4BHJB3A7
X-MailFrom: kojo@cs.helsinki.fi
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: tcpm@ietf.org, Matt Mathis <mattmathis@measurementlab.net>, Matt Mathis <ietf@mattmathis.net>
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [tcpm] Re: PRR behaviour on detecting loss of a retransmission(WAS:I-DAction: draft-ietf-tcpm-prr-rfc6937bis-06.txt)
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/x29LQi8tY48sRXeOZwFCUW-4OSg>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>

Hi Yoshi,

On Sat, 2 Nov 2024, Yoshifumi Nishida wrote:

> Hi Markku,
> 
> On Tue, Oct 29, 2024 at 7:45 AM Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org> wrote:
>
>       [MK2] To ensure that multiplicative decrease becomes implemented correctly
>       this document should give exact advise how to compute the new
>       ssthresh value. If one follows current RFCs (e.g., RFC 5681 or RFC 9438)
>       and computes the new value of ssthresh (and cwnd) either using flightsize
>       (ssthresh = 0.5 * FlightSize or ssthresh = 0.7 *FlightSize) or cwnd
>       (ssthresh = 0.5 * cwnd or ssthresh = 0.7 * cwnd) the result is often not
>       correct (or is more or less random).
>
>       This is because fligthsize becomes inflated during fast recovery as the
>       TCP sender sends new data during the recovery (before any
>       cumulative/partial ACKs arrive). Similarly, with PRR cwnd reaches the
>       target (=correctly reduced) value only in the end of recovery, meaning
>       that cwnd is too big during the recovery.
>
>       A typical, simple scenario with flightsize, for example:
>
>       Amount of outstanding data is 100 segments and a loss is detected ->
>       ssthresh = 50 or 70 (assume CC algo is Reno or Cubic) and recovery
>       starts. Assume the fast rexmitted (1st lost) segment becomes dropped.
>
>       During the first RTT the TCP sender injects 50 or 70 new data segments ->
>       FlightSize = 150 or 170. Soon after the first RTT, the TCP sender detects
>       the loss of the rexmitted segment and computes: ssthresh = 0.5 * 150 = 75
>       (Reno) or ssthresh = 0.7 * 170 = 119 (Cubic). This results in three times
>       higher ssthress value than expected with Reno (= 25) and ~ 2.5 times
>       higher than expected with Cubic (= 49).
>
>       A simple scenario with cwnd, for example:
>
>       Amount of outstanding data is 100 segments and a loss is detected ->
>       ssthresh = 50 or 70 (assume CC algo is Reno or Cubic) and recovery
>       starts. Assume there is significant number of losses in the current
>       window of data and the fast rexmitted (=1st lost) segment becomes
>       dropped. In addition, there may be significant Ack loss.
>       That is, a typical case with very heavy congestion and it would be
>       crucial to reduce ssthresh (and cwnd) correctly.
>
>       During the first RTT only little date gets delivered (i.e., hardly any
>       SACKed data and little additional lost segments are detected, keeping
>       cwnd ~ flightsize (= cwnd before entering recovery). When lost rexmit
>       becomes detected after one RTT, the TCP sender computes new ssthresh =
>       0.5 * cwnd or 0.7 * cwnd and the results is only minimal reduction from
>       the ssthresh used during the first RTT of recovery, instead lowering twice
>       with the same multiplicative decrease factor.
>
>       I hope this clarifies the issue and the need to define that ssthresh MUST
>       NOT be reinitialized using flightsize or cwnd.
> 
> 
> I'm not very sure this is a typical or good example. 
> When we found the lost of retransmitted packets with a certain ack, the ACK should contain 
> some SACK blocks, which marks many unacked packets as lost ones.
> 
> Or, if the stack is using a timer based loss detection logic, I think it can mark many unacked 
> packets as lost ones as well by the time it considers the retransmission of the lost packets failed.
> So, in any cases,when we detect lost of retransmissions, we should find 
> many lost packets as well in this situation, which reduces the flight size significantly.

[MK3] In the example with flightsize just one segment was lost and in 
addition only the retransmission of it. Dropping just one segment in a 
window of data is the most typical (?) TCP saw thooth behaviour, I think 
(assuming the connection is not application limited). Receiving SACK 
blocks does not alter flightsize (= snd.next - snd.una) nor does it 
introduce any new losses if there are no additional losses but the lone 
rexmitted segment.

[MK3] In the example with cwnd what you is possible. But my example was 
just one example out of thousands (or more) different scenarios. I just 
selected scenarios that will result in way too big ssthresh value i.e., 
that clearly is not congestion safe. The resulting ssthresh value may well 
be undesireable low as well. While a too low ssthresh value is congestion 
safe, it hardly is desireable to have very poor performance that it esults 
in.
The major point is that during fast recovery the value of FlightSize and 
cwnd is RANDOM (in certain range) and the value depends on loss pattern 
and recovery stage at the time the loss of a rexmit is detected. That is, 
if the amount of outstanding data is 100 segments with CUBIC when a loss 
is detected and fast rexmit&fast recovery is entered, the value of 
flightsize can be anything in range 0..170 at the time a loss of a 
rexmitted segments is detected. Similarly, cwnd can be anything in range 
1..99.
So, no matter if we use flightsize or cwnd to calculate the new walue of
ssthresh, we may end up with extremely low value or way too high value.
And, it is also irrelevant which multiplicative decrease factor we
decide to use to calculate the new value ssthresh.

> I think there might be cases you mentioned (e.g when we don't use SACK nor RACK), 
> but from my personal point of view,  this is a problem of loss detection logic or flightsize 
> calculation logic, but not a problem of PRR. 

[MK3] I was not thinking any such cases.

> In my view, PRR is designed just to regulate the amount of data sent during the recovery based on input
> parameters such as ssthresh, cwnd and pipe size. So, I would like to think the problems in input 
> parameters should basically be outside of PRR. 

[MK3] IMHO opinion RFCs should be of high enough quality such that an 
implementor is able to come up with correctly behaving implementation. 
I cannot quite see how it is possible to come up with a correct 
implementation by reading the current text in the PRR doc together with 
the text in RFC 5681 and RFC 9438 that advises to use either cwnd or 
FlightSize to calculate a new value of ssthresh. Could you possibly 
explain how a new value of ssthresh could be correctly calculated when 
a new PRR episode is initiated due to lost rexmit during fast recovery if 
one uses either cwnd or FlightSize to calculated ssthresh as instructed in 
RFC 5681 and/or RFC 9438?

Thanks,

/Markku

> I also agree with using 'inflight' in the draft as Neal suggested.
> --
> Yoshi
> 
> 
>
>       Maybe on detecting loss of a rexmit, ssthresh could be reinitialized
>
>         ssthresh = multiplicatice_decrease_factor * ssthresh?
>
>       In addition, when rereading the prr algo I found also an additonal
>       problem with the PRR algo that uses pipe (RFC 6675 pipe algorithm)
>       together with RACK-TLP loss detection. The algo uses pipe as the
>       (quite accurate) estimate of outstanding data. However, the definition of
>       pipe depends on loss detection in RFC 6675 that defines a lost segment
>       different from RACK-TLP. The pipe algo depends of RFC 6675 IsLost()
>       function that requires three SACked segments above the lost segment to
>       declare the segment lost, while this is often not the case with RACK-TLP.
>
>       Shouldn't the "pipe" estimate in the algo be based on the loss detection
>       algorithm in use? Otherwise, pipe may be badly off in number of
>       scenarios.
>
>       The same holds for non-SACK fast recovery, that is NewReno. I think the
>       document/algo should clarify that, without SACK, the SACKed segments for
>       pipe calculation are estimated in the similar way as they are estimated
>       for DeliveredData (i.e., one SACKd segment = one duplicate ACK).
>
>       Hope this is helpful.
>
>       Best regards,
>
>       /Markku
>
>       > I would propose that we make that more clear in the PRR document.
>       >
>       > In section 6, "Algorithm", I would propose we change the existing text:
>       >
>       >   At the beginning of recovery, initialize the PRR state.
>       > to:
>       >
>       >    At the beginning of a congestion control response episode initiated
>       >    by the congestion control algorithm, a TCP data sender using PRR
>       >    MUST initialize the PRR state. The timing of the start of a
>       >    congestion control response episode is entirely up to the
>       >    congestion control algorithm, and (for example) could correspond to
>       >    the start of a fast recovery episode, or a once-per-round-trip
>       >    reduction when lost retransmits or lost original transmissions are
>       >    detected after fast recovery is already in progress.
>       >
>       > How does that sound to everyone?
>       >
>       > neal
>       >
>       >
>       >_______________________________________________
>       tcpm mailing list -- tcpm@ietf.org
>       To unsubscribe send an email to tcpm-leave@ietf.org
> 
> 
>