Re: [tcpm] draft-ietf-tcpm-prr-rfc6937bis-03: set cwnd to ssthresh exiting fast recovery?

On Mon, Aug 21, 2023 at 1:41 PM Yoshifumi Nishida <nsd.ietf@gmail.com>
wrote:

> Hello everyone,
>
> If there are no other particular options on this, I think the authors
> would update the draft based on the discussions or send proposed texts to
> the ML.
>

Sorry for the delay on this. Here is the rough draft of proposed text for
this, in the "Changes From RFC 6937" section:

"""
A final change: upon exiting recovery, a data sender SHOULD set cwnd to
ssthresh. This is important for robust performance. Without setting cwnd to
ssthresh at the end of recovery, with some loss patterns cwnd could end
fast recovery well below ssthresh, leading to bad performance. The
performance could, in some cases, be worse than <xref target="RFC6675" />
recovery, which simply sets cwnd = ssthresh at the start of recovery. This
behavior of setting cwnd to ssthresh at the end of recovery has been
implemented since the first widely deployed TCP PRR implementation in 2011,
and is similar to <xref target="RFC6675" />, which specifies setting cwnd
to ssthresh at the start of recovery.
"""

Proposed pseudocode for this, at the end of the "Algorithm" section:

"""
  On exiting fast recovery:
     cwnd = ssthresh
"""

Feedback welcome/encouraged. Thanks!

neal

> Either way, after reviewing the updated texts, the chairs will think about
> whether we can conclude the WGLC for the draft.
> If you have any opinions, please let us know.
>
> Thanks!
> --
> Yoshi
>
> On Wed, Aug 9, 2023 at 11:27 AM Yoshifumi Nishida <nsd.ietf@gmail.com>
> wrote:
>
>> Hi Neal, Yuchung,
>>
>> Thank you so much.
>> This sounds like a good direction. I also think using SHOULD is a good
>> balance.
>> If other people especially working on implementations have some thoughts,
>> please share.
>> --
>> Yoshi
>>
>> On Wed, Aug 9, 2023 at 8:45 AM Neal Cardwell <ncardwell@google.com>
>> wrote:
>>
>>> On Wed, Aug 9, 2023 at 11:39 AM Yuchung Cheng <ycheng@google.com> wrote:
>>>
>>>> We can add "Upon exiting recovery, cwnd /SHOULD/ be set to ssthresh"
>>>> with some performance rationale, given that RFC6675 and existing PRR
>>>> implementation already do so.
>>>>
>>>
>>> This sounds very good to me. Yoshifumi, how does that sound?
>>>
>>> thanks,
>>> neal
>>>
>>>
>>>> Note that RFC5681 Sec 4.3 has related wording on cwnd exiting recovery:
>>>> "Finally, after all loss in the given window of segments has been
>>>> successfully retransmitted, cwnd MUST be set to no more than ssthresh and
>>>> congestion avoidance MUST be used to further increase cwnd."
>>>>
>>>> Why not MUST: it's not strictly necessary because it won't break TCP or
>>>> make network unstable. It's important for congestion control to determine
>>>> the cwnd after recovery. It has the implication to induce a large burst if
>>>> cwnd >> pipe as mentioned in the end of Section 5 in RFC6675.
>>>> Why not MAY: lacking so has major performance implications in various
>>>> cases as discussed in this thread
>>>>
>>>> How does that sound?
>>>>
>>>> On Wed, Aug 9, 2023 at 8:10 AM Neal Cardwell <ncardwell@google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 9, 2023 at 11:05 AM Neal Cardwell <ncardwell@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Yoshifumi,
>>>>>>
>>>>>> You are correct that draft-ietf-tcpm-prr-rfc6937bis-04 does not
>>>>>>  incorporate the suggestion in this thread to have a "cwnd = ssthresh" step
>>>>>> at the end of fast recovery. My sense was that this was because we had not
>>>>>> come to a conclusion / resolution of this question in this thread. :-)
>>>>>>
>>>>>> I would still argue that it's important for PRR to set cwnd =
>>>>>> ssthresh at the end of recovery. Without setting cwnd = ssthresh at the end
>>>>>> of recovery, cwnd could end recovery far below ssthresh, leading to
>>>>>> unusably terrible performance; performance that would be far worse than RFC
>>>>>> 6675 recovery (which simply sets cwnd = ssthresh at the start of recovery).
>>>>>>
>>>>>> The Linux TCP PRR has had this cwnd = ssthresh step at the end of
>>>>>> recovery since the original PRR implementation in 2011:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a262f0cdf1f2916ea918dc329492abb5323d9a6c
>>>>>>
>>>>>
>>>>> And FWIW it sounds like from Randall Stewart's earlier post on this
>>>>> thread ("when we exit recovery we set cwnd to ssthresh") that FreeBSD TCP
>>>>> PRR also has the same  cwnd = ssthresh step at the end of recovery that
>>>>> Linux TCP PRR has.
>>>>>
>>>>> I would suspect that Microsoft TCP PRR has a similar step; I've CC-ed
>>>>> some folks who may be able to shed light on that.
>>>>>
>>>>> neal
>>>>>
>>>>>
>>>>>
>>>>>> best regards,
>>>>>> neal
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 9, 2023 at 3:16 AM Yoshifumi Nishida <nsd.ietf@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Yuchung,
>>>>>>>
>>>>>>> Thanks for the response.
>>>>>>> I just would like to check one thing.
>>>>>>> In my understanding, Neal's suggestion here was to adjust cwnd to
>>>>>>> ssthresh at the end of recovery.
>>>>>>> But, I cannot find the statement or logic for such adjustment. Does
>>>>>>> this mean we decided there's no adjustment at the end of recovery? Or, am I
>>>>>>> missing something?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Yoshi
>>>>>>>
>>>>>>> On Tue, Aug 8, 2023 at 2:34 PM Yuchung Cheng <ycheng@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Yoshifumi,
>>>>>>>>
>>>>>>>> That part is how the "RecoverFS" state variable is calculated in
>>>>>>>> the draft. See the diff of 03/04 on Section 5 and 6 regarding "RecoverFS"
>>>>>>>> state variable definition and computation.
>>>>>>>> https://author-tools.ietf.org/iddiff?url2=draft-ietf-tcpm-prr-rfc6937bis-04
>>>>>>>>
>>>>>>>> Does that make sense?
>>>>>>>>
>>>>>>>> On Tue, Aug 8, 2023 at 12:01 AM Yoshifumi Nishida <
>>>>>>>> nsd.ietf@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Yuchung,
>>>>>>>>>
>>>>>>>>> I think you have already updated the draft on the following point
>>>>>>>>> from the discussions in the last WG meeting.
>>>>>>>>> Could you point out which part has been updated? I'm just
>>>>>>>>> checking..
>>>>>>>>> Thanks,
>>>>>>>>> --
>>>>>>>>> Yoshi
>>>>>>>>>
>>>>>>>>> On Fri, May 5, 2023 at 11:51 AM Yoshifumi Nishida <
>>>>>>>>> nsd.ietf@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Neal,
>>>>>>>>>>
>>>>>>>>>> Yes, I think I understand your point.
>>>>>>>>>> I prefer the current logic in some ways as it's more conservative
>>>>>>>>>> as I think we cannot always presume that queue has been drained at the end
>>>>>>>>>> of recovery.
>>>>>>>>>> But, I also think it may look too conservative.
>>>>>>>>>> I am expecting that the authors provide some insights on this
>>>>>>>>>> point.
>>>>>>>>>> --
>>>>>>>>>> Yoshi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, May 2, 2023 at 11:31 AM Neal Cardwell <
>>>>>>>>>> ncardwell@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Yoshi,
>>>>>>>>>>>
>>>>>>>>>>> You are right that because PRR always sets cwnd to ssthresh at
>>>>>>>>>>> the end of recovery, there will be some cases where with PRR cwnd jumps up
>>>>>>>>>>> drastically at the end of the recovery.
>>>>>>>>>>>
>>>>>>>>>>> However, AFAIK cwnd jumping up drastically, per se, is not a
>>>>>>>>>>> problem. Big bursts of packets going into the network is a problem. And
>>>>>>>>>>> given the dynamics of the alternative loss recovery algorithms (RFC6675 and
>>>>>>>>>>> PRR), both can allow bursts of packets; just in different circumstances:
>>>>>>>>>>>
>>>>>>>>>>> (1) RFC6675: Because RFC6675 sets cwnd once at the start of fast
>>>>>>>>>>> recovery, using (4.2) from RFC6675:
>>>>>>>>>>>
>>>>>>>>>>> ssthresh = cwnd = (FlightSize / 2)
>>>>>>>>>>>
>>>>>>>>>>> ...that means RFC6675 allows big bursts at the moment any loss
>>>>>>>>>>> is detected: any time L packets are lost, the sender can burst L more
>>>>>>>>>>> packets.
>>>>>>>>>>>
>>>>>>>>>>> (2) PRR: PRR is specifically designed to avoid big bursts in
>>>>>>>>>>> response to packet losses; no matter the structure or timing of the losses,
>>>>>>>>>>> PRR only allows a big burst at the end of Fast Recovery after all holes
>>>>>>>>>>> have been plugged, and the algorithm sets cwnd to ssthresh.
>>>>>>>>>>>
>>>>>>>>>>> So in your example ("For example, many packets were lost before
>>>>>>>>>>> entering recovery"), AFAICT RFC6675 can allow a big burst at the beginning
>>>>>>>>>>> of recovery, when the lost packets are detected. AFAICT in this case PRR
>>>>>>>>>>> can allow a burst of packets at the end of recovery when it sets cwnd to
>>>>>>>>>>> ssthresh, but at least at this point the bottleneck queue has potentially
>>>>>>>>>>> drained somewhat.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know if that analysis misses something important.
>>>>>>>>>>> :-)
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> neal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 1, 2023 at 5:22 PM Yoshifumi Nishida <
>>>>>>>>>>> nsd.ietf@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Randall,
>>>>>>>>>>>>
>>>>>>>>>>>> I might miss something, but here's what I've thought..
>>>>>>>>>>>> If we lost many packets in a RTT such as the Figure 5 in the
>>>>>>>>>>>> 6937bis draft, I think the window growth during the recovery period will be
>>>>>>>>>>>> bound by PRR-CRB or PRR-SSRB.
>>>>>>>>>>>> Hence, I think the cwnd at the end of recovery can be smaller
>>>>>>>>>>>> than we expect as shown in figure 5.
>>>>>>>>>>>> --
>>>>>>>>>>>> Yoshi
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, May 1, 2023 at 4:17 AM Randall Stewart <rrs@netflix.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Neal and Yoshi:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Neal: So the FreeBSD implementation in rack, like linux, does
>>>>>>>>>>>>> the same exact thing set cwnd to ssthresh at
>>>>>>>>>>>>> exit from recovery.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yoshi: I don’t see how this would cause cwnd to be larger,
>>>>>>>>>>>>> since at the entry to recovery you set ssthresh = cwnd *  Beta. But
>>>>>>>>>>>>>           maybe I am missing something, can you give an
>>>>>>>>>>>>> example like Neal did below?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> R
>>>>>>>>>>>>>
>>>>>>>>>>>>> On May 1, 2023, at 5:32 AM, Yoshifumi Nishida <
>>>>>>>>>>>>> nsd.ietf@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Neal,
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we always set cwnd to ssthresh at the end of recovery, I am
>>>>>>>>>>>>> guessing there will be some cases where cwnd jumps up drastically at the
>>>>>>>>>>>>> end of the recovery. For example, many packets were lost before entering
>>>>>>>>>>>>> recovery.  Or, am I missing something?
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Yoshi
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 19, 2023 at 7:37 PM Neal Cardwell <ncardwell=
>>>>>>>>>>>>> 40google.com@dmarc.ietf.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Working through examples for the
>>>>>>>>>>>>>> "draft-ietf-tcpm-prr-rfc6937bis-03 and RecoverFS initialization" thread
>>>>>>>>>>>>>> this evening, I ran into another potential issue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The Linux TCP implementation of PRR explicitly/directly sets
>>>>>>>>>>>>>> cwnd to ssthresh at the end of fast recovery (in tcp_end_cwnd_reduction()).
>>>>>>>>>>>>>> But this behavior is not in the algorithm in the PRR RFC or draft, at least
>>>>>>>>>>>>>> in the figures in section 6, Algorithms. Maybe it is in the prose somewhere
>>>>>>>>>>>>>> and I missed it; but in that case I'd argue strongly to put this in the
>>>>>>>>>>>>>> figures in section 6, Algorithms.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AFAICT in some cases this is strictly necessary to get cwnd
>>>>>>>>>>>>>> to grow to reach ssthresh. Without such a direct step, cwnd could end up
>>>>>>>>>>>>>> far below ssthresh at the end of recovery. Here's an example to illustrate:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> CC = CUBIC
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cwnd = 10
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The reordering degree was estimated to be large, so the
>>>>>>>>>>>>>> connection will wait for more than 3 packets to be SACKed before entering
>>>>>>>>>>>>>> fast recovery.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --- Application writes 10*MSS.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> TCP sends packets P1 .. P10.
>>>>>>>>>>>>>> pipe = 10 packets in flight (P1 .. P10)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --- P2..P9 SACKed  -> do nothing
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Because the reordering degree was previously estimated to be
>>>>>>>>>>>>>> large.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --- P10 SACKed -> mark P1 as lost and enter fast recovery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PRR:
>>>>>>>>>>>>>> ssthresh = CongCtrlAlg() = 7 packets // CUBIC
>>>>>>>>>>>>>> prr_delivered = 0
>>>>>>>>>>>>>> prr_out = 0
>>>>>>>>>>>>>> RecoverFS = snd.nxt - snd.una = 10 packets (P1..P10)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> DeliveredData = 1  (P10 was SACKed)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> prr_delivered += DeliveredData   ==> prr_delivered = 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> pipe =  0  (all packets are SACKed or lost; P1 is lost, rest
>>>>>>>>>>>>>> are SACKed)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> safeACK = false (snd.una did not advance)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (pipe > ssthresh) => if (0 > 7) => false
>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>   // PRR-CRB by default
>>>>>>>>>>>>>>   sndcnt = MAX(prr_delivered - prr_out, DeliveredData)
>>>>>>>>>>>>>>          = MAX(1 - 0, 1)
>>>>>>>>>>>>>>          = 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   sndcnt = MIN(ssthresh - pipe, sndcnt)
>>>>>>>>>>>>>>          = MIN(7 - 0, 1)
>>>>>>>>>>>>>>          = 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cwnd = pipe + sndcnt
>>>>>>>>>>>>>>      = 0    + 1
>>>>>>>>>>>>>>      = 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> retransmit P1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> prr_out += 1   ==> prr_out = 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --- P1 retransmit plugs hole; receive cumulative ACK for
>>>>>>>>>>>>>> P1..P10
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> DeliveredData = 1  (P1 was newly ACKed)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> prr_delivered += DeliveredData   ==> prr_delivered = 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> pipe =  0  (all packets are cumuatively ACKed)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> safeACK = (snd.una advances and no further loss indicated)
>>>>>>>>>>>>>> safeACK = true
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (pipe > ssthresh) => if (0 > 7) => false
>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>   // PRR-CRB by default
>>>>>>>>>>>>>>   sndcnt = MAX(prr_delivered - prr_out, DeliveredData)
>>>>>>>>>>>>>>          = MAX(2 - 1, 1)
>>>>>>>>>>>>>>          = 1
>>>>>>>>>>>>>>   if (safeACK) => true
>>>>>>>>>>>>>>     // PRR-SSRB when recovery is in good progress
>>>>>>>>>>>>>>     sndcnt += 1   ==> sndcnt = 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   sndcnt = MIN(ssthresh - pipe, sndcnt)
>>>>>>>>>>>>>>          = MIN(7 - 0, 2)
>>>>>>>>>>>>>>          = 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cwnd = pipe + sndcnt
>>>>>>>>>>>>>>      = 0    + 2
>>>>>>>>>>>>>>      = 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So we exit fast recovery with cwnd=2 even though ssthresh is
>>>>>>>>>>>>>> 7.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As noted above, the Linux TCP implementation does not suffer
>>>>>>>>>>>>>> this problem because it explicitly/directly sets cwnd to ssthresh at the
>>>>>>>>>>>>>> end of fast recovery.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would recommend including this cwnd=ssthresh step at the
>>>>>>>>>>>>>> end of recovery in the draft, to ensure that cwnd reaches ssthresh at the
>>>>>>>>>>>>>> end of fast recovery, even in cases like this where there will be
>>>>>>>>>>>>>> insufficient delivered data in fast recovery to allow pipe to incrementally
>>>>>>>>>>>>>> grow to reach ssthresh using PRR-SSRB.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> neal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> tcpm mailing list
>>>>>>>>>>>>>> tcpm@ietf.org
>>>>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/tcpm
>>>>>>>>>>>>>> <https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1683538345000000&usg=AOvVaw2cOITQpYcuP_M95396rEmw>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> tcpm mailing list
>>>>>>>>>>>>> tcpm@ietf.org
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1683538345000000&usg=AOvVaw2cOITQpYcuP_M95396rEmw
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------
>>>>>>>>>>>>> Randall Stewart
>>>>>>>>>>>>> rrs@netflix.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>