Re: [tcpm] draft-ietf-tcpm-prr-rfc6937bis-03 and RecoverFS initialization

Hi Yuchung,

Thank you so much! OK, we will wait for the updates.
--
Yoshi

On Mon, May 22, 2023 at 10:09 AM Yuchung Cheng <ycheng@google.com> wrote:

> Hi Yoshifumi,
>
> Sorry for the radio silence. Neal will help co-author and update the draft
> as he has many insights. We'll provide an update soon, hopefully we can
> move forward before next meeting in SF.
>
> On Sun, May 21, 2023 at 11:34 PM Yoshifumi Nishida <nsd.ietf@gmail.com>
> wrote:
>
>> Hello,
>>
>> Just in case, as this discussion has been quiet for a while..
>> I personally think what Neal mentions seems to make sense although I'm
>> not very sure which approach is better.
>> I hope this part will be addressed in the updated version of the draft.
>> --
>> Yoshi
>>
>> On Wed, Apr 19, 2023 at 7:20 PM Neal Cardwell <ncardwell=
>> 40google.com@dmarc.ietf.org> wrote:
>>
>>>
>>>
>>> On Tue, Apr 18, 2023 at 7:35 PM Yuchung Cheng <ycheng@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Apr 17, 2023 at 2:00 PM Neal Cardwell <ncardwell@google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 17, 2023 at 4:13 PM Yuchung Cheng <ycheng@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Neal,
>>>>>>
>>>>>> That's a good point and it was considered in the early stage of PRR.
>>>>>> We picked FlightSize (=snd.nxt - snd.una) to ensure ssthresh/RecoverFS
>>>>>> faithfully reflects the proportion of the congestion control window
>>>>>> reduction: RFC5681 still use FlightSize to compute ssthresh. But  some TCP
>>>>>> or specific C.C.s may use either cwnd (e.g. Linux cubic/reno) or pipe.  How
>>>>>> about a small graph:
>>>>>>
>>>>>> "If a TCP or congestion control implementation uses cwnd or pipe
>>>>>> instead of FlightSize to compute ssthresh, then RecoverFS should use the
>>>>>> specific metric accordingly, i.e. cwnd right before recovery"
>>>>>>
>>>>>
>>>>> AFAICT that analysis is conflating two different issues:
>>>>>
>>>>> (1) How does the congestion control compute ssthresh (based on cwnd or
>>>>> pipe or FlightSize?) You rightly point out that approaches vary for this
>>>>> part.
>>>>>
>>>>> (2) How does PRR determine what fraction of outstanding packets have
>>>>> been delivered (aka prr_delivered / RecoverFS)?
>>>>>
>>>>> AFAICT to get the right answer for the (2) question, RecoverFS should
>>>>> be initialized to "pipe", no matter what approach the CC takes for
>>>>> answering (1).
>>>>>
>>>>> My understanding of PRR is that if (pipe > ssthresh) is true, then the
>>>>> algorithm is doing Proportional Rate Reduction, and is essentially
>>>>> computing:
>>>>>
>>>>>       sndcnt ~= (target data sent in recovery)            - (actual
>>>>> data sent in recovery)
>>>>>       sndcnt ~= (fraction of data delivered) * ssthresh   - prr_out
>>>>>       sndcnt ~= (prr_delivered / RecoverFS ) * ssthresh   - prr_out
>>>>>
>>>>> For the (target data sent in recovery) to equal ssthresh at the end of
>>>>> the first round in recovery the algorithm must reach the point where
>>>>> prr_delivered == RecoverFS, so that (prr_delivered / RecoverFS ) is 1.
>>>>> Since  prr_delivered can only reach as high as "pipe" at the start of
>>>>> recovery, to be able to reach that condition we need to have RecoverFS ==
>>>>> "pipe". If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
>>>>> prr_delivered won't be able to match RecoverFS and the (target data sent in
>>>>> recovery) won't reach ssthresh, and the algorithm will undershoot (the cwnd
>>>>> won't reach the ssthresh specified by congestion control, however it
>>>>> calculated that).
>>>>>
>>>> I still can't parse your analysis after reading it multiple times.
>>>>
>>>> "prr_delivered can only reach as high as "pipe" at the start of
>>>> recovery" --> prr_delivered is initiated to 0 at the start of the recovery?
>>>> "If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
>>>> prr_delivered won't be able to match RecoverFS" --> why is RecoverFS too
>>>> big and prr_delivered won't reach it.
>>>>
>>>> I am not saying RecoverFS initiated to "pipe" is wrong. I just don't
>>>> see a substantial difference between FlightSize vs pipe, unless the
>>>> FlightSize/cwnd is small and/or limited-transmits were not used.
>>>>
>>>> maybe you can walk an example with FlightSize vs pipe...
>>>>
>>>
>>> Discussing a concrete example is a good idea!
>>>
>>> Here's an example, sketching the behavior with
>>> draft-ietf-tcpm-prr-rfc6937bis-03, AFAICT from trying to execute the
>>> example by hand:
>>>
>>> CC = Reno
>>>
>>> cwnd = 100 packets
>>>
>>> The application writes 100*MSS.
>>>
>>> TCP sends 100 packets.
>>>
>>> In this example, to make the effects more clear, the TCP sender has
>>> detected reordering with RACK-TLP or some other technique, so does not
>>> enter fast recovery on the third SACKed packet, but rather waits a while to
>>> accumulate more SACKs.
>>>
>>> From the flight of 100 packets, 1 packet is lost (P1), and 24 packets
>>> are SACKed (packets P2..P25).
>>>
>>> We enter fast recovery with PRR.
>>>
>>> RecoverFS = snd.nxt - snd.una = 100
>>>
>>> ssthresh = cwnd / 2 = 50  (Reno)
>>>
>>> pipe = snd.nxt - snd.una - (lost + SACKed) = 100 - (1 + 24) = 75 packets
>>>
>>> The expression (pipe > ssthresh) is true for a number of consecutive
>>> SACKs, so we use the PRR code path repeatedly for a while as SACKs stream
>>> in for P26..P100.
>>>
>>> Given the PRR code path math, in general, the target number of packets
>>> sent so far in recovery will be:
>>>
>>>    target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>>>                       = CEIL(prr_delivered * 50 / 100)
>>>                       = CEIL(prr_delivered * .5)
>>>
>>> What happens: This will cause the sender to send 1 packet for every 2
>>> packets delivered (SACKed). Specifically, the connection will send 1 packet
>>> for every 2 packets SACKed for the first 50 packets SACKed of the round
>>> trip. This will cause pipe to fall from 75 to 75 - 50*0.5 = 75 - 25 = 50
>>> packets during that period, at which point (pipe > ssthresh) becomes false
>>> and the connection will follow the PRR-CRB path to match the sending
>>> process to the delivery process (packet conservation) to keep pipe matching
>>> ssthresh. So the sender's rate was inconsistent: for 50 SACKs it sends at 1
>>> packet for every 2 packets delivered (SACKed); then for 25 SACKs it sends
>>> at 1 packet for every 1 packet delivered (SACKed). So we don't meet the
>>> goal of making "pipe" transition smoothly and consistently from its initial
>>> value to ssthresh.
>>>
>>> What we want instead: the in-flight data (pipe) progressing smoothly
>>> from 75 to 50 over the course of the full round trip, with the 75 packets
>>> SACKed mapping smoothly into 50 packets transmitted, a ratio of 50 packets
>>> send for 75 packets delivered, or a sent/delivered ratio of 50/75, or 0.666.
>>>
>>> So what we want is: initializing with RecoverFS = pipe, so we have :
>>>    target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>>>                       = CEIL(prr_delivered * 50 / 75)
>>>                       = CEIL(prr_delivered * 0.666)
>>>
>>> That should achieve the goal of sending 50 packets for 75 packets
>>> delivered, or a sent/delivered ratio of 50/75, or 0.666, aka sending 2
>>> packets for every 3 packets SACKed. In particular, at the end of the round
>>> trip time we'll have:
>>>
>>>    target_sent_so_far = CEIL(prr_delivered * 50 / 75)
>>>                       = CEIL(75 * 50 / 75)
>>>                       = 50
>>>
>>> Hopefully that illustrates why, for the target_sent_so_far to smoothly
>>> rise to ssthresh at the end of the first round in recovery, RecoverFS
>>> should be initialized to pipe.
>>>
>>> The difference between the current initialization (RecoverFS = snd.nxt -
>>> snd.una) and the proposed initialization (RecoverFS = pipe) would probably
>>> be small in the typical case. But in cases like this where the sender has
>>> detected reordering and is therefore allowing many SACKed packets before
>>> entering recovery, AFAICT the difference could be significant.
>>>
>>> Best regards,
>>> neal
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> What am I missing? :-)
>>>>>
>>>>> best regards,
>>>>> neal
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Mon, Apr 17, 2023 at 11:23 AM Neal Cardwell <ncardwell@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Regarding this line in draft-ietf-tcpm-prr-rfc6937bis-03:
>>>>>>>
>>>>>>>    RecoverFS = snd.nxt - snd.una // FlightSize right before recovery
>>>>>>>
>>>>>>> AFAICT this should be:
>>>>>>>
>>>>>>>   RecoverFS = pipe  // RFC 6675 pipe algorithm
>>>>>>>
>>>>>>> Rationale: when recovery starts, often snd.nxt - snd.una includes 1
>>>>>>> or more lost packets above snd.una and 3 or more SACKed packets above that;
>>>>>>> those packets are not really in the pipe, and not really in the FlightSize.
>>>>>>>
>>>>>>> With the draft as-is, packets that were SACKed on ACKs that happened
>>>>>>> before entering fast recovery are incorporated in RecoverFS (snd.nxt -
>>>>>>> snd.una) but never in prr_delivered (since that is set to 0 upon entering
>>>>>>> fast recovery), so at the end of fast recovery the expression:
>>>>>>>
>>>>>>>   CEIL(prr_delivered * ssthresh / RecoverFS)
>>>>>>>
>>>>>>> can be quite far below ssthresh, for very large numbers of packets
>>>>>>> SACKed before entering fast recovery (e.g., if the reordering degree is
>>>>>>> large).
>>>>>>>
>>>>>>> AFAICT that means that at the end of recovery the cwnd could be
>>>>>>> quite far below ssthresh, to the same degree, resulting in the cwnd being
>>>>>>> less than what congestion control specified when the connection entered
>>>>>>> fast recovery.
>>>>>>>
>>>>>>> AFAICT switching to RecoverFS = pipe fixes this, since it means that
>>>>>>> RecoverFS only includes packets in the pipe when the connection enters fast
>>>>>>> recovery, and thus prr_delivered can eventually reach RecoverFS, so tha) t
>>>>>>> the target number of packets sent (CEIL(prr_delivered * ssthresh /
>>>>>>> RecoverFS) can fully reach ssthresh.
>>>>>>>
>>>>>>> Apologies if I'm missing something or this has already been
>>>>>>> discussed.
>>>>>>>
>>>>>>> best regards,
>>>>>>> neal
>>>>>>>
>>>>>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>>>
>>