Re: [tcpm] draft-ietf-tcpm-prr-rfc6937bis-03 and RecoverFS initialization

Hi Yoshifumi,

Sorry for the radio silence. Neal will help co-author and update the draft
as he has many insights. We'll provide an update soon, hopefully we can
move forward before next meeting in SF.

On Sun, May 21, 2023 at 11:34 PM Yoshifumi Nishida <nsd.ietf@gmail.com>
wrote:

> Hello,
>
> Just in case, as this discussion has been quiet for a while..
> I personally think what Neal mentions seems to make sense although I'm not
> very sure which approach is better.
> I hope this part will be addressed in the updated version of the draft.
> --
> Yoshi
>
> On Wed, Apr 19, 2023 at 7:20 PM Neal Cardwell <ncardwell=
> 40google.com@dmarc.ietf.org> wrote:
>
>>
>>
>> On Tue, Apr 18, 2023 at 7:35 PM Yuchung Cheng <ycheng@google.com> wrote:
>>
>>>
>>>
>>> On Mon, Apr 17, 2023 at 2:00 PM Neal Cardwell <ncardwell@google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Apr 17, 2023 at 4:13 PM Yuchung Cheng <ycheng@google.com>
>>>> wrote:
>>>>
>>>>> Hi Neal,
>>>>>
>>>>> That's a good point and it was considered in the early stage of PRR.
>>>>> We picked FlightSize (=snd.nxt - snd.una) to ensure ssthresh/RecoverFS
>>>>> faithfully reflects the proportion of the congestion control window
>>>>> reduction: RFC5681 still use FlightSize to compute ssthresh. But  some TCP
>>>>> or specific C.C.s may use either cwnd (e.g. Linux cubic/reno) or pipe.  How
>>>>> about a small graph:
>>>>>
>>>>> "If a TCP or congestion control implementation uses cwnd or pipe
>>>>> instead of FlightSize to compute ssthresh, then RecoverFS should use the
>>>>> specific metric accordingly, i.e. cwnd right before recovery"
>>>>>
>>>>
>>>> AFAICT that analysis is conflating two different issues:
>>>>
>>>> (1) How does the congestion control compute ssthresh (based on cwnd or
>>>> pipe or FlightSize?) You rightly point out that approaches vary for this
>>>> part.
>>>>
>>>> (2) How does PRR determine what fraction of outstanding packets have
>>>> been delivered (aka prr_delivered / RecoverFS)?
>>>>
>>>> AFAICT to get the right answer for the (2) question, RecoverFS should
>>>> be initialized to "pipe", no matter what approach the CC takes for
>>>> answering (1).
>>>>
>>>> My understanding of PRR is that if (pipe > ssthresh) is true, then the
>>>> algorithm is doing Proportional Rate Reduction, and is essentially
>>>> computing:
>>>>
>>>>       sndcnt ~= (target data sent in recovery)            - (actual
>>>> data sent in recovery)
>>>>       sndcnt ~= (fraction of data delivered) * ssthresh   - prr_out
>>>>       sndcnt ~= (prr_delivered / RecoverFS ) * ssthresh   - prr_out
>>>>
>>>> For the (target data sent in recovery) to equal ssthresh at the end of
>>>> the first round in recovery the algorithm must reach the point where
>>>> prr_delivered == RecoverFS, so that (prr_delivered / RecoverFS ) is 1.
>>>> Since  prr_delivered can only reach as high as "pipe" at the start of
>>>> recovery, to be able to reach that condition we need to have RecoverFS ==
>>>> "pipe". If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
>>>> prr_delivered won't be able to match RecoverFS and the (target data sent in
>>>> recovery) won't reach ssthresh, and the algorithm will undershoot (the cwnd
>>>> won't reach the ssthresh specified by congestion control, however it
>>>> calculated that).
>>>>
>>> I still can't parse your analysis after reading it multiple times.
>>>
>>> "prr_delivered can only reach as high as "pipe" at the start of
>>> recovery" --> prr_delivered is initiated to 0 at the start of the recovery?
>>> "If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
>>> prr_delivered won't be able to match RecoverFS" --> why is RecoverFS too
>>> big and prr_delivered won't reach it.
>>>
>>> I am not saying RecoverFS initiated to "pipe" is wrong. I just don't see
>>> a substantial difference between FlightSize vs pipe, unless the
>>> FlightSize/cwnd is small and/or limited-transmits were not used.
>>>
>>> maybe you can walk an example with FlightSize vs pipe...
>>>
>>
>> Discussing a concrete example is a good idea!
>>
>> Here's an example, sketching the behavior with
>> draft-ietf-tcpm-prr-rfc6937bis-03, AFAICT from trying to execute the
>> example by hand:
>>
>> CC = Reno
>>
>> cwnd = 100 packets
>>
>> The application writes 100*MSS.
>>
>> TCP sends 100 packets.
>>
>> In this example, to make the effects more clear, the TCP sender has
>> detected reordering with RACK-TLP or some other technique, so does not
>> enter fast recovery on the third SACKed packet, but rather waits a while to
>> accumulate more SACKs.
>>
>> From the flight of 100 packets, 1 packet is lost (P1), and 24 packets are
>> SACKed (packets P2..P25).
>>
>> We enter fast recovery with PRR.
>>
>> RecoverFS = snd.nxt - snd.una = 100
>>
>> ssthresh = cwnd / 2 = 50  (Reno)
>>
>> pipe = snd.nxt - snd.una - (lost + SACKed) = 100 - (1 + 24) = 75 packets
>>
>> The expression (pipe > ssthresh) is true for a number of consecutive
>> SACKs, so we use the PRR code path repeatedly for a while as SACKs stream
>> in for P26..P100.
>>
>> Given the PRR code path math, in general, the target number of packets
>> sent so far in recovery will be:
>>
>>    target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>>                       = CEIL(prr_delivered * 50 / 100)
>>                       = CEIL(prr_delivered * .5)
>>
>> What happens: This will cause the sender to send 1 packet for every 2
>> packets delivered (SACKed). Specifically, the connection will send 1 packet
>> for every 2 packets SACKed for the first 50 packets SACKed of the round
>> trip. This will cause pipe to fall from 75 to 75 - 50*0.5 = 75 - 25 = 50
>> packets during that period, at which point (pipe > ssthresh) becomes false
>> and the connection will follow the PRR-CRB path to match the sending
>> process to the delivery process (packet conservation) to keep pipe matching
>> ssthresh. So the sender's rate was inconsistent: for 50 SACKs it sends at 1
>> packet for every 2 packets delivered (SACKed); then for 25 SACKs it sends
>> at 1 packet for every 1 packet delivered (SACKed). So we don't meet the
>> goal of making "pipe" transition smoothly and consistently from its initial
>> value to ssthresh.
>>
>> What we want instead: the in-flight data (pipe) progressing smoothly from
>> 75 to 50 over the course of the full round trip, with the 75 packets SACKed
>> mapping smoothly into 50 packets transmitted, a ratio of 50 packets send
>> for 75 packets delivered, or a sent/delivered ratio of 50/75, or 0.666.
>>
>> So what we want is: initializing with RecoverFS = pipe, so we have :
>>    target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>>                       = CEIL(prr_delivered * 50 / 75)
>>                       = CEIL(prr_delivered * 0.666)
>>
>> That should achieve the goal of sending 50 packets for 75 packets
>> delivered, or a sent/delivered ratio of 50/75, or 0.666, aka sending 2
>> packets for every 3 packets SACKed. In particular, at the end of the round
>> trip time we'll have:
>>
>>    target_sent_so_far = CEIL(prr_delivered * 50 / 75)
>>                       = CEIL(75 * 50 / 75)
>>                       = 50
>>
>> Hopefully that illustrates why, for the target_sent_so_far to smoothly
>> rise to ssthresh at the end of the first round in recovery, RecoverFS
>> should be initialized to pipe.
>>
>> The difference between the current initialization (RecoverFS = snd.nxt -
>> snd.una) and the proposed initialization (RecoverFS = pipe) would probably
>> be small in the typical case. But in cases like this where the sender has
>> detected reordering and is therefore allowing many SACKed packets before
>> entering recovery, AFAICT the difference could be significant.
>>
>> Best regards,
>> neal
>>
>>
>>
>>
>>>
>>>
>>>>
>>>> What am I missing? :-)
>>>>
>>>> best regards,
>>>> neal
>>>>
>>>>
>>>>
>>>>
>>>>> On Mon, Apr 17, 2023 at 11:23 AM Neal Cardwell <ncardwell@google.com>
>>>>> wrote:
>>>>>
>>>>>> Regarding this line in draft-ietf-tcpm-prr-rfc6937bis-03:
>>>>>>
>>>>>>    RecoverFS = snd.nxt - snd.una // FlightSize right before recovery
>>>>>>
>>>>>> AFAICT this should be:
>>>>>>
>>>>>>   RecoverFS = pipe  // RFC 6675 pipe algorithm
>>>>>>
>>>>>> Rationale: when recovery starts, often snd.nxt - snd.una includes 1
>>>>>> or more lost packets above snd.una and 3 or more SACKed packets above that;
>>>>>> those packets are not really in the pipe, and not really in the FlightSize.
>>>>>>
>>>>>> With the draft as-is, packets that were SACKed on ACKs that happened
>>>>>> before entering fast recovery are incorporated in RecoverFS (snd.nxt -
>>>>>> snd.una) but never in prr_delivered (since that is set to 0 upon entering
>>>>>> fast recovery), so at the end of fast recovery the expression:
>>>>>>
>>>>>>   CEIL(prr_delivered * ssthresh / RecoverFS)
>>>>>>
>>>>>> can be quite far below ssthresh, for very large numbers of packets
>>>>>> SACKed before entering fast recovery (e.g., if the reordering degree is
>>>>>> large).
>>>>>>
>>>>>> AFAICT that means that at the end of recovery the cwnd could be quite
>>>>>> far below ssthresh, to the same degree, resulting in the cwnd being less
>>>>>> than what congestion control specified when the connection entered fast
>>>>>> recovery.
>>>>>>
>>>>>> AFAICT switching to RecoverFS = pipe fixes this, since it means that
>>>>>> RecoverFS only includes packets in the pipe when the connection enters fast
>>>>>> recovery, and thus prr_delivered can eventually reach RecoverFS, so tha) t
>>>>>> the target number of packets sent (CEIL(prr_delivered * ssthresh /
>>>>>> RecoverFS) can fully reach ssthresh.
>>>>>>
>>>>>> Apologies if I'm missing something or this has already been discussed.
>>>>>>
>>>>>> best regards,
>>>>>> neal
>>>>>>
>>>>>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>>
>