Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt

Hi Neal,

Pls see inline below. Note that I'm sending another reply to address the 
issues with cumulatively ACKed bytes with the ACK that triggers loss 
recovery and a number of other related issues that should be addressed.

I'll also extract the issue with decreasing sending rate again on a loss 
of retransmitted segment to a thread of its own because it is a crucial 
issue and different from the other issues related to correct calculation 
bytes delivered during the recovery process.

On Mon, 18 Mar 2024, Neal Cardwell wrote:

> Hi Markku,
> Thank you for your thoughtful, detailed, and useful feedback! Comments in-line below:
> 
> On Mon, Mar 18, 2024 at 1:31 AM Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org> wrote:
>       Hi, Neal, all,
>
>       I have not been able to follow the progress of this draft for a long
>       while, so apologies for chiming in this late.
>
>       I took a quick look at the latest discussions on setting RecoverFS.
>
>       The idea of setting RecoverFS = pipe seems like a neat idea to get cwnd
>       descend smoothly to the target in the given example. However, isn't it
>       more important to ensure that in all cases the sender sends at most
>       ssthresh pkts per RTT during the recovery and that in the end of the
>       recovery cwnd is at most ssthresh?
>
>       If I am not mistaken, reodering (and Ack losses) may result in undesired
>       outcome. Let's modify Neal's example a bit such that reordering occurs but
>       reordering window (with RACK-TLP) is slightly too small:
>
>       CC = Reno
>       cwnd = 100 packets
>       The application writes 100*MSS.
>       TCP sends 100 packets.
>
>       In this example the TCP sender has detected reordering with RACK-TLP or
>       some other technique, so does not enter fast recovery on the third
>       SACKed packet, but rather waits a while to accumulate more SACKs.
>
>       >From the flight of 100 packets, 1 packet is lost (P1), and 24 packets
>       are delayed (packets P2..P25) and 3 packets (P26..P28) are SACKed
>       (assume P2..P25 arrive after P28, for example).
>
>       We enter fast recovery with PRR.
>
>       RecoverFS = snd.nxt - snd.una = 100
>
>       ssthresh = cwnd / 2 = 50  (Reno)
>
>       pipe = snd.nxt - snd.una - (lost + SACKed) = 100 - (25 + 3) = 72 packets
>
>       The expression (pipe > ssthresh) is true for a number of consecutive
>       SACKs, so we use the PRR code path repeatedly for a while as SACKs stream
>       in for P2..25 and P29..P100.
>
>       When the the SACK for P100 has been processed we have sent
>
>       Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>                    = CEIL(96 * 50 / 72)
>                    = 67
>
>       So, PRR does not exit with cwnd = 50 but with much higher cwnd than
>       expected.
>
>       If CC = CUBIC
>
>       Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>                    = CEIL(96 * 70 / 72)
>                    = 94
> 
>
>       The same behavior seems to occurs also if the is significant Ack loss
>       among P2..25 and at least one pkt gets reordered out of the reordering
>       window. But, maybe I am missing something?
> 
> 
> Thanks!
> 
> BTW, I gather the line above that reads:
>   RecoverFS = snd.nxt - snd.una = 100
> is referring to a different version of the algorithm, and for this discussion probably it makes
> more sense as something like:
>   RecoverFS = pipe = 72 packets

My bad, sorry!
(I just copied your earlier example and modified it but forgot to modify 
the line that sets RecoverFS. However, as you could see, I used RecoverFS 
exactly as you corrected above.)

> But your point still stands, and you raise a great point: simply initializing RecoverFS to
> "pipe" is not safe, because packets that were marked lost and removed from pipe may actually
> have been merely reordered. So if those packets are delivered later, they will increase the
> numerator of prr_delivered / RecoverFS without increasing the denominator, thus leading to a
> result above 1.0, and thus potentially leading to a target for Sent_so_far that is above
> ssthresh, causing the algorithm to erroneously exceed ssthresh.
> 
> What do folks think about addressing this issue by changing the initialization of RecoverFS to
> be as follows:
> 
>   existing text:
>      pipe = (RFC 6675 pipe algorithm)
>      RecoverFS = pipe              // RFC 6675 pipe before recovery
> 
>   proposed new text:
>      RecoverFS = snd.nxt - snd.una + (bytes newly cumulatively acknowledged)
> That should fix the issue you raise, and keep prr_delivered from being able to grow larger than
> RecoverFS.
> 
> I'm proposing here to add in  (bytes newly cumulatively acknowledged) because in a (probably
> rare) corner case where the ACK that initiates fast recovery advances snd.una, then the   (bytes
> newly cumulatively acknowledged)  on that first ACK will be non-zero, and will be added into
> prr_delivered. So if RecoverFS does not include (bytes newly cumulatively acknowledged) in such
> cases, then we could again trigger the kind of scenario you raise where prr_delivered >
> RecoverFS.

First, I was not quite sure what kind of scenario you were thinking where 
the ACK that initiates loss recovery would advance snd.una but then I got 
it. It's covered in my other reply.

It seems to me that the original way to initialize RecoverFS to snd.nxt - 
snd.una, i.e., to effective flight size, is the way that ensures correct 
operation in all cases. However, I think computing DeliveredData should be 
clarified and modified to take into account all different cases such that 
all SACKed (and cumulatively Acked) bytes during the recovery 
process are counted in correctly.

> We could try to make RecoverFS more precise by subtracting out SACKed data. But given the
> proposed RecoverFS initialization has already changed three times in the last few months, IMHO
> it seems safer to keep it as simple as possible. :-)

Agreed. In my view the real problem in the examples that you have 
described (i.e., in cases where entering to loss recovery gets 
postponed) is not in how RecoverFS is initialized but in that how 
DeliveredData is specified and thereby what is counted in prr_delivered. 
Please see my explanation in another (later) reply to Yoshi and you.

[SNIP: I am moving the discussion on reacting to loss of a rexmit to 
a new thread of its own and will reply to it separately]

>       In addtion, there are a few other things that might be useful to
>       correct/clarify:
>
>       - When exactly does PRR algorithm exit? That is, is the algo steps
>          executed also for the final cumulative ACK that covers RecoveryPoint
>          (or recover)?
> 
> 
> The PRR algorithm exits implicitly when fast recovery exits. PRR takes no particular special
> steps at the end of fast recovery, so the draft thus far hasn't bothered to spell that out. :-)
> This is sort of treated implicitly by the fact that the pseudocode specifies when the blocks of
> logic are triggered, which are the events: "At the beginning of recovery", "On every ACK
> starting or during fast recovery", "On any data transmission or retransmission:" (I would
> propose, however, to update that first phrase to be: "At the beginning of fast recovery", since
> an unqualified "recovery" could conceivably apply to RTO recovery.)

Understood, but I think it would be useful to confirm/clarify that the 
algo steps for "On every ACK starting or during fast recovery" are 
executed also for the final cumulative ACK that ends the loss recovery.
Why? Because "during fast recovery" seems a bit ambiguous at the time the 
cumulative ACK that covers RecoveryPoint arrives. RFC 6675 does nothing 
but immediately terminates loss recovery upon arrival of the final 
cumulative ACK: "An incoming cumulative ACK for a sequence number greater 
than RecoveryPoint signals the end of loss recovery, and the loss
recovery phase MUST be terminated." This could be intrepeted such that 
upon receipt of the final ACK fast recovery terminates immediately 
and there is no reason to execute the PRR steps for "On every ACK 
starting or during fast recovery" either.

> Do you think that would be useful to discuss when the PRR algorithm exits, given that nothing
> happens at that point? If so, is there particular text that you would find useful?

Would it clarify enough if the text is simply  modified as follows:

Existing text:

  On every ACK starting or during fast recovery:

Proposed new text:

  On every ACK starting or during fast recovery, including the final
  cumulative ACK that signals the end of loss recovery:   (*)

(*) I'd actually suggest modifying this further (something like what 
follows):

  On every arriving ACK during fast recovery, including the final
  cumulative ACK that signals the end of loss recovery:

Reasoning: litterarily speaking fast recovery does not include fast 
retransmit as specified in RFC 5681, Sec 3.2; the steps 2&3 specify fast 
retransmit that is followed by fast recovery starting from step 4.
Similarly, RFC 6675, sec 5, step 4 specifies fast retransmit and steps 
(A)-(C) fast recovery.
Please see my proposal in another reply to Yoshi and you explaining why it 
might be useful for fast retransmit to be a step of its own like it is in 
all other RFCs. That would solve a number of issues in counting 
delivered bytes at the arrival of the ACK that triggers loss recovery.

>       - The draft reads:
>
>          "Although increasing the window
>           during recovery seems to be ill advised, it is important to remember
>           that this is actually less aggressive than permitted by RFC 5681,
>           which sends the same quantity of additional data as a single burst in
>           response to the ACK that triggered Fast Retransmit."
>
>         I think it should cite RFC 6675 as TCP Reno loss recovery specified in
>         RFC 5681 soes not send such a burst.
> 
> 
> Sounds like a great point to me. I've updated this text in our internal copy to reference RFC
> 6675. So the next draft revision will reflect that, unless there are objections or other ideas.

Great, thanks.

Thanks,

/Markku

> Thanks!
> neal
> 
>  
>
>       On Mon, 26 Feb 2024, Neal Cardwell wrote:
>
>       > As noted in the draft, revision 06 primarily has a single change relative to 05:
>       it updates
>       > RecoverFS to be initialized as "RecoverFS = pipe" in both the prose and
>       pseudocode.
>       >
>       > Thanks to Richard Scheffenegger and the TCPM community for reviewing the 05
>       revision.
>       > Comments/suggestions welcome!
>       >
>       > Thanks!
>       > neal
>       >
>       >
>       > On Mon, Feb 26, 2024 at 10:23 AM <internet-drafts@ietf.org> wrote:
>       >       Internet-Draft draft-ietf-tcpm-prr-rfc6937bis-06.txt is now available. It is
>       a
>       >       work item of the TCP Maintenance and Minor Extensions (TCPM) WG of the IETF.
>       >
>       >          Title:   Proportional Rate Reduction for TCP
>       >          Authors: Matt Mathis
>       >                   Nandita Dukkipati
>       >                   Yuchung Cheng
>       >                   Neal Cardwell
>       >          Name:    draft-ietf-tcpm-prr-rfc6937bis-06.txt
>       >          Pages:   17
>       >          Dates:   2024-02-26
>       >
>       >       Abstract:
>       >
>       >          This document updates the experimental Proportional Rate Reduction
>       >          (PRR) algorithm, described RFC 6937, to standards track.  PRR
>       >          provides logic to regulate the amount of data sent by TCP or other
>       >          transport protocols during fast recovery.  PRR accurately regulates
>       >          the actual flight size through recovery such that at the end of
>       >          recovery it will be as close as possible to the slow start threshold
>       >          (ssthresh), as determined by the congestion control algorithm.
>       >
>       >       The IETF datatracker status page for this Internet-Draft is:
>       >       https://datatracker.ietf.org/doc/draft-ietf-tcpm-prr-rfc6937bis/
>       >
>       >       There is also an HTML version available at:
>       >       https://www.ietf.org/archive/id/draft-ietf-tcpm-prr-rfc6937bis-06.html
>       >
>       >       A diff from the previous version is available at:
>       >       https://author-tools.ietf.org/iddiff?url2=draft-ietf-tcpm-prr-rfc6937bis-06
>       >
>       >       Internet-Drafts are also available by rsync at:
>       >       rsync.ietf.org::internet-drafts
>       >
>       >
>       >       _______________________________________________
>       >       tcpm mailing list
>       >       tcpm@ietf.org
>       >       https://www.ietf.org/mailman/listinfo/tcpm
>       >
>       >
>       >
> 
> 
>