Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt

Hi Markku,

Thank you for your thoughtful, detailed, and useful feedback! Comments
in-line below:

On Mon, Mar 18, 2024 at 1:31 AM Markku Kojo <kojo=
40cs.helsinki.fi@dmarc.ietf.org> wrote:

> Hi, Neal, all,
>
> I have not been able to follow the progress of this draft for a long
> while, so apologies for chiming in this late.
>
> I took a quick look at the latest discussions on setting RecoverFS.
>
> The idea of setting RecoverFS = pipe seems like a neat idea to get cwnd
> descend smoothly to the target in the given example. However, isn't it
> more important to ensure that in all cases the sender sends at most
> ssthresh pkts per RTT during the recovery and that in the end of the
> recovery cwnd is at most ssthresh?
>
> If I am not mistaken, reodering (and Ack losses) may result in undesired
> outcome. Let's modify Neal's example a bit such that reordering occurs but
> reordering window (with RACK-TLP) is slightly too small:
>
> CC = Reno
> cwnd = 100 packets
> The application writes 100*MSS.
> TCP sends 100 packets.
>
> In this example the TCP sender has detected reordering with RACK-TLP or
> some other technique, so does not enter fast recovery on the third
> SACKed packet, but rather waits a while to accumulate more SACKs.
>
> >From the flight of 100 packets, 1 packet is lost (P1), and 24 packets
> are delayed (packets P2..P25) and 3 packets (P26..P28) are SACKed
> (assume P2..P25 arrive after P28, for example).
>
> We enter fast recovery with PRR.
>
> RecoverFS = snd.nxt - snd.una = 100
>
> ssthresh = cwnd / 2 = 50  (Reno)
>
> pipe = snd.nxt - snd.una - (lost + SACKed) = 100 - (25 + 3) = 72 packets
>
> The expression (pipe > ssthresh) is true for a number of consecutive
> SACKs, so we use the PRR code path repeatedly for a while as SACKs stream
> in for P2..25 and P29..P100.
>
> When the the SACK for P100 has been processed we have sent
>
> Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>              = CEIL(96 * 50 / 72)
>              = 67
>
> So, PRR does not exit with cwnd = 50 but with much higher cwnd than
> expected.
>
> If CC = CUBIC
>
> Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>              = CEIL(96 * 70 / 72)
>              = 94
>
>
> The same behavior seems to occurs also if the is significant Ack loss
> among P2..25 and at least one pkt gets reordered out of the reordering
> window. But, maybe I am missing something?
>

Thanks!

BTW, I gather the line above that reads:
  RecoverFS = snd.nxt - snd.una = 100
is referring to a different version of the algorithm, and for this
discussion probably it makes more sense as something like:
  RecoverFS = pipe = 72 packets

But your point still stands, and you raise a great point: simply
initializing RecoverFS to "pipe" is not safe, because packets that were
marked lost and removed from pipe may actually have been merely reordered.
So if those packets are delivered later, they will increase the numerator
of prr_delivered / RecoverFS without increasing the denominator, thus
leading to a result above 1.0, and thus potentially leading to a target for
Sent_so_far that is above ssthresh, causing the algorithm to erroneously
exceed ssthresh.

What do folks think about addressing this issue by changing the
initialization of RecoverFS to be as follows:

  existing text:
     pipe = (RFC 6675 pipe algorithm)
     RecoverFS = pipe              // RFC 6675 pipe before recovery

  proposed new text:
     RecoverFS = snd.nxt - snd.una + (bytes newly cumulatively acknowledged)

That should fix the issue you raise, and keep prr_delivered from being able
to grow larger than RecoverFS.

I'm proposing here to add in  (bytes newly cumulatively acknowledged)
because in a (probably rare) corner case where the ACK that initiates fast
recovery advances snd.una, then the   (bytes newly cumulatively
acknowledged)  on that first ACK will be non-zero, and will be added into
prr_delivered. So if RecoverFS does not include (bytes newly cumulatively
acknowledged) in such cases, then we could again trigger the kind of
scenario you raise where prr_delivered > RecoverFS.

We could try to make RecoverFS more precise by subtracting out SACKed data.
But given the proposed RecoverFS initialization has already changed three
times in the last few months, IMHO it seems safer to keep it as simple as
possible. :-)

> In addition, it seems that the algorithm in the latest version does not
> address my WGLC comment on reducing send rate (ssthresh) again if
> RACK-TLP detects loss of a retransmission. The sender must reduce
> ssthresh again as loss of a rexmit occurs on another RTT. If it is not
> done, the fast recovery keeps on sending at the same rate until the end
> of recovery regardless of how many times a segment has to be
> retransmitted. This sounds very bad behaviour to me in front of heavy
> congestion that drops a lot of pkts (rexmits) and the PRR sender does not
> react at all.
>

I would argue that the question of whether a connection should reduce
ssthresh when RACK-TLP detects the loss of a retransmission, while
important, is outside the scope of PRR. PRR is taking loss detection and
congestion control decisions as externally provided inputs into PRR. When
to mark a packet as lost is a loss detection question, and whether to
reduce ssthresh upon a particular packet loss is a congestion control
decision. PRR is focused on taking the ssthresh output from congestion
control, and loss detection decisions from the loss detection algorithm,
and deciding how to evolve the cwnd to try to smoothly and safely converge
the volume of in-flight data toward the given ssthresh.

> In addtion, there are a few other things that might be useful to
> correct/clarify:
>
> - When exactly does PRR algorithm exit? That is, is the algo steps
>    executed also for the final cumulative ACK that covers RecoveryPoint
>    (or recover)?
>

The PRR algorithm exits implicitly when fast recovery exits. PRR takes no
particular special steps at the end of fast recovery, so the draft thus far
hasn't bothered to spell that out. :-) This is sort of treated implicitly
by the fact that the pseudocode specifies when the blocks of logic are
triggered, which are the events: "At the beginning of recovery", "On every
ACK starting or during fast recovery", "On any data transmission or
retransmission:" (I would propose, however, to update that first phrase to
be: "At the beginning of fast recovery", since an unqualified "recovery"
could conceivably apply to RTO recovery.)

Do you think that would be useful to discuss when the PRR algorithm
exits, given that nothing happens at that point? If so, is there particular
text that you would find useful?

> - The draft reads:
>
>    "Although increasing the window
>     during recovery seems to be ill advised, it is important to remember
>     that this is actually less aggressive than permitted by RFC 5681,
>     which sends the same quantity of additional data as a single burst in
>     response to the ACK that triggered Fast Retransmit."
>
>   I think it should cite RFC 6675 as TCP Reno loss recovery specified in
>   RFC 5681 soes not send such a burst.
>

Sounds like a great point to me. I've updated this text in our internal
copy to reference RFC 6675. So the next draft revision will reflect that,
unless there are objections or other ideas.

Thanks!
neal

>
> On Mon, 26 Feb 2024, Neal Cardwell wrote:
>
> > As noted in the draft, revision 06 primarily has a single change
> relative to 05: it updates
> > RecoverFS to be initialized as "RecoverFS = pipe" in both the prose and
> pseudocode.
> >
> > Thanks to Richard Scheffenegger and the TCPM community for reviewing the
> 05 revision.
> > Comments/suggestions welcome!
> >
> > Thanks!
> > neal
> >
> >
> > On Mon, Feb 26, 2024 at 10:23 AM <internet-drafts@ietf.org> wrote:
> >       Internet-Draft draft-ietf-tcpm-prr-rfc6937bis-06.txt is now
> available. It is a
> >       work item of the TCP Maintenance and Minor Extensions (TCPM) WG of
> the IETF.
> >
> >          Title:   Proportional Rate Reduction for TCP
> >          Authors: Matt Mathis
> >                   Nandita Dukkipati
> >                   Yuchung Cheng
> >                   Neal Cardwell
> >          Name:    draft-ietf-tcpm-prr-rfc6937bis-06.txt
> >          Pages:   17
> >          Dates:   2024-02-26
> >
> >       Abstract:
> >
> >          This document updates the experimental Proportional Rate
> Reduction
> >          (PRR) algorithm, described RFC 6937, to standards track.  PRR
> >          provides logic to regulate the amount of data sent by TCP or
> other
> >          transport protocols during fast recovery.  PRR accurately
> regulates
> >          the actual flight size through recovery such that at the end of
> >          recovery it will be as close as possible to the slow start
> threshold
> >          (ssthresh), as determined by the congestion control algorithm.
> >
> >       The IETF datatracker status page for this Internet-Draft is:
> >       https://datatracker.ietf.org/doc/draft-ietf-tcpm-prr-rfc6937bis/
> >
> >       There is also an HTML version available at:
> >
> https://www.ietf.org/archive/id/draft-ietf-tcpm-prr-rfc6937bis-06.html
> >
> >       A diff from the previous version is available at:
> >
> https://author-tools.ietf.org/iddiff?url2=draft-ietf-tcpm-prr-rfc6937bis-06
> >
> >       Internet-Drafts are also available by rsync at:
> >       rsync.ietf.org::internet-drafts
> >
> >
> >       _______________________________________________
> >       tcpm mailing list
> >       tcpm@ietf.org
> >       https://www.ietf.org/mailman/listinfo/tcpm
> >
> >
> >