Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt> (The RACK-TLPloss detection algorithm for TCP) to Proposed Standard

+1 to decoupling loss detection from congestion control.  It ensures RACK
does not need to not be updated every time a new congestion controller is
standardized.

On Fri, Dec 4, 2020 at 6:58 PM Yuchung Cheng <ycheng=
40google.com@dmarc.ietf.org> wrote:

> On Fri, Dec 4, 2020 at 5:02 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
> >
> > Hi all,
> >
> > I know this is a bit late but I didn't have time earlier to take look at
> > this draft.
> >
> > Given that this RFC to be is standards track and RECOMMENDED to replace
> > current DupAck-based loss detection, it is important that the spec is
> > clear on its advice to those implementing it. Current text seems to
> > lack important advice w.r.t congestion control, and even though
> > the spec tries to decouple loss detection from congestion control
> > and does not intend to modify existing standard congestion control
> > some of the examples advice incorrect congestion control actions.
> > Therefore, I think it is worth to correct the mistakes and take
> > yet another look at a few implications of this specification.
> As you noted, the intention is to decouple the two as much as possible
>
> Unlike the 20 years ago where TCP loss detection and congestion
> control are essentially glued in one piece, the decoupling of the two
> (including modularizing congestion controls in implementations) has
> helped fueled many great inventions of new congestion controls.
> Codifying so-called-default C.C. reactions in the loss detection is a
> step backward that the authors try their best to avoid.To keep the
> document less "abstract / unclear" as many WGLC reviewers commented,
> we use examples to illustrate that includes CC actions. But the
> details of these CC actions are likely to become obsolete as CC
> continues to advance hopefully.
>
> >
> > Sec. 3.4 (and elsewhere when discussing recovering a dropped
> > retransmission):
> >
> > It is very useful that RACK-TLP allows for recovering dropped rexmits.
> > However, it seems that the spec ignores the fact that loss of a
> > retransmission is a loss in a successive window that requires reacting
> > to congestion twice as per RFC 5681. This advice must be included in
> > the specification because with RACK-TLP recovery of dropped rexmit
> > takes place during the fast recovery which is very different
> > from the other standard algorithms and therefore easy to miss
> > when implementing this spec.
>
> per RFC5681 sec 4.3 https://tools.ietf.org/html/rfc5681#section-4.3
> "Loss in two successive windows of data, or the loss of a
>   retransmission, should be taken as two indications of congestion and,
>    therefore, cwnd (and ssthresh) MUST be lowered twice in this case."
>
> RACK-TLP is a loss detection algorithm. RFC5681 is crystal clear on
> this so I am not sure what clause you suggest to add to RACK-TLP.
> >
> > Sec 9.3:
> >
> > In Section 9.3 it is stated that the only modification to the existing
> > congestion control algorithms is that one outstanding loss probe
> > can be sent even if the congestion window is fully used. This is
> > fine, but the spec lacks the advice that if a new data segment is sent
> > this extra segment MUST NOT be included when calculating the new value
> > of ssthresh as per the equation (4) of RFC 5681. Such segment is an
> > extra segment not allowed by cwnd, so it must be excluded from
> > FlightSize, if the TLP probe detects loss or if there is no ack
> > and RTO is needed to trigger loss recovery.
>
> Why exclude TLP (or any data) from FlightSize? The congestion control
> needs precise accounting of the flight size to react to congestion
> properly.
>
> >
> > In these cases the temporary over-commit is not accounted for as DupAck
> > does not decrease FlightSize and in case of an RTO the next ACK comes too
> > late. This is similar to the rule in RFC 5681 and RFC 6675 that prohibits
> > including the segments transmitted via Limitid Transmit in the
> > calculation of ssthresh.
> >
> > In Section 9.3 a few example scenarios are used to illustriate the
> > intended operation of RACK-TLP.
> >
> >   In the first example a sender has a congestion window (cwnd) of 20
> >   segments on a SACK-enabled connection.  It sends 10 data segments
> >   and all of them are lost.
> >
> > The text claims that without RACK-TLP the ending cwnd would be 4 segments
> > due to congestion window validation. This is incorrect.
> > As per RFC 7661 the sender MUST exit the non-validated phase upon an
> > RTO. Therefore the ending cwnd would be 5 segments (or 5 1/2 segments if
> > the TCP sender uses the equation (4) of RFC 5681).
> >
> > The operation with RACK-TLP would inevitably result in congestion
> > collapse if RACK-TLP behaves as described in the example because
> > it restores the previous cwnd of 10 segments after the fast recovery
> > and would not react to congestion at all! I think this is not the
> > intended behavior by this spec but a mistake in the example.
> > The ssthresh calculated in the beginning of loss recovery should
> > be 5 segments as per RFC 6675 (and RFC 5681).
> To clarify, would this text look more clear?
>
> 'an ending cwnd set to the slow start threshold of 5 segments (half of
> the original congestion window of 10 segments)'
>
> >
> > Furthermore, it seems that this example with RACK-TLP refers to using
> > PRR_SSRB which effectively implements regular slow start in this
> > case(?). From congestion control point of view this is correct because
> > the entire flight of data as well as ack clock was lost.
> >
> > However, as correctly discussed in Sec 2, congestion window must be reset
> > to 1 MSS when an entire flight of data is and Ack clock is lost. But how
> > can an implementor know what to do if she/he is not implementing the
> > experimental PRR algrorithm? This spec articulates specifying an
> > alternative for DupAck counting, indicating that TLP is used to trigger
> > Fast Retransmit & Fast Recovery only, not a loss recovery in slow start.
> > This means that without an additional advise an implementation of this
> > spec would just halve the cwnd and ssthresh and send a potentially very
> > large burst of segments in the beginning of the Fast Recovery because
> > there is no ack clock. So, this spec begs for an advise (MUST) when to
> > slow start and reset cwnd and when not, or at least a discussion of
> > this problem and some sort of advise what to do and what to avoid.
> > And, maybe a recommendation to implement it with PRR?
>
> It's wise to decouple loss detection (RACK-TLP) vs congestion/burst
> control (when to slow-start). The use of PRR is just an example to
> illustrate and not meant for a recommendation.
>
> Section 3 has a lengthy section to elaborate the key point of RACK-TLP
> is to maximize the chance of fast recovery. How C.C. governs the
> transmission dynamics after losses are detected are out of scope of
> this document in our authors' opinions.
>
>
> >
> > Another question relates to the use of TLP and adjusting timer(s) upon
> > timeout. In the same example discussed above, it is clear that PTO
> > that fires TLP is just a more aggressive retransmit timer with
> > an alternative data segment to (re)transmit.
> >
> > Therefore, as per RFC 2914 (BCP 41), Sec 9.1, when PTO expires, it is in
> > effect a retransmission timout and the timer(s) must be backed-off.
> > This is not adviced in this specification. Whether it is the TCP RTO
> > or PTO that should be backed-off is an open question.  Otherwise,
> > if the congestion is persistent and further transmission are also lost,
> > RACK-TLP would not react to congestion properly but would keep
> > retransmitting with "constant" timer value because new RTT estimate
> > cannot be obtained.
> > On a buffer bloated and heavily congested bottleneck this would easily
> > result in sending at least one unnecessary retransmission per one
> > delivered segment which is not advisable (e.g., when there are a huge
> > number of applications sharing a constrained bottleneck and these
> > applications are sending only one (or a few) segments and then
> > waiting for an reply from the peer before sending another request).
>
> Thanks for pointing to the RFC.  After TLP, RTO timers will
> exp-backoff (as usual) for stability reasons mentioned in sec 9.3
> (didn't find 9.1 relevant). In your scenario, you presuppose the
> retransmission is unnecessary so obviously TLP is not good. Consider
> what happens without TLP where all the senders fire RTO spuriously and
> blow up the network. It is equally unfortunate behavior. "bdp
> insufficient of many flows" is a congestion control problem
>
>
> >
> > Additional notes:
> >
> > Sec 2.2:
> >
> > Example 2:
> > "Lost retransmissions cause a  resort to RTO recovery, since
> >   DUPACK-counting does not detect the loss of the retransmissions.
> >   Then the slow start after RTO recovery could cause burst losses
> >   again that severely degrades performance [POLICER16]."
> >
> > RTO reovery is done in slow start. The last sentence is confusing as
> > there is no (new) slow-start after RTO recovery (or more precisely
> > slow start continues until cwnd > ssthresh). Do you mean: if/when slow
> > start still continues after RTO Recovery has repaired lost segments,
> > it may cause burst losses again?
> I mean the slow start after (the start of) RTO recovery. HTH
>
>
> >
> > Example 3:
> >   "If the reordering degree is beyond DupThresh, the DUPACK-
> >    counting can cause a spurious fast recovery and unnecessary
> >    congestion window reduction.  To mitigate the issue, [RFC4653]
> >    adjusts DupThresh to half of the inflight size to tolerate the
> >    higher degree of reordering.  However if more than half of the
> >    inflight is lost, then the sender has to resort to RTO recovery."
> >
> > This seems to be somewhat incorrect description of TCP-NCR specified in
> > RFC 4653. TCP-NCR uses Extended Limited Transmit that keeps on sending
> > new data segments on DupAcks that makes it likely to avoid an RTO in
> > the given example scenario, if not too many of the the new data
> > segments triggered by Extended Limited Transmit are lost.
> sorry I don't see how the text is wrong describing RFC4653,
> specifically the algorithm in adjusting ssthresh
>
> >
> > Sec. 3.5:
> >
> >   "For example, consider a simple case where one
> >   segment was sent with an RTO of 1 second, and then the application
> >   writes more data, causing a second and third segment to be sent right
> >   before the RTO of the first segment expires.  Suppose only the first
> >   segment is lost.  Without RACK, upon RTO expiration the sender marks
> >   all three segments as lost and retransmits the first segment.  When
> >   the sender receives the ACK that selectively acknowledges the second
> >   segment, the sender spuriously retransmits the third segment."
> >
> > This seems incorrect. When the sender receives the ACK that selectively
> > acknowledges the second segment, it is a DupAck as per RFC 6675 and does
> > not increase cwnd and cwnd remains as 1 MSS and pipe is 1 MSS. So, the
> > rexmit of the third segment is not allowad until the cumulative ACK of
> > the first segment arrives.
> I don't see where RFC6675 forbids growing cwnd. Even if it does, I
> don't think it's a good thing (in RTO-slow-start) as DUPACK clearly
> indicates a delivery has been made.
>
>
> >
> > Best regards,
> >
> > /Markku
> >
> >
> >
> > On Mon, 16 Nov 2020, The IESG wrote:
> >
> > >
> > > The IESG has received a request from the TCP Maintenance and Minor
> Extensions
> > > WG (tcpm) to consider the following document: - 'The RACK-TLP loss
> detection
> > > algorithm for TCP'
> > >  <draft-ietf-tcpm-rack-13.txt> as Proposed Standard
> > >
> > > The IESG plans to make a decision in the next few weeks, and solicits
> final
> > > comments on this action. Please send substantive comments to the
> > > last-call@ietf.org mailing lists by 2020-11-30. Exceptionally,
> comments may
> > > be sent to iesg@ietf.org instead. In either case, please retain the
> beginning
> > > of the Subject line to allow automated sorting.
> > >
> > > Abstract
> > >
> > >
> > >   This document presents the RACK-TLP loss detection algorithm for TCP.
> > >   RACK-TLP uses per-segment transmit timestamps and selective
> > >   acknowledgements (SACK) and has two parts: RACK ("Recent
> > >   ACKnowledgment") starts fast recovery quickly using time-based
> > >   inferences derived from ACK feedback.  TLP ("Tail Loss Probe")
> > >   leverages RACK and sends a probe packet to trigger ACK feedback to
> > >   avoid retransmission timeout (RTO) events.  Compared to the widely
> > >   used DUPACK threshold approach, RACK-TLP detects losses more
> > >   efficiently when there are application-limited flights of data, lost
> > >   retransmissions, or data packet reordering events.  It is intended to
> > >   be an alternative to the DUPACK threshold approach.
> > >
> > >
> > >
> > >
> > > The file can be obtained via
> > > https://datatracker.ietf.org/doc/draft-ietf-tcpm-rack/
> > >
> > >
> > >
> > > No IPR declarations have been submitted directly on this I-D.
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > tcpm mailing list
> > > tcpm@ietf.org
> > > https://www.ietf.org/mailman/listinfo/tcpm
> > >
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>