Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) toProposed Standard

Markku Kojo <kojo@cs.helsinki.fi> Fri, 18 December 2020 20:41 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 68CCD3A0896; Fri, 18 Dec 2020 12:41:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id roxgbpYPuv0G; Fri, 18 Dec 2020 12:41:35 -0800 (PST)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DE0E53A0880; Fri, 18 Dec 2020 12:41:32 -0800 (PST)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Fri, 18 Dec 2020 22:41:22 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=bI6ep3 2fcIzUZk2zpmAhtWE0d6jNHiTEnbG+ba1++kU=; b=b4ongEI2HZxHdYRZ9bz0nC Vbk3w+vJyMgOQMJrhG/KZP0Ni0hDmpszdqINFl0B4JL9Xm81oVnMOEE4WRRThkim w1wN38tTdFTvx7DrNXWTV/ai4QLX7mtxEYbLPancblhJVTjPbCU8hmh1349g2F11 mJlfAz2h8WE0Fy2JzBaCg=
Received: from hp8x-60 (85-76-102-128-nat.elisa-mobile.fi [85.76.102.128]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Fri, 18 Dec 2020 22:41:22 +0200 id 00000000005A1477.000000005FDD13F2.00005454
Date: Fri, 18 Dec 2020 22:41:21 +0200
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Yuchung Cheng <ycheng@google.com>
cc: Praveen Balasubramanian <pravb@microsoft.com>, "martin.h.duke@gmail.com" <martin.h.duke@gmail.com>, "tcpm@ietf.org" <tcpm@ietf.org>, "draft-ietf-tcpm-rack@ietf.org" <draft-ietf-tcpm-rack@ietf.org>, "tuexen@fh-muenster.de" <tuexen@fh-muenster.de>, "draft-ietf-tcpm-rack.all@ietf.org" <draft-ietf-tcpm-rack.all@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>, "tcpm-chairs@ietf.org" <tcpm-chairs@ietf.org>
In-Reply-To: <CAK6E8=c8sjfzgfYadHsTk1LvFCJs_EcMjR-kpcj+krkaytEE8g@mail.gmail.com>
Message-ID: <alpine.DEB.2.21.2012181556050.5844@hp8x-60.cs.helsinki.fi>
References: <160557473030.20071.3820294165818082636@ietfa.amsl.com> <alpine.DEB.2.21.2012081502530.5180@hp8x-60.cs.helsinki.fi> <CADVnQykrm1ORm7N+8L0iEyqtJ2rQ1dr1xg+EmYcWQE9nmDX_mA@mail.gmail.com> <alpine.DEB.2.21.2012141505360.5844@hp8x-60.cs.helsinki.fi> <CAM4esxT9hNqX4Zo+9tMRu9MNEfwuUwebaBFcitj1pCZx_NkqHA@mail.gmail.com> <alpine.DEB.2.21.2012160256380.5844@hp8x-60.cs.helsinki.fi> <CAM4esxRDrFZAYBS4exaQFFj6Djwe6KHrzMEtGvOhscpoxk3RQA@mail.gmail.com> <alpine.DEB.2.21.2012162339560.5844@hp8x-60.cs.helsinki.fi> <CAM4esxRQjuzo4u9oUN2CDC1vbeFxmSarjBLqpboatjWouiL37Q@mail.gmail.com> <CAM4esxQ67K9kcaWwNot2DfJpCe8ShOngXogxKU=KXZJGn+pbXg@mail.gmail.com> <alpine.DEB.2.21.2012171019160.5844@hp8x-60.cs.helsinki.fi> <CAM4esxTvTjvVk5hE0z5UnLBdKv04UC+daRBxsnnZ1qJTa=gSgw@mail.gmail.com> <CY1PR00MB0172182657354535DF24E790B6C49@CY1PR00MB0172.namprd00.prod.outlook.com> <CAK6E8=c8sjfzgfYadHsTk1LvFCJs_EcMjR-kpcj+krkaytEE8g@mail.gmail.com>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-21614-1608324082-0001-2"
Content-ID: <alpine.DEB.2.21.2012182228100.27827@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/MrhB5a6U0fR3DUSCb1h69PSbI8Q>
Subject: Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) toProposed Standard
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Dec 2020 20:41:40 -0000

Hi,

On Thu, 17 Dec 2020, Yuchung Cheng wrote:

> How about
>  
> "9.3.  Interaction with congestion control
> 
> RACK-TLP intentionally decouples loss detection ... 
> As mentioned in Figure 1 caption, RFC5681 mandates a principle that
> Loss in two successive windows of data, or the loss of a
> retransmission, should be taken as two indications of congestion, and
> therefore reacted separately. However implementation of RFC6675 pipe
> algorithm may not directly account for this newly detected congestion
> events properly. PRR [RFCxxxx] is RECOMMENDED for the specificcongestion control actions taken
> upon the losses detected by RACK-TLP"

I believe you intend to add such a mention in Figure 1. So ACK for it.

I think

  "... implementation of RFC6675 pipe algorithm may not directly
   account ..."

is too mild expression, i.e., "may not" does not quite capture the 
problem. The problem in my view is that RFC6675 algorithm simply breaks, 
i.e., it possibly requires significant modification in many aspects, but 
I cannot immediately say what all falls apart other than determining the 
new value for cwnd & ssthresh with FlightSize and the pipe calculation. 
Also, NextSeg () seemingly does not work for lost retransmissions, etc.

I think the draft should say that RFC 6675 algorithm does not handle such 
a second congestion response correctly. Therefore it MUST not be used but 
a modified algorithm that carefully addresses such a case is needed.

Unless I am not mistaken, PRR algorithm does not work without any 
modification either (for reducing window again for a lost retransmission). 
At least FlightSize cannot be used for deterniming new value for ssthresh 
when a loss of a retransmission is detected in the middle of fast 
recovery. I cannot tell for sure what else possibly breaks. If it is a 
trivial change in PRR, I just don't see it immediately. But if it is, I 
think it would be useful to explicitly point it out.

Another important issue with detecting lost retransmissions is tracking 
when this occurs. Currently RACK-TLP algorithm (pseudocode) detects a loss 
of a retransmitted segment in the same way as a loss of an original 
transmission. In order to take an appropriate congestion response when the 
loss of a retransmitted segment occurs, it would be good if the RACK-TLP 
algorithm would track when this occurs and indicate it. In particular, I 
think it would be very important when the congestion control actions are 
in another document.
Otherwise such a congestion control document would first need to modify 
the RACK-TLP algorithm to add the necessary variables and actions.

FYI: When we decoupled spurious retransmission detection and congestion 
control response, the detection RFCs (EIFEL [RFC 3522] and F-RTO [RFC 
5682] included the necessary variable for indicating that spurious 
retransmission was detected and the algorithms set the variable.

> To Makku's request for "what's the justification to enter fast recovery". Consider this example
> w/o RACK-TLP
> 
> T0: Send 100 segments but application-limited. All are lost.
> T-2RTT: app writes so another 3 segments are sent. Made to the destination and triggered 3
> DUPACKs
> T-3RTT: 3 DUPACK arrives. start fast recovery and subsequent cc reactions to burst ~50 packets
> with Reno 
> 
> In this case any ACK occured before RTO is (generally) considered clock-acked, and how I
> understand Van's initial design.  This behavior existed decades before RACK-TLP. RACK-TLP
> essentially changes the "3-segments by app" to "1-segment by tcp". 

Ack, this kind of scenario is a problem with RFC 6675 loss detection and 
recovery, but not with Reno loss recovery (RFC 5681) nor with Newreno 
loss recovery (RFC 6582).

Sure congestion control response for this scenario should be fixed with 
RFC 6675. PRR is one such fix.

But now we are discussing pure RFC 6675 (without PRR cc modification).
My point is that we should not make things (congestion control behavior) 
worse than in the current standard track algorithms even though there is 
this problem with the current algorithm. I try to explain why I think 
things may get worse if TLP probe is used to trigger fast recovery in 
*all* scenarios where it detects a loss. So we compare RACK-TLP loss 
detection to pure RFC 6675 DupAck detection when a TCP sender uses 
standard RFC 6675 loss recovery and congestion control in both cases.

(0) In the aforementioned scenario the same problem occurs also with
     RACK-TLP loss detection.
     No diffrence, both should be fixed/mitigated.

Differences:

(1) RFC 6675 has this burst problem only in the aforementioned scenario.

     RACK-TLP has also additional scenarios when this problem may
     occur:

     (a) Always when the sender is cwnd limited and the entire
         flight is lost. This case includes large transfers
         where cwnd is often large, resulting in large bursts.

     (b) Always when the sender is rwnd limited and the entire
         flight is lost.

     (c) Always when the sender remains application limited and
         the entire flight is lost. That is, when the sending
         application does not provide more data to send within
         a period of one RTO.

      This means that the burst problem is likely to occur more
      often with RACK-TLP because there are more occassions when
      a problematic scenario may emerge.
      This is an important difference that is likely to create more
      congestion with RACK-TLP detection than with DupAck-based
      detection, if TLP-detected loss always invokes fast recovery
      without necessary safeguards in the fast recovery algorithm

(2)  A combined effect of entering fast recovery when the
      entire flight is lost and capability to detect lost
      retransmissions:

      Important to note first: if there was severe congestion that
      continues it is likely that some (many) segments are
      dropped from the burst of retransmitted segments.

      In this case:

        RFC 6675 sender (re)transmits more segments ack-clocked at
        most for one RTT at half of the previous rate and then waits
        until RTO expires --> cwnd=1

        RACK-TLP sender continues (re)transmitting segments ack-clocked
        at the half of the previous rate maybe for *several RTTs* until
        all segments repaired or RTO expires.
        If all losses are repaired and RACK-TLP does not take another
        congestion response when a lost retransmission is detected,
        it continues all the way without further cwnd reduction.

      Thereby RACK-TLP sender would be significantly more aggessive that
      RFC 6675 sender.

      If RACK-TLP takes another congestion response when a lost
      retransmission is detected, it is still more aggressive but
      much more reasonable as it halves cwnd again.


(3) Maybe not that important:

     Three DupAcks provide some ack clocking feedback that can be
     used to mitigate pure RFC 6675 behavior. For example, the sender
     may derive the ack clock from these three DupAcks and use it to
     clock out the retransmissions (e.g., using at least doubled
     spacing compared to what is inferfed from DupAcks). Segment
     offloading (GRO) may interfere though?

     A single DupAck for RACK-TLP does not provide any ack clock
     information for this purposes.

Best regards,

/Markku


> On Thu, Dec 17, 2020 at 10:52 AM Praveen Balasubramanian <pravb@microsoft.com> wrote:
>
>       I agree that we should have a note in this RFC about congestion control action upon
>       detecting lost retransmission(s).
>
>        
>
>       From: tcpm <tcpm-bounces@ietf.org> On Behalf Of Martin Duke
>       Sent: Thursday, December 17, 2020 7:30 AM
>       To: Markku Kojo <kojo@cs.helsinki.fi>
>       Cc: tcpm@ietf.org Extensions <tcpm@ietf.org>; draft-ietf-tcpm-rack@ietf.org; Michael
>       Tuexen <tuexen@fh-muenster.de>; draft-ietf-tcpm-rack.all@ietf.org; Last Call
>       <last-call@ietf.org>; tcpm-chairs <tcpm-chairs@ietf.org>
>       Subject: [EXTERNAL] Re: [tcpm] Last Call:
>       <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to Proposed
>       Standard
> 
>  
> 
> Hi Markku,
> 
>  
> 
> Thanks, now I understand your objections.
> 
>  
> 
> Martin
> 
>  
> 
> On Thu, Dec 17, 2020 at 12:46 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>
>       Hi,
>
>       On Wed, 16 Dec 2020, Martin Duke wrote:
>
>       > I spent a little longer looking at the specs more carefully, and I explained
>       (1)
>       > incorrectly in my last two messages. P21..29 are not Limited Transmit
>       packets. 
>
>       Correct. Just normal the rule that allows sending new data during fast
>       recovery.
>
>       > However, unless I'm missing something else, 6675 is clear that the recovery
>       period
>       > does not end until the cumulative ack advances, meaning that detecting the
>       lost
>       > retransmission of P1 does not trigger another MD directly.
>
>       As I have said earlier, RFC 6675 does not repeat all congestion control
>       principles from RFC 5681. It definitely honors the CC principle that
>       requires to treat a loss of a retransmission as a new congestion
>       indication and another MD. I believe I am obligated to know this as a
>       co-author of RFC 6675. ;)
>
>       RFC 6675 explicitly indicates that it follows RFC 5681 by stating in the
>       abstract:
>
>       " ... conforms to the spirit of the current congestion control
>         specification (RFC 5681 ..."
>
>       And in the intro:
>
>          "The algorithm specified in this document is a straightforward
>           SACK-based loss recovery strategy that follows the  guidelines
>           set in [RFC5681] ..."
>
>       I don't think there is anything unclear in this.
>
>       RFC 6675 and all other standard congestion controls (RFC 5581 and RFC
>       6582) handle a loss of a retransmission by "enforcing" RTO to detect it.
>       And RTO guarantees MD. RACK-TLP changes the loss detection in this case
>       and therefore the standard congestion control algorithms do not have
>       actions to handle it corrrectly. That is the point.
>
>       BR,
>
>       /Markku
>
>       > Thanks for this exercise! It's refreshed my memory of these details after
>       working
>       > on slightly different QUIC algorithms a long time.
>       >
>       > On Wed, Dec 16, 2020, 18:55 Martin Duke <martin.h.duke@gmail.com> wrote:
>       > (1) Flightsize: in RFC 6675. Section 5, Step 4.2:
>       >
>       >        (4.2) ssthresh = cwnd = (FlightSize / 2)
>       >
>       >              The congestion window (cwnd) and slow start threshold
>       >              (ssthresh) are reduced to half of FlightSize per [RFC5681].
>       >              Additionally, note that [RFC5681] requires that any
>       >              segments sent as part of the Limited Transmit mechanism not
>       >              be counted in FlightSize for the purpose of the above
>       >              equation.
>       >
>       > IIUC the segments P21..P29 in your example were sent because of Limited
>       > Transmit, and so don't count. The flightsize for the purposes of (4.2) is
>       > therefore 20 after both losses, and the cwnd does not go up on the second
>       > loss.
>       >
>       > (2)
>       > " Even a single shot burst every time there is significant loss
>       > event is not acceptable, not to mention continuous aggressiveness, and
>       > this is exactly what RFC 2914 and RFC 5033 explicitly address and warn
>       > about."
>       >
>       > "Significant loss event" is the key phrase here. The intent of TLP/PTO is to
>       > equalize the treatment of a small packet loss whether it happened in the
>       > middle of a burst or the end. Why should an isolated loss be treated
>       > differently based on its position in the burst? This is just a logical
>       > extension of fast retransmit, which also modified the RTO paradigm. The
>       > working group consensus is that this is a feature, not a bug; you're welcome
>       > to feel otherwise but I suspect you're in the rough here.
>       >
>       > Regards
>       > Martin
>       >
>       >
>       > On Wed, Dec 16, 2020 at 4:11 PM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>       >       Hi Martin,
>       >
>       >       See inline.
>       >
>       >       On Wed, 16 Dec 2020, Martin Duke wrote:
>       >
>       >       > Hi Markku,
>       >       >
>       >       > There is a ton here, but I'll try to address the top points.
>       >       Hopefully
>       >       > they obviate the rest.
>       >
>       >       Sorry for being verbose. I tried to be clear but you actually
>       >       removed my
>       >       key issues/questions ;)
>       >
>       >       > 1.
>       >       > [Markku]
>       >       > "Hmm, not sure what you mean by "this is a new loss detection
>       >       after
>       >       > acknowledgment of new data"?
>       >       > But anyway, RFC 5681 gives the general principle to reduce
>       >       cwnd and
>       >       > ssthresh twice if a retransmission is lost but IMHO (and I
>       >       believe many
>       >       > who have designed new loss recovery and CC algorithms or
>       >       implemented
>       >       > them
>       >       > agree) that it is hard to get things right if only congestion
>       >       control
>       >       > principles are available and no algorithm."
>       >       >
>       >       > [Martin]
>       >       > So 6675 Sec 5 is quite explicit that there is only one cwnd
>       >       reduction
>       >       > per fast recovery episode, which ends once new data has been
>       >       > acknowledged.
>       >
>       >       To be more precise: fast recovery ends when the current window
>       >       becomes
>       >       cumulatively acknowledged, that is,
>       >
>       >       (4.1) RecoveryPoint (= HighData at the beginning) becomes
>       >       acknowledged
>       >
>       >       I believe we agree and you meant this although new data below
>       >       RecoveryPoint may become cumulatively acknowledged already
>       >       earlier
>       >       during the fast recovery. Reno loss recovery in RFC 5681 ends,
>       >       when
>       >       (any) new data has been acknowledged.
>       >
>       >       > By definition, if a retransmission is lost it is because
>       >       > newer data has been acknowledged, so it's a new recovery
>       >       episode.
>       >
>       >       Not sure where you have this definition? Newer than what are you
>       >       referring to?
>       >
>       >       But, yes, if a retransmission is lost with RFC 6675 algorithm,
>       >       it requires RTO to be detected and definitely starts a new
>       >       recovery
>       >       episode. That is, a new recovery episode is enforced by step
>       >       (1.a) of
>       >       NextSeg () which prevents retransmission if a segment that has
>       >       already
>       >       been retransmitted. If RACK-TLP is used for detecting loss with
>       >       RFC 6675
>       >       things get different in many ways, because it may detect loss of
>       >       a
>       >       retransmission. It would pretty much require an entire redesign
>       >       of the algorith. For example, calculation of pipe does not
>       >       consider
>       >       segments that have been retransmitted more than once.
>       >
>       >       > Meanwhile, during the Fast Recovery period the incoming acks
>       >       implicitly
>       >       > remove data from the network and therefore keep flightsize
>       >       low.
>       >
>       >       Incorrect. FlightSize != pipe. Only cumulative acks remove data
>       >       from
>       >       FlightSize and new data transmitted during fast recovery inflate
>       >       FlightSize. How FlightSize evolves depends on loss pattern as I
>       >       said.
>       >       It is also possible that FlightSize is low, it may err in both
>       >       directions. A simple example can be used as a proof for the case
>       >       where
>       >       cwnd increases if a loss of retransmission is detected and
>       >       repaired:
>       >
>       >       RFC 6675 recovery with RACK-TLP loss detection:
>       >       (contains some inaccuracies because it has not been defined how
>       >       lost rexmits are calculated into pipe)
>       >
>       >       cwnd=20; packets P1,...,P20 in flight = current window of data
>       >       [P1 dropped and rexmit of P1 will also be dropped]
>       >
>       >       DupAck w/SACK for P2 arrives
>       >       [loss of P1 detected after one RTT from original xmit of P1]
>       >       [cwnd=ssthresh=10]
>       >       P1 is rexmitted (and it logically starts next window of data)
>       >
>       >       DupAcks w/ SACK for original P3..11 arrive
>       >       DupAck w/ SACK for original P12 arrives
>       >       [cwnd-pipe = 10-9 >=1]
>       >       send P21
>       >       DupAck w/SACK for P13 arrives
>       >       send P22
>       >       ...
>       >       DupAck w/SACK for P20 arrives
>       >       send P29
>       >       [FlightSize=29]
>       >
>       >       (Ack for rexmit of P1 would arrive here unless it got dropped)
>       >
>       >       DupAck w/SACK for P21 arrives
>       >       [loss of rexmit P1 detected after one RTT from rexmit of P1]
>       >
>       >       SET cwnd = ssthresh = FlightSize/2= 29/2 = 14,5
>       >
>       >       CWND INCREASES when it should be at most 5 after halving it
>       >       twice!!!
>       >
>       >       > We can continue to go around on our interpretation of these
>       >       documents,
>       >       > but fundamentally if there is ambiguity in 5681/6675 we should
>       >       bis
>       >       > those RFCs rather than expand the scope of RACK.
>       >
>       >       As I said earlier, I am not opposing bis, though 5681bis wuold
>       >       not
>       >       be needed, I think.
>       >
>       >       But let me repeat: if we publish RACK-TLP now without necessary
>       >       warnings
>       >       or with a correct congesion control algorithm someone will try
>       >       to
>       >       implement RACK-TLP with RFC 6675 and it will be a total mesh.
>       >       The
>       >       behavior will be unpredictable and quite likely unsafe
>       >       congestion
>       >       control behavior.
>       >
>       >       > 2.
>       >       > [Markku]
>       >       > " In short:
>       >       > When with a non-RACK-TLP implementation timer (RTO) expires:
>       >       cwnd=1
>       >       > MSS,
>       >       > and slow start is entered.
>       >       > When with a RACK_TLP implementation timer (PTO) expires,
>       >       > normal fast recovery is entered (unless implementing
>       >       > also PRR). So no RTO recovery as explicitly stated in Sec.
>       >       7.4.1."
>       >       >
>       >       > [Martin]
>       >       > There may be a misunderstanding here. PTO is not the same as
>       >       RTO, and
>       >       > both mechanisms exist! The loss response to a PTO is to send a
>       >       probe;
>       >       > the RTO response is as with conventional TCP. In Section 7.3:
>       >
>       >       No, I don't think I misunderstood. If you call timeout with
>       >       another name, it is still timeout. And congestion control does
>       >       not
>       >       consider which segments to send (SND.UNA vs. probe w/ higher
>       >       sequence
>       >       number), only how much is sent.
>       >
>       >       You ignored my major point where I decoupled congestion control
>       >       from loss
>       >       detection and loss recovery and compared RFC 5681 behavior to
>       >       RACK-TLP
>       >       behavior in exactly the same scenario where an entire flight is
>       >       lost and
>       >       timer expires.
>       >
>       >       Please comment why congestion control behavior is allowed to be
>       >       radically
>       >       different in these two implementations?
>       >
>       >       RFC 5681 & RFC 6298 timeout:
>       >
>       >               RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
>       >              1. RTO timer expires
>       >              2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
>       >              3. Ack of rexmit sent in step 2 arrives
>       >              4. cwnd = cwnd+1 MSS; send two segments
>       >              ...
>       >
>       >       RACK-TLP timeout:
>       >
>       >               PTO=min(2*SRTT,RTO) (PTO used for arming the timer)
>       >              1. PTO times expires
>       >              2. (cwnd=1 MSS); (re)xmit one segment
>       >              3. Ack of (re)xmit sent in srep 2 arrives
>       >              4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments
>       >
>       >       If FlightSize is 100 segments when timer expires, congestion
>       >       control is
>       >       the same in steps 1-3, but in step 4 the standard congestion
>       >       control
>       >       allows transmitting 2 segments, while RACK-TLP would allow
>       >       blasting 50 segments.
>       >
>       >       > After attempting to send a loss probe, regardless of whether a
>       >       loss
>       >       >    probe was sent, the sender MUST re-arm the RTO timer, not
>       >       the PTO
>       >       >    timer, if FlightSize is not zero.  This ensures RTO
>       >       recovery remains
>       >       >    the last resort if TLP fails.
>       >       > "
>       >
>       >       This does not prevent the above RACK-TLP behavior from getting
>       >       realized.
>       >
>       >       > So a pure RTO response exists in the case of persistent
>       >       congestion that
>       >       > causes losses of probes or their ACKs.
>       >
>       >       Yes, RTO response exists BUT only after RACK-TLP at least once
>       >       blasts the
>       >       network. It may well be that with smaller windows RACK-TLP is
>       >       successful
>       >       during its TLP initiated overly aggressive "fast recovery" and
>       >       never
>       >       enters RTO recovery because it may detect and repair also loss
>       >       of
>       >       rexmits. That is, it continues at too high rate even if lost
>       >       rexmits
>       >       indicate that congestion persists in successive windows of data.
>       >       And
>       >       worse, it is successful because it pushes away other compatible
>       >       TCP
>       >       flows by being too aggressive and unfair.
>       >
>       >       Even a single shot burst every time there is significant loss
>       >       event is not acceptable, not to mention continuous
>       >       aggressiveness, and
>       >       this is exactly what RFC 2914 and RFC 5033 explicitly address
>       >       and warn
>       >       about.
>       >
>       >       Are we ignoring these BCPs that have IETF consensus?
>       >
>       >       And the other important question I'd like to have an answer:
>       >
>       >       What is the justification to modify standard TCP congestion
>       >       control to
>       >       use fast recovery instead of slow start for a case where timeout
>       >       is
>       >       needed to detect the packet losses because there is no feedback
>       >       and ack
>       >       clock is lost? RACK-TLP explicitly instructs to do so in Sec.
>       >       7.4.1.
>       >
>       >       As I noted: based on what is written in the draft it does not
>       >       intend to
>       >       change congestion control but effectively it does.
>       >
>       >       /Markku
>       >
>       >       > Martin
>       >       >
>       >       >
>       >       > On Wed, Dec 16, 2020 at 11:39 AM Markku Kojo
>       >       <kojo@cs.helsinki.fi>
>       >       > wrote:
>       >       >       Hi Martin,
>       >       >
>       >       >       On Tue, 15 Dec 2020, Martin Duke wrote:
>       >       >
>       >       >       > Hi Markku,
>       >       >       >
>       >       >       > Thanks for the comments. The authors will incorporate
>       >       >       many of your
>       >       >       > suggestions after the IESG review.
>       >       >       >
>       >       >       > There's one thing I don't understand in your comments:
>       >       >       >
>       >       >       > " That is,
>       >       >       > where can an implementer find advice for correct
>       >       >       congestion control
>       >       >       > actions with RACK-TLP, when:
>       >       >       >
>       >       >       > (1) a loss of rexmitted segment is detected
>       >       >       > (2) an entire flight of data gets dropped (and
>       >       detected),
>       >       >       >      that is, when there is no feedback available and
>       >       a
>       >       >       timeout
>       >       >       >      is needed to detect the loss "
>       >       >       >
>       >       >       > Section 9.3 is the discussion about CC, and is clear
>       >       that
>       >       >       the
>       >       >       > implementer should use either 5681 or 6937.
>       >       >
>       >       >       Just a cite nit: RFC 5681 provides basic CC concepts and
>       >       >       some useful CC
>       >       >       guidelines but given that RACK-TLP MUST implement SACK
>       >       the
>       >       >       algorithm in
>       >       >       RFC 5681 is not that useful and an implementer quite
>       >       likely
>       >       >       follows
>       >       >       mainly the algorithm in RFC 6675 (and not RFC 6937 at
>       >       all
>       >       >       if not
>       >       >       implementing PRR).
>       >       >       And RFC 6675 is not mentioned in Sec 9.3, though it is
>       >       >       listed in the
>       >       >       Sec. 4 (Requirements).
>       >       >
>       >       >       > You went through the 6937 case in detail.
>       >       >
>       >       >       Yes, but without correct CC actions.
>       >       >
>       >       >       > If 5681, it's pretty clear to me that in (1) this is a
>       >       >       new loss
>       >       >       > detection after acknowledgment of new data, and
>       >       therefore
>       >       >       requires a
>       >       >       > second halving of cwnd.
>       >       >
>       >       >       Hmm, not sure what you mean by "this is a new loss
>       >       >       detection after
>       >       >       acknowledgment of new data"?
>       >       >       But anyway, RFC 5681 gives the general principle to
>       >       reduce
>       >       >       cwnd and
>       >       >       ssthresh twice if a retransmission is lost but IMHO (and
>       >       I
>       >       >       believe many
>       >       >       who have designed new loss recovery and CC algorithms or
>       >       >       implemented them
>       >       >       agree) that it is hard to get things right if only
>       >       >       congestion control
>       >       >       principles are available and no algorithm.
>       >       >       That's why ALL mechanisms that we have include a quite
>       >       >       detailed algorithm
>       >       >       with all necessary variables and actions for loss
>       >       recovery
>       >       >       and/or CC
>       >       >       purposes (and often also pseudocode). Like this document
>       >       >       does for loss
>       >       >       detection.
>       >       >
>       >       >       So the problem is that we do not have a detailed enough
>       >       >       algorithm or
>       >       >       rule that tells exactly what to do when a loss of rexmit
>       >       is
>       >       >       detected.
>       >       >       Even worse, the algorithms in RFC 5681 and RFC 6675
>       >       refer
>       >       >       to
>       >       >       equation (4) of RFC 5681 to reduce ssthresh and cwnd
>       >       when a
>       >       >       loss
>       >       >       requiring a congestion control action is detected:
>       >       >
>       >       >         (cwnd =) ssthresh = FlightSize / 2)
>       >       >
>       >       >       And RFC 5681 gives a warning not to halve cwnd in the
>       >       >       equation but
>       >       >       FlightSize.
>       >       >
>       >       >       That is, this equation is what an implementer
>       >       intuitively
>       >       >       would use
>       >       >       when reading the relevant RFCs but it gives a wrong
>       >       result
>       >       >       for
>       >       >       outstanding data when in fast recovery (when the sender
>       >       is
>       >       >       in
>       >       >       congestion avoidance and the equation (4) is used to
>       >       halve
>       >       >       cwnd, it
>       >       >       gives a correct result).
>       >       >       More precisely, during fast recovery FlightSize is
>       >       inflated
>       >       >       when new
>       >       >       data is sent and reduced when segments are cumulatively
>       >       >       Acked.
>       >       >       What the outcome is depends on the loss pattern. In the
>       >       >       worst case,
>       >       >       FlightSize is signficantly larger than in the beginning
>       >       of
>       >       >       the fast
>       >       >       recovery when FlightSize was (correctly) used to
>       >       determine
>       >       >       the halved
>       >       >       value for cwnd and ssthresh, i.e., equation (4) may
>       >       result
>       >       >       in
>       >       >       *increasing* cwnd upon detecting a loss of a rexmitted
>       >       >       segment, instead
>       >       >       of further halving it.
>       >       >
>       >       >       A clever implementer might have no problem to have it
>       >       right
>       >       >       with some
>       >       >       thinking but I am afraid that there will be incorrect
>       >       >       implementations
>       >       >       with what is currently specified. Not all implementers
>       >       have
>       >       >       spent
>       >       >       signicicant fraction of their career in solving TCP
>       >       >       peculiarities.
>       >       >
>       >       >       > For (2), the RTO timer is still operative so
>       >       >       > the RTO recovery rules would still follow.
>       >       >
>       >       >       In short:
>       >       >       When with a non-RACK-TLP implementation timer (RTO)
>       >       >       expires: cwnd=1 MSS,
>       >       >       and slow start is entered.
>       >       >       When with a RACK_TLP implementation timer (PTO) expires,
>       >       >       normal fast recovery is entered (unless implementing
>       >       >       also PRR). So no RTO recovery as explicitly stated in
>       >       Sec.
>       >       >       7.4.1.
>       >       >
>       >       >       This means that this document explicitly modifies
>       >       standard
>       >       >       TCP congestion
>       >       >       control when there are no acks coming and the
>       >       >       retransmission timer
>       >       >       expires
>       >       >
>       >       >       from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
>       >       >              1. RTO timer expires
>       >       >              2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one
>       >       >       segment
>       >       >              3. Ack of rexmit sent in step 2 arrives
>       >       >              4. cwnd = cwnd+1 MSS; send two segments
>       >       >              ...
>       >       >
>       >       >       to:   PTO=min(2*SRTT,RTO) (PRO used for arming the
>       >       timer)
>       >       >              1. PTO times expires
>       >       >              2. (cwnd=1 MSS); (re)xmit one segment
>       >       >              3. Ack of (re)xmit sent in srep 2 arrives
>       >       >              4. cwnd = ssthresh = FlightSize/2; send N=cwnd
>       >       >       segments
>       >       >
>       >       >       For example, if FlightSize is 100 segments when timer
>       >       >       expires,
>       >       >       congestion control is the same in steps 1-3, but in step
>       >       4
>       >       >       the
>       >       >       current standard congestion control allows transmitting
>       >       2
>       >       >       segments,
>       >       >       while RACK-TLP would allow blasting 50 segments.
>       >       >
>       >       >       Question is: what is the justification to modify
>       >       standard
>       >       >       TCP
>       >       >       congestion control to use fast recovery instead of slow
>       >       >       start for a
>       >       >       case where timeout is needed to detect loss because
>       >       there
>       >       >       is no
>       >       >       feedback and ack clock is lost? The draft does not give
>       >       any
>       >       >       justification. This clearly is in conflict with items
>       >       (0)
>       >       >       and (1)
>       >       >       in BCP 133 (RFC 5033).
>       >       >
>       >       >       Furthermore, there is no implementation nor experimental
>       >       >       experience
>       >       >       evaluating this change. The implementation with
>       >       >       experimental experience
>       >       >       uses PRR (RFC 6937) which is an Experimental
>       >       specification
>       >       >       including a
>       >       >       novel "trick" that directs PRR fast recovery to
>       >       effectively
>       >       >       use slow
>       >       >       start in this case at hand.
>       >       >
>       >       >
>       >       >       > In other words, I am not seeing a case that requires
>       >       new
>       >       >       congestion
>       >       >       > control concepts except as discussed in 9.3.
>       >       >
>       >       >       See above. The change in standard congestion control for
>       >       >       (2).
>       >       >       The draft intends not to change congestion control but
>       >       >       effectively it
>       >       >       does without any operational evidence.
>       >       >
>       >       >       What's also is missing and would be very useful:
>       >       >
>       >       >       - For (1), a hint for an implementer saying that because
>       >       >       RACK-TLP is
>       >       >          able to detect a loss of a rexmit unlike any other
>       >       loss
>       >       >       detection
>       >       >          algorithm, the sender MUST react twice to congestion
>       >       >       (and cite
>       >       >          RFC 5681). And cite a document where necessary
>       >       correct
>       >       >       actions
>       >       >          are described.
>       >       >
>       >       >       - For (1), advise that an implementer needs to keep
>       >       track
>       >       >       when it
>       >       >          detects a loss of a retransmitted segment. Current
>       >       >       algorithms
>       >       >          in the draft detect a loss of retransmitted segment
>       >       >       exactly in
>       >       >          the same way as loss of any other segment. There
>       >       seems
>       >       >       to be
>       >       >          nothing to track when a retransmission of a
>       >       >       retransmitted segment
>       >       >          takes place. Therefore, the algorithms should have
>       >       >       additional
>       >       >          actions to correctly track when such a loss is
>       >       detected.
>       >       >
>       >       >       - For (1), discussion on how many times a loss of a
>       >       >       retransmission
>       >       >          of the same segment may occur and be detected. Seems
>       >       >       that it
>       >       >          may be possible to drop a rexmitted segment more than
>       >       >       once and
>       >       >          detect it also several times?  What are the
>       >       >       implications?
>       >       >
>       >       >       - If previous is possible, then the algorithm possibly
>       >       also
>       >       >          may detect a loss of a new segment that was sent
>       >       during
>       >       >       fast
>       >       >          recovery? This is also loss in two successive windows
>       >       of
>       >       >       data,
>       >       >          and cwnd MUST be lowered twice. This discussion and
>       >       >       necessary
>       >       >          actions to track it are missing, if such scenario is
>       >       >       possible.
>       >       >
>       >       >       > What am I missing?
>       >       >
>       >       >       Hope the above helps.
>       >       >
>       >       >       /Markku
>       >       >
>       >       >
>       >       > <snipping the rest>
>       >       >
>       >       >
>       >
>       >
>       >
> 
> 
>