Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to Proposed Standard

Martin Duke <martin.h.duke@gmail.com> Thu, 17 December 2020 21:59 UTC

Return-Path: <martin.h.duke@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0CAB53A00D3; Thu, 17 Dec 2020 13:59:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AxW3H9oEod61; Thu, 17 Dec 2020 13:59:10 -0800 (PST)
Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49DCA3A00D8; Thu, 17 Dec 2020 13:59:10 -0800 (PST)
Received: by mail-il1-x132.google.com with SMTP id w12so299278ilm.12; Thu, 17 Dec 2020 13:59:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xbsP3NtwW/x96BSHg3gsd2YMvcxMXbfHDbg1JnPeVAc=; b=Xlv8pJPy0w9/JfSpIPgHYwjswb7GShzKyWvDp13iwqxBR4yRow0jh5NPsSIbbwQksx +jCfRuHpVcn3EMvz1Ji8OXiAOC1wLOXBSw6IyUcin1i9o/hPleyd518uhHjPo3eu03wM 3/gogVieLQAFhRtbzaWXy2t6LX+g3bT1fKnQT9sPuPetEvg8f8HdIUW0WCCuTwskwUuq YG7i3Z36rzHtx8KL8gHgnIxa6VM7jpPopa8dLIBQQs6pKp95h9QKyRZqagZtf+IMbbzI 0zc0civQVySXWPVcczBlhceRNS2ZsqIsswWamkATWiGuGeunecdcExJMgvpHHrSLxr8c yfiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xbsP3NtwW/x96BSHg3gsd2YMvcxMXbfHDbg1JnPeVAc=; b=gBwPanms/q9QDK6BHXiT+2DogXfx8Biv/JySm0F4F2t9zy7QHUD5fuGc1efUiMNZqE ts9UksVlQaYl7PsmEZ8Tbs+EOYNArroQ0XPu8oFnGiQBqX1i6Jw3ssIabCiMDryZvtf2 Gs60L8XYbgavK2tCual+lJHK4FX7s2r9DCk6e0wvcgxb6CyFDtZARncfEgLwVLQ4aWDE eqMGVe927iWM7BmmGzOgy+P9ZjRzd95QvwBJK1UuBzLYd6N4KRZG7uhnWAFPR2NtkFVB Kyd5UK7JTFbrPwxGD9kfmVxi4Zh1UavA0E7YP3dFbRYAS0r+nbN6zpjUADfYQWKUohKw 5qkA==
X-Gm-Message-State: AOAM530mcrucEvHY0VL9Q9GJ9iz+c7RVbQYGCkVf0fEBQp1IEutn7HKz GDAavQvrBdF1WddXIzYBYap+LSucDJydGHJqeSY=
X-Google-Smtp-Source: ABdhPJyVga0vC40PWttYnlteecKyl1uvsEU8R58H19KE78sOIETkvfRcQher2NUXMRyJE/DnK6c6BSIAm2MN3BlINmw=
X-Received: by 2002:a92:d44f:: with SMTP id r15mr964609ilm.237.1608242349378; Thu, 17 Dec 2020 13:59:09 -0800 (PST)
MIME-Version: 1.0
References: <160557473030.20071.3820294165818082636@ietfa.amsl.com> <alpine.DEB.2.21.2012030145440.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com> <alpine.DEB.2.21.2012071227390.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=fNd3ToWEoCYHwgPG7QUvCXw3kV2rwH=hqmhibQmQNseg@mail.gmail.com> <alpine.DEB.2.21.2012081502530.5180@hp8x-60.cs.helsinki.fi> <CADVnQykrm1ORm7N+8L0iEyqtJ2rQ1dr1xg+EmYcWQE9nmDX_mA@mail.gmail.com> <alpine.DEB.2.21.2012141505360.5844@hp8x-60.cs.helsinki.fi> <CAM4esxT9hNqX4Zo+9tMRu9MNEfwuUwebaBFcitj1pCZx_NkqHA@mail.gmail.com> <alpine.DEB.2.21.2012160256380.5844@hp8x-60.cs.helsinki.fi> <CAM4esxRDrFZAYBS4exaQFFj6Djwe6KHrzMEtGvOhscpoxk3RQA@mail.gmail.com> <alpine.DEB.2.21.2012162339560.5844@hp8x-60.cs.helsinki.fi> <CAM4esxRQjuzo4u9oUN2CDC1vbeFxmSarjBLqpboatjWouiL37Q@mail.gmail.com> <CAM4esxQ67K9kcaWwNot2DfJpCe8ShOngXogxKU=KXZJGn+pbXg@mail.gmail.com> <alpine.DEB.2.21.2012171019160.5844@hp8x-60.cs.helsinki.fi> <CAM4esxTvTjvVk5hE0z5UnLBdKv04UC+daRBxsnnZ1qJTa=gSgw@mail.gmail.com> <CY1PR00MB0172182657354535DF24E790B6C49@CY1PR00MB0172.namprd00.prod.outlook.com> <CAK6E8=c8sjfzgfYadHsTk1LvFCJs_EcMjR-kpcj+krkaytEE8g@mail.gmail.com>
In-Reply-To: <CAK6E8=c8sjfzgfYadHsTk1LvFCJs_EcMjR-kpcj+krkaytEE8g@mail.gmail.com>
From: Martin Duke <martin.h.duke@gmail.com>
Date: Thu, 17 Dec 2020 13:58:59 -0800
Message-ID: <CAM4esxRNe8RvzxH2ssywYF5=tvKJmtmVEKedZf8cQA7KaC6=CQ@mail.gmail.com>
To: Yuchung Cheng <ycheng@google.com>
Cc: Praveen Balasubramanian <pravb@microsoft.com>, "kojo@cs.helsinki.fi" <kojo@cs.helsinki.fi>, "tcpm@ietf.org" <tcpm@ietf.org>, "draft-ietf-tcpm-rack@ietf.org" <draft-ietf-tcpm-rack@ietf.org>, "tuexen@fh-muenster.de" <tuexen@fh-muenster.de>, "draft-ietf-tcpm-rack.all@ietf.org" <draft-ietf-tcpm-rack.all@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>, "tcpm-chairs@ietf.org" <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000f291f305b6b019e8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/bx-vQZcJ-fgVAnmN44Q89Y2KWpE>
Subject: Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to Proposed Standard
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 21:59:15 -0000

This is good, thanks.

I might also add the sentence "In the absence of PRR, when TCP RACK detects
a lost retransmission it MUST trigger a second congestion response" or
something to that effect. What do you think?

On Thu, Dec 17, 2020 at 11:36 AM Yuchung Cheng <ycheng@google.com> wrote:

> How about
>
> "9.3.  Interaction with congestion control
>
> RACK-TLP intentionally decouples loss detection ...
> As mentioned in Figure 1 caption, RFC5681 mandates a principle that
> Loss in two successive windows of data, or the loss of a
> retransmission, should be taken as two indications of congestion, and
> therefore reacted separately. However implementation of RFC6675 pipe
> algorithm may not directly account for this newly detected congestion
> events properly. PRR [RFCxxxx] is RECOMMENDED for the specific
> congestion control actions taken upon the losses detected by RACK-TLP"
>
>
> To Makku's request for "what's the justification to enter fast recovery".
> Consider this example w/o RACK-TLP
>
> T0: Send 100 segments but application-limited. All are lost.
> T-2RTT: app writes so another 3 segments are sent. Made to the destination
> and triggered 3 DUPACKs
> T-3RTT: 3 DUPACK arrives. start fast recovery and subsequent cc reactions
> to burst ~50 packets with Reno
>
> In this case any ACK occured before RTO is (generally) considered
> clock-acked, and how I understand Van's initial design.  This behavior
> existed decades before RACK-TLP. RACK-TLP essentially changes the
> "3-segments by app" to "1-segment by tcp".
>
> On Thu, Dec 17, 2020 at 10:52 AM Praveen Balasubramanian <
> pravb@microsoft.com> wrote:
>
>> I agree that we should have a note in this RFC about congestion control
>> action upon detecting lost retransmission(s).
>>
>>
>>
>> *From:* tcpm <tcpm-bounces@ietf.org> *On Behalf Of * Martin Duke
>> *Sent:* Thursday, December 17, 2020 7:30 AM
>> *To:* Markku Kojo <kojo@cs.helsinki.fi>
>> *Cc:* tcpm@ietf.org Extensions <tcpm@ietf.org>;
>> draft-ietf-tcpm-rack@ietf.org; Michael Tuexen <tuexen@fh-muenster.de>;
>> draft-ietf-tcpm-rack.all@ietf.org; Last Call <last-call@ietf.org>;
>> tcpm-chairs <tcpm-chairs@ietf.org>
>> *Subject:* [EXTERNAL] Re: [tcpm] Last Call:
>> <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to
>> Proposed Standard
>>
>>
>>
>> Hi Markku,
>>
>>
>>
>> Thanks, now I understand your objections.
>>
>>
>>
>> Martin
>>
>>
>>
>> On Thu, Dec 17, 2020 at 12:46 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>>
>> Hi,
>>
>> On Wed, 16 Dec 2020, Martin Duke wrote:
>>
>> > I spent a little longer looking at the specs more carefully, and I
>> explained (1)
>> > incorrectly in my last two messages. P21..29 are not Limited Transmit
>> packets.
>>
>> Correct. Just normal the rule that allows sending new data during fast
>> recovery.
>>
>> > However, unless I'm missing something else, 6675 is clear that the
>> recovery period
>> > does not end until the cumulative ack advances, meaning that detecting
>> the lost
>> > retransmission of P1 does not trigger another MD directly.
>>
>> As I have said earlier, RFC 6675 does not repeat all congestion control
>> principles from RFC 5681. It definitely honors the CC principle that
>> requires to treat a loss of a retransmission as a new congestion
>> indication and another MD. I believe I am obligated to know this as a
>> co-author of RFC 6675. ;)
>>
>> RFC 6675 explicitly indicates that it follows RFC 5681 by stating in the
>> abstract:
>>
>> " ... conforms to the spirit of the current congestion control
>>   specification (RFC 5681 ..."
>>
>> And in the intro:
>>
>>    "The algorithm specified in this document is a straightforward
>>     SACK-based loss recovery strategy that follows the  guidelines
>>     set in [RFC5681] ..."
>>
>> I don't think there is anything unclear in this.
>>
>> RFC 6675 and all other standard congestion controls (RFC 5581 and RFC
>> 6582) handle a loss of a retransmission by "enforcing" RTO to detect it.
>> And RTO guarantees MD. RACK-TLP changes the loss detection in this case
>> and therefore the standard congestion control algorithms do not have
>> actions to handle it corrrectly. That is the point.
>>
>> BR,
>>
>> /Markku
>>
>> > Thanks for this exercise! It's refreshed my memory of these details
>> after working
>> > on slightly different QUIC algorithms a long time.
>> >
>> > On Wed, Dec 16, 2020, 18:55 Martin Duke <martin.h.duke@gmail.com>
>> wrote:
>> > (1) Flightsize: in RFC 6675. Section 5, Step 4.2:
>> >
>> >        (4.2) ssthresh = cwnd = (FlightSize / 2)
>> >
>> >              The congestion window (cwnd) and slow start threshold
>> >              (ssthresh) are reduced to half of FlightSize per [RFC5681].
>> >              Additionally, note that [RFC5681] requires that any
>> >              segments sent as part of the Limited Transmit mechanism not
>> >              be counted in FlightSize for the purpose of the above
>> >              equation.
>> >
>> > IIUC the segments P21..P29 in your example were sent because of Limited
>> > Transmit, and so don't count. The flightsize for the purposes of (4.2)
>> is
>> > therefore 20 after both losses, and the cwnd does not go up on the
>> second
>> > loss.
>> >
>> > (2)
>> > " Even a single shot burst every time there is significant loss
>> > event is not acceptable, not to mention continuous aggressiveness, and
>> > this is exactly what RFC 2914 and RFC 5033 explicitly address and warn
>> > about."
>> >
>> > "Significant loss event" is the key phrase here. The intent of TLP/PTO
>> is to
>> > equalize the treatment of a small packet loss whether it happened in the
>> > middle of a burst or the end. Why should an isolated loss be treated
>> > differently based on its position in the burst? This is just a logical
>> > extension of fast retransmit, which also modified the RTO paradigm. The
>> > working group consensus is that this is a feature, not a bug; you're
>> welcome
>> > to feel otherwise but I suspect you're in the rough here.
>> >
>> > Regards
>> > Martin
>> >
>> >
>> > On Wed, Dec 16, 2020 at 4:11 PM Markku Kojo <kojo@cs.helsinki.fi>
>> wrote:
>> >       Hi Martin,
>> >
>> >       See inline.
>> >
>> >       On Wed, 16 Dec 2020, Martin Duke wrote:
>> >
>> >       > Hi Markku,
>> >       >
>> >       > There is a ton here, but I'll try to address the top points.
>> >       Hopefully
>> >       > they obviate the rest.
>> >
>> >       Sorry for being verbose. I tried to be clear but you actually
>> >       removed my
>> >       key issues/questions ;)
>> >
>> >       > 1.
>> >       > [Markku]
>> >       > "Hmm, not sure what you mean by "this is a new loss detection
>> >       after
>> >       > acknowledgment of new data"?
>> >       > But anyway, RFC 5681 gives the general principle to reduce
>> >       cwnd and
>> >       > ssthresh twice if a retransmission is lost but IMHO (and I
>> >       believe many
>> >       > who have designed new loss recovery and CC algorithms or
>> >       implemented
>> >       > them
>> >       > agree) that it is hard to get things right if only congestion
>> >       control
>> >       > principles are available and no algorithm."
>> >       >
>> >       > [Martin]
>> >       > So 6675 Sec 5 is quite explicit that there is only one cwnd
>> >       reduction
>> >       > per fast recovery episode, which ends once new data has been
>> >       > acknowledged.
>> >
>> >       To be more precise: fast recovery ends when the current window
>> >       becomes
>> >       cumulatively acknowledged, that is,
>> >
>> >       (4.1) RecoveryPoint (= HighData at the beginning) becomes
>> >       acknowledged
>> >
>> >       I believe we agree and you meant this although new data below
>> >       RecoveryPoint may become cumulatively acknowledged already
>> >       earlier
>> >       during the fast recovery. Reno loss recovery in RFC 5681 ends,
>> >       when
>> >       (any) new data has been acknowledged.
>> >
>> >       > By definition, if a retransmission is lost it is because
>> >       > newer data has been acknowledged, so it's a new recovery
>> >       episode.
>> >
>> >       Not sure where you have this definition? Newer than what are you
>> >       referring to?
>> >
>> >       But, yes, if a retransmission is lost with RFC 6675 algorithm,
>> >       it requires RTO to be detected and definitely starts a new
>> >       recovery
>> >       episode. That is, a new recovery episode is enforced by step
>> >       (1.a) of
>> >       NextSeg () which prevents retransmission if a segment that has
>> >       already
>> >       been retransmitted. If RACK-TLP is used for detecting loss with
>> >       RFC 6675
>> >       things get different in many ways, because it may detect loss of
>> >       a
>> >       retransmission. It would pretty much require an entire redesign
>> >       of the algorith. For example, calculation of pipe does not
>> >       consider
>> >       segments that have been retransmitted more than once.
>> >
>> >       > Meanwhile, during the Fast Recovery period the incoming acks
>> >       implicitly
>> >       > remove data from the network and therefore keep flightsize
>> >       low.
>> >
>> >       Incorrect. FlightSize != pipe. Only cumulative acks remove data
>> >       from
>> >       FlightSize and new data transmitted during fast recovery inflate
>> >       FlightSize. How FlightSize evolves depends on loss pattern as I
>> >       said.
>> >       It is also possible that FlightSize is low, it may err in both
>> >       directions. A simple example can be used as a proof for the case
>> >       where
>> >       cwnd increases if a loss of retransmission is detected and
>> >       repaired:
>> >
>> >       RFC 6675 recovery with RACK-TLP loss detection:
>> >       (contains some inaccuracies because it has not been defined how
>> >       lost rexmits are calculated into pipe)
>> >
>> >       cwnd=20; packets P1,...,P20 in flight = current window of data
>> >       [P1 dropped and rexmit of P1 will also be dropped]
>> >
>> >       DupAck w/SACK for P2 arrives
>> >       [loss of P1 detected after one RTT from original xmit of P1]
>> >       [cwnd=ssthresh=10]
>> >       P1 is rexmitted (and it logically starts next window of data)
>> >
>> >       DupAcks w/ SACK for original P3..11 arrive
>> >       DupAck w/ SACK for original P12 arrives
>> >       [cwnd-pipe = 10-9 >=1]
>> >       send P21
>> >       DupAck w/SACK for P13 arrives
>> >       send P22
>> >       ...
>> >       DupAck w/SACK for P20 arrives
>> >       send P29
>> >       [FlightSize=29]
>> >
>> >       (Ack for rexmit of P1 would arrive here unless it got dropped)
>> >
>> >       DupAck w/SACK for P21 arrives
>> >       [loss of rexmit P1 detected after one RTT from rexmit of P1]
>> >
>> >       SET cwnd = ssthresh = FlightSize/2= 29/2 = 14,5
>> >
>> >       CWND INCREASES when it should be at most 5 after halving it
>> >       twice!!!
>> >
>> >       > We can continue to go around on our interpretation of these
>> >       documents,
>> >       > but fundamentally if there is ambiguity in 5681/6675 we should
>> >       bis
>> >       > those RFCs rather than expand the scope of RACK.
>> >
>> >       As I said earlier, I am not opposing bis, though 5681bis wuold
>> >       not
>> >       be needed, I think.
>> >
>> >       But let me repeat: if we publish RACK-TLP now without necessary
>> >       warnings
>> >       or with a correct congesion control algorithm someone will try
>> >       to
>> >       implement RACK-TLP with RFC 6675 and it will be a total mesh.
>> >       The
>> >       behavior will be unpredictable and quite likely unsafe
>> >       congestion
>> >       control behavior.
>> >
>> >       > 2.
>> >       > [Markku]
>> >       > " In short:
>> >       > When with a non-RACK-TLP implementation timer (RTO) expires:
>> >       cwnd=1
>> >       > MSS,
>> >       > and slow start is entered.
>> >       > When with a RACK_TLP implementation timer (PTO) expires,
>> >       > normal fast recovery is entered (unless implementing
>> >       > also PRR). So no RTO recovery as explicitly stated in Sec.
>> >       7.4.1."
>> >       >
>> >       > [Martin]
>> >       > There may be a misunderstanding here. PTO is not the same as
>> >       RTO, and
>> >       > both mechanisms exist! The loss response to a PTO is to send a
>> >       probe;
>> >       > the RTO response is as with conventional TCP. In Section 7.3:
>> >
>> >       No, I don't think I misunderstood. If you call timeout with
>> >       another name, it is still timeout. And congestion control does
>> >       not
>> >       consider which segments to send (SND.UNA vs. probe w/ higher
>> >       sequence
>> >       number), only how much is sent.
>> >
>> >       You ignored my major point where I decoupled congestion control
>> >       from loss
>> >       detection and loss recovery and compared RFC 5681 behavior to
>> >       RACK-TLP
>> >       behavior in exactly the same scenario where an entire flight is
>> >       lost and
>> >       timer expires.
>> >
>> >       Please comment why congestion control behavior is allowed to be
>> >       radically
>> >       different in these two implementations?
>> >
>> >       RFC 5681 & RFC 6298 timeout:
>> >
>> >               RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
>> >              1. RTO timer expires
>> >              2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
>> >              3. Ack of rexmit sent in step 2 arrives
>> >              4. cwnd = cwnd+1 MSS; send two segments
>> >              ...
>> >
>> >       RACK-TLP timeout:
>> >
>> >               PTO=min(2*SRTT,RTO) (PTO used for arming the timer)
>> >              1. PTO times expires
>> >              2. (cwnd=1 MSS); (re)xmit one segment
>> >              3. Ack of (re)xmit sent in srep 2 arrives
>> >              4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments
>> >
>> >       If FlightSize is 100 segments when timer expires, congestion
>> >       control is
>> >       the same in steps 1-3, but in step 4 the standard congestion
>> >       control
>> >       allows transmitting 2 segments, while RACK-TLP would allow
>> >       blasting 50 segments.
>> >
>> >       > After attempting to send a loss probe, regardless of whether a
>> >       loss
>> >       >    probe was sent, the sender MUST re-arm the RTO timer, not
>> >       the PTO
>> >       >    timer, if FlightSize is not zero.  This ensures RTO
>> >       recovery remains
>> >       >    the last resort if TLP fails.
>> >       > "
>> >
>> >       This does not prevent the above RACK-TLP behavior from getting
>> >       realized.
>> >
>> >       > So a pure RTO response exists in the case of persistent
>> >       congestion that
>> >       > causes losses of probes or their ACKs.
>> >
>> >       Yes, RTO response exists BUT only after RACK-TLP at least once
>> >       blasts the
>> >       network. It may well be that with smaller windows RACK-TLP is
>> >       successful
>> >       during its TLP initiated overly aggressive "fast recovery" and
>> >       never
>> >       enters RTO recovery because it may detect and repair also loss
>> >       of
>> >       rexmits. That is, it continues at too high rate even if lost
>> >       rexmits
>> >       indicate that congestion persists in successive windows of data.
>> >       And
>> >       worse, it is successful because it pushes away other compatible
>> >       TCP
>> >       flows by being too aggressive and unfair.
>> >
>> >       Even a single shot burst every time there is significant loss
>> >       event is not acceptable, not to mention continuous
>> >       aggressiveness, and
>> >       this is exactly what RFC 2914 and RFC 5033 explicitly address
>> >       and warn
>> >       about.
>> >
>> >       Are we ignoring these BCPs that have IETF consensus?
>> >
>> >       And the other important question I'd like to have an answer:
>> >
>> >       What is the justification to modify standard TCP congestion
>> >       control to
>> >       use fast recovery instead of slow start for a case where timeout
>> >       is
>> >       needed to detect the packet losses because there is no feedback
>> >       and ack
>> >       clock is lost? RACK-TLP explicitly instructs to do so in Sec.
>> >       7.4.1.
>> >
>> >       As I noted: based on what is written in the draft it does not
>> >       intend to
>> >       change congestion control but effectively it does.
>> >
>> >       /Markku
>> >
>> >       > Martin
>> >       >
>> >       >
>> >       > On Wed, Dec 16, 2020 at 11:39 AM Markku Kojo
>> >       <kojo@cs.helsinki.fi>
>> >       > wrote:
>> >       >       Hi Martin,
>> >       >
>> >       >       On Tue, 15 Dec 2020, Martin Duke wrote:
>> >       >
>> >       >       > Hi Markku,
>> >       >       >
>> >       >       > Thanks for the comments. The authors will incorporate
>> >       >       many of your
>> >       >       > suggestions after the IESG review.
>> >       >       >
>> >       >       > There's one thing I don't understand in your comments:
>> >       >       >
>> >       >       > " That is,
>> >       >       > where can an implementer find advice for correct
>> >       >       congestion control
>> >       >       > actions with RACK-TLP, when:
>> >       >       >
>> >       >       > (1) a loss of rexmitted segment is detected
>> >       >       > (2) an entire flight of data gets dropped (and
>> >       detected),
>> >       >       >      that is, when there is no feedback available and
>> >       a
>> >       >       timeout
>> >       >       >      is needed to detect the loss "
>> >       >       >
>> >       >       > Section 9.3 is the discussion about CC, and is clear
>> >       that
>> >       >       the
>> >       >       > implementer should use either 5681 or 6937.
>> >       >
>> >       >       Just a cite nit: RFC 5681 provides basic CC concepts and
>> >       >       some useful CC
>> >       >       guidelines but given that RACK-TLP MUST implement SACK
>> >       the
>> >       >       algorithm in
>> >       >       RFC 5681 is not that useful and an implementer quite
>> >       likely
>> >       >       follows
>> >       >       mainly the algorithm in RFC 6675 (and not RFC 6937 at
>> >       all
>> >       >       if not
>> >       >       implementing PRR).
>> >       >       And RFC 6675 is not mentioned in Sec 9.3, though it is
>> >       >       listed in the
>> >       >       Sec. 4 (Requirements).
>> >       >
>> >       >       > You went through the 6937 case in detail.
>> >       >
>> >       >       Yes, but without correct CC actions.
>> >       >
>> >       >       > If 5681, it's pretty clear to me that in (1) this is a
>> >       >       new loss
>> >       >       > detection after acknowledgment of new data, and
>> >       therefore
>> >       >       requires a
>> >       >       > second halving of cwnd.
>> >       >
>> >       >       Hmm, not sure what you mean by "this is a new loss
>> >       >       detection after
>> >       >       acknowledgment of new data"?
>> >       >       But anyway, RFC 5681 gives the general principle to
>> >       reduce
>> >       >       cwnd and
>> >       >       ssthresh twice if a retransmission is lost but IMHO (and
>> >       I
>> >       >       believe many
>> >       >       who have designed new loss recovery and CC algorithms or
>> >       >       implemented them
>> >       >       agree) that it is hard to get things right if only
>> >       >       congestion control
>> >       >       principles are available and no algorithm.
>> >       >       That's why ALL mechanisms that we have include a quite
>> >       >       detailed algorithm
>> >       >       with all necessary variables and actions for loss
>> >       recovery
>> >       >       and/or CC
>> >       >       purposes (and often also pseudocode). Like this document
>> >       >       does for loss
>> >       >       detection.
>> >       >
>> >       >       So the problem is that we do not have a detailed enough
>> >       >       algorithm or
>> >       >       rule that tells exactly what to do when a loss of rexmit
>> >       is
>> >       >       detected.
>> >       >       Even worse, the algorithms in RFC 5681 and RFC 6675
>> >       refer
>> >       >       to
>> >       >       equation (4) of RFC 5681 to reduce ssthresh and cwnd
>> >       when a
>> >       >       loss
>> >       >       requiring a congestion control action is detected:
>> >       >
>> >       >         (cwnd =) ssthresh = FlightSize / 2)
>> >       >
>> >       >       And RFC 5681 gives a warning not to halve cwnd in the
>> >       >       equation but
>> >       >       FlightSize.
>> >       >
>> >       >       That is, this equation is what an implementer
>> >       intuitively
>> >       >       would use
>> >       >       when reading the relevant RFCs but it gives a wrong
>> >       result
>> >       >       for
>> >       >       outstanding data when in fast recovery (when the sender
>> >       is
>> >       >       in
>> >       >       congestion avoidance and the equation (4) is used to
>> >       halve
>> >       >       cwnd, it
>> >       >       gives a correct result).
>> >       >       More precisely, during fast recovery FlightSize is
>> >       inflated
>> >       >       when new
>> >       >       data is sent and reduced when segments are cumulatively
>> >       >       Acked.
>> >       >       What the outcome is depends on the loss pattern. In the
>> >       >       worst case,
>> >       >       FlightSize is signficantly larger than in the beginning
>> >       of
>> >       >       the fast
>> >       >       recovery when FlightSize was (correctly) used to
>> >       determine
>> >       >       the halved
>> >       >       value for cwnd and ssthresh, i.e., equation (4) may
>> >       result
>> >       >       in
>> >       >       *increasing* cwnd upon detecting a loss of a rexmitted
>> >       >       segment, instead
>> >       >       of further halving it.
>> >       >
>> >       >       A clever implementer might have no problem to have it
>> >       right
>> >       >       with some
>> >       >       thinking but I am afraid that there will be incorrect
>> >       >       implementations
>> >       >       with what is currently specified. Not all implementers
>> >       have
>> >       >       spent
>> >       >       signicicant fraction of their career in solving TCP
>> >       >       peculiarities.
>> >       >
>> >       >       > For (2), the RTO timer is still operative so
>> >       >       > the RTO recovery rules would still follow.
>> >       >
>> >       >       In short:
>> >       >       When with a non-RACK-TLP implementation timer (RTO)
>> >       >       expires: cwnd=1 MSS,
>> >       >       and slow start is entered.
>> >       >       When with a RACK_TLP implementation timer (PTO) expires,
>> >       >       normal fast recovery is entered (unless implementing
>> >       >       also PRR). So no RTO recovery as explicitly stated in
>> >       Sec.
>> >       >       7.4.1.
>> >       >
>> >       >       This means that this document explicitly modifies
>> >       standard
>> >       >       TCP congestion
>> >       >       control when there are no acks coming and the
>> >       >       retransmission timer
>> >       >       expires
>> >       >
>> >       >       from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
>> >       >              1. RTO timer expires
>> >       >              2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one
>> >       >       segment
>> >       >              3. Ack of rexmit sent in step 2 arrives
>> >       >              4. cwnd = cwnd+1 MSS; send two segments
>> >       >              ...
>> >       >
>> >       >       to:   PTO=min(2*SRTT,RTO) (PRO used for arming the
>> >       timer)
>> >       >              1. PTO times expires
>> >       >              2. (cwnd=1 MSS); (re)xmit one segment
>> >       >              3. Ack of (re)xmit sent in srep 2 arrives
>> >       >              4. cwnd = ssthresh = FlightSize/2; send N=cwnd
>> >       >       segments
>> >       >
>> >       >       For example, if FlightSize is 100 segments when timer
>> >       >       expires,
>> >       >       congestion control is the same in steps 1-3, but in step
>> >       4
>> >       >       the
>> >       >       current standard congestion control allows transmitting
>> >       2
>> >       >       segments,
>> >       >       while RACK-TLP would allow blasting 50 segments.
>> >       >
>> >       >       Question is: what is the justification to modify
>> >       standard
>> >       >       TCP
>> >       >       congestion control to use fast recovery instead of slow
>> >       >       start for a
>> >       >       case where timeout is needed to detect loss because
>> >       there
>> >       >       is no
>> >       >       feedback and ack clock is lost? The draft does not give
>> >       any
>> >       >       justification. This clearly is in conflict with items
>> >       (0)
>> >       >       and (1)
>> >       >       in BCP 133 (RFC 5033).
>> >       >
>> >       >       Furthermore, there is no implementation nor experimental
>> >       >       experience
>> >       >       evaluating this change. The implementation with
>> >       >       experimental experience
>> >       >       uses PRR (RFC 6937) which is an Experimental
>> >       specification
>> >       >       including a
>> >       >       novel "trick" that directs PRR fast recovery to
>> >       effectively
>> >       >       use slow
>> >       >       start in this case at hand.
>> >       >
>> >       >
>> >       >       > In other words, I am not seeing a case that requires
>> >       new
>> >       >       congestion
>> >       >       > control concepts except as discussed in 9.3.
>> >       >
>> >       >       See above. The change in standard congestion control for
>> >       >       (2).
>> >       >       The draft intends not to change congestion control but
>> >       >       effectively it
>> >       >       does without any operational evidence.
>> >       >
>> >       >       What's also is missing and would be very useful:
>> >       >
>> >       >       - For (1), a hint for an implementer saying that because
>> >       >       RACK-TLP is
>> >       >          able to detect a loss of a rexmit unlike any other
>> >       loss
>> >       >       detection
>> >       >          algorithm, the sender MUST react twice to congestion
>> >       >       (and cite
>> >       >          RFC 5681). And cite a document where necessary
>> >       correct
>> >       >       actions
>> >       >          are described.
>> >       >
>> >       >       - For (1), advise that an implementer needs to keep
>> >       track
>> >       >       when it
>> >       >          detects a loss of a retransmitted segment. Current
>> >       >       algorithms
>> >       >          in the draft detect a loss of retransmitted segment
>> >       >       exactly in
>> >       >          the same way as loss of any other segment. There
>> >       seems
>> >       >       to be
>> >       >          nothing to track when a retransmission of a
>> >       >       retransmitted segment
>> >       >          takes place. Therefore, the algorithms should have
>> >       >       additional
>> >       >          actions to correctly track when such a loss is
>> >       detected.
>> >       >
>> >       >       - For (1), discussion on how many times a loss of a
>> >       >       retransmission
>> >       >          of the same segment may occur and be detected. Seems
>> >       >       that it
>> >       >          may be possible to drop a rexmitted segment more than
>> >       >       once and
>> >       >          detect it also several times?  What are the
>> >       >       implications?
>> >       >
>> >       >       - If previous is possible, then the algorithm possibly
>> >       also
>> >       >          may detect a loss of a new segment that was sent
>> >       during
>> >       >       fast
>> >       >          recovery? This is also loss in two successive windows
>> >       of
>> >       >       data,
>> >       >          and cwnd MUST be lowered twice. This discussion and
>> >       >       necessary
>> >       >          actions to track it are missing, if such scenario is
>> >       >       possible.
>> >       >
>> >       >       > What am I missing?
>> >       >
>> >       >       Hope the above helps.
>> >       >
>> >       >       /Markku
>> >       >
>> >       >
>> >       > <snipping the rest>
>> >       >
>> >       >
>> >
>> >
>> >
>>
>>