Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to Proposed Standard
Martin Duke <martin.h.duke@gmail.com> Thu, 17 December 2020 21:59 UTC
Return-Path: <martin.h.duke@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0CAB53A00D3; Thu, 17 Dec 2020 13:59:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AxW3H9oEod61; Thu, 17 Dec 2020 13:59:10 -0800 (PST)
Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49DCA3A00D8; Thu, 17 Dec 2020 13:59:10 -0800 (PST)
Received: by mail-il1-x132.google.com with SMTP id w12so299278ilm.12; Thu, 17 Dec 2020 13:59:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xbsP3NtwW/x96BSHg3gsd2YMvcxMXbfHDbg1JnPeVAc=; b=Xlv8pJPy0w9/JfSpIPgHYwjswb7GShzKyWvDp13iwqxBR4yRow0jh5NPsSIbbwQksx +jCfRuHpVcn3EMvz1Ji8OXiAOC1wLOXBSw6IyUcin1i9o/hPleyd518uhHjPo3eu03wM 3/gogVieLQAFhRtbzaWXy2t6LX+g3bT1fKnQT9sPuPetEvg8f8HdIUW0WCCuTwskwUuq YG7i3Z36rzHtx8KL8gHgnIxa6VM7jpPopa8dLIBQQs6pKp95h9QKyRZqagZtf+IMbbzI 0zc0civQVySXWPVcczBlhceRNS2ZsqIsswWamkATWiGuGeunecdcExJMgvpHHrSLxr8c yfiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xbsP3NtwW/x96BSHg3gsd2YMvcxMXbfHDbg1JnPeVAc=; b=gBwPanms/q9QDK6BHXiT+2DogXfx8Biv/JySm0F4F2t9zy7QHUD5fuGc1efUiMNZqE ts9UksVlQaYl7PsmEZ8Tbs+EOYNArroQ0XPu8oFnGiQBqX1i6Jw3ssIabCiMDryZvtf2 Gs60L8XYbgavK2tCual+lJHK4FX7s2r9DCk6e0wvcgxb6CyFDtZARncfEgLwVLQ4aWDE eqMGVe927iWM7BmmGzOgy+P9ZjRzd95QvwBJK1UuBzLYd6N4KRZG7uhnWAFPR2NtkFVB Kyd5UK7JTFbrPwxGD9kfmVxi4Zh1UavA0E7YP3dFbRYAS0r+nbN6zpjUADfYQWKUohKw 5qkA==
X-Gm-Message-State: AOAM530mcrucEvHY0VL9Q9GJ9iz+c7RVbQYGCkVf0fEBQp1IEutn7HKz GDAavQvrBdF1WddXIzYBYap+LSucDJydGHJqeSY=
X-Google-Smtp-Source: ABdhPJyVga0vC40PWttYnlteecKyl1uvsEU8R58H19KE78sOIETkvfRcQher2NUXMRyJE/DnK6c6BSIAm2MN3BlINmw=
X-Received: by 2002:a92:d44f:: with SMTP id r15mr964609ilm.237.1608242349378; Thu, 17 Dec 2020 13:59:09 -0800 (PST)
MIME-Version: 1.0
References: <160557473030.20071.3820294165818082636@ietfa.amsl.com> <alpine.DEB.2.21.2012030145440.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com> <alpine.DEB.2.21.2012071227390.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=fNd3ToWEoCYHwgPG7QUvCXw3kV2rwH=hqmhibQmQNseg@mail.gmail.com> <alpine.DEB.2.21.2012081502530.5180@hp8x-60.cs.helsinki.fi> <CADVnQykrm1ORm7N+8L0iEyqtJ2rQ1dr1xg+EmYcWQE9nmDX_mA@mail.gmail.com> <alpine.DEB.2.21.2012141505360.5844@hp8x-60.cs.helsinki.fi> <CAM4esxT9hNqX4Zo+9tMRu9MNEfwuUwebaBFcitj1pCZx_NkqHA@mail.gmail.com> <alpine.DEB.2.21.2012160256380.5844@hp8x-60.cs.helsinki.fi> <CAM4esxRDrFZAYBS4exaQFFj6Djwe6KHrzMEtGvOhscpoxk3RQA@mail.gmail.com> <alpine.DEB.2.21.2012162339560.5844@hp8x-60.cs.helsinki.fi> <CAM4esxRQjuzo4u9oUN2CDC1vbeFxmSarjBLqpboatjWouiL37Q@mail.gmail.com> <CAM4esxQ67K9kcaWwNot2DfJpCe8ShOngXogxKU=KXZJGn+pbXg@mail.gmail.com> <alpine.DEB.2.21.2012171019160.5844@hp8x-60.cs.helsinki.fi> <CAM4esxTvTjvVk5hE0z5UnLBdKv04UC+daRBxsnnZ1qJTa=gSgw@mail.gmail.com> <CY1PR00MB0172182657354535DF24E790B6C49@CY1PR00MB0172.namprd00.prod.outlook.com> <CAK6E8=c8sjfzgfYadHsTk1LvFCJs_EcMjR-kpcj+krkaytEE8g@mail.gmail.com>
In-Reply-To: <CAK6E8=c8sjfzgfYadHsTk1LvFCJs_EcMjR-kpcj+krkaytEE8g@mail.gmail.com>
From: Martin Duke <martin.h.duke@gmail.com>
Date: Thu, 17 Dec 2020 13:58:59 -0800
Message-ID: <CAM4esxRNe8RvzxH2ssywYF5=tvKJmtmVEKedZf8cQA7KaC6=CQ@mail.gmail.com>
To: Yuchung Cheng <ycheng@google.com>
Cc: Praveen Balasubramanian <pravb@microsoft.com>, "kojo@cs.helsinki.fi" <kojo@cs.helsinki.fi>, "tcpm@ietf.org" <tcpm@ietf.org>, "draft-ietf-tcpm-rack@ietf.org" <draft-ietf-tcpm-rack@ietf.org>, "tuexen@fh-muenster.de" <tuexen@fh-muenster.de>, "draft-ietf-tcpm-rack.all@ietf.org" <draft-ietf-tcpm-rack.all@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>, "tcpm-chairs@ietf.org" <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000f291f305b6b019e8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/bx-vQZcJ-fgVAnmN44Q89Y2KWpE>
Subject: Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to Proposed Standard
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 21:59:15 -0000
This is good, thanks. I might also add the sentence "In the absence of PRR, when TCP RACK detects a lost retransmission it MUST trigger a second congestion response" or something to that effect. What do you think? On Thu, Dec 17, 2020 at 11:36 AM Yuchung Cheng <ycheng@google.com> wrote: > How about > > "9.3. Interaction with congestion control > > RACK-TLP intentionally decouples loss detection ... > As mentioned in Figure 1 caption, RFC5681 mandates a principle that > Loss in two successive windows of data, or the loss of a > retransmission, should be taken as two indications of congestion, and > therefore reacted separately. However implementation of RFC6675 pipe > algorithm may not directly account for this newly detected congestion > events properly. PRR [RFCxxxx] is RECOMMENDED for the specific > congestion control actions taken upon the losses detected by RACK-TLP" > > > To Makku's request for "what's the justification to enter fast recovery". > Consider this example w/o RACK-TLP > > T0: Send 100 segments but application-limited. All are lost. > T-2RTT: app writes so another 3 segments are sent. Made to the destination > and triggered 3 DUPACKs > T-3RTT: 3 DUPACK arrives. start fast recovery and subsequent cc reactions > to burst ~50 packets with Reno > > In this case any ACK occured before RTO is (generally) considered > clock-acked, and how I understand Van's initial design. This behavior > existed decades before RACK-TLP. RACK-TLP essentially changes the > "3-segments by app" to "1-segment by tcp". > > On Thu, Dec 17, 2020 at 10:52 AM Praveen Balasubramanian < > pravb@microsoft.com> wrote: > >> I agree that we should have a note in this RFC about congestion control >> action upon detecting lost retransmission(s). >> >> >> >> *From:* tcpm <tcpm-bounces@ietf.org> *On Behalf Of * Martin Duke >> *Sent:* Thursday, December 17, 2020 7:30 AM >> *To:* Markku Kojo <kojo@cs.helsinki.fi> >> *Cc:* tcpm@ietf.org Extensions <tcpm@ietf.org>; >> draft-ietf-tcpm-rack@ietf.org; Michael Tuexen <tuexen@fh-muenster.de>; >> draft-ietf-tcpm-rack.all@ietf.org; Last Call <last-call@ietf.org>; >> tcpm-chairs <tcpm-chairs@ietf.org> >> *Subject:* [EXTERNAL] Re: [tcpm] Last Call: >> <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to >> Proposed Standard >> >> >> >> Hi Markku, >> >> >> >> Thanks, now I understand your objections. >> >> >> >> Martin >> >> >> >> On Thu, Dec 17, 2020 at 12:46 AM Markku Kojo <kojo@cs.helsinki.fi> wrote: >> >> Hi, >> >> On Wed, 16 Dec 2020, Martin Duke wrote: >> >> > I spent a little longer looking at the specs more carefully, and I >> explained (1) >> > incorrectly in my last two messages. P21..29 are not Limited Transmit >> packets. >> >> Correct. Just normal the rule that allows sending new data during fast >> recovery. >> >> > However, unless I'm missing something else, 6675 is clear that the >> recovery period >> > does not end until the cumulative ack advances, meaning that detecting >> the lost >> > retransmission of P1 does not trigger another MD directly. >> >> As I have said earlier, RFC 6675 does not repeat all congestion control >> principles from RFC 5681. It definitely honors the CC principle that >> requires to treat a loss of a retransmission as a new congestion >> indication and another MD. I believe I am obligated to know this as a >> co-author of RFC 6675. ;) >> >> RFC 6675 explicitly indicates that it follows RFC 5681 by stating in the >> abstract: >> >> " ... conforms to the spirit of the current congestion control >> specification (RFC 5681 ..." >> >> And in the intro: >> >> "The algorithm specified in this document is a straightforward >> SACK-based loss recovery strategy that follows the guidelines >> set in [RFC5681] ..." >> >> I don't think there is anything unclear in this. >> >> RFC 6675 and all other standard congestion controls (RFC 5581 and RFC >> 6582) handle a loss of a retransmission by "enforcing" RTO to detect it. >> And RTO guarantees MD. RACK-TLP changes the loss detection in this case >> and therefore the standard congestion control algorithms do not have >> actions to handle it corrrectly. That is the point. >> >> BR, >> >> /Markku >> >> > Thanks for this exercise! It's refreshed my memory of these details >> after working >> > on slightly different QUIC algorithms a long time. >> > >> > On Wed, Dec 16, 2020, 18:55 Martin Duke <martin.h.duke@gmail.com> >> wrote: >> > (1) Flightsize: in RFC 6675. Section 5, Step 4.2: >> > >> > (4.2) ssthresh = cwnd = (FlightSize / 2) >> > >> > The congestion window (cwnd) and slow start threshold >> > (ssthresh) are reduced to half of FlightSize per [RFC5681]. >> > Additionally, note that [RFC5681] requires that any >> > segments sent as part of the Limited Transmit mechanism not >> > be counted in FlightSize for the purpose of the above >> > equation. >> > >> > IIUC the segments P21..P29 in your example were sent because of Limited >> > Transmit, and so don't count. The flightsize for the purposes of (4.2) >> is >> > therefore 20 after both losses, and the cwnd does not go up on the >> second >> > loss. >> > >> > (2) >> > " Even a single shot burst every time there is significant loss >> > event is not acceptable, not to mention continuous aggressiveness, and >> > this is exactly what RFC 2914 and RFC 5033 explicitly address and warn >> > about." >> > >> > "Significant loss event" is the key phrase here. The intent of TLP/PTO >> is to >> > equalize the treatment of a small packet loss whether it happened in the >> > middle of a burst or the end. Why should an isolated loss be treated >> > differently based on its position in the burst? This is just a logical >> > extension of fast retransmit, which also modified the RTO paradigm. The >> > working group consensus is that this is a feature, not a bug; you're >> welcome >> > to feel otherwise but I suspect you're in the rough here. >> > >> > Regards >> > Martin >> > >> > >> > On Wed, Dec 16, 2020 at 4:11 PM Markku Kojo <kojo@cs.helsinki.fi> >> wrote: >> > Hi Martin, >> > >> > See inline. >> > >> > On Wed, 16 Dec 2020, Martin Duke wrote: >> > >> > > Hi Markku, >> > > >> > > There is a ton here, but I'll try to address the top points. >> > Hopefully >> > > they obviate the rest. >> > >> > Sorry for being verbose. I tried to be clear but you actually >> > removed my >> > key issues/questions ;) >> > >> > > 1. >> > > [Markku] >> > > "Hmm, not sure what you mean by "this is a new loss detection >> > after >> > > acknowledgment of new data"? >> > > But anyway, RFC 5681 gives the general principle to reduce >> > cwnd and >> > > ssthresh twice if a retransmission is lost but IMHO (and I >> > believe many >> > > who have designed new loss recovery and CC algorithms or >> > implemented >> > > them >> > > agree) that it is hard to get things right if only congestion >> > control >> > > principles are available and no algorithm." >> > > >> > > [Martin] >> > > So 6675 Sec 5 is quite explicit that there is only one cwnd >> > reduction >> > > per fast recovery episode, which ends once new data has been >> > > acknowledged. >> > >> > To be more precise: fast recovery ends when the current window >> > becomes >> > cumulatively acknowledged, that is, >> > >> > (4.1) RecoveryPoint (= HighData at the beginning) becomes >> > acknowledged >> > >> > I believe we agree and you meant this although new data below >> > RecoveryPoint may become cumulatively acknowledged already >> > earlier >> > during the fast recovery. Reno loss recovery in RFC 5681 ends, >> > when >> > (any) new data has been acknowledged. >> > >> > > By definition, if a retransmission is lost it is because >> > > newer data has been acknowledged, so it's a new recovery >> > episode. >> > >> > Not sure where you have this definition? Newer than what are you >> > referring to? >> > >> > But, yes, if a retransmission is lost with RFC 6675 algorithm, >> > it requires RTO to be detected and definitely starts a new >> > recovery >> > episode. That is, a new recovery episode is enforced by step >> > (1.a) of >> > NextSeg () which prevents retransmission if a segment that has >> > already >> > been retransmitted. If RACK-TLP is used for detecting loss with >> > RFC 6675 >> > things get different in many ways, because it may detect loss of >> > a >> > retransmission. It would pretty much require an entire redesign >> > of the algorith. For example, calculation of pipe does not >> > consider >> > segments that have been retransmitted more than once. >> > >> > > Meanwhile, during the Fast Recovery period the incoming acks >> > implicitly >> > > remove data from the network and therefore keep flightsize >> > low. >> > >> > Incorrect. FlightSize != pipe. Only cumulative acks remove data >> > from >> > FlightSize and new data transmitted during fast recovery inflate >> > FlightSize. How FlightSize evolves depends on loss pattern as I >> > said. >> > It is also possible that FlightSize is low, it may err in both >> > directions. A simple example can be used as a proof for the case >> > where >> > cwnd increases if a loss of retransmission is detected and >> > repaired: >> > >> > RFC 6675 recovery with RACK-TLP loss detection: >> > (contains some inaccuracies because it has not been defined how >> > lost rexmits are calculated into pipe) >> > >> > cwnd=20; packets P1,...,P20 in flight = current window of data >> > [P1 dropped and rexmit of P1 will also be dropped] >> > >> > DupAck w/SACK for P2 arrives >> > [loss of P1 detected after one RTT from original xmit of P1] >> > [cwnd=ssthresh=10] >> > P1 is rexmitted (and it logically starts next window of data) >> > >> > DupAcks w/ SACK for original P3..11 arrive >> > DupAck w/ SACK for original P12 arrives >> > [cwnd-pipe = 10-9 >=1] >> > send P21 >> > DupAck w/SACK for P13 arrives >> > send P22 >> > ... >> > DupAck w/SACK for P20 arrives >> > send P29 >> > [FlightSize=29] >> > >> > (Ack for rexmit of P1 would arrive here unless it got dropped) >> > >> > DupAck w/SACK for P21 arrives >> > [loss of rexmit P1 detected after one RTT from rexmit of P1] >> > >> > SET cwnd = ssthresh = FlightSize/2= 29/2 = 14,5 >> > >> > CWND INCREASES when it should be at most 5 after halving it >> > twice!!! >> > >> > > We can continue to go around on our interpretation of these >> > documents, >> > > but fundamentally if there is ambiguity in 5681/6675 we should >> > bis >> > > those RFCs rather than expand the scope of RACK. >> > >> > As I said earlier, I am not opposing bis, though 5681bis wuold >> > not >> > be needed, I think. >> > >> > But let me repeat: if we publish RACK-TLP now without necessary >> > warnings >> > or with a correct congesion control algorithm someone will try >> > to >> > implement RACK-TLP with RFC 6675 and it will be a total mesh. >> > The >> > behavior will be unpredictable and quite likely unsafe >> > congestion >> > control behavior. >> > >> > > 2. >> > > [Markku] >> > > " In short: >> > > When with a non-RACK-TLP implementation timer (RTO) expires: >> > cwnd=1 >> > > MSS, >> > > and slow start is entered. >> > > When with a RACK_TLP implementation timer (PTO) expires, >> > > normal fast recovery is entered (unless implementing >> > > also PRR). So no RTO recovery as explicitly stated in Sec. >> > 7.4.1." >> > > >> > > [Martin] >> > > There may be a misunderstanding here. PTO is not the same as >> > RTO, and >> > > both mechanisms exist! The loss response to a PTO is to send a >> > probe; >> > > the RTO response is as with conventional TCP. In Section 7.3: >> > >> > No, I don't think I misunderstood. If you call timeout with >> > another name, it is still timeout. And congestion control does >> > not >> > consider which segments to send (SND.UNA vs. probe w/ higher >> > sequence >> > number), only how much is sent. >> > >> > You ignored my major point where I decoupled congestion control >> > from loss >> > detection and loss recovery and compared RFC 5681 behavior to >> > RACK-TLP >> > behavior in exactly the same scenario where an entire flight is >> > lost and >> > timer expires. >> > >> > Please comment why congestion control behavior is allowed to be >> > radically >> > different in these two implementations? >> > >> > RFC 5681 & RFC 6298 timeout: >> > >> > RTO=SRTT+4*RTTVAR (RTO used for arming the timer) >> > 1. RTO timer expires >> > 2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment >> > 3. Ack of rexmit sent in step 2 arrives >> > 4. cwnd = cwnd+1 MSS; send two segments >> > ... >> > >> > RACK-TLP timeout: >> > >> > PTO=min(2*SRTT,RTO) (PTO used for arming the timer) >> > 1. PTO times expires >> > 2. (cwnd=1 MSS); (re)xmit one segment >> > 3. Ack of (re)xmit sent in srep 2 arrives >> > 4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments >> > >> > If FlightSize is 100 segments when timer expires, congestion >> > control is >> > the same in steps 1-3, but in step 4 the standard congestion >> > control >> > allows transmitting 2 segments, while RACK-TLP would allow >> > blasting 50 segments. >> > >> > > After attempting to send a loss probe, regardless of whether a >> > loss >> > > probe was sent, the sender MUST re-arm the RTO timer, not >> > the PTO >> > > timer, if FlightSize is not zero. This ensures RTO >> > recovery remains >> > > the last resort if TLP fails. >> > > " >> > >> > This does not prevent the above RACK-TLP behavior from getting >> > realized. >> > >> > > So a pure RTO response exists in the case of persistent >> > congestion that >> > > causes losses of probes or their ACKs. >> > >> > Yes, RTO response exists BUT only after RACK-TLP at least once >> > blasts the >> > network. It may well be that with smaller windows RACK-TLP is >> > successful >> > during its TLP initiated overly aggressive "fast recovery" and >> > never >> > enters RTO recovery because it may detect and repair also loss >> > of >> > rexmits. That is, it continues at too high rate even if lost >> > rexmits >> > indicate that congestion persists in successive windows of data. >> > And >> > worse, it is successful because it pushes away other compatible >> > TCP >> > flows by being too aggressive and unfair. >> > >> > Even a single shot burst every time there is significant loss >> > event is not acceptable, not to mention continuous >> > aggressiveness, and >> > this is exactly what RFC 2914 and RFC 5033 explicitly address >> > and warn >> > about. >> > >> > Are we ignoring these BCPs that have IETF consensus? >> > >> > And the other important question I'd like to have an answer: >> > >> > What is the justification to modify standard TCP congestion >> > control to >> > use fast recovery instead of slow start for a case where timeout >> > is >> > needed to detect the packet losses because there is no feedback >> > and ack >> > clock is lost? RACK-TLP explicitly instructs to do so in Sec. >> > 7.4.1. >> > >> > As I noted: based on what is written in the draft it does not >> > intend to >> > change congestion control but effectively it does. >> > >> > /Markku >> > >> > > Martin >> > > >> > > >> > > On Wed, Dec 16, 2020 at 11:39 AM Markku Kojo >> > <kojo@cs.helsinki.fi> >> > > wrote: >> > > Hi Martin, >> > > >> > > On Tue, 15 Dec 2020, Martin Duke wrote: >> > > >> > > > Hi Markku, >> > > > >> > > > Thanks for the comments. The authors will incorporate >> > > many of your >> > > > suggestions after the IESG review. >> > > > >> > > > There's one thing I don't understand in your comments: >> > > > >> > > > " That is, >> > > > where can an implementer find advice for correct >> > > congestion control >> > > > actions with RACK-TLP, when: >> > > > >> > > > (1) a loss of rexmitted segment is detected >> > > > (2) an entire flight of data gets dropped (and >> > detected), >> > > > that is, when there is no feedback available and >> > a >> > > timeout >> > > > is needed to detect the loss " >> > > > >> > > > Section 9.3 is the discussion about CC, and is clear >> > that >> > > the >> > > > implementer should use either 5681 or 6937. >> > > >> > > Just a cite nit: RFC 5681 provides basic CC concepts and >> > > some useful CC >> > > guidelines but given that RACK-TLP MUST implement SACK >> > the >> > > algorithm in >> > > RFC 5681 is not that useful and an implementer quite >> > likely >> > > follows >> > > mainly the algorithm in RFC 6675 (and not RFC 6937 at >> > all >> > > if not >> > > implementing PRR). >> > > And RFC 6675 is not mentioned in Sec 9.3, though it is >> > > listed in the >> > > Sec. 4 (Requirements). >> > > >> > > > You went through the 6937 case in detail. >> > > >> > > Yes, but without correct CC actions. >> > > >> > > > If 5681, it's pretty clear to me that in (1) this is a >> > > new loss >> > > > detection after acknowledgment of new data, and >> > therefore >> > > requires a >> > > > second halving of cwnd. >> > > >> > > Hmm, not sure what you mean by "this is a new loss >> > > detection after >> > > acknowledgment of new data"? >> > > But anyway, RFC 5681 gives the general principle to >> > reduce >> > > cwnd and >> > > ssthresh twice if a retransmission is lost but IMHO (and >> > I >> > > believe many >> > > who have designed new loss recovery and CC algorithms or >> > > implemented them >> > > agree) that it is hard to get things right if only >> > > congestion control >> > > principles are available and no algorithm. >> > > That's why ALL mechanisms that we have include a quite >> > > detailed algorithm >> > > with all necessary variables and actions for loss >> > recovery >> > > and/or CC >> > > purposes (and often also pseudocode). Like this document >> > > does for loss >> > > detection. >> > > >> > > So the problem is that we do not have a detailed enough >> > > algorithm or >> > > rule that tells exactly what to do when a loss of rexmit >> > is >> > > detected. >> > > Even worse, the algorithms in RFC 5681 and RFC 6675 >> > refer >> > > to >> > > equation (4) of RFC 5681 to reduce ssthresh and cwnd >> > when a >> > > loss >> > > requiring a congestion control action is detected: >> > > >> > > (cwnd =) ssthresh = FlightSize / 2) >> > > >> > > And RFC 5681 gives a warning not to halve cwnd in the >> > > equation but >> > > FlightSize. >> > > >> > > That is, this equation is what an implementer >> > intuitively >> > > would use >> > > when reading the relevant RFCs but it gives a wrong >> > result >> > > for >> > > outstanding data when in fast recovery (when the sender >> > is >> > > in >> > > congestion avoidance and the equation (4) is used to >> > halve >> > > cwnd, it >> > > gives a correct result). >> > > More precisely, during fast recovery FlightSize is >> > inflated >> > > when new >> > > data is sent and reduced when segments are cumulatively >> > > Acked. >> > > What the outcome is depends on the loss pattern. In the >> > > worst case, >> > > FlightSize is signficantly larger than in the beginning >> > of >> > > the fast >> > > recovery when FlightSize was (correctly) used to >> > determine >> > > the halved >> > > value for cwnd and ssthresh, i.e., equation (4) may >> > result >> > > in >> > > *increasing* cwnd upon detecting a loss of a rexmitted >> > > segment, instead >> > > of further halving it. >> > > >> > > A clever implementer might have no problem to have it >> > right >> > > with some >> > > thinking but I am afraid that there will be incorrect >> > > implementations >> > > with what is currently specified. Not all implementers >> > have >> > > spent >> > > signicicant fraction of their career in solving TCP >> > > peculiarities. >> > > >> > > > For (2), the RTO timer is still operative so >> > > > the RTO recovery rules would still follow. >> > > >> > > In short: >> > > When with a non-RACK-TLP implementation timer (RTO) >> > > expires: cwnd=1 MSS, >> > > and slow start is entered. >> > > When with a RACK_TLP implementation timer (PTO) expires, >> > > normal fast recovery is entered (unless implementing >> > > also PRR). So no RTO recovery as explicitly stated in >> > Sec. >> > > 7.4.1. >> > > >> > > This means that this document explicitly modifies >> > standard >> > > TCP congestion >> > > control when there are no acks coming and the >> > > retransmission timer >> > > expires >> > > >> > > from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer) >> > > 1. RTO timer expires >> > > 2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one >> > > segment >> > > 3. Ack of rexmit sent in step 2 arrives >> > > 4. cwnd = cwnd+1 MSS; send two segments >> > > ... >> > > >> > > to: PTO=min(2*SRTT,RTO) (PRO used for arming the >> > timer) >> > > 1. PTO times expires >> > > 2. (cwnd=1 MSS); (re)xmit one segment >> > > 3. Ack of (re)xmit sent in srep 2 arrives >> > > 4. cwnd = ssthresh = FlightSize/2; send N=cwnd >> > > segments >> > > >> > > For example, if FlightSize is 100 segments when timer >> > > expires, >> > > congestion control is the same in steps 1-3, but in step >> > 4 >> > > the >> > > current standard congestion control allows transmitting >> > 2 >> > > segments, >> > > while RACK-TLP would allow blasting 50 segments. >> > > >> > > Question is: what is the justification to modify >> > standard >> > > TCP >> > > congestion control to use fast recovery instead of slow >> > > start for a >> > > case where timeout is needed to detect loss because >> > there >> > > is no >> > > feedback and ack clock is lost? The draft does not give >> > any >> > > justification. This clearly is in conflict with items >> > (0) >> > > and (1) >> > > in BCP 133 (RFC 5033). >> > > >> > > Furthermore, there is no implementation nor experimental >> > > experience >> > > evaluating this change. The implementation with >> > > experimental experience >> > > uses PRR (RFC 6937) which is an Experimental >> > specification >> > > including a >> > > novel "trick" that directs PRR fast recovery to >> > effectively >> > > use slow >> > > start in this case at hand. >> > > >> > > >> > > > In other words, I am not seeing a case that requires >> > new >> > > congestion >> > > > control concepts except as discussed in 9.3. >> > > >> > > See above. The change in standard congestion control for >> > > (2). >> > > The draft intends not to change congestion control but >> > > effectively it >> > > does without any operational evidence. >> > > >> > > What's also is missing and would be very useful: >> > > >> > > - For (1), a hint for an implementer saying that because >> > > RACK-TLP is >> > > able to detect a loss of a rexmit unlike any other >> > loss >> > > detection >> > > algorithm, the sender MUST react twice to congestion >> > > (and cite >> > > RFC 5681). And cite a document where necessary >> > correct >> > > actions >> > > are described. >> > > >> > > - For (1), advise that an implementer needs to keep >> > track >> > > when it >> > > detects a loss of a retransmitted segment. Current >> > > algorithms >> > > in the draft detect a loss of retransmitted segment >> > > exactly in >> > > the same way as loss of any other segment. There >> > seems >> > > to be >> > > nothing to track when a retransmission of a >> > > retransmitted segment >> > > takes place. Therefore, the algorithms should have >> > > additional >> > > actions to correctly track when such a loss is >> > detected. >> > > >> > > - For (1), discussion on how many times a loss of a >> > > retransmission >> > > of the same segment may occur and be detected. Seems >> > > that it >> > > may be possible to drop a rexmitted segment more than >> > > once and >> > > detect it also several times? What are the >> > > implications? >> > > >> > > - If previous is possible, then the algorithm possibly >> > also >> > > may detect a loss of a new segment that was sent >> > during >> > > fast >> > > recovery? This is also loss in two successive windows >> > of >> > > data, >> > > and cwnd MUST be lowered twice. This discussion and >> > > necessary >> > > actions to track it are missing, if such scenario is >> > > possible. >> > > >> > > > What am I missing? >> > > >> > > Hope the above helps. >> > > >> > > /Markku >> > > >> > > >> > > <snipping the rest> >> > > >> > > >> > >> > >> > >> >>
- [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt> (… The IESG
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Yuchung Cheng
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Ian Swett
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] [Last-Call] Last Call: <draft-ietf-tcp… Michael Welzl
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Yuchung Cheng
- Re: [tcpm] [Last-Call] Last Call: <draft-ietf-tcp… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Neal Cardwell
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Praveen Balasubramanian
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Yuchung Cheng
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Martin Duke
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Yuchung Cheng
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Neal Cardwell
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Neal Cardwell
- Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-t… Markku Kojo
- Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-t… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Praveen Balasubramanian
- Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-t… Markku Kojo