Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt> (The RACK-TLPloss detection algorithm for TCP) to Proposed Standard
Ian Swett <ianswett@google.com> Sat, 05 December 2020 01:14 UTC
Return-Path: <ianswett@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C37C43A1125 for <tcpm@ietfa.amsl.com>; Fri, 4 Dec 2020 17:14:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Uw928NvWLeBA for <tcpm@ietfa.amsl.com>; Fri, 4 Dec 2020 17:14:07 -0800 (PST)
Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CC50F3A112B for <tcpm@ietf.org>; Fri, 4 Dec 2020 17:14:03 -0800 (PST)
Received: by mail-yb1-xb29.google.com with SMTP id v92so7188558ybi.4 for <tcpm@ietf.org>; Fri, 04 Dec 2020 17:14:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8bSff3TWjW8NALDMsGJeLRuLM6OpJkiYQiE2y/W7fgM=; b=o0I5ZCahZ3qJjJyfVYr/QPPDe9xqXcgEkkE8OoGkQ1ba1Yq38M0ai6NCE4tAMzvkI5 5qPhzUO7TRot6ndxsGDDd85zGMZAskc+3S4Y0prVSGo6SR5+FjTQ/9GGvC3ACUz2SeFk DocXn/LUOe/Y0JzJAufuPL1L/DV+hTKJUqIN02lbZxGFyiq22BkJeMieiBVK+b1hs4in +iLU8GH1mhK4xTVXpCo/fcf42EfHhbbSVfZ8C8GKXu2vi2j0yG4PwNpB58Lf9pHiMX6P TunlJAfmyILk821L5q2zZXIRmbY/pDjSFWiamZ4P1NahVrfHNZ5XKmma8WFtLLsKaSFR LbHQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8bSff3TWjW8NALDMsGJeLRuLM6OpJkiYQiE2y/W7fgM=; b=EAh2SxbUo2Hs7+46Hk3DCV1oVgMUUUUKNJ0J/gtAhNSKAXY3AjBwDQWdM1y4NjDJwD ALf/GQ2f41baragJCKcpKZ2vG7aSqsPHwI9ttvTEwTKxULW33zjPfxrZki8zJ2OW2YRg raOl56dEVM+JisvpW6uC50pvg6tTxoZpATJfJTHG/EJilNZI9AFEi9Yw+m2jUAGmvBFg Llr0qtowWU6K1aw79/hhVqiOpkI1UdwepfLSzZyO+vWLLLrUWWWBtFoO21tO5NuNzrW4 p+SbDrVKHMvCSllcQ0deofEDE6h4N6lW0/nGn5Y3/52jY+OCPeSx91wd6/Cz/q9rCb5q GFWg==
X-Gm-Message-State: AOAM5306Kwi33avwtfuXO0dMix3QUEa19qtIEwi9ZsC3rRMNOuJLWQQI uhYoZHR4dxTYUkyg/3uUeSHnDKTI8/3x8UlOTaGRzQ==
X-Google-Smtp-Source: ABdhPJzc9mCtLNAynW6v5R30PNmdKoyw22Cp7gAOvOBkNbqFSpaVrdP8ho8wDn7AOuMev0THXujlgdT/U4CSrJWcSVw=
X-Received: by 2002:a5b:f41:: with SMTP id y1mr9454876ybr.119.1607130842499; Fri, 04 Dec 2020 17:14:02 -0800 (PST)
MIME-Version: 1.0
References: <160557473030.20071.3820294165818082636@ietfa.amsl.com> <alpine.DEB.2.21.2012030145440.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com>
In-Reply-To: <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com>
From: Ian Swett <ianswett@google.com>
Date: Fri, 04 Dec 2020 20:13:51 -0500
Message-ID: <CAKcm_gP5q+sMajeZp7EQEr25tQEFij=DP0D7LZ7noaEtXGM6Rg@mail.gmail.com>
To: Yuchung Cheng <ycheng=40google.com@dmarc.ietf.org>
Cc: Markku Kojo <kojo@cs.helsinki.fi>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, draft-ietf-tcpm-rack@ietf.org, Michael Tuexen <tuexen@fh-muenster.de>, draft-ietf-tcpm-rack.all@ietf.org, Last Call <last-call@ietf.org>, IETF-Announce <ietf-announce@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000f9ffd005b5ad4e32"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/J5I6akS2Gltv36idfOWVl6uS3D4>
Subject: Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt> (The RACK-TLPloss detection algorithm for TCP) to Proposed Standard
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Dec 2020 01:14:13 -0000
+1 to decoupling loss detection from congestion control. It ensures RACK does not need to not be updated every time a new congestion controller is standardized. On Fri, Dec 4, 2020 at 6:58 PM Yuchung Cheng <ycheng= 40google.com@dmarc.ietf.org> wrote: > On Fri, Dec 4, 2020 at 5:02 AM Markku Kojo <kojo@cs.helsinki.fi> wrote: > > > > Hi all, > > > > I know this is a bit late but I didn't have time earlier to take look at > > this draft. > > > > Given that this RFC to be is standards track and RECOMMENDED to replace > > current DupAck-based loss detection, it is important that the spec is > > clear on its advice to those implementing it. Current text seems to > > lack important advice w.r.t congestion control, and even though > > the spec tries to decouple loss detection from congestion control > > and does not intend to modify existing standard congestion control > > some of the examples advice incorrect congestion control actions. > > Therefore, I think it is worth to correct the mistakes and take > > yet another look at a few implications of this specification. > As you noted, the intention is to decouple the two as much as possible > > Unlike the 20 years ago where TCP loss detection and congestion > control are essentially glued in one piece, the decoupling of the two > (including modularizing congestion controls in implementations) has > helped fueled many great inventions of new congestion controls. > Codifying so-called-default C.C. reactions in the loss detection is a > step backward that the authors try their best to avoid.To keep the > document less "abstract / unclear" as many WGLC reviewers commented, > we use examples to illustrate that includes CC actions. But the > details of these CC actions are likely to become obsolete as CC > continues to advance hopefully. > > > > > Sec. 3.4 (and elsewhere when discussing recovering a dropped > > retransmission): > > > > It is very useful that RACK-TLP allows for recovering dropped rexmits. > > However, it seems that the spec ignores the fact that loss of a > > retransmission is a loss in a successive window that requires reacting > > to congestion twice as per RFC 5681. This advice must be included in > > the specification because with RACK-TLP recovery of dropped rexmit > > takes place during the fast recovery which is very different > > from the other standard algorithms and therefore easy to miss > > when implementing this spec. > > per RFC5681 sec 4.3 https://tools.ietf.org/html/rfc5681#section-4.3 > "Loss in two successive windows of data, or the loss of a > retransmission, should be taken as two indications of congestion and, > therefore, cwnd (and ssthresh) MUST be lowered twice in this case." > > RACK-TLP is a loss detection algorithm. RFC5681 is crystal clear on > this so I am not sure what clause you suggest to add to RACK-TLP. > > > > Sec 9.3: > > > > In Section 9.3 it is stated that the only modification to the existing > > congestion control algorithms is that one outstanding loss probe > > can be sent even if the congestion window is fully used. This is > > fine, but the spec lacks the advice that if a new data segment is sent > > this extra segment MUST NOT be included when calculating the new value > > of ssthresh as per the equation (4) of RFC 5681. Such segment is an > > extra segment not allowed by cwnd, so it must be excluded from > > FlightSize, if the TLP probe detects loss or if there is no ack > > and RTO is needed to trigger loss recovery. > > Why exclude TLP (or any data) from FlightSize? The congestion control > needs precise accounting of the flight size to react to congestion > properly. > > > > > In these cases the temporary over-commit is not accounted for as DupAck > > does not decrease FlightSize and in case of an RTO the next ACK comes too > > late. This is similar to the rule in RFC 5681 and RFC 6675 that prohibits > > including the segments transmitted via Limitid Transmit in the > > calculation of ssthresh. > > > > In Section 9.3 a few example scenarios are used to illustriate the > > intended operation of RACK-TLP. > > > > In the first example a sender has a congestion window (cwnd) of 20 > > segments on a SACK-enabled connection. It sends 10 data segments > > and all of them are lost. > > > > The text claims that without RACK-TLP the ending cwnd would be 4 segments > > due to congestion window validation. This is incorrect. > > As per RFC 7661 the sender MUST exit the non-validated phase upon an > > RTO. Therefore the ending cwnd would be 5 segments (or 5 1/2 segments if > > the TCP sender uses the equation (4) of RFC 5681). > > > > The operation with RACK-TLP would inevitably result in congestion > > collapse if RACK-TLP behaves as described in the example because > > it restores the previous cwnd of 10 segments after the fast recovery > > and would not react to congestion at all! I think this is not the > > intended behavior by this spec but a mistake in the example. > > The ssthresh calculated in the beginning of loss recovery should > > be 5 segments as per RFC 6675 (and RFC 5681). > To clarify, would this text look more clear? > > 'an ending cwnd set to the slow start threshold of 5 segments (half of > the original congestion window of 10 segments)' > > > > > Furthermore, it seems that this example with RACK-TLP refers to using > > PRR_SSRB which effectively implements regular slow start in this > > case(?). From congestion control point of view this is correct because > > the entire flight of data as well as ack clock was lost. > > > > However, as correctly discussed in Sec 2, congestion window must be reset > > to 1 MSS when an entire flight of data is and Ack clock is lost. But how > > can an implementor know what to do if she/he is not implementing the > > experimental PRR algrorithm? This spec articulates specifying an > > alternative for DupAck counting, indicating that TLP is used to trigger > > Fast Retransmit & Fast Recovery only, not a loss recovery in slow start. > > This means that without an additional advise an implementation of this > > spec would just halve the cwnd and ssthresh and send a potentially very > > large burst of segments in the beginning of the Fast Recovery because > > there is no ack clock. So, this spec begs for an advise (MUST) when to > > slow start and reset cwnd and when not, or at least a discussion of > > this problem and some sort of advise what to do and what to avoid. > > And, maybe a recommendation to implement it with PRR? > > It's wise to decouple loss detection (RACK-TLP) vs congestion/burst > control (when to slow-start). The use of PRR is just an example to > illustrate and not meant for a recommendation. > > Section 3 has a lengthy section to elaborate the key point of RACK-TLP > is to maximize the chance of fast recovery. How C.C. governs the > transmission dynamics after losses are detected are out of scope of > this document in our authors' opinions. > > > > > > Another question relates to the use of TLP and adjusting timer(s) upon > > timeout. In the same example discussed above, it is clear that PTO > > that fires TLP is just a more aggressive retransmit timer with > > an alternative data segment to (re)transmit. > > > > Therefore, as per RFC 2914 (BCP 41), Sec 9.1, when PTO expires, it is in > > effect a retransmission timout and the timer(s) must be backed-off. > > This is not adviced in this specification. Whether it is the TCP RTO > > or PTO that should be backed-off is an open question. Otherwise, > > if the congestion is persistent and further transmission are also lost, > > RACK-TLP would not react to congestion properly but would keep > > retransmitting with "constant" timer value because new RTT estimate > > cannot be obtained. > > On a buffer bloated and heavily congested bottleneck this would easily > > result in sending at least one unnecessary retransmission per one > > delivered segment which is not advisable (e.g., when there are a huge > > number of applications sharing a constrained bottleneck and these > > applications are sending only one (or a few) segments and then > > waiting for an reply from the peer before sending another request). > > Thanks for pointing to the RFC. After TLP, RTO timers will > exp-backoff (as usual) for stability reasons mentioned in sec 9.3 > (didn't find 9.1 relevant). In your scenario, you presuppose the > retransmission is unnecessary so obviously TLP is not good. Consider > what happens without TLP where all the senders fire RTO spuriously and > blow up the network. It is equally unfortunate behavior. "bdp > insufficient of many flows" is a congestion control problem > > > > > > Additional notes: > > > > Sec 2.2: > > > > Example 2: > > "Lost retransmissions cause a resort to RTO recovery, since > > DUPACK-counting does not detect the loss of the retransmissions. > > Then the slow start after RTO recovery could cause burst losses > > again that severely degrades performance [POLICER16]." > > > > RTO reovery is done in slow start. The last sentence is confusing as > > there is no (new) slow-start after RTO recovery (or more precisely > > slow start continues until cwnd > ssthresh). Do you mean: if/when slow > > start still continues after RTO Recovery has repaired lost segments, > > it may cause burst losses again? > I mean the slow start after (the start of) RTO recovery. HTH > > > > > > Example 3: > > "If the reordering degree is beyond DupThresh, the DUPACK- > > counting can cause a spurious fast recovery and unnecessary > > congestion window reduction. To mitigate the issue, [RFC4653] > > adjusts DupThresh to half of the inflight size to tolerate the > > higher degree of reordering. However if more than half of the > > inflight is lost, then the sender has to resort to RTO recovery." > > > > This seems to be somewhat incorrect description of TCP-NCR specified in > > RFC 4653. TCP-NCR uses Extended Limited Transmit that keeps on sending > > new data segments on DupAcks that makes it likely to avoid an RTO in > > the given example scenario, if not too many of the the new data > > segments triggered by Extended Limited Transmit are lost. > sorry I don't see how the text is wrong describing RFC4653, > specifically the algorithm in adjusting ssthresh > > > > > Sec. 3.5: > > > > "For example, consider a simple case where one > > segment was sent with an RTO of 1 second, and then the application > > writes more data, causing a second and third segment to be sent right > > before the RTO of the first segment expires. Suppose only the first > > segment is lost. Without RACK, upon RTO expiration the sender marks > > all three segments as lost and retransmits the first segment. When > > the sender receives the ACK that selectively acknowledges the second > > segment, the sender spuriously retransmits the third segment." > > > > This seems incorrect. When the sender receives the ACK that selectively > > acknowledges the second segment, it is a DupAck as per RFC 6675 and does > > not increase cwnd and cwnd remains as 1 MSS and pipe is 1 MSS. So, the > > rexmit of the third segment is not allowad until the cumulative ACK of > > the first segment arrives. > I don't see where RFC6675 forbids growing cwnd. Even if it does, I > don't think it's a good thing (in RTO-slow-start) as DUPACK clearly > indicates a delivery has been made. > > > > > > Best regards, > > > > /Markku > > > > > > > > On Mon, 16 Nov 2020, The IESG wrote: > > > > > > > > The IESG has received a request from the TCP Maintenance and Minor > Extensions > > > WG (tcpm) to consider the following document: - 'The RACK-TLP loss > detection > > > algorithm for TCP' > > > <draft-ietf-tcpm-rack-13.txt> as Proposed Standard > > > > > > The IESG plans to make a decision in the next few weeks, and solicits > final > > > comments on this action. Please send substantive comments to the > > > last-call@ietf.org mailing lists by 2020-11-30. Exceptionally, > comments may > > > be sent to iesg@ietf.org instead. In either case, please retain the > beginning > > > of the Subject line to allow automated sorting. > > > > > > Abstract > > > > > > > > > This document presents the RACK-TLP loss detection algorithm for TCP. > > > RACK-TLP uses per-segment transmit timestamps and selective > > > acknowledgements (SACK) and has two parts: RACK ("Recent > > > ACKnowledgment") starts fast recovery quickly using time-based > > > inferences derived from ACK feedback. TLP ("Tail Loss Probe") > > > leverages RACK and sends a probe packet to trigger ACK feedback to > > > avoid retransmission timeout (RTO) events. Compared to the widely > > > used DUPACK threshold approach, RACK-TLP detects losses more > > > efficiently when there are application-limited flights of data, lost > > > retransmissions, or data packet reordering events. It is intended to > > > be an alternative to the DUPACK threshold approach. > > > > > > > > > > > > > > > The file can be obtained via > > > https://datatracker.ietf.org/doc/draft-ietf-tcpm-rack/ > > > > > > > > > > > > No IPR declarations have been submitted directly on this I-D. > > > > > > > > > > > > > > > > > > _______________________________________________ > > > tcpm mailing list > > > tcpm@ietf.org > > > https://www.ietf.org/mailman/listinfo/tcpm > > > > > _______________________________________________ > tcpm mailing list > tcpm@ietf.org > https://www.ietf.org/mailman/listinfo/tcpm >
- [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt> (… The IESG
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Yuchung Cheng
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Ian Swett
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] [Last-Call] Last Call: <draft-ietf-tcp… Michael Welzl
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Yuchung Cheng
- Re: [tcpm] [Last-Call] Last Call: <draft-ietf-tcp… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Neal Cardwell
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Martin Duke
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Praveen Balasubramanian
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Yuchung Cheng
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Martin Duke
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Yuchung Cheng
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Neal Cardwell
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Neal Cardwell
- Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-t… Markku Kojo
- Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-t… Markku Kojo
- Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.tx… Markku Kojo
- Re: [tcpm] [EXTERNAL] Re: Last Call: <draft-ietf-… Praveen Balasubramanian
- Re: [tcpm] [EXTERNAL] Re: Last Call:<draft-ietf-t… Markku Kojo