Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt> (The RACK-TLPloss detection algorithm for TCP) to Proposed Standard

Ian Swett <ianswett@google.com> Sat, 05 December 2020 01:14 UTC

Return-Path: <ianswett@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C37C43A1125 for <tcpm@ietfa.amsl.com>; Fri, 4 Dec 2020 17:14:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Uw928NvWLeBA for <tcpm@ietfa.amsl.com>; Fri, 4 Dec 2020 17:14:07 -0800 (PST)
Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CC50F3A112B for <tcpm@ietf.org>; Fri, 4 Dec 2020 17:14:03 -0800 (PST)
Received: by mail-yb1-xb29.google.com with SMTP id v92so7188558ybi.4 for <tcpm@ietf.org>; Fri, 04 Dec 2020 17:14:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8bSff3TWjW8NALDMsGJeLRuLM6OpJkiYQiE2y/W7fgM=; b=o0I5ZCahZ3qJjJyfVYr/QPPDe9xqXcgEkkE8OoGkQ1ba1Yq38M0ai6NCE4tAMzvkI5 5qPhzUO7TRot6ndxsGDDd85zGMZAskc+3S4Y0prVSGo6SR5+FjTQ/9GGvC3ACUz2SeFk DocXn/LUOe/Y0JzJAufuPL1L/DV+hTKJUqIN02lbZxGFyiq22BkJeMieiBVK+b1hs4in +iLU8GH1mhK4xTVXpCo/fcf42EfHhbbSVfZ8C8GKXu2vi2j0yG4PwNpB58Lf9pHiMX6P TunlJAfmyILk821L5q2zZXIRmbY/pDjSFWiamZ4P1NahVrfHNZ5XKmma8WFtLLsKaSFR LbHQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8bSff3TWjW8NALDMsGJeLRuLM6OpJkiYQiE2y/W7fgM=; b=EAh2SxbUo2Hs7+46Hk3DCV1oVgMUUUUKNJ0J/gtAhNSKAXY3AjBwDQWdM1y4NjDJwD ALf/GQ2f41baragJCKcpKZ2vG7aSqsPHwI9ttvTEwTKxULW33zjPfxrZki8zJ2OW2YRg raOl56dEVM+JisvpW6uC50pvg6tTxoZpATJfJTHG/EJilNZI9AFEi9Yw+m2jUAGmvBFg Llr0qtowWU6K1aw79/hhVqiOpkI1UdwepfLSzZyO+vWLLLrUWWWBtFoO21tO5NuNzrW4 p+SbDrVKHMvCSllcQ0deofEDE6h4N6lW0/nGn5Y3/52jY+OCPeSx91wd6/Cz/q9rCb5q GFWg==
X-Gm-Message-State: AOAM5306Kwi33avwtfuXO0dMix3QUEa19qtIEwi9ZsC3rRMNOuJLWQQI uhYoZHR4dxTYUkyg/3uUeSHnDKTI8/3x8UlOTaGRzQ==
X-Google-Smtp-Source: ABdhPJzc9mCtLNAynW6v5R30PNmdKoyw22Cp7gAOvOBkNbqFSpaVrdP8ho8wDn7AOuMev0THXujlgdT/U4CSrJWcSVw=
X-Received: by 2002:a5b:f41:: with SMTP id y1mr9454876ybr.119.1607130842499; Fri, 04 Dec 2020 17:14:02 -0800 (PST)
MIME-Version: 1.0
References: <160557473030.20071.3820294165818082636@ietfa.amsl.com> <alpine.DEB.2.21.2012030145440.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com>
In-Reply-To: <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com>
From: Ian Swett <ianswett@google.com>
Date: Fri, 04 Dec 2020 20:13:51 -0500
Message-ID: <CAKcm_gP5q+sMajeZp7EQEr25tQEFij=DP0D7LZ7noaEtXGM6Rg@mail.gmail.com>
To: Yuchung Cheng <ycheng=40google.com@dmarc.ietf.org>
Cc: Markku Kojo <kojo@cs.helsinki.fi>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, draft-ietf-tcpm-rack@ietf.org, Michael Tuexen <tuexen@fh-muenster.de>, draft-ietf-tcpm-rack.all@ietf.org, Last Call <last-call@ietf.org>, IETF-Announce <ietf-announce@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000f9ffd005b5ad4e32"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/J5I6akS2Gltv36idfOWVl6uS3D4>
Subject: Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt> (The RACK-TLPloss detection algorithm for TCP) to Proposed Standard
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Dec 2020 01:14:13 -0000

+1 to decoupling loss detection from congestion control.  It ensures RACK
does not need to not be updated every time a new congestion controller is
standardized.

On Fri, Dec 4, 2020 at 6:58 PM Yuchung Cheng <ycheng=
40google.com@dmarc.ietf.org> wrote:

> On Fri, Dec 4, 2020 at 5:02 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
> >
> > Hi all,
> >
> > I know this is a bit late but I didn't have time earlier to take look at
> > this draft.
> >
> > Given that this RFC to be is standards track and RECOMMENDED to replace
> > current DupAck-based loss detection, it is important that the spec is
> > clear on its advice to those implementing it. Current text seems to
> > lack important advice w.r.t congestion control, and even though
> > the spec tries to decouple loss detection from congestion control
> > and does not intend to modify existing standard congestion control
> > some of the examples advice incorrect congestion control actions.
> > Therefore, I think it is worth to correct the mistakes and take
> > yet another look at a few implications of this specification.
> As you noted, the intention is to decouple the two as much as possible
>
> Unlike the 20 years ago where TCP loss detection and congestion
> control are essentially glued in one piece, the decoupling of the two
> (including modularizing congestion controls in implementations) has
> helped fueled many great inventions of new congestion controls.
> Codifying so-called-default C.C. reactions in the loss detection is a
> step backward that the authors try their best to avoid.To keep the
> document less "abstract / unclear" as many WGLC reviewers commented,
> we use examples to illustrate that includes CC actions. But the
> details of these CC actions are likely to become obsolete as CC
> continues to advance hopefully.
>
> >
> > Sec. 3.4 (and elsewhere when discussing recovering a dropped
> > retransmission):
> >
> > It is very useful that RACK-TLP allows for recovering dropped rexmits.
> > However, it seems that the spec ignores the fact that loss of a
> > retransmission is a loss in a successive window that requires reacting
> > to congestion twice as per RFC 5681. This advice must be included in
> > the specification because with RACK-TLP recovery of dropped rexmit
> > takes place during the fast recovery which is very different
> > from the other standard algorithms and therefore easy to miss
> > when implementing this spec.
>
> per RFC5681 sec 4.3 https://tools.ietf.org/html/rfc5681#section-4.3
> "Loss in two successive windows of data, or the loss of a
>   retransmission, should be taken as two indications of congestion and,
>    therefore, cwnd (and ssthresh) MUST be lowered twice in this case."
>
> RACK-TLP is a loss detection algorithm. RFC5681 is crystal clear on
> this so I am not sure what clause you suggest to add to RACK-TLP.
> >
> > Sec 9.3:
> >
> > In Section 9.3 it is stated that the only modification to the existing
> > congestion control algorithms is that one outstanding loss probe
> > can be sent even if the congestion window is fully used. This is
> > fine, but the spec lacks the advice that if a new data segment is sent
> > this extra segment MUST NOT be included when calculating the new value
> > of ssthresh as per the equation (4) of RFC 5681. Such segment is an
> > extra segment not allowed by cwnd, so it must be excluded from
> > FlightSize, if the TLP probe detects loss or if there is no ack
> > and RTO is needed to trigger loss recovery.
>
> Why exclude TLP (or any data) from FlightSize? The congestion control
> needs precise accounting of the flight size to react to congestion
> properly.
>
> >
> > In these cases the temporary over-commit is not accounted for as DupAck
> > does not decrease FlightSize and in case of an RTO the next ACK comes too
> > late. This is similar to the rule in RFC 5681 and RFC 6675 that prohibits
> > including the segments transmitted via Limitid Transmit in the
> > calculation of ssthresh.
> >
> > In Section 9.3 a few example scenarios are used to illustriate the
> > intended operation of RACK-TLP.
> >
> >   In the first example a sender has a congestion window (cwnd) of 20
> >   segments on a SACK-enabled connection.  It sends 10 data segments
> >   and all of them are lost.
> >
> > The text claims that without RACK-TLP the ending cwnd would be 4 segments
> > due to congestion window validation. This is incorrect.
> > As per RFC 7661 the sender MUST exit the non-validated phase upon an
> > RTO. Therefore the ending cwnd would be 5 segments (or 5 1/2 segments if
> > the TCP sender uses the equation (4) of RFC 5681).
> >
> > The operation with RACK-TLP would inevitably result in congestion
> > collapse if RACK-TLP behaves as described in the example because
> > it restores the previous cwnd of 10 segments after the fast recovery
> > and would not react to congestion at all! I think this is not the
> > intended behavior by this spec but a mistake in the example.
> > The ssthresh calculated in the beginning of loss recovery should
> > be 5 segments as per RFC 6675 (and RFC 5681).
> To clarify, would this text look more clear?
>
> 'an ending cwnd set to the slow start threshold of 5 segments (half of
> the original congestion window of 10 segments)'
>
> >
> > Furthermore, it seems that this example with RACK-TLP refers to using
> > PRR_SSRB which effectively implements regular slow start in this
> > case(?). From congestion control point of view this is correct because
> > the entire flight of data as well as ack clock was lost.
> >
> > However, as correctly discussed in Sec 2, congestion window must be reset
> > to 1 MSS when an entire flight of data is and Ack clock is lost. But how
> > can an implementor know what to do if she/he is not implementing the
> > experimental PRR algrorithm? This spec articulates specifying an
> > alternative for DupAck counting, indicating that TLP is used to trigger
> > Fast Retransmit & Fast Recovery only, not a loss recovery in slow start.
> > This means that without an additional advise an implementation of this
> > spec would just halve the cwnd and ssthresh and send a potentially very
> > large burst of segments in the beginning of the Fast Recovery because
> > there is no ack clock. So, this spec begs for an advise (MUST) when to
> > slow start and reset cwnd and when not, or at least a discussion of
> > this problem and some sort of advise what to do and what to avoid.
> > And, maybe a recommendation to implement it with PRR?
>
> It's wise to decouple loss detection (RACK-TLP) vs congestion/burst
> control (when to slow-start). The use of PRR is just an example to
> illustrate and not meant for a recommendation.
>
> Section 3 has a lengthy section to elaborate the key point of RACK-TLP
> is to maximize the chance of fast recovery. How C.C. governs the
> transmission dynamics after losses are detected are out of scope of
> this document in our authors' opinions.
>
>
> >
> > Another question relates to the use of TLP and adjusting timer(s) upon
> > timeout. In the same example discussed above, it is clear that PTO
> > that fires TLP is just a more aggressive retransmit timer with
> > an alternative data segment to (re)transmit.
> >
> > Therefore, as per RFC 2914 (BCP 41), Sec 9.1, when PTO expires, it is in
> > effect a retransmission timout and the timer(s) must be backed-off.
> > This is not adviced in this specification. Whether it is the TCP RTO
> > or PTO that should be backed-off is an open question.  Otherwise,
> > if the congestion is persistent and further transmission are also lost,
> > RACK-TLP would not react to congestion properly but would keep
> > retransmitting with "constant" timer value because new RTT estimate
> > cannot be obtained.
> > On a buffer bloated and heavily congested bottleneck this would easily
> > result in sending at least one unnecessary retransmission per one
> > delivered segment which is not advisable (e.g., when there are a huge
> > number of applications sharing a constrained bottleneck and these
> > applications are sending only one (or a few) segments and then
> > waiting for an reply from the peer before sending another request).
>
> Thanks for pointing to the RFC.  After TLP, RTO timers will
> exp-backoff (as usual) for stability reasons mentioned in sec 9.3
> (didn't find 9.1 relevant). In your scenario, you presuppose the
> retransmission is unnecessary so obviously TLP is not good. Consider
> what happens without TLP where all the senders fire RTO spuriously and
> blow up the network. It is equally unfortunate behavior. "bdp
> insufficient of many flows" is a congestion control problem
>
>
> >
> > Additional notes:
> >
> > Sec 2.2:
> >
> > Example 2:
> > "Lost retransmissions cause a  resort to RTO recovery, since
> >   DUPACK-counting does not detect the loss of the retransmissions.
> >   Then the slow start after RTO recovery could cause burst losses
> >   again that severely degrades performance [POLICER16]."
> >
> > RTO reovery is done in slow start. The last sentence is confusing as
> > there is no (new) slow-start after RTO recovery (or more precisely
> > slow start continues until cwnd > ssthresh). Do you mean: if/when slow
> > start still continues after RTO Recovery has repaired lost segments,
> > it may cause burst losses again?
> I mean the slow start after (the start of) RTO recovery. HTH
>
>
> >
> > Example 3:
> >   "If the reordering degree is beyond DupThresh, the DUPACK-
> >    counting can cause a spurious fast recovery and unnecessary
> >    congestion window reduction.  To mitigate the issue, [RFC4653]
> >    adjusts DupThresh to half of the inflight size to tolerate the
> >    higher degree of reordering.  However if more than half of the
> >    inflight is lost, then the sender has to resort to RTO recovery."
> >
> > This seems to be somewhat incorrect description of TCP-NCR specified in
> > RFC 4653. TCP-NCR uses Extended Limited Transmit that keeps on sending
> > new data segments on DupAcks that makes it likely to avoid an RTO in
> > the given example scenario, if not too many of the the new data
> > segments triggered by Extended Limited Transmit are lost.
> sorry I don't see how the text is wrong describing RFC4653,
> specifically the algorithm in adjusting ssthresh
>
> >
> > Sec. 3.5:
> >
> >   "For example, consider a simple case where one
> >   segment was sent with an RTO of 1 second, and then the application
> >   writes more data, causing a second and third segment to be sent right
> >   before the RTO of the first segment expires.  Suppose only the first
> >   segment is lost.  Without RACK, upon RTO expiration the sender marks
> >   all three segments as lost and retransmits the first segment.  When
> >   the sender receives the ACK that selectively acknowledges the second
> >   segment, the sender spuriously retransmits the third segment."
> >
> > This seems incorrect. When the sender receives the ACK that selectively
> > acknowledges the second segment, it is a DupAck as per RFC 6675 and does
> > not increase cwnd and cwnd remains as 1 MSS and pipe is 1 MSS. So, the
> > rexmit of the third segment is not allowad until the cumulative ACK of
> > the first segment arrives.
> I don't see where RFC6675 forbids growing cwnd. Even if it does, I
> don't think it's a good thing (in RTO-slow-start) as DUPACK clearly
> indicates a delivery has been made.
>
>
> >
> > Best regards,
> >
> > /Markku
> >
> >
> >
> > On Mon, 16 Nov 2020, The IESG wrote:
> >
> > >
> > > The IESG has received a request from the TCP Maintenance and Minor
> Extensions
> > > WG (tcpm) to consider the following document: - 'The RACK-TLP loss
> detection
> > > algorithm for TCP'
> > >  <draft-ietf-tcpm-rack-13.txt> as Proposed Standard
> > >
> > > The IESG plans to make a decision in the next few weeks, and solicits
> final
> > > comments on this action. Please send substantive comments to the
> > > last-call@ietf.org mailing lists by 2020-11-30. Exceptionally,
> comments may
> > > be sent to iesg@ietf.org instead. In either case, please retain the
> beginning
> > > of the Subject line to allow automated sorting.
> > >
> > > Abstract
> > >
> > >
> > >   This document presents the RACK-TLP loss detection algorithm for TCP.
> > >   RACK-TLP uses per-segment transmit timestamps and selective
> > >   acknowledgements (SACK) and has two parts: RACK ("Recent
> > >   ACKnowledgment") starts fast recovery quickly using time-based
> > >   inferences derived from ACK feedback.  TLP ("Tail Loss Probe")
> > >   leverages RACK and sends a probe packet to trigger ACK feedback to
> > >   avoid retransmission timeout (RTO) events.  Compared to the widely
> > >   used DUPACK threshold approach, RACK-TLP detects losses more
> > >   efficiently when there are application-limited flights of data, lost
> > >   retransmissions, or data packet reordering events.  It is intended to
> > >   be an alternative to the DUPACK threshold approach.
> > >
> > >
> > >
> > >
> > > The file can be obtained via
> > > https://datatracker.ietf.org/doc/draft-ietf-tcpm-rack/
> > >
> > >
> > >
> > > No IPR declarations have been submitted directly on this I-D.
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > tcpm mailing list
> > > tcpm@ietf.org
> > > https://www.ietf.org/mailman/listinfo/tcpm
> > >
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>