Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmfor TCP) to Proposed Standard

Markku Kojo <kojo@cs.helsinki.fi> Sat, 19 December 2020 00:41 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1DCA73A09AD; Fri, 18 Dec 2020 16:41:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O-frcK9GVPYS; Fri, 18 Dec 2020 16:41:37 -0800 (PST)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 15F223A0965; Fri, 18 Dec 2020 16:41:36 -0800 (PST)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Sat, 19 Dec 2020 02:41:32 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=cRuHZo vc6WA4jOlc0E8Ui+MRYuPqUk9vu2PRLc4kyhA=; b=T3nw6lJFGLw7wSNE/OQ0ma obupN9uQrTkdVe9lahDOXbkJ05/Ga00ptHS1YKskR0f/WsevOP2+n4cj12KVwjNJ Cf4rIwQcxyXwiobzBf0Atd3UXZTQm5KyibasCDJerMMBYZ3sAF9I2724DA9n3pIx yShEeOz/ajdbIZQ1HvrAQ=
Received: from hp8x-60 (85-76-102-128-nat.elisa-mobile.fi [85.76.102.128]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Sat, 19 Dec 2020 02:41:32 +0200 id 00000000005A0403.000000005FDD4C3C.000017F2
Date: Sat, 19 Dec 2020 02:41:31 +0200
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Neal Cardwell <ncardwell@google.com>
cc: Martin Duke <martin.h.duke@gmail.com>, Yuchung Cheng <ycheng@google.com>, Last Call <last-call@ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, draft-ietf-tcpm-rack@ietf.org, Michael Tuexen <tuexen@fh-muenster.de>, draft-ietf-tcpm-rack.all@ietf.org, tcpm-chairs <tcpm-chairs@ietf.org>
In-Reply-To: <CADVnQy=CvMUsueUysEgg4n7Ba6yWPJa44_GuQZ46CJEQ94sjpw@mail.gmail.com>
Message-ID: <alpine.DEB.2.21.2012182359160.27827@hp8x-60.cs.helsinki.fi>
References: <160557473030.20071.3820294165818082636@ietfa.amsl.com> <alpine.DEB.2.21.2012030145440.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com> <alpine.DEB.2.21.2012071227390.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=fNd3ToWEoCYHwgPG7QUvCXw3kV2rwH=hqmhibQmQNseg@mail.gmail.com> <alpine.DEB.2.21.2012081502530.5180@hp8x-60.cs.helsinki.fi> <CADVnQykrm1ORm7N+8L0iEyqtJ2rQ1dr1xg+EmYcWQE9nmDX_mA@mail.gmail.com> <alpine.DEB.2.21.2012141505360.5844@hp8x-60.cs.helsinki.fi> <CAM4esxT9hNqX4Zo+9tMRu9MNEfwuUwebaBFcitj1pCZx_NkqHA@mail.gmail.com> <alpine.DEB.2.21.2012160256380.5844@hp8x-60.cs.helsinki.fi> <CADVnQy=CvMUsueUysEgg4n7Ba6yWPJa44_GuQZ46CJEQ94sjpw@mail.gmail.com>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-6294-1608338492-0001-2"
Content-ID: <alpine.DEB.2.21.2012190241230.27827@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/sVrv4sqly31ZN9vFRWSH7oPoDCo>
Subject: Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmfor TCP) to Proposed Standard
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Dec 2020 00:41:40 -0000

Hi Neal,

On Fri, 18 Dec 2020, Neal Cardwell wrote:

> 
> 
> On Wed, Dec 16, 2020 at 2:39 PM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>       > For (2), the RTO timer is still operative so
>       > the RTO recovery rules would still follow.
>
>       In short:
>       When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS,
>       and slow start is entered.
>       When with a RACK_TLP implementation timer (PTO) expires,
>       normal fast recovery is entered (unless implementing
>       also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1.
>
>       This means that this document explicitly modifies standard TCP congestion
>       control when there are no acks coming and the retransmission timer
>       expires
>
>       from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
> 
> It's also worth mentioning this aspect of [RFC6298]:

Sure.

>    (2.4) Whenever RTO is computed, if it is less than 1 second, then the
>          RTO SHOULD be rounded up to 1 second.
>  
>              1. RTO timer expires
>              2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
>              3. Ack of rexmit sent in step 2 arrives
>              4. cwnd = cwnd+1 MSS; send two segments
>              ...
>
>       to:   PTO=min(2*SRTT,RTO) (PRO used for arming the timer)
>              1. PTO times expires
>              2. (cwnd=1 MSS); (re)xmit one segment
> 
> 
> It may be worthwhile to point out here that the RACK-TLP draft does not specify setting cwnd
> to 1 at this point, and the Linux TCP implementation from our team does not do this. The

Yes, that's why I put it in parenthesis. In my view the RACK-TLP 
draft implicitly limits cwnd to one segment by allowing just one TLP 
probe segment.

> rationale is that at this point there is no solid evidence that anything has been lost, and
> setting cwnd to 1 at this point would make the algorithm more timid than the preceding
> approaches, for no good reason.

Sure, no need to set cwnd at this point.

A good reason could be: No feedback, Ack clock lost? But, of course, 
it is too early even though after the arrival of ack the sender may well 
modify cwnd again. Like it now does, if it decides it was loss other than 
probe segment.
   
>              3. Ack of (re)xmit sent in srep 2 arrives
>              4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments
> 
> 
> That step (4) assumes a particular congestion control implementation that is different than
> what we would recommend.

Ok. I just used the Standards Track formula as does the RACK-TLP draft in 
its examples. And because RACK-TLP draft states it does not modify 
current congestion control.

>       For example, if FlightSize is 100 segments when timer expires,
>       congestion control is the same in steps 1-3, but in step 4 the
>       current standard congestion control allows transmitting 2 segments,
>       while RACK-TLP would allow blasting 50 segments.
>
>       Question is: what is the justification to modify standard TCP
>       congestion control to use fast recovery instead of slow start for a
>       case where timeout is needed to detect loss because there is no
>       feedback and ack clock is lost? The draft does not give any
>       justification. This clearly is in conflict with items (0) and (1)
>       in BCP 133 (RFC 5033).
> 
> 
> The draft pointedly does not modify standard TCP congestion control.
> 
> RACK-TLP does not specify using fast recovery instead of slow start for a  case where timeout
> is needed to detect loss because there is no  feedback and the ACK clock is lost. Rather,
> RACK-TLP only triggers fast recovery if there *is* ACK feedback providing an ACK clock and
> strong evidence of a packet loss.

So here our views diverge. In the above steps I decoupled congestion 
control from what segments are sent (rexmit and xmit are mentioned there 
just as comments to check what is going on, they can be freely removed).
Congestion control governs how many segments can be sent.

In my view, when there is no feedback RACK TLP uses timeout (PTO) to help 
make progress. Without the timeout it cannot make progress. Just like 
an RFC 5681 sender, it cannot make progress until timeout expires. 
So this should be taken as the criteria to (effectively) enter slow start, 
once loss becomes detected.

Or, at least I don't see any difference why different timeout value would 
change the congestion control.

When timeout expires RACK-TLP sends one segment (just like an RFC 5681 
sender when RTO expires). The only difference is that RFC 5681 sender 
selects a different segment (first unacknowledged segment) to retransmit 
"blindly" in order to get feedback and start ACK clock. RACK-TLP sends 
"blindly" the last segment from the retransmission queue (or a new 
segment). Selecting a different segment for transmission upon timeout 
does not change anything, in my view. In both cases it is a "blind" 
selection; the sender does not know what was lost. And in both cases the 
ACK for this one segment provides feedback about what potentially has 
been lost. There the only difference is that the segment that RACK-TLP 
selected to transmit is a better choice when SACK option is use because 
it provides more information.

If there is some difference in that the ACK for RACK-TLP provides 
stronger evidence for packet loss (and what was lost), then it should be 
also ok to modify the current standard TCP congestion control such that 
upon RTO timeout the sender does not select the first unacknowledged 
segment for blind retransmission but the last segment in the 
retransmission queue (or maybe a new segment). With SACK this would 
provide exactly the same information as TLP probe does. And, upon arrival 
of the first ACK, RTO recovery would use similar rules as in RACK-TLP to 
better decide whether it was spurious RTO or loss and move from slow 
start to fast recovery and set cwnd=ssthresh.

I really don't see how this change in "blindly" retrasmitted first segment 
in slow start would allow modifying congestion control for RTO recovery.

> The main aspect of triggering loss recovery that is new is the approach of allowing a sender
> to transmit one additional "probe" segment in flight after 2*SRTT. Once this is accepted, the
> rest of the recovery process essentially follows from principles already generally accepted
> in the IETF TCP community.

Could you please see above and explain (or provide a pointer to an RFC) 
what are those "principles already generally accepted in the IETF TCP 
community". That would help me to understand your point.

> Put another way, it seems to me that if one is to object to TLP-triggered fast recovery, then
> the objection must be mounted specifically against the permission granted to the sender to
> transmit one additional "probe" segment in flight after 2*SRTT. Once that permission is
> granted, there is nothing really new about TLP-triggered fast recovery.

I am sorry but I still fail to see what is the preceding evidence that 
makes this not new. A pointer could help.

In my view the probe is not anything to object as long as it is not 
considered as a cwnd increase in the later cwnd&ssthresh calculation 
(a minor detail, but someone might later suggest first two then 4 and so 
on probe segments with the justufication that it is just one more than 
earlier).

>       Furthermore, there is no implementation nor experimental experience
>       evaluating this change. The implementation with experimental experience
>       uses PRR (RFC 6937) which is an Experimental specification including a
>       novel "trick" that directs PRR fast recovery to effectively use slow
>       start in this case at hand.
> 
> 
> What do you think of Yuchung's latest suggestion for new text in "9.3.  Interaction with
> congestion control" suggested by Yuchung Thursday afternoon (Dec 17), which explicitly
> recommends PRR? As mentioned earlier in this thread, there is considerable implementation and
> experimental experience with RACK-TLP plus PRR since the Linux TCP stack has been using
> RACK-TLP with PRR as the default loss recovery algorithm since Linux v4.18 in August 2018.

As I have already indicated, in my view PRR does not have the problem we 
are discussing here because PRR-SSRB makes fast recovery to behave like 
slow start. And PRR-CRB is even more conservative. So it would be a safe 
choice for this problem unlike the current RFC 6675 algorithm.

In other words, I only object allowing the use of RACK-TLP with the 
RFC 6675 congestion control algorithm unmodified because it does not have 
a safeguard like PRR. This does not mean that RACK-TLP document would 
need to include the necessary modifications to the RFC 6675 algorithm.

I don't know processwise but PRR possibly cannot be used as normative 
requirement because it is currently Experimental? Not quite sure though.

Best regards,

/Markku

> The exact commit is:
> 
>   b38a51fec1c1 tcp: disable RFC6675 loss detection
> 
> best,
> neal