Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithm for TCP) to Proposed Standard

Neal Cardwell <ncardwell@google.com> Fri, 18 December 2020 14:45 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8D0673A03FA for <tcpm@ietfa.amsl.com>; Fri, 18 Dec 2020 06:45:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.598
X-Spam-Level:
X-Spam-Status: No, score=-17.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tgZm7oB2bS6G for <tcpm@ietfa.amsl.com>; Fri, 18 Dec 2020 06:45:46 -0800 (PST)
Received: from mail-vk1-xa2c.google.com (mail-vk1-xa2c.google.com [IPv6:2607:f8b0:4864:20::a2c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 77A7D3A0762 for <tcpm@ietf.org>; Fri, 18 Dec 2020 06:45:46 -0800 (PST)
Received: by mail-vk1-xa2c.google.com with SMTP id m145so567232vke.7 for <tcpm@ietf.org>; Fri, 18 Dec 2020 06:45:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gTAcViMUIemCH47qDOdAigkN/fDD3Cisfv8QC7BFFiE=; b=WpwTnag5rJYi0wl5EN3pedJjKNu0liAH++IIYSVlL8LcBR2ccTYGy01ZJxkN5626SH FwXWjkWWfzLkUUByCbnUvYSnxtsjInpy+jgxA/JJl5jArHDoTu3EIDDqbX7m+3AEzZGa 73LaKrBrV2KEka/wAxEDG26wEVeufRxmi/M6aH3gHOvs+iq9lz8L2Ysla7+mhOXpmfx7 w6sJe0/kP//Ji+ejyNl14rTP1Z/JK4ORT86StBCHxNI86cLhZUiQjm8ERxWpdKoPQuAk DgEQ5pbJrg0hSPctJuSFiVX9o0t4ainG3XJSWypDhiuS3tBrLEGRfMuTbYK/nobKgutU PlEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gTAcViMUIemCH47qDOdAigkN/fDD3Cisfv8QC7BFFiE=; b=W0JiyQbW8lGhIc6YrKP6GO4EZF+jQWjtABxjz3/2MkAx8pk8sLAmab/Wh1AHiSqrRd jID4gL/zjCr4le5yNiHySIJ/8PzgRoH/L5BeG7gZBv3j8NmcWW5c4UGtAhhOzWkCBcFx VJwqUauDwSMBsR5lZOgjn0O/ocPsImcbCwCt0Ioqh+M+mznD7aDGyRy4wSVwYbBG5cX0 HJyafV3TdT1NBXHSZWBp6c5UCb2JavFv3+ZS40bOH/KblNldl2oTCM00BeKl8iAvNH/z i1u93PXg1bfj2qnMGYG4rWUBlJkCviwPNtdxBhG0ta4ksUhbZ+M6zxaCbeqDFeTD1CMu 6srw==
X-Gm-Message-State: AOAM530Te6r5D4EWCvd0y5816oUK0osZ5U+H1AhBUCXnbm6R8DwH0739 8kiEh7t9KIKIuzqT4qpgT/o7ySoqQGH2Wbso+HvkI3PNT0DFwHi/
X-Google-Smtp-Source: ABdhPJwsUrYcdT8DqfLULsdB2NR/TQZLVgNlpXBm86qYO7BytrrZk6ZvEkckba1L1r/3aK7HO+A7dVQ5m9oqLu2Jfbg=
X-Received: by 2002:a1f:3fc9:: with SMTP id m192mr4415105vka.17.1608302745213; Fri, 18 Dec 2020 06:45:45 -0800 (PST)
MIME-Version: 1.0
References: <160557473030.20071.3820294165818082636@ietfa.amsl.com> <alpine.DEB.2.21.2012030145440.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=diHBZJC5Ei=wKt=j=om1aDcFU8==kSYEtp=KZ4g__+Xg@mail.gmail.com> <alpine.DEB.2.21.2012071227390.5180@hp8x-60.cs.helsinki.fi> <CAK6E8=fNd3ToWEoCYHwgPG7QUvCXw3kV2rwH=hqmhibQmQNseg@mail.gmail.com> <alpine.DEB.2.21.2012081502530.5180@hp8x-60.cs.helsinki.fi> <CADVnQykrm1ORm7N+8L0iEyqtJ2rQ1dr1xg+EmYcWQE9nmDX_mA@mail.gmail.com> <alpine.DEB.2.21.2012141505360.5844@hp8x-60.cs.helsinki.fi> <CAM4esxT9hNqX4Zo+9tMRu9MNEfwuUwebaBFcitj1pCZx_NkqHA@mail.gmail.com> <alpine.DEB.2.21.2012160256380.5844@hp8x-60.cs.helsinki.fi>
In-Reply-To: <alpine.DEB.2.21.2012160256380.5844@hp8x-60.cs.helsinki.fi>
From: Neal Cardwell <ncardwell@google.com>
Date: Fri, 18 Dec 2020 09:45:28 -0500
Message-ID: <CADVnQy=CvMUsueUysEgg4n7Ba6yWPJa44_GuQZ46CJEQ94sjpw@mail.gmail.com>
To: Markku Kojo <kojo@cs.helsinki.fi>
Cc: Martin Duke <martin.h.duke@gmail.com>, Yuchung Cheng <ycheng@google.com>, Last Call <last-call@ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, draft-ietf-tcpm-rack@ietf.org, Michael Tuexen <tuexen@fh-muenster.de>, draft-ietf-tcpm-rack.all@ietf.org, tcpm-chairs <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000d25aa405b6be2937"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/nrP4uBIV6YgyRHV1QqH6jh6ggvE>
Subject: Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithm for TCP) to Proposed Standard
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Dec 2020 14:45:50 -0000

On Wed, Dec 16, 2020 at 2:39 PM Markku Kojo <kojo@cs.helsinki.fi> wrote:

> > For (2), the RTO timer is still operative so
> > the RTO recovery rules would still follow.
>
> In short:
> When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS,
> and slow start is entered.
> When with a RACK_TLP implementation timer (PTO) expires,
> normal fast recovery is entered (unless implementing
> also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1.
>
> This means that this document explicitly modifies standard TCP congestion
> control when there are no acks coming and the retransmission timer
> expires
>
> from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
>

It's also worth mentioning this aspect of [RFC6298]:

   (2.4) Whenever RTO is computed, if it is less than 1 second, then the
         RTO SHOULD be rounded up to 1 second.


>        1. RTO timer expires
>        2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
>        3. Ack of rexmit sent in step 2 arrives
>        4. cwnd = cwnd+1 MSS; send two segments
>        ...
>
> to:   PTO=min(2*SRTT,RTO) (PRO used for arming the timer)
>        1. PTO times expires
>        2. (cwnd=1 MSS); (re)xmit one segment
>

It may be worthwhile to point out here that the RACK-TLP draft does not
specify setting cwnd to 1 at this point, and the Linux TCP implementation
from our team does not do this. The rationale is that at this point there
is no solid evidence that anything has been lost, and setting cwnd to 1 at
this point would make the algorithm more timid than the preceding
approaches, for no good reason.


>        3. Ack of (re)xmit sent in srep 2 arrives
>        4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments
>

That step (4) assumes a particular congestion control implementation that
is different than what we would recommend.


For example, if FlightSize is 100 segments when timer expires,
> congestion control is the same in steps 1-3, but in step 4 the
> current standard congestion control allows transmitting 2 segments,
> while RACK-TLP would allow blasting 50 segments.
>
> Question is: what is the justification to modify standard TCP
> congestion control to use fast recovery instead of slow start for a
> case where timeout is needed to detect loss because there is no
> feedback and ack clock is lost? The draft does not give any
> justification. This clearly is in conflict with items (0) and (1)
> in BCP 133 (RFC 5033).
>

The draft pointedly does not modify standard TCP congestion control.

RACK-TLP does not specify using fast recovery instead of slow start for a
case where timeout is needed to detect loss because there is no  feedback
and the ACK clock is lost. Rather, RACK-TLP only triggers fast recovery if
there *is* ACK feedback providing an ACK clock and strong evidence of a
packet loss.

The main aspect of triggering loss recovery that is new is the approach of
allowing a sender to transmit one additional "probe" segment in flight
after 2*SRTT. Once this is accepted, the rest of the recovery process
essentially follows from principles already generally accepted in the IETF
TCP community.

Put another way, it seems to me that if one is to object to TLP-triggered
fast recovery, then the objection must be mounted specifically against the
permission granted to the sender to transmit one additional "probe" segment
in flight after 2*SRTT. Once that permission is granted, there is nothing
really new about TLP-triggered fast recovery.

Furthermore, there is no implementation nor experimental experience
> evaluating this change. The implementation with experimental experience
> uses PRR (RFC 6937) which is an Experimental specification including a
> novel "trick" that directs PRR fast recovery to effectively use slow
> start in this case at hand.
>

What do you think of Yuchung's latest suggestion for new text in "9.3.
Interaction with congestion control" suggested by Yuchung Thursday
afternoon (Dec 17), which explicitly recommends PRR? As mentioned earlier
in this thread, there is considerable implementation and experimental
experience with RACK-TLP plus PRR since the Linux TCP stack has been using
RACK-TLP with PRR as the default loss recovery algorithm since Linux v4.18
in August 2018. The exact commit is:

  b38a51fec1c1 tcp: disable RFC6675 loss detection

best,
neal