Re: [tcpm] TLP questions

Neal Cardwell <ncardwell@google.com> Tue, 15 May 2018 03:00 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 56D73127863 for <tcpm@ietfa.amsl.com>; Mon, 14 May 2018 20:00:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -18.21
X-Spam-Level:
X-Spam-Status: No, score=-18.21 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wtIzIw_dwvEr for <tcpm@ietfa.amsl.com>; Mon, 14 May 2018 20:00:11 -0700 (PDT)
Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 90FED127909 for <tcpm@ietf.org>; Mon, 14 May 2018 20:00:11 -0700 (PDT)
Received: by mail-wm0-x22a.google.com with SMTP id a137-v6so14027941wme.1 for <tcpm@ietf.org>; Mon, 14 May 2018 20:00:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MS84zMgBOxA4NeFhctStT3WYd4RCTlq8W2CPUHvlsc0=; b=oc6C3NUTYO3pSO/NWzVDeFo199KmoLc5CDrADLtqNBy233NFcq0ggAEHWJdm3x1n0x 1dctD36tPxUW1M/32swauM2T8XNkcKM6ZU1otBZf5T3IV1fszdZGMS3c3wQVZzGF3mwd TJuZ1sTJFxrCIiIrXX4bqw3quL6FOBd4QftIQ7la8w7MwEsDDTX5L6ac5/+DdraPXjAL kO1QCT8aGfOmwKCilEgAHYRb+2TxxMMpDZrsBUQzmVanwGL4kdvy1APQcfcnnj7qTdoQ EOA2h9orbKMmuC3dEqwk4bbyI9wFiF1TI+L2ymEdudLixZL8OSh2WiqNQV6v9ZKbnh5K p6UQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MS84zMgBOxA4NeFhctStT3WYd4RCTlq8W2CPUHvlsc0=; b=kxfOeSwm7/wU9xE6MLvMOxQP6EWTBX5xaxCW0lz+YoBgPAzKIX8zasgDktpYmDkYA8 4Vu5WjCn6oCpdrSicPk4/VlVvjepZtOBwTlKi+XUJ4TywIETPPyd1zCO2nSty/QiMYab ADMObMGWIuhFWHHgEMSkz0xIYU1aeVODCF2i8WeX2N1u58axaZr62tSLH+j7tsYxrhb4 UGcg8BUIyf8BT6WdGrZEJPFIxiA5wIZth0eivpX6dI7EytY93Nt7SYiRipV6SrjVqZht IkfdQcOi0u47vP29pJD8U2X7rjK9J39pGmHPiD1tOtwTOxxW/GCNDYtO5f8JLl2z5zb1 9HTg==
X-Gm-Message-State: ALKqPwedxwjhMfDnL5tBvcwLOGRZ1KxHqyMi01TMgpmEm/JzHK1RXzF/ jEJ3z7tX25ZPVTD6P9ulm639g19xramI0Xjx1yWOuw==
X-Google-Smtp-Source: AB8JxZp4xRoHFyw68Hqm+EzjmVMera+FfhX578pHQwekwfBwdvshiGT/YErRLIDgS9h92LDNXO7uDDHSa5ijGoqO+0M=
X-Received: by 2002:a1c:a750:: with SMTP id q77-v6mr6378698wme.111.1526353209659; Mon, 14 May 2018 20:00:09 -0700 (PDT)
MIME-Version: 1.0
References: <CY4PR21MB063011EB9ABCD23BABC2EDC0B6990@CY4PR21MB0630.namprd21.prod.outlook.com> <CY4PR21MB0630AF5B03B8C260AD72E366B6990@CY4PR21MB0630.namprd21.prod.outlook.com> <CADVnQyk04js7VaFdUKFYg6h8yE2ZzoDMG_EPeS_hKYb_tnesww@mail.gmail.com> <CY4PR21MB06301845A5898A725A9E5FD5B6980@CY4PR21MB0630.namprd21.prod.outlook.com>
In-Reply-To: <CY4PR21MB06301845A5898A725A9E5FD5B6980@CY4PR21MB0630.namprd21.prod.outlook.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Mon, 14 May 2018 22:59:51 -0400
Message-ID: <CADVnQy=qmEtzU8nekB1ibcyvcLJPScT4_cv9HaY8Z9Dh+vvXJw@mail.gmail.com>
To: Praveen Balasubramanian <pravb@microsoft.com>
Cc: Yuchung Cheng <ycheng@google.com>, "tcpm@ietf.org" <tcpm@ietf.org>, Nandita Dukkipati <nanditad@google.com>, Priyaranjan Jha <priyarjha@google.com>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/xWwAQo_OeHSsnQkdeOVDv_EgTsA>
Subject: Re: [tcpm] TLP questions
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 May 2018 03:00:14 -0000

On Thu, May 10, 2018 at 7:29 PM Praveen Balasubramanian
<pravb@microsoft.com>
wrote:

> Thanks Neal for the detailed response along with the historical context.
Looking forward to draft 04 updates.



> A few more questions and comments.

Thanks again for another round of excellent questions and comments. :-)

> > then typically RACK will install a timer based on the reordering
> > window, and when that timer fires it will mark some packets lost
> > and enter fast recovery

> The "reordering settling" timer is defined as optional in the draft.
> The Windows implementation currently does not use this timer.
> Until we add such a timer, we plan to not prevent TLP even if SACK
> scoreboard is not empty. However if recovery is triggered, we'll
> cancel the PTO and arm an RTO. Do you see any issues with this
> approach? Since the draft makes the "reordering settling"
> timer optional, I think it should suggest this alternative approach.

That sounds to me like it would work OK. But IMHO it sounds like
it's missing out on the opportunity to use a reordering timer to
speed up fast recovery quite a bit by initiating a RACK-based
recovery using the reordering timer.

That scenario sounds like:

+ send the original flight of data
+ 1*srtt passes
+ receive 1 or 2 SACKs for packets in that flight, schedule TLP
+ wait 2*srtt
+ TLP timer fires, send TLP
+ 1*srtt passes
+ receive SACK of TLP, initiate RACK-based fast recovery

If there's a reordering timer, this could be:

+ send the original flight of data
+ 1*srtt passes
+ receive 1 or 2 SACKs for packets in that flight, schedule TLP
+ wait 0.25*min_rtt
+ reordering timer fires,  initiate RACK-based fast recovery

With the reordering timer, AFAICT it takes about 2.75*srtt less time before
the recovery begins, if I correctly noted all the details.

> > So the 2ms is to allow for real-world jitter in the network and end
hosts

> Currently the Windows implementation doesn't use TLP and RACK for
> connections with < 10 msec RTT. So until we change this logic,
> we will skip adding the 2 ms jitter protection.

Skipping the 2ms jitter protection sounds OK in that case.

Are TLP and RACK skipped in the <10ms case due to the granularity of
the send and ACK timestamps, or the granularity of the timers, or both?
I could imagine that there could be some nice latency improvements from
using TLP and RACK at those low RTTs, as long as any timers that
were scheduled were rounded up appropriately (e.g. by 10ms-20ms).
But I haven't fully thought through all the implications of this kind
of scenario.

> > By "a previously unsent segment" we mean basically
> > "the next segment (of MSS or fewer bytes) that the sender
> > would normally send if it had available cwnd at this time

> This is better but still not crisp enough to determine what
> exactly the Linux implementation does. Since you said that
> Linux does not roll back SND.NXT upon RTO, does this imply
> that "a previously unsent segment", is the one starting at SND.NXT?

Yes, for the Linux TLP sender, "a previously unsent segment" is the one
starting at SND.NXT.

BTW, F-RTO RFC - https://tools.ietf.org/html/rfc5682 - in
section 2.1 2) b), in describing its similar "forward" probe packets,
uses the phrase "transmit up to two new (previously unsent) segments".
So for consistency with the F-RTO text I have tentatively
changed that line in our internal draft of -04 from:
  "Transmit that new segment"
to:
  "Transmit one new (previously unsent) segment"

It is probably worth noting here that in the Linux TCP stack the sender
would not send a TLP when in the middle of an RTO-triggered recovery.
This corresponds to one of the conditions in the draft, in section
"5.4.1. Phase 1: Scheduling a loss probe": the condition that says:
    " 2.  The connection is not in loss recovery"

So perhaps because of this the different SND.NXT "rewind" behavior of
different TCP stacks upon RTO may not matter ultimately? Or perhaps
there are more implications to unpack. :-)

> > By "the last segment" we mean "the highest-sequence segment
> > (of MSS or fewer bytes) that has already been transmitted
> > and not ACKed or SACKed

> Again given that SND.NXT is not rolled back this would presumably
> the first UN(S)ACKed MSS or fewer bytes walking back from SND.NXT
> (I am not suggesting an actual walk back, just for illustration).

Yes, exactly.

> I am a bit confused that you used the "or SACKed" since you
> previously required that PTO not be armed if any SACK blocks
> were present. If that is followed, then "the last segment" would
> pretty much always be MSS or fewer bytes just before
> (non-rolled back) SND.NXT.

Yes, you are exactly right. I should have left the "or SACKed" part out of
that line in my e-mail. :-)

By the way, regarding retransmitting the "last segment", this discussion
made me realize that AFAICT the draft did not yet discuss why the TLP
retransmission is the last segment. I have added some proposed text to our
internal draft of the -04 rev to try to describe the rationale:

"When the loss probe is a retransmission, the sender uses the
highest-sequence segment sent so far. This is in order to deal with the
retransmission ambiguity problem in TCP. Suppose a sender sends N segments,
and then retransmits the last segment (segment N) as a loss probe, and then
the sender receives a SACK for segment N. As long as the sender waits for
any required RACK reordering settling timer to then expire, it doesn't
matter if that SACK was for the original transmission of segment N or the
TLP retransmission; in either case the arrival of the SACK for segment N
provides evidence that the segments preceding segment N were likely lost."

Of course welcome comments/suggestions about this proposed paragraph as
well.

Thanks!

neal