Re: [tcpm] TLP questions

Yuchung Cheng <ycheng@google.com> Sat, 19 May 2018 16:28 UTC

MIME-Version: 1.0
In-Reply-To: <CY4PR21MB0630420C48B02C6B12F853FDB6900@CY4PR21MB0630.namprd21.prod.outlook.com>
References: <CY4PR21MB063011EB9ABCD23BABC2EDC0B6990@CY4PR21MB0630.namprd21.prod.outlook.com> <CY4PR21MB0630AF5B03B8C260AD72E366B6990@CY4PR21MB0630.namprd21.prod.outlook.com> <CADVnQyk04js7VaFdUKFYg6h8yE2ZzoDMG_EPeS_hKYb_tnesww@mail.gmail.com> <CY4PR21MB06301845A5898A725A9E5FD5B6980@CY4PR21MB0630.namprd21.prod.outlook.com> <CADVnQy=qmEtzU8nekB1ibcyvcLJPScT4_cv9HaY8Z9Dh+vvXJw@mail.gmail.com> <CY4PR21MB0630420C48B02C6B12F853FDB6900@CY4PR21MB0630.namprd21.prod.outlook.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Sat, 19 May 2018 09:27:37 -0700
Message-ID: <CAK6E8=c8zEAB6C-b3AyXYDoKP4gd9=Wh5HRPJeixZbqqXHr4sA@mail.gmail.com>
To: Praveen Balasubramanian <pravb@microsoft.com>
Cc: Neal Cardwell <ncardwell@google.com>, "tcpm@ietf.org" <tcpm@ietf.org>, Nandita Dukkipati <nanditad@google.com>, Priyaranjan Jha <priyarjha@google.com>, Matt Olson <maolson@microsoft.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/HGBkMfozffpmlMXsfyPqEdw44BA>
Subject: Re: [tcpm] TLP questions
Precedence: list

On Fri, May 18, 2018 at 4:32 PM, Praveen Balasubramanian
<pravb@microsoft.com> wrote:
>> Are TLP and RACK skipped in the <10ms case due to the granularity of the send and ACK timestamps, or the granularity of the timers, or both?
>> I could imagine that there could be some nice latency improvements from using TLP and RACK at those low RTTs, as long as any timers that were scheduled were rounded up appropriately (e.g. by 10ms-20ms).
>> But I haven't fully thought through all the implications of this kind of scenario.
> Both but primarily due to timer granularity. I'd be interested to know if RACK/TLP in Linux is working well on mobile devices where I assume the timer isn’t as fine grained.
If the loss recovery is based on "time", and the time measurement is
bad, then it simply won't work well.

RACK is designed for modern stack that equipped w/ good clocks and
per-pkt timestamps. We can probably make that more clear in the draft.

>
>> It is probably worth noting here that in the Linux TCP stack the sender would not send a TLP when in the middle of an RTO-triggered recovery.
> Windows stack exits recovery upon an RTO but keep track of the highest sent sequence number (SND.NXT value before the rewind). So it can do what Linux is doing post RTO. However I am not convinced if there is a safety issue with allowing TLP post an RTO. I can think of four cases that triggered the RTO: forward path loss, reverse path (ACK) loss, and a large RTT increase beyond RTO. It seems like if there is further loss for new data in slow start post RTO, then TLP may help in all cases except the RTT increase case. I get the part about implementation convenience to disallow TLP during all recovery, but do you see a safety issue?

I don't see a safety issue. It's certain worth considering to include
post-timeout phase, especially if TLP is new data to avoid packet
ambiguity issue. hmm maybe we can implement that in Linux...

>
> -----Original Message-----
> From: Neal Cardwell [mailto:ncardwell@google.com]
> Sent: Monday, May 14, 2018 8:00 PM
> To: Praveen Balasubramanian <pravb@microsoft.com>
> Cc: Yuchung Cheng <ycheng@google.com>; tcpm@ietf.org; Nandita Dukkipati <nanditad@google.com>; Priyaranjan Jha <priyarjha@google.com>
> Subject: Re: TLP questions
>
> On Thu, May 10, 2018 at 7:29 PM Praveen Balasubramanian <pravb@microsoft.com>
> wrote:
>
>> Thanks Neal for the detailed response along with the historical context.
> Looking forward to draft 04 updates.
>
>
>
>> A few more questions and comments.
>
> Thanks again for another round of excellent questions and comments. :-)
>
>> > then typically RACK will install a timer based on the reordering
>> > window, and when that timer fires it will mark some packets lost and
>> > enter fast recovery
>
>> The "reordering settling" timer is defined as optional in the draft.
>> The Windows implementation currently does not use this timer.
>> Until we add such a timer, we plan to not prevent TLP even if SACK
>> scoreboard is not empty. However if recovery is triggered, we'll
>> cancel the PTO and arm an RTO. Do you see any issues with this
>> approach? Since the draft makes the "reordering settling"
>> timer optional, I think it should suggest this alternative approach.
>
> That sounds to me like it would work OK. But IMHO it sounds like it's missing out on the opportunity to use a reordering timer to speed up fast recovery quite a bit by initiating a RACK-based recovery using the reordering timer.
>
> That scenario sounds like:
>
> + send the original flight of data
> + 1*srtt passes
> + receive 1 or 2 SACKs for packets in that flight, schedule TLP wait
> + 2*srtt TLP timer fires, send TLP 1*srtt passes receive SACK of TLP,
> + initiate RACK-based fast recovery
>
> If there's a reordering timer, this could be:
>
> + send the original flight of data
> + 1*srtt passes
> + receive 1 or 2 SACKs for packets in that flight, schedule TLP wait
> + 0.25*min_rtt reordering timer fires,  initiate RACK-based fast
> + recovery
>
> With the reordering timer, AFAICT it takes about 2.75*srtt less time before the recovery begins, if I correctly noted all the details.
>
>> > So the 2ms is to allow for real-world jitter in the network and end
> hosts
>
>> Currently the Windows implementation doesn't use TLP and RACK for
>> connections with < 10 msec RTT. So until we change this logic, we will
>> skip adding the 2 ms jitter protection.
>
> Skipping the 2ms jitter protection sounds OK in that case.
>
> Are TLP and RACK skipped in the <10ms case due to the granularity of the send and ACK timestamps, or the granularity of the timers, or both?
> I could imagine that there could be some nice latency improvements from using TLP and RACK at those low RTTs, as long as any timers that were scheduled were rounded up appropriately (e.g. by 10ms-20ms).
> But I haven't fully thought through all the implications of this kind of scenario.
>
>> > By "a previously unsent segment" we mean basically "the next segment
>> > (of MSS or fewer bytes) that the sender would normally send if it
>> > had available cwnd at this time
>
>> This is better but still not crisp enough to determine what exactly
>> the Linux implementation does. Since you said that Linux does not roll
>> back SND.NXT upon RTO, does this imply that "a previously unsent
>> segment", is the one starting at SND.NXT?
>
> Yes, for the Linux TLP sender, "a previously unsent segment" is the one starting at SND.NXT.
>
> BTW, F-RTO RFC - https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc5682&data=02%7C01%7Cpravb%40microsoft.com%7C88aec2962f4c4782e54208d5ba0ff9ca%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636619500137896021&sdata=mfd2Dun3d2dcamWx5xmLCsuZzEWiDNlqTJgXrc7LQQ8%3D&reserved=0 - in section 2.1 2) b), in describing its similar "forward" probe packets, uses the phrase "transmit up to two new (previously unsent) segments".
> So for consistency with the F-RTO text I have tentatively changed that line in our internal draft of -04 from:
>   "Transmit that new segment"
> to:
>   "Transmit one new (previously unsent) segment"
>
> It is probably worth noting here that in the Linux TCP stack the sender would not send a TLP when in the middle of an RTO-triggered recovery.
> This corresponds to one of the conditions in the draft, in section "5.4.1. Phase 1: Scheduling a loss probe": the condition that says:
>     " 2.  The connection is not in loss recovery"
>
> So perhaps because of this the different SND.NXT "rewind" behavior of different TCP stacks upon RTO may not matter ultimately? Or perhaps there are more implications to unpack. :-)
>
>> > By "the last segment" we mean "the highest-sequence segment (of MSS
>> > or fewer bytes) that has already been transmitted and not ACKed or
>> > SACKed
>
>> Again given that SND.NXT is not rolled back this would presumably the
>> first UN(S)ACKed MSS or fewer bytes walking back from SND.NXT (I am
>> not suggesting an actual walk back, just for illustration).
>
> Yes, exactly.
>
>> I am a bit confused that you used the "or SACKed" since you previously
>> required that PTO not be armed if any SACK blocks were present. If
>> that is followed, then "the last segment" would pretty much always be
>> MSS or fewer bytes just before (non-rolled back) SND.NXT.
>
> Yes, you are exactly right. I should have left the "or SACKed" part out of that line in my e-mail. :-)
>
> By the way, regarding retransmitting the "last segment", this discussion made me realize that AFAICT the draft did not yet discuss why the TLP retransmission is the last segment. I have added some proposed text to our internal draft of the -04 rev to try to describe the rationale:
>
> "When the loss probe is a retransmission, the sender uses the highest-sequence segment sent so far. This is in order to deal with the retransmission ambiguity problem in TCP. Suppose a sender sends N segments, and then retransmits the last segment (segment N) as a loss probe, and then the sender receives a SACK for segment N. As long as the sender waits for any required RACK reordering settling timer to then expire, it doesn't matter if that SACK was for the original transmission of segment N or the TLP retransmission; in either case the arrival of the SACK for segment N provides evidence that the segments preceding segment N were likely lost."
>
> Of course welcome comments/suggestions about this proposed paragraph as well.
>
> Thanks!
>
> neal

Re: [tcpm] TLP questions Praveen Balasubramanian
Re: [tcpm] TLP questions Neal Cardwell
Re: [tcpm] TLP questions Yoshifumi Nishida
Re: [tcpm] TLP questions Praveen Balasubramanian
Re: [tcpm] TLP questions Neal Cardwell
Re: [tcpm] TLP questions Neal Cardwell
Re: [tcpm] TLP questions Praveen Balasubramanian
Re: [tcpm] TLP questions Yuchung Cheng
Re: [tcpm] TLP questions Yoshifumi Nishida