Re: [tsvwg] question regarding L4S claims

Pete Heist <pete@heistp.net> Fri, 05 June 2020 11:33 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.5\))
From: Pete Heist <pete@heistp.net>
In-Reply-To: <E2A2330D-DAC3-4189-A5EE-D33F7CE49B90@gmail.com>
Date: Fri, 05 Jun 2020 13:33:14 +0200
Cc: Sebastian Moeller <moeller0@gmx.de>, TSVWG <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <995BC572-4A36-4D1B-9CF8-510DB0108C86@heistp.net>
References: <E3F3E880-0A67-4C6A-8F23-DD7E129E96CD@gmx.de> <E2A2330D-DAC3-4189-A5EE-D33F7CE49B90@gmail.com>
To: Jonathan Morton <chromatix99@gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/grZswGrHjdCLd1sngYkViJAUn6I>
Subject: Re: [tsvwg] question regarding L4S claims
Precedence: list

> On Jun 4, 2020, at 1:32 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>> On 4 Jun, 2020, at 12:27 pm, Sebastian Moeller <moeller0@gmx.de> wrote:
>> 
>> 3) 99.9th %ile queuing delay: the assumption here seems to be that only the single AQM hop actually controls all variable RTT jitter. And that 99.9 is a relevant number here, which obviously is debatable.
> 
> I've been wanting to bring up something related to this point for a while, but other matters have previously taken precedence.  My question is: when and under what circumstances does the observed 99.9th-percentile delay occur?

I’m interested in knowing the answer to this as well.

> In the case of fq_codel, tested with a conventional TCP stream, it will obviously be at the termination of slow-start, when the exponential growth (doubling cwnd per RTT) cannot be stopped instantaneously, but continues for one more RTT after the first congestion signal is given (which for CUBIC may be a delay signal interpreted by HyStart).  So the peak queue depth at slow-start termination will tend to be at least the baseline path RTT, and will then quickly settle down to congestion-avoidance mode in which the peak delay is only slightly greater than the Codel target (ie. 5ms).  And I remind readers that fq_codel protects other flows from experiencing the peak delay of one flow's startup.
> 
> The situation with L4S is less obvious, noting that the data presented is very old, and comes from DCTCP rather than any recent version of TCP Prague.  I consider it likely that DCTCP's slow-start is terminated prematurely by L4S' very aggressive AQM interacting with the natural burstiness of an unpaced slow-start, so the early phase is characterised primarily by linear rather than exponential growth.  We might therefore reasonably guess (in the absence of a corresponding time-series plot which could confirm or deny this hypothesis) that the 99.9th-percentile delay quoted for DCTCP corresponds to the congestion-avoidance phase rather than slow-start.

According to the presentation, there are 300 "web-like flows” per second with ECN, and 300 w/o ECN, using an exponential arrival process, with file sizes having a Pareto distribution from 9.1K to 1M, on a 120Mbit link with 10ms path RTT. I also expect at those high arrival rates that flows drop out of slow start early, but this would need to be reproduced to verify it. FCT would be interesting to see here for the web-like flows, as well as how performance changes with a few different realistic levels of burstiness and jitter.

> If true, the quoted 4.5ms figure would represent only a couple of milliseconds' improvement over the congestion-avoidance latency performance of fq_codel with CUBIC, and an actual deficit of perhaps a millisecond compared to CUBIC-SCE performance with an SCE-enabled AQM, both at default settings.  And of course it is entirely possible to delete the slow-start phase entirely and rely only on the linear or polynomial growth factors, as long as you're willing to accept the greater delay before the path capacity is actually found.  If minimising peak latency is paramount, that might just be a sacrifice you have to accept.

The concern I have is that minimizing peak latency of capacity seeking flows in CA isn’t paramount, but that is what has been used as a KPI in some discussions. A difference of 2-3ms in CA will often be lost in comparison to delays from other sources than queueing. More important for capacity seeking flows is that they achieve high utilization with reasonable delays. Since we’ve seen that burstiness can dramatically affect the utilization of HFCC flows depending on the marking scheme used, and since high capacity is also identified as important in draft-han-iccrg-arvr-transport-problem-01, burst tolerance and utilization need to be balanced for this application.

Delays for both isochronous and VBR UDP flows are also important to consider, because those generally will be latency sensitive, e.g. for voip, videoconferencing and interactive applications. Afaik those haven’t been tested, so we need to do that, and also look at what impacts they’ll have on HFCC flows in a shared low latency queue, and again, how the marking strategy affects utilization in that case.

Hopefully some of Sebastian’s concerns can be addressed with more realistic traffic patterns and path characteristics, which admittedly is work needed in our testing also. Reproducing a test similar to what is in that presentation but with some key variations mentioned above should be worthwhile.

> But it's pretty clear that remote-rendering VR headset displays is not practical over a network.  It's barely practical even within a single, up-to-date gaming-spec PC, without involving a network link at all.  I can entirely believe a 20ms latency budget for that application, as it's consistent with a 90fps minimum frame rate for VR headsets that I've seen quoted elsewhere.  That equates to 11ms between frames, and the information needed to start rendering that frame has to be prepared in advance.  Just getting data over a USB cable reliably within 9ms is challenging enough!
> 
> - Jonathan Morton
>

[tsvwg] question regarding L4S claims Sebastian Moeller
Re: [tsvwg] question regarding L4S claims Sebastian Moeller
Re: [tsvwg] question regarding L4S claims Jonathan Morton
Re: [tsvwg] question regarding L4S claims Pete Heist