Re: [tsvwg] What does the low queueing delay that L4S offers actually mean for the latency experienced by an application?

Sebastian Moeller <moeller0@gmx.de> Thu, 28 March 2019 12:01 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <AM0PR07MB48194E5195C822DCD842C84DE0590@AM0PR07MB4819.eurprd07.prod.outlook.com>
Date: Thu, 28 Mar 2019 13:01:40 +0100
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <E96421AF-79E3-45F3-B05B-08F2C0C73607@gmx.de>
References: <390EEF00-4CFF-4F22-8492-7D6D178756D5@gmx.de> <AM0PR07MB48199CD0EE0C27B9A6AC792FE0580@AM0PR07MB4819.eurprd07.prod.outlook.com> <91DCADC2-BB35-4CBA-9425-136D22A81538@gmx.de> <AM0PR07MB481927C4B6B3A52C62B60F74E0580@AM0PR07MB4819.eurprd07.prod.outlook.com> <38F6C2D8-38FB-417B-AF76-083388584F2E@gmx.de> <AM0PR07MB48194E5195C822DCD842C84DE0590@AM0PR07MB4819.eurprd07.prod.outlook.com>
To: "Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/ICC_rVr0QSJ3E8q31UNLT6Plqho>
Subject: Re: [tsvwg] What does the low queueing delay that L4S offers actually mean for the latency experienced by an application?
Precedence: list

Hi Olivier

> On Mar 28, 2019, at 11:35, Tilmans, Olivier (Nokia - BE/Antwerp) <olivier.tilmans@nokia-bell-labs.com> wrote:
> 
> Hi Sebastian,
> 
>> 	Fair-enough, it shows though that ultra-low queueing delay is not
>> going to magically improve all applications.
> 
> Today's applications "work". Reducing queuing delays may help some of them,
> but latency wasn't their limiting factor in the first place (or we wouldn't say they
> work).

	It is not an accident I used the word improve instead of work.

> 
> Providing a LL service over the public Internet however enables a new class of
> applications that otherwise require dedicated/ad-hoc infrastructure.

	Build it and they will come, but what if they never come? I ask this in the context of whether to bet the ECT(1) codepoint on L4S making it, other than that I see L4S as at least a worthwile and interesting experiment.


> Dynamic applications such as AR/VR, thin clients, or interactive videos are
> example use-case where the content displayed to the user depends on his
> action.

	Sure, but all of these are RTT limited, so all L4S is going to give you is that the end-points can be a bit further away, this will not magically work from Europe to the west-coast.


> If we don't want the user to complain about input lag, motion sickness, 
> ... latency as to be brought down significantly (and reliably) without sacrificing
> the quality of the displayed content (e.g., going beyond wireframe/low-res).
> 
> This new service is part of the rationale why L4S' designers think it can get
> deployment traction with ISPs (as opposed to Today's status quo with 
> classic ECN AQMs).
	
	An ISP will only do this if it will allow either lower Cap/OpEX or higher revenue from the customers. How exactly is L4S helping with either in a way that the latency reduction already delivered by TCP-friendly ECN marking AQMs do not promise tp do so already? Not being an ISP I might not see the forest for the trees, but I am not seeing a qualitatively different deployment scenario than before.

> Other reasons/use-cases are documented in 
> draft-ietf-tsvwg-l4s-arch (e.g., using shallow buffers over high-BDP 
> links--stallite--is an economic incentive, complementary to PEPs discussed in
> RFC3135/RFC3449)
> 
>> 	Sorry that does not sound actually true. The encoder might have a
>> better prediction about the path characteristics, but unless it uses an oracle
>> that prediction might not reflect the characteristics the packets will
>> encounter en-route. 
> 
> Which is why we care about the tail latencies in all tests/demos/presentations,
> and not the average. Having some bounds on the feedback loop, enables to 
> design systems around it. 

	All true, and not counter to my argument. Also "bounds on the feedback loop" is optimistic, I give you lower average RTT and even less variation, but that is different from a hard bound, no?

> 
>> But sure this is an application where reasonable low RTT
>> can be useful, albeit once your RTT exceeds the inter-frame period this
>> example will not be generally useful, no?
> 
> Yes, hence why having bounds on your service quality is key to optimize
> its deployment (e.g., how much frame time do you allocate to the transmission
> delays, the server-side rendering, ...). Currently, one can only hope that the
> locations where he placed his service instances are "close-enough" to the
> user (as in: we hope the ISP has reasonable queues wrt. where we deployed 
> our service).

	That really does not change qualitatively, close-enough will increase a bit, sure, but L4S still offers no hard bounds (let me mention the overload protection by tail-dropping in the dual queue RFC).


> 
>> 	Well, I would like to see how you achieve the required microsecond
>> synchronization across multiple flow active at the AQM hop, on a > 20 ms
>> path, in my layman's view the ACK-"clock" is not really precise enough for
>> this. I am sure this works for DCTCP in the datacenter, but there RTT probably
>> is < 1 ms to begin with.
>> 
>>> In other words, past the initial window, you never send more data unless
>> you
>>> receive an ACK that lets you slide your sending window.
>> 
>> 	Sure, but if say 10 more flows start up and hit the AQM bottleneck,
>> the old send timing will be wrong; my question is how long does it take to re-
>> synchronize all L4S flows again to achieve the empty queue?
>> 
>>> 
>>> All data packets that traverse a given link are "spread" by the bottleneck
>> ingress
>>> router (i.e., they are serialized over that bottleneck link, one at a time).
>> These
>>> packets, even if initially sent at the same time, will now arrive at different
>> times
>>> to their destination (i.e., wake their corresponding TCP stacks at different
>>> moments). This will results in ACKs also sent at different times (and also
>> serialized
>>> at different times by the bottleneck link). This will cause the next data-
>> packets to
>>> no longer be sent at the same time, as the ACKs allowing them to be sent
>> will not
>>> be received simultaneously. Note that this is an over-simplification (see
>> e.g., [1,2]
>>> for more thorough analyses of classic TCP and [3] for DCTCP).
>> 
>> 	Sure, but that just desynchronizes them, for you scheme to work
>> IMHO you require more than de-phasing, you need each sender to send at
>> that point in time that this packet will basically encounter an empty queue at
>> the AQM
> 
> I am not certain I understand where this us synchronization requirement came
> from, as we're not speaking here of TSN/detnet.

	The synchronisation requirement comes from the fact, that once Packets bunch up in your queue the later packets will need to wait for the earlier packets to clear; queueing delay is a function of how-many bytes need to be pushed out at a given rate before the current packet gets pushed out (I am simplifying here of course). To guarantee ultra-low queueing delay _you_ need to make sure all L4S senders at the bottleneck spread their packets such that there is minimal bunch-up in the AQM queue. And that requires IMHO a somewhat stricter synchronisation of the senders than the simple de-phasing that will help spread out the packet arrival times from different senders that started fully synchronized.


> 
> We're not requiring sender to synchronize themselves (although they eventually
> will if running in a stable environment, as shown by prior research), we're directly
> giving them feedback through CE marks if their packets caused the L4S queue to 
> build up (typically using a 1ms step threshold). The L4S queue does not have to
> be empty, it has to contain e.g., less than 10 1500bytes packets on a 120Mbps
> link.
> Any more than that, and the receiver will echo the CE marks when ACKing
> the received burst of packets, causing the sender to backoff linearly wrt. the 
> number of marks when sending its next burst.
> Senders are also required to backoff if they take more than fair share of BW,
> i.e., if they receive the random marks due to the buildup of the classic queue.

	Agreed on all of this, it does not address my point though.

> 
>> again I am not sure how this is going to work over a partially
>> congested 20ms+ link. If you could point me to data showing how L4S deals
>> with this (preferable gracefully, I hope) I would be thankful.
> 
> The throughput you get over a  network path with a "partially congested link"
> is the available BW over the most congested link/the one with the smallest rate.
> This where the AQM has to be (and is usually the access link).
> It is true that converging with 20ms is slower than with 1ms (i.e., DCTCP 
> currently end up under-utilizing the link for flows lasting only a few packets, see 
> draft-ietf-tsvwg-ecn-l4s-id §A.2.3). 

	So what is the plan to tackle this? Especially in the light of competent AQMs in the path, that might not necessarily show a strong correlation between queue build-up and measurable queueing delay?

> 
>> BTW, the cited DCTCP paper says "While DCTCP converges slower than TCP,
>> we found that its convergence rate is no more than a factor 1.4 slower than
>> TCP." while I believe that L4S would need something that converges faster
>> than TCP if ultra-low queueing delay is to be achieved reliably and robustly; I
>> guess I am still missing something...
> 
> This is a known limitation with DCTCP indeed.
> See draft-ietf-tsvwg-ecn-l4s-id §A.2.3 The main challenge there is properly 
> estimating the bottleneck BW without overshooting (i.e., building up a 
> queue).
> 
> The key here is that the slower convergence causes under-utilization, not 
> queue buildup (cfr. DCTCP stability analysis).

	Good, thanks for pointing that out.

Best Regards
	Sebastian


> 
> 
> Best,
> Olivier

Re: [tsvwg] What does the low queueing delay that… Tilmans, Olivier (Nokia - BE/Antwerp)
Re: [tsvwg] What does the low queueing delay that… Sebastian Moeller
[tsvwg] What does the low queueing delay that L4S… Sebastian Moeller
Re: [tsvwg] What does the low queueing delay that… Tilmans, Olivier (Nokia - BE/Antwerp)
Re: [tsvwg] What does the low queueing delay that… Sebastian Moeller
Re: [tsvwg] What does the low queueing delay that… Tilmans, Olivier (Nokia - BE/Antwerp)
Re: [tsvwg] What does the low queueing delay that… Sebastian Moeller