Re: [tsvwg] What does the low queueing delay that L4S offers actually mean for the latency experienced by an application?

"Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com> Thu, 28 March 2019 10:36 UTC

From: "Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com>
To: Sebastian Moeller <moeller0@gmx.de>
CC: tsvwg IETF list <tsvwg@ietf.org>
Thread-Topic: [tsvwg] What does the low queueing delay that L4S offers actually mean for the latency experienced by an application?
Thread-Index: AQHU4y2IuGGGZtbrxkGob7ssemElz6Yd6xvQgAGMgICAAAyNAIAAFvIAgAAaiAA=
Date: Thu, 28 Mar 2019 10:35:54 +0000
Message-ID: <AM0PR07MB48194E5195C822DCD842C84DE0590@AM0PR07MB4819.eurprd07.prod.outlook.com>
References: <390EEF00-4CFF-4F22-8492-7D6D178756D5@gmx.de> <AM0PR07MB48199CD0EE0C27B9A6AC792FE0580@AM0PR07MB4819.eurprd07.prod.outlook.com> <91DCADC2-BB35-4CBA-9425-136D22A81538@gmx.de> <AM0PR07MB481927C4B6B3A52C62B60F74E0580@AM0PR07MB4819.eurprd07.prod.outlook.com> <38F6C2D8-38FB-417B-AF76-083388584F2E@gmx.de>
In-Reply-To: <38F6C2D8-38FB-417B-AF76-083388584F2E@gmx.de>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: nokia-bell-labs.com does not designate permitted sender hosts)
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 08245d18-4480-4ffd-d754-08d6b3692877
X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Mar 2019 10:35:55.0233 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR07MB5988
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/pqqyaTjBpvZImRIAKrvVAAd3pVA>
Subject: Re: [tsvwg] What does the low queueing delay that L4S offers actually mean for the latency experienced by an application?
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Mar 2019 10:36:10 -0000

Hi Sebastian,

> 	Fair-enough, it shows though that ultra-low queueing delay is not
> going to magically improve all applications.

Today's applications "work". Reducing queuing delays may help some of them,
but latency wasn't their limiting factor in the first place (or we wouldn't say they
work).

Providing a LL service over the public Internet however enables a new class of
applications that otherwise require dedicated/ad-hoc infrastructure.
Dynamic applications such as AR/VR, thin clients, or interactive videos are
example use-case where the content displayed to the user depends on his
action. If we don't want the user to complain about input lag, motion sickness, 
... latency as to be brought down significantly (and reliably) without sacrificing
the quality of the displayed content (e.g., going beyond wireframe/low-res).

This new service is part of the rationale why L4S' designers think it can get 
deployment traction with ISPs (as opposed to Today's status quo with 
classic ECN AQMs). Other reasons/use-cases are documented in 
draft-ietf-tsvwg-l4s-arch (e.g., using shallow buffers over high-BDP 
links--stallite--is an economic incentive, complementary to PEPs discussed in
RFC3135/RFC3449)

> 	Sorry that does not sound actually true. The encoder might have a
> better prediction about the path characteristics, but unless it uses an oracle
> that prediction might not reflect the characteristics the packets will
> encounter en-route. 

Which is why we care about the tail latencies in all tests/demos/presentations,
and not the average. Having some bounds on the feedback loop, enables to 
design systems around it. 

> But sure this is an application where reasonable low RTT
> can be useful, albeit once your RTT exceeds the inter-frame period this
> example will not be generally useful, no?

Yes, hence why having bounds on your service quality is key to optimize
its deployment (e.g., how much frame time do you allocate to the transmission
delays, the server-side rendering, ...). Currently, one can only hope that the
locations where he placed his service instances are "close-enough" to the
user (as in: we hope the ISP has reasonable queues wrt. where we deployed 
our service).

> 	Well, I would like to see how you achieve the required microsecond
> synchronization across multiple flow active at the AQM hop, on a > 20 ms
> path, in my layman's view the ACK-"clock" is not really precise enough for
> this. I am sure this works for DCTCP in the datacenter, but there RTT probably
> is < 1 ms to begin with.
>
> > In other words, past the initial window, you never send more data unless
> you
> > receive an ACK that lets you slide your sending window.
> 
> 	Sure, but if say 10 more flows start up and hit the AQM bottleneck,
> the old send timing will be wrong; my question is how long does it take to re-
> synchronize all L4S flows again to achieve the empty queue?
> 
> >
> > All data packets that traverse a given link are "spread" by the bottleneck
> ingress
> > router (i.e., they are serialized over that bottleneck link, one at a time).
> These
> > packets, even if initially sent at the same time, will now arrive at different
> times
> > to their destination (i.e., wake their corresponding TCP stacks at different
> > moments). This will results in ACKs also sent at different times (and also
> serialized
> > at different times by the bottleneck link). This will cause the next data-
> packets to
> > no longer be sent at the same time, as the ACKs allowing them to be sent
> will not
> > be received simultaneously. Note that this is an over-simplification (see
> e.g., [1,2]
> > for more thorough analyses of classic TCP and [3] for DCTCP).
> 
> 	Sure, but that just desynchronizes them, for you scheme to work
> IMHO you require more than de-phasing, you need each sender to send at
> that point in time that this packet will basically encounter an empty queue at
> the AQM

I am not certain I understand where this us synchronization requirement came
from, as we're not speaking here of TSN/detnet.

We're not requiring sender to synchronize themselves (although they eventually
will if running in a stable environment, as shown by prior research), we're directly
giving them feedback through CE marks if their packets caused the L4S queue to 
build up (typically using a 1ms step threshold). The L4S queue does not have to
 be empty, it has to contain e.g., less than 10 1500bytes packets on a 120Mbps
link. Any more than that, and the receiver will echo the CE marks when ACKing
the received burst of packets, causing the sender to backoff linearly wrt. the 
number of marks when sending its next burst.
Senders are also required to backoff if they take more than fair share of BW,
i.e., if they receive the random marks due to the buildup of the classic queue.

> again I am not sure how this is going to work over a partially
> congested 20ms+ link. If you could point me to data showing how L4S deals
> with this (preferable gracefully, I hope) I would be thankful.

The throughput you get over a  network path with a "partially congested link"
is the available BW over the most congested link/the one with the smallest rate.
This where the AQM has to be (and is usually the access link).
It is true that converging with 20ms is slower than with 1ms (i.e., DCTCP 
currently end up under-utilizing the link for flows lasting only a few packets, see 
draft-ietf-tsvwg-ecn-l4s-id §A.2.3). 

> BTW, the cited DCTCP paper says "While DCTCP converges slower than TCP,
> we found that its convergence rate is no more than a factor 1.4 slower than
> TCP." while I believe that L4S would need something that converges faster
> than TCP if ultra-low queueing delay is to be achieved reliably and robustly; I
> guess I am still missing something...

This is a known limitation with DCTCP indeed.
See draft-ietf-tsvwg-ecn-l4s-id §A.2.3 The main challenge there is properly 
estimating the bottleneck BW without overshooting (i.e., building up a 
queue).

The key here is that the slower convergence causes under-utilization, not 
queue buildup (cfr. DCTCP stability analysis).


Best,
Olivier

Re: [tsvwg] What does the low queueing delay that… Tilmans, Olivier (Nokia - BE/Antwerp)
Re: [tsvwg] What does the low queueing delay that… Sebastian Moeller
[tsvwg] What does the low queueing delay that L4S… Sebastian Moeller
Re: [tsvwg] What does the low queueing delay that… Tilmans, Olivier (Nokia - BE/Antwerp)
Re: [tsvwg] What does the low queueing delay that… Sebastian Moeller
Re: [tsvwg] What does the low queueing delay that… Tilmans, Olivier (Nokia - BE/Antwerp)
Re: [tsvwg] What does the low queueing delay that… Sebastian Moeller