Re: [tsvwg] Real time low latency video (SCReAM) and L4S

Sebastian Moeller <> Thu, 10 December 2020 09:56 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D2ABF3A0C03; Thu, 10 Dec 2020 01:56:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.648
X-Spam-Status: No, score=-1.648 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id bQ_uOIrHEdDX; Thu, 10 Dec 2020 01:56:13 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 464E83A0BF2; Thu, 10 Dec 2020 01:56:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;; s=badeba3b8450; t=1607594169; bh=9d5HPrVmdckyZSz18/qhVJfxEBDCEXOoXCmdsjCwH2Y=; h=X-UI-Sender-Class:From:Subject:Date:In-Reply-To:Cc:To:References; b=PUQPlB3Su+d3kNOryDSodZjF6aOacEDfNpQHircM8V3lkwEK57FMAYCHAMvvRy0gc MGEH8pThiWyuplSonlglhVCNrylPeFo/XKIV8ykoqxGEe+l9IWQ5UK2cR+AxZB+oyg Xj0054vRbpGKfJCweYpkz5rdjdWhQxSWN/yNvGfk=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [] ([]) by (mrgmx104 []) with ESMTPSA (Nemesis) id 1MxDkm-1ju5YW10iu-00xZfm; Thu, 10 Dec 2020 10:56:08 +0100
From: Sebastian Moeller <>
Message-Id: <>
Content-Type: multipart/alternative; boundary="Apple-Mail=_85A87807-CC2F-4E25-9B68-B88C35DA0DCE"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\))
Date: Thu, 10 Dec 2020 10:56:05 +0100
In-Reply-To: <>
Cc: "" <>
To: Ingemar Johansson S <>
References: <>
X-Mailer: Apple Mail (2.3445.104.17)
X-Provags-ID: V03:K1:qZLvSv6t1OtmTGvyi13LKE9VptLMlC6yyms0K4UN/D9bRZACe2e P0WIoDywIXuxUSa3AXNVfTbNavDcDz/4BkNw3cVBS53oNtsDPd7LEEtXh34gF3qNp9QT9zK Bv0HNUgDxmW42ZKDFSTgx7marJxSqn/kQVvJR9eO/XcsWpLZ1WKN1Ut3Shnkxkw3ihqxR77 /HOnJ3KMohcgdaNNBl5vw==
X-UI-Out-Filterresults: notjunk:1;V03:K0:tSLv8rMz9as=:zNccqjBhRAxuad9w2teaDe n7D9uLJwl5OwM1xoD76QkIOGfb2yBIh5MIQS1WD6oWihaaO3IXojnEmeQp2lcZEkX8O4TDsQm 7Nyew/DiJQfck8UTzjNeEpu2oPKVYN0Mgn2OO3/aU5jNXUtAjVZZvqWyOWjzQuwvX4/rtwZTP Ux++GpJQzZzOQOjDa4qFelnzIrBYQHLx94uA4dtjdtYfGLA6PGl65+xuuz8hal5h0/oJqoRDb 2sAwjznIjlim745raxWA8VHsqAaVttjb3AmgZ+f1cB5MNh8n6bhnsePiU/BrCkgRxHObh9hU2 mQc3RJWCvhM1ofvilsUsPkaKV5COO1dMFc//MxWkBUmOVusv2dXNEUViFBYhI7BviJjRgu6Ad m7hkk5UYfvAZVK3FN/XcbJ01n2YDzLAJVfNaXsclUh16y2nLpyYkGLgkZTzLwPiUKVPVaUYm5 lkhyP5hKydZjF2iumSpWxmDSficOGqiGwGtiWXfNIjjUSmyYalaHI0TurZ1bXUOsP8+LzH5P4 IQmzzfZH9pXrJO6/hAF/nOpSvnZSOsoUuExMrzbiqzY2RQGelwE1YQAqCaps64SvQeLEydccH nm11G5qmsT/hfKNTRRsjitrwwMTecBm3q+F2zEG9FjbqFba83qBDyNhWseHeV4q+JyfDqtokk Y9nAT8Q/8IyxF23puCD2ziODmJc4fnJF4QGMMk42V4EKeh7NjzwiOD46cgFzkRP1vcCAJDH62 sA+MR+bhrDijMJXDqQgv4O4/OZZWkCxRLei+eUdb5la/Mt0kD0g6+xN4oQs+Fkcdf56cIkeEv vax+0NuiZwOz3yMqj87QtMhT3GFnySw9BmNaEgZlevo8I0+/ZomhiZwcuDxs79dzCfLsXC6FI RCpQqRs5xawR6WnVFPwvqoD4ri3S5oPsUZxo81a2M=
Archived-At: <>
Subject: Re: [tsvwg] Real time low latency video (SCReAM) and L4S
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 10 Dec 2020 09:56:17 -0000

Hi Ingemar,

one clarification, pure CoDel is not the state of the art AQM that L4S competes against, at the very least it is fq_codel (or cake), which for multiple concurrent flows is a big difference (in your single flow tests mentioned below that does not matter, but for real links it makes a ton of difference, also doing these tests without cross-traffic is not too realistic, ist it?). I thought that was obvious, but I should be more explicit in the future. See my anecdotal delay data from cake inn the other thread, with 3 concurrent multi-flow loads.

> On Dec 10, 2020, at 10:18, Ingemar Johansson S <> wrote:
> Hi
> I write up one post to address a few of the arguments for/against L4S seen from a SCReAM perspective and from the general point of streaming of low latency video.
> I don't expect that the opponents of L4S will become convinced but I hope that it will help others to get some perspective around the matters brought up below. There is a certain risk that this can end up in yet another lengthy thread. I will personally eject from this quite quickly to spare the rest of the audience. 
> 1) "SCReAM, last I checked, has no code to implement the detection heuristic." (for RFC3168 bottlenecks) : 
> SCReAM as well as other rate control algorithms need to have a fall back to delay based congestion control, to at least get a decent behavior for the cases where L4S is not implemented. In addition  a video source has a variable frame size output, not just  because of occasional I frames, P frames also vary in size. This means that realtime low latency applications that strive for a low e2e delay will leave some air for other flows. It is not a given that they will behave like SCReAM but regardless of implementation they all have a rate limited source with a rate that varies a lot over short time scales.
> With SCReAM I currently see the problem that it is a bit too reactive to congestion and that it needs some improvement and an RFC3168 detection algorithm is perhaps needed then. But currently, SCReAM is a pretty weak competitor against flows with infinite source bitrates. 
> But.. instead of just repeating this argument over and over again. Get the SCReAM BW test application from and run it with the following additional arguments 
> -ect 1 -rand 20 
> The code is public, it is just to fire away in test beds.
> 2) L4S gives very limited gain over CoDel (5ms is only a little more than 1ms). 
> I will address this argument as well as the argument that the argument that the congestion control algos chartered in the RMCAT WG already themselves can keep delay low. The simulated test case is SCReAM running with a test trace from NVENC, the video frames are scaled to match the target rate, in the first experiment I assume that the video encoder is very responsive to changes in target bitrate, in the second experiment a bit sluggish behavior is mimicked that makes the video coder react 200ms late, this is a common case in many video encoders (e.g NVIVIA Jetson Nano and Xavier NX and RPI 3). The bottleneck is a 20ms RTT and a bandwidth that varies between 40 and 15Mbps in steps. The max video bitrate is set to 30Mbps.
> *SCReAM_no_ECN_v_0.0.png : Shows the performance with no ECN support at all. The video frame delay illustrates the delay the end user will experience, the larger the variations, the larger delay is needed in a dejitterbuffer is needed to avoid a choppy play out.
> *SCReAM_ECT0_CoDel_5ms_100ms_v_0.0.png : Shows the performance with  CoDel ECN marking.  Slightly better than no ECN but still up to 150ms frame delay spikes
> *SCReAM_ECT1_L4S_v_0.0.png : Now with L4S. The delay is greatly reduced. What is left is the frame delay due to the serialization delay of the larger video frames. The nominal bitrate is slightly lower than the other two cases above in the congested area (when the bottleneck BW is <= 30Mbps), this is a natural tradeoff between latency and throughput.
> Now we complicate things a bit for the experiment when the video coder is a bit more sluggish in its behavior. Now it takes 200ms for the video coder to respond to a changed target bitrate.

	Well, now try again with your period of rate changes similar to the delay instead of multiple seconds...

> * SCReAM_no_ECN_v_0.2.png : As expected one can see that the video frame delays increase yet some more.
> * SCReAM_ECT0_CoDel_5ms_100ms_v_0.2.png : Roughly the same performance with some tendency towards a higher frame delay.

	[SM] Well, over a 20ms RTT link CoDel's default values are not perfect, here 2ms_40ms might be a better match, but you still have the non high frequency ECH response to deal with. Before you interpret that as an indication of L4S' inherent superiority, please show the respective behavior over a link with 100ms base RTT. This cuts both ways.... 

> * SCReAM_ECT1_L4S_v_0.2.png : Video frame delay is still very low, despite this added extra video coding artifact.

	[SM] If you look at the data you realize that the latency spikes are missing for one reason only, the headroom that SCReAM over L4S leaves is accidentally? large enough that the rate decrease steps actually still leave enough room for the SCReAM flows ot continue at their initial rate. Looking at that graph troubles will start, if rate steps will be closer in time than 200ms or of the rate steps are large enough to actually eat up all the rate under-shoot that SCReAM over an L4S bottleneck generates... "Blaming" that on L4S is a bit of  a stretch, no?
	To illustrate my point, at around second 50 the undershoot is not sufficient (plot of video rate in blue and link capacity in red touch) and lo and behold we see a prominent spike in the delay plot. Are you sure that your hypotheses of why L4S leads to superior performance in this test is actually HFCC and not simply the fact that it under estimates the capacity and hence simply rarely fills the bottleneck buffers?
	If we look at the temporal overshoot, all plots show a similar temporal response profile, it is just due to the L4S cases lower utilization that overshoot does not translate into latency spikes. 
	In short I do not think the presented data is sufficiently broad to allow extrapolation to the generic case. It is bit like the fas-track-to-close-CDN examples that have been presented numerous times in the past. I do not doubt that under the tested condition L4S gives the best low latency performance, but I do not think your argument why L4S performs better actually is driven by the actual data.

> To conclude. 
> + L4S improves performance a lot.
> + Congestion controls that only rely on delay and packet loss measurements does not give a very low delay.

	[SM] These are decent arguments for HFCC in general and not for L4S specifically, no? Assuming that your tests do not simply demonstrate that paced video streaming over variable rate links should always leave enough headroom for the to be expected rate steps....

> I hope that this should cast some light on the question whether L4S is useful or not.

	[SM] It does not answer the question though whether L4S is the best or even a reasonable implementation of HFCC. 

Again, I am not sure that your conclusion is the only or even the simplest reason, why L4S-enables SCReAM seems to perform better in these tests, with the added question of what end users would prefer, higher quality bitrates with the occasional glitch or smoother but lower quality video.

Best Regards

> /Ingemar
> ================================
> Ingemar Johansson  M.Sc. 
> Master Researcher
> Ericsson Research
> GFTL ER NAP NCM Netw Proto & E2E Perf
> Labratoriegränd 11
> 977 53, Luleå, Sweden
> Phone +46-1071 43042
> SMS/MMS +46-73 078 3289
> Talk about a dream, try to make it real
>                  Bruce Springsteen
> =================================