Re: [tsvwg] Real time low latency video (SCReAM) and L4S

Sebastian Moeller <moeller0@gmx.de> Thu, 10 December 2020 09:56 UTC

From: Sebastian Moeller <moeller0@gmx.de>
Message-Id: <A08F26AD-4060-4536-94D9-6AF8225812F2@gmx.de>
Content-Type: multipart/alternative; boundary="Apple-Mail=_85A87807-CC2F-4E25-9B68-B88C35DA0DCE"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\))
Date: Thu, 10 Dec 2020 10:56:05 +0100
In-Reply-To: <HE1PR0701MB229912614342D1316592A1E6C2CB0@HE1PR0701MB2299.eurprd07.prod.outlook.com>
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
To: Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>
References: <HE1PR0701MB229912614342D1316592A1E6C2CB0@HE1PR0701MB2299.eurprd07.prod.outlook.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/jD9flyhpn4CShiJq7e08oqr-c94>
Subject: Re: [tsvwg] Real time low latency video (SCReAM) and L4S
Precedence: list

Hi Ingemar,

one clarification, pure CoDel is not the state of the art AQM that L4S competes against, at the very least it is fq_codel (or cake), which for multiple concurrent flows is a big difference (in your single flow tests mentioned below that does not matter, but for real links it makes a ton of difference, also doing these tests without cross-traffic is not too realistic, ist it?). I thought that was obvious, but I should be more explicit in the future. See my anecdotal delay data from cake inn the other thread, with 3 concurrent multi-flow loads.


> On Dec 10, 2020, at 10:18, Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org> wrote:
> 
> Hi
> 
> I write up one post to address a few of the arguments for/against L4S seen from a SCReAM perspective and from the general point of streaming of low latency video.
> I don't expect that the opponents of L4S will become convinced but I hope that it will help others to get some perspective around the matters brought up below. There is a certain risk that this can end up in yet another lengthy thread. I will personally eject from this quite quickly to spare the rest of the audience. 
> 
> 1) "SCReAM, last I checked, has no code to implement the detection heuristic." (for RFC3168 bottlenecks) : 
> SCReAM as well as other rate control algorithms need to have a fall back to delay based congestion control, to at least get a decent behavior for the cases where L4S is not implemented. In addition  a video source has a variable frame size output, not just  because of occasional I frames, P frames also vary in size. This means that realtime low latency applications that strive for a low e2e delay will leave some air for other flows. It is not a given that they will behave like SCReAM but regardless of implementation they all have a rate limited source with a rate that varies a lot over short time scales.
> With SCReAM I currently see the problem that it is a bit too reactive to congestion and that it needs some improvement and an RFC3168 detection algorithm is perhaps needed then. But currently, SCReAM is a pretty weak competitor against flows with infinite source bitrates. 
> But.. instead of just repeating this argument over and over again. Get the SCReAM BW test application from https://github.com/EricssonResearch/scream and run it with the following additional arguments 
> -ect 1 -rand 20 
> The code is public, it is just to fire away in test beds.
> 
> 2) L4S gives very limited gain over CoDel (5ms is only a little more than 1ms). 
> I will address this argument as well as the argument that the argument that the congestion control algos chartered in the RMCAT WG already themselves can keep delay low. The simulated test case is SCReAM running with a test trace from NVENC, the video frames are scaled to match the target rate, in the first experiment I assume that the video encoder is very responsive to changes in target bitrate, in the second experiment a bit sluggish behavior is mimicked that makes the video coder react 200ms late, this is a common case in many video encoders (e.g NVIVIA Jetson Nano and Xavier NX and RPI 3). The bottleneck is a 20ms RTT and a bandwidth that varies between 40 and 15Mbps in steps. The max video bitrate is set to 30Mbps.
> 
> *SCReAM_no_ECN_v_0.0.png : Shows the performance with no ECN support at all. The video frame delay illustrates the delay the end user will experience, the larger the variations, the larger delay is needed in a dejitterbuffer is needed to avoid a choppy play out.
> *SCReAM_ECT0_CoDel_5ms_100ms_v_0.0.png : Shows the performance with  CoDel ECN marking.  Slightly better than no ECN but still up to 150ms frame delay spikes
> *SCReAM_ECT1_L4S_v_0.0.png : Now with L4S. The delay is greatly reduced. What is left is the frame delay due to the serialization delay of the larger video frames. The nominal bitrate is slightly lower than the other two cases above in the congested area (when the bottleneck BW is <= 30Mbps), this is a natural tradeoff between latency and throughput.
> 
> Now we complicate things a bit for the experiment when the video coder is a bit more sluggish in its behavior. Now it takes 200ms for the video coder to respond to a changed target bitrate.

	Well, now try again with your period of rate changes similar to the delay instead of multiple seconds...


> * SCReAM_no_ECN_v_0.2.png : As expected one can see that the video frame delays increase yet some more.
> * SCReAM_ECT0_CoDel_5ms_100ms_v_0.2.png : Roughly the same performance with some tendency towards a higher frame delay.

	[SM] Well, over a 20ms RTT link CoDel's default values are not perfect, here 2ms_40ms might be a better match, but you still have the non high frequency ECH response to deal with. Before you interpret that as an indication of L4S' inherent superiority, please show the respective behavior over a link with 100ms base RTT. This cuts both ways.... 


> * SCReAM_ECT1_L4S_v_0.2.png : Video frame delay is still very low, despite this added extra video coding artifact.

	[SM] If you look at the data you realize that the latency spikes are missing for one reason only, the headroom that SCReAM over L4S leaves is accidentally? large enough that the rate decrease steps actually still leave enough room for the SCReAM flows ot continue at their initial rate. Looking at that graph troubles will start, if rate steps will be closer in time than 200ms or of the rate steps are large enough to actually eat up all the rate under-shoot that SCReAM over an L4S bottleneck generates... "Blaming" that on L4S is a bit of  a stretch, no?
	To illustrate my point, at around second 50 the undershoot is not sufficient (plot of video rate in blue and link capacity in red touch) and lo and behold we see a prominent spike in the delay plot. Are you sure that your hypotheses of why L4S leads to superior performance in this test is actually HFCC and not simply the fact that it under estimates the capacity and hence simply rarely fills the bottleneck buffers?
	If we look at the temporal overshoot, all plots show a similar temporal response profile, it is just due to the L4S cases lower utilization that overshoot does not translate into latency spikes. 
	In short I do not think the presented data is sufficiently broad to allow extrapolation to the generic case. It is bit like the fas-track-to-close-CDN examples that have been presented numerous times in the past. I do not doubt that under the tested condition L4S gives the best low latency performance, but I do not think your argument why L4S performs better actually is driven by the actual data.

> 
> To conclude. 
> + L4S improves performance a lot.
> + Congestion controls that only rely on delay and packet loss measurements does not give a very low delay.

	[SM] These are decent arguments for HFCC in general and not for L4S specifically, no? Assuming that your tests do not simply demonstrate that paced video streaming over variable rate links should always leave enough headroom for the to be expected rate steps....


> 
> I hope that this should cast some light on the question whether L4S is useful or not.

	[SM] It does not answer the question though whether L4S is the best or even a reasonable implementation of HFCC. 

Again, I am not sure that your conclusion is the only or even the simplest reason, why L4S-enables SCReAM seems to perform better in these tests, with the added question of what end users would prefer, higher quality bitrates with the occasional glitch or smoother but lower quality video.

Best Regards
	Sebastian



> 
> /Ingemar
> ================================
> Ingemar Johansson  M.Sc. 
> Master Researcher
> 
> Ericsson Research
> RESEARCHER
> GFTL ER NAP NCM Netw Proto & E2E Perf
> Labratoriegränd 11
> 977 53, Luleå, Sweden
> Phone +46-1071 43042
> SMS/MMS +46-73 078 3289
> ingemar.s.johansson@ericsson.com
> www.ericsson.com
> 
> Talk about a dream, try to make it real
>                  Bruce Springsteen
> =================================
>

Attachment: SCReAM_no_ECN_v_0.0.png
Attachment: SCReAM_ECT0_CoDel_5ms_100ms_v_0.0.png
Attachment: SCReAM_ECT1_L4S_v_0.0.png
Attachment: SCReAM_no_ECN_v_0.2.png
Attachment: SCReAM_ECT0_CoDel_5ms_100ms_v_0.2.png
Attachment: SCReAM_ECT1_L4S_v_0.2.png

[tsvwg] Real time low latency video (SCReAM) and … Ingemar Johansson S
Re: [tsvwg] Real time low latency video (SCReAM) … Sebastian Moeller
Re: [tsvwg] Real time low latency video (SCReAM) … Ingemar Johansson S
Re: [tsvwg] Real time low latency video (SCReAM) … Greg White
Re: [tsvwg] Real time low latency video (SCReAM) … Ingemar Johansson S

Re: [tsvwg] Real time low latency video (SCReAM) and L4S

Attachment: SCReAM_no_ECN_v_0.0.png

Attachment: SCReAM_ECT0_CoDel_5ms_100ms_v_0.0.png

Attachment: SCReAM_ECT1_L4S_v_0.0.png

Attachment: SCReAM_no_ECN_v_0.2.png

Attachment: SCReAM_ECT0_CoDel_5ms_100ms_v_0.2.png

Attachment: SCReAM_ECT1_L4S_v_0.2.png