Re: [tsvwg] Question regarding slide 3 of https://datatracker.ietf.org/meeting/106/materials/slides-106-tsvwg-sessb-31-tcp-prague-status-of-implementation-and-evaluation-00#page=3

Sebastian Moeller <moeller0@gmx.de> Fri, 11 June 2021 10:39 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.21\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <AM9PR07MB73131643CC45F6C63EA0B824B9369@AM9PR07MB7313.eurprd07.prod.outlook.com>
Date: Fri, 11 Jun 2021 12:38:52 +0200
Cc: TSVWG <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <E08EFD61-2A81-4098-AFB2-D72E4E7BFFC6@gmx.de>
References: <4D72E5C0-6EA6-4C90-9CD2-A94201806B22@gmx.de> <AM9PR07MB73131643CC45F6C63EA0B824B9369@AM9PR07MB7313.eurprd07.prod.outlook.com>
To: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/undNo81qGisaS7x-6iuLfJvXmvU>
Subject: Re: [tsvwg] Question regarding slide 3 of https://datatracker.ietf.org/meeting/106/materials/slides-106-tsvwg-sessb-31-tcp-prague-status-of-implementation-and-evaluation-00#page=3
Precedence: list

Hi Koen,

thanks for the information, indeed helpful to understand the details.



> On Jun 9, 2021, at 16:34, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Hi Sebastian,
> 
> We typically load the system measuring these plots with 1 flow L4S, one flow Classic (up to now labeled 1-1 in the paper), then we add 2 load levels of dynamic traffic (labeled 1h-1h for high load and 1l-1l for low load in the paper). The high load means 100 requests per second per 40Mbps throughput per traffic type (Classic & L4S) and low load would be 10 times less (10 requests per second per 40Mbps per type).

	[SM] Mmmh, I assume that the reported queueing delay CDFs are per category and not only for the single long running flow, is that correct?

> 
> On the iccrg presentation, you see we mention these details:
> ● fixed Ethernet
> So smoothly shaped physical layer

	[SM] How that? As far as I understand ethernet, a packet is always send at the negotiated link rate, and shaper basically control how many packets/bytes per unit of time they admit, so the link itself will always either be 100% loaded or idle> I guess what I should ask, what do you mean with smoothly shaped or what would be a link layer you want to differentiate against here?


> ● long-running TCPs: 1 ECN 1 non-ECN 
> 1 flow of each traffic type
> ● web-like flows @ 300/s ECN, 300/s non-ECN 

	[SM] Mmmh, ECN here means ECT(1) or ECT(0)? I assume ECT(1) at least for the dualQ runs, correct?


> So this is for the 1h-1h case, where there are 3 times the 100 requests per second, which means for a link of 3 times 40Mbps = 120Mbps (see further) and not for the 1l-1l which would be for 30 times the 40Mbps
> ● exponential arrival process

	[SM] I am feeling dumb here, but I understand arrival times here as the times the test system actually transmits a packet into the hosts networking stack? And the arrival time depends on the actual network conditions like queueing delay?


> For reproducibility and fair comparison, we use a prepared scenario file that is generated as a exponential distribution with a average arrival rate of 100 requests per second in this (1h) case, played our 3 times as fast
> ● file sizes Pareto distr. α=0.9 1KB min 1MB max
> Again for reproducibility and fair comparison, we use a prepared request size file with the parameters specified

	[SM] Thanks that seems to have the distribution parameters that Jonathan stated in a later email (from looking at the actual CDF of rs1.txt).


> ● 120Mb/s 10ms base RTT

	So at 120 Mbps we have a full MTU packet rate of:
(120*1000*1000) / ((1500+38)*8) = 9752.93 packets/second
or

1000 / ((120*1000*1000) / ((1500+38)*8))  = 0.1025 milliseconds per packet

so at the 1-2 ms queueing delay we are talking about >= 10-20 packets queued up (depending on size)




> For this network throughput and base RTT
> 
> In this slide we grouped 3 different experiments:
> - DualPI2 using for the L4S/ECN flows DCTCP (a good version like the original DCTCP in kernel 3.19 or Prague now for the recent kernels (we didn't maintain DCTCP to be aligned with Prague)) and classic flows Cubic without ECN
> - FQ_Codel using for the ECN flows Cubic with ECN enabled and classic flows Cubic without ECN
> - PIE using for the ECN flows Cubic with ECN enabled and classic flows Cubic without ECN
> So here PIE and FQ_Codel don't use L4S.

	[SM] Okay, but that means that all flows are also being treated to fq_codels default interval of 100ms and delay target of 5 ms, which we know is not ideal to get lowest queueing delay for a 10 ms RTT path. (Since 100/5 are the default values that is just an observation not intended as a criticism of your measurements).


> The paper plots are using the same network and traffic conditions (for the first row, while the second row is the 1l-1l case), but with slightly different AQM mechanisms and traffic type combinations.
> The difference in the paper are for the first DualPI2 DCTCP/CUBIC experiment, in the presentation we had the time-shifted FIFO (you can see that the latency of Classic is never later than that of L4S + about 30ms and v.v.), while if the WRR priority scheduler is used, the latencies are completely decoupled (so L4S even lower latencies and Classic not limited, only by their own congestion control).

	[SM] For the latter point, can you give a reference where I can find comnparablr plots with WRR instead of TS-FIFO?

> For the PIE there shouldn't be much difference besides the typical variance we see below 99.9 percentiles for the classic traffic type (L4S is more consistent up to 99.999).

	[SM] That is expected, tradition PIE has additional burst tolerance which will increase queuing delay slightly but which also will keep utilization and RTT-fairness a bit higher, which is supported by Figure 5 in the paper.

> For FQ_Codel the experiment is completely different, as the ECN-Cubic is replaced by the DCTCP traffic and for ECT(1) we used the immediate 1ms threshold (you are correct we forgot to mention in Table 1, thanks for noticing) instead of the Codel-ECN in the presentation for managing ECN-Cubic.
> 
> Hope this clarifies up to here.
> 
> Probably you wonder why the latency of the L4S is much bigger in case of FQ, compared to DualQ, which is simple to explain:
> - In FQ every flow has its own queue AND AQM with a threshold of 1ms.

	[SM] Is that actually true? As far as I understand https://elixir.bootlin.com/linux/latest/source/include/net/codel_impl.h#L142 a packets sojourn time is tested against ce_threshold, so isn't the 1 ms threshold universally applied for all ECN capable packets?



> So all flows are controlled around 1ms, while in L4S they just find usually an empty queue without the 1ms threshold being hit at all, because there is Classic traffic that keeps the L4S flows below the link capacity via the AQM coupling.

	[SM] That would indicate that the 1ms ce_threshold is not a good fit to L4S default queueing delay selection, which is not too unexpected, no? 



> - When dynamic flows kick in, there will be many Q's that need to be scheduled, plus new flows or "empty-Q" flows will get priority over the 1ms L4S flows. This means that also L4S flows have to wait much more in FQ than in DualQ.

	[SM] Mmmh, the waiting time in FQ depends on the number of detected flows (plus the sparseness boost for new flows, but note, as long as the long running flow empties its queue inbetween packet arrival it will also receive a sparseness boost).


> First in their own queue of around 1ms and second for their queue getting scheduled in the RR scheduler.

	[SM] Again, I am not sure each flow has its independent threshold... each flow will have its own independent AQM state but they all share the same target/interval or ce_threshold.


> Additionally their bandwidth is aggressively varying at a micro scale when flow queues build up and disappear and when flows start and their packets get priority. This causes additional jitter for the FQ-L4S flows.\

	[SM] Well, that needs a better measure than overall queuing delay... one of the properties that FQ allows is to reign in misbehaving/slowly reacting flows while keeping the harm mostly to those flows. So the measure to look at would be the RTT variance for the long running/well behaving flows especially under conditions with misbehaving flows in the mix, no? My point is, no individual flow cares for overall queueing delay CDF of an AQM as long as the intra-flow RTT variation stay small enough.


> So from this comes my insight: use FQ if you think fair rate is most important, use DualQ if you think smooth throughput (low rate jitter)) and low latency (jitter) is most important.

	[SM] I am not convinced that the data we are discussing supports such a statement, at least not on an individual flow basis.


> 
> Hope this clarifies the difference between FQ and DualQ (and their results) too, and allow you to reproduce these results. \\\

	[SM] Thanks, I was just trying to better understand what is reported on the slide and in the paper, and your email helped a lot, thanks again.


> Note that these delays are measured per packet and is only the sojourn time between enqueue and dequeue (so exact queue delay CDF per packet), not anu averaged/smoothed delay that is typically used in some test frameworks which make it impossible to spot these 99.999 percentile delays.

	[SM] mmh, now I am confused, is this the delays only for the long running flows or also for the "chaff"-flows in the same category?



> 
> See also inline for some of your specific questions.
> 
> Koen.
> 
> -----Original Message-----
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Sebastian Moeller
> Sent: Wednesday, June 2, 2021 12:13 PM
> To: TSVWG <tsvwg@ietf.org>
> Subject: [tsvwg] Question regarding slide 3 of https://datatracker.ietf.org/meeting/106/materials/slides-106-tsvwg-sessb-31-tcp-prague-status-of-implementation-and-evaluation-00#page=3
> 
> Hi Bob, Koen,
> 
> I recently had a look at https://datatracker.ietf.org/meeting/106/materials/slides-106-tsvwg-sessb-31-tcp-prague-status-of-implementation-and-evaluation-00#page=3 again, and I wonder whether you could share more information about the data/experiments underlaying that figure.
> 
> It looks similar to Figure 7 of https://www.bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf, except the individual CDFs look slightly different. Figure7 does not have much of a legend and is also not referenced at all in the manuscript. Could you share the exact test conditions, as well as the modifications to fq_codel you mention in the manuscript as well as the ce_threshold value you used (assuming you used that at all), please?
> 
> [K] You are right, seems we forgot to mention we used a 1ms threshold for FQ_Codel in the manuscript in Table 1. But besides that:
> "(we) used a modified version
> of FQ-CoDel, where L4S support was added by using a shallow
> ECN marking threshold for any ECT(1) packet, with an
> additional check to ensure that the threshold is only applied
> if there is more than 1 packet in the queue. The queue length
> check was added to prevent 100% marking at lower link rates,
> considering that packet serialisation in such cases takes longer"

	[SM] The general approach seems to be to set the minimum delay target/ce_threshold such that it is allows at least 1, better 2 full MTU packet to queue without triggering, otherwise throughput takes an unacceptable hit, your approach however also works and IIRC is close to original CoDel ideas.



> Also, in that manuscript you seemed to have used fq_codel with target 5ms and interval 100ms and compared it against DualPI2 with a 1 ms ref_delay target" for the L-queue. How does your modified fq_codel stack up in comparison, how are target and interval interpreted for ECT(1) traffic?
> 
> [K] We really used a 1ms threshold for ECT[1], so there is a direct head to head comparison possible in the paper (otherwise the 80% percentile would be rather around 5ms instead of 1ms).

	[SM] Excellent, thanks for clearing this up.


> 
> Best Regards
> 	Sebastian

[tsvwg] Question regarding slide 3 of https://dat… Sebastian Moeller
Re: [tsvwg] Question regarding slide 3 of https:/… De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] Question regarding slide 3 of https:/… Jonathan Morton
Re: [tsvwg] Question regarding slide 3 of https:/… De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] Question regarding slide 3 of https:/… Sebastian Moeller
Re: [tsvwg] Question regarding slide 3 of https:/… Jonathan Morton