Re: [tcpm] sce vs l4s comparison plots?

"Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:

> Dave,
> 	Thank you for the links, some short responses inline.

I did a bit more data review of the l4s data today.

https://l4s.cablelabs.com/results/6-dctcp-20191018-214743/6-dctcp_25_50Mbps_0.95_0ms.pdf

has an interesting growth line in the middle series of charts.

Two anecdotes: 1) arguably the internet is so messed up because everyone
has optimized for speedtest's 20s test run rather than a longer
period. 2) I lost much hair to this maddening one: http://blog.cerowrt.org/post/disabling_channel_scans/

It would be good to have a test that also ran for more than a minute,
say 5, or 10, so that a global trend be seen.

>
> Regards,
> Rod
>
>> 
>> It is really wonderful to have these two groups using the
>> flent tool and producing comparable data on the same test suites in
>> a few cases now, and the testing so far found a couple bugs.
>> 
>> https://github.com/heistp/sce-l4s-bakeoff
>> https://l4s.cablelabs.com/
>> 
>> I do feel rather strongly that any major change to how the internet does
>> congestion control requires a maximum of publically repeatable
>> benchmarks using open, shared source code. [1]
>> 
>> I do keep hoping for a rrul test on a 15 to 20x1 asymmetric network
>> simulation like how most cablemodems and dsl boxes are configured, and
>> more of the rtt_fair_* tests, and what the heck - the tcp_square_wave
>> test, and any of a dozen others. tcp_nup with 32 flows... heck, hundreds
>> of flows... go to town! Burn some cycles!
>
> I believe Pete has some work scheduled for the (near?) future
> in this area.

I will sleep so much better about both proposals once a rrul test
is run, and then from there accecn can be better evaluated.

thx everybody for your efforts on this!

>> 
>> A few packet caps would be nice to have also, to look
>> over relative mark rates in tcptrace and scetrace.
> I do believe all you seek for Petes data is there,
> https://github.com/heistp/sce-l4s-bakeoff/#test-output
> specifically items:
> *.flent.gz- standard Flent output file, may be used to view original data or re-generate plots
> *.tcpdump_*.pcap.bz2- pcap file compressed with bzip2 -9

Groovy. Don't stop doing that. ;)

>> 
>> Anyway... l4steam....
>> 
>> In looking over the l4s.cablelabs site today though, I see the data is
>> only available as .png files, and that the associated flent.gz data
>> is not apparently being published.
>> 
>> ..svgs help. Flent can also be run at a higher sample rate... but
>> 
>> There are a few things about having *.flent.gz files available that
>> would help us look at some things in more detail. Me, I tend to look at
>> comparison cdf plots more than anything else, and secondly bar plots,
>> and lacking that it's hard to make a distinction betwen a whole bunch of
>> tiny lines at the bottom about overall queue depths and latency
>> distributions, among other things, without squinting, and doing tricks
>> in photoshop to make the two or more datasets "line up".
>> 
>> (there's dozens of plot types in the flent tool, and comparison plots
>> are a snap)
>> 
>> By default (when run without -x) flent captures very little metadata
>> about the system it is run on (IP addresses, a couple sysctls, and
>> qdiscs) but it's helpful to have. One example that would be in that
>> metadata, is that I'm unsure if the ns3 data is using an IW4 or IW10?
>> 
>> It sounds like you are rate limiting with htb? (to what quantum?)

?

>> 
>> Another example, in more "native" environments running at a simulated
>> line rate, BQL is quite important to have in the simulation

?

>> also. there's been a couple papers published on BQL's benefits vs a raw
>> txring, thus far, there's a good plot of what it does in fig 6 of:

>> 
>> http://sci-hub.tw/10.1109/LANMAN.2019.8847054
>> 
>> Lastly...
>> 
>> So far as I know ns(X) does not correctly simulate GSO/TSO even when
>> run in DCE mode, but I could be out of date on that. TBF (and cake)

?

>> do break apart superpackets, htb (+ anything, like fq_codel or dualq)
>> do not.
>> 
>> I also have completely forgotten at this point, to what extent the
>> ns3 model of fq_codel matches the deployed reality. It was such a long
>> time ago.... as best as I recall ns3 didn't even have ecn support then!

I looked over the ns3 fq_codel implementation, and (by mark one eyeball)
it appears largely correct - nowadays we increase the codel count in the
bulk dropper, but unless there's a specific flooding test (?) in the
suite, we're not going to hit that. Might be able to fix that once I get
used to writing OOP again.

More important to verify is the actual endpoint setup in the experiment
to make sure a real 5 tuple is in use....

The ns3 codel code matches the linux implementation, also.

I'd still have to spin up the test suite to actually trust it, and
I was (2012) pretty sure the cubic implementation differed
significantly from linuxes. 

> And apparently is still missing some ECN (RFC3168) support as your
> following link contains some patches:
> 	6.  cd back one level to the top-level ns-3 directory,
> 	and patch the ns-3.30.1 release with a patch provided
> 	in the l4s-evaluation/patches/ directory. This patch
> 	adds support for ECE handling and DCTCP in the base
> 	TCP code, and adds ECN handling to CoDel and FqCoDel models.

Well, I am *happy* to see it was the current release of ns3 + a little
bit. That's a relief.

That said - oy - um, er - this being the first ever use of ECN in
ns3 requires quite a bit of validation. From a quick look at this patch,
simply off the top of my head - it doesn't follow rfc3168's
recommendation in section

6.1.5.  Retransmitted TCP packets

"This document specifies ECN-capable TCP implementations MUST NOT set
   either ECT codepoint (ECT(0) or ECT(1)) in the IP header for
   retransmitted data packets, and that the TCP data receiver SHOULD
   ignore the ECN field on arriving data packets that are outside of the
   receiver's current window.  "...

I think (need to look harder)

... as much as some folk here would like to obsolete that, linux does
actually follow this portion of the spec (I don't know about other OSes)
, and it has long been one of my concerns regarding l4s (and cake's)
interactions with loss and flipping between queues in general.

>
>> 
>> I'm taking apart
>> https://gitlab.com/tomhend/modules/l4s-evaluation/blob/master/README.md
>> as I write.
>> 
>> thx for the data!
>> 
>> [1] If anyone hasn't read my related rant yet, it's here:
>> https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf
>> 
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>>