[tcpPrague] TSO burst sizing causing TCP Prague unfairness on high capacity links ?

Ashutosh Srivastava <as12738@nyu.edu> Thu, 28 May 2020 19:20 UTC

From: Ashutosh Srivastava <as12738@nyu.edu>
Date: Thu, 28 May 2020 15:19:48 -0400
To: tcpprague@ietf.org
Subject: [tcpPrague] TSO burst sizing causing TCP Prague unfairness on high capacity links ?
Hi everyone,

I am a PhD student at the NYU Tandon School of Engineering. Recently, I
have been working on evaluating
<http://witestlab.poly.edu/~ffund/pubs/tcp-mmwave.pdf> low latency
congestion control protocols ( like BBR, TCP-Prague) over high capacity
mmWave wireless links. We observed unfairness between TCP Prague flows when
running over high capacity links ( not just wireless but in general ) and I
would like to share some of our findings here.

The plot below shows the throughput share between two competing TCP Prague
flows with one of them starting 5 seconds after the first one.  The
experiment settings were as follows:

   - This experiment was done on the Cloudlab
<https://www.cloudlab.us/> testbed
   with a 3-node topology ( source, router, receiver).
   - The bottleneck between the router and receiver was a 1 Gbps wired link
   ( 10 Gig interfaces , capacity restricted to 1Gbps using linux traffic
   shaping tools (tc) ).
   - The flows were sent using iperf3.
   - The AQM at the router was a FQ qdisc with a single bucket and was
   marking packets with ECN at a marking threshold of 5 ms. You can use the
   following parameters with the tc-fq qdisc to replicate this setting :  fq
   limit 5000p flow_limit 5000p orphan_mask 0 ce_threshold 5ms
   - The RTT scaling and ECN fallback features of TCP Prague were disabled
   for this set of experiments as we ran into some other issues with them.
   - The propagation / base delay of the setup was very low ( around 0.4 ms

[image: Screen Shot 2020-05-28 at 2.07.02 PM.png]

As you can observe, the second flow grabs almost all the
available bandwidth and the first one is starved. This experiment was done
using commit number e741f5a
the TCP Prague linux kernel implementation ( Apr 8 , 2020 ). After some
investigation, we found that there might be something broken with the TSO
burst sizing updates dones by TCP Prague. I disabled the TSO burst size
updates and ran the experiment with the exact same settings and found that
the fairness / convergence this time was much better. ( See next plot ).

[image: Screen Shot 2020-05-28 at 2.10.28 PM.png]

We have not gone further on investigating / fixing this issue for now, but
this email was a follow up to a meeting we had earlier today with Bob, Koen
and other members of the TCP Prague team. I would be happy to answer your
questions / comments on these results and continue further discussion on
these issues.

Also, if interested you can look into the ss data plots ( srtt and cwnd )
for these two experiments at this link :

Thank you,

Ashutosh Srivastava
First year PhD student
Department of Electrical and Computer Engineering
NYU Tandon School of Engineering