[tcpPrague] TSO burst sizing causing TCP Prague unfairness on high capacity links ?

Ashutosh Srivastava <as12738@nyu.edu> Thu, 28 May 2020 19:20 UTC

Return-Path: <as12738@nyu.edu>
X-Original-To: tcpprague@ietfa.amsl.com
Delivered-To: tcpprague@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 4E4243A00E1 for <tcpprague@ietfa.amsl.com>; Thu, 28 May 2020 12:20:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_IMAGE_RATIO_06=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=nyu.edu header.b=MLumLQKB; dkim=pass (2048-bit key) header.d=nyu-edu.20150623.gappssmtp.com header.b=YXvBAvzt
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id d6GIDOnZVd7O for <tcpprague@ietfa.amsl.com>; Thu, 28 May 2020 12:20:03 -0700 (PDT)
Received: from mx0b-00256a01.pphosted.com (mx0a-00256a01.pphosted.com []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1BEF53A00E0 for <tcpprague@ietf.org>; Thu, 28 May 2020 12:20:02 -0700 (PDT)
Received: from pps.filterd (m0142701.ppops.net []) by mx0b-00256a01.pphosted.com ( with SMTP id 04SJ8E43127422 for <tcpprague@ietf.org>; Thu, 28 May 2020 15:20:02 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nyu.edu; h=mime-version : from : date : message-id : subject : to : content-type; s=20180315; bh=6PXY9pQyr7qXQ00SDEYzGqui0I2evCOA9ZR/9UVkx9s=; b=MLumLQKBZDYSA0HKGfFuoiWZYvycuSh1ixQFuOGe3sb4n6RaIix4F+9+9PxBb4YeOC3k d6K9rmvSBXwHpVaV76769nbYl+9pN3+Yy+ISSHszgeTyrtu8v+ZUzzX4jIhTO3+mFxQw GcgmP+BMGMXXiyommGCtiFc2QoB3bpbXjc9ivRlFPM5OvayZuYFcIMPpQAc6B7n6NZTv gbUS/96OpoUff/hhKYY8IrEwf2SEi8KMoPuS+Y+GHtp9dOcDkf0N52/DTYNHaIU627tI LiQNsnJVG2AtLeSmmlBUqeG1RnRzsDO4QjkVdfaiop5pbmftx3mTJcFeqGCEXkVi/LK1 1w==
Received: from mail-il1-f199.google.com (mail-il1-f199.google.com []) by mx0b-00256a01.pphosted.com with ESMTP id 319ktr2ruw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <tcpprague@ietf.org>; Thu, 28 May 2020 15:20:01 -0400
Received: by mail-il1-f199.google.com with SMTP id c29so141868ilf.20 for <tcpprague@ietf.org>; Thu, 28 May 2020 12:20:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nyu-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=6PXY9pQyr7qXQ00SDEYzGqui0I2evCOA9ZR/9UVkx9s=; b=YXvBAvztYtX2ReQdeJ8tqDm85dnj/eEiHW2j+TA0cyIUa5v4fp14bCz+jeRveauXwV oVYSS/JC1gfkbUjy/KhcLvUprEPQfsAHHaWNNkG89iOsdHE8aqRmmhtnfNWKry2OmzHS WNux6yJVfS7/w9B29J1Vgv58Q/T1n941p5qbi3lkgGb9NNfW5W4QOMeF6sjqbx7TC3fP GEv9SlNip6y7Znzn33eGqXyMr9waFgLtlvzstpi0hHm4l68MIvveZ++6Ktat6YCiY1F+ zh3fSyuYGl6B3YyRIzU0Ii/QX0k8M7EII/Youe96oZEoWYnE8lW03/Pb6qkb8Zjrd38k AglQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=6PXY9pQyr7qXQ00SDEYzGqui0I2evCOA9ZR/9UVkx9s=; b=d9CbLsEuXX/TfV4pFT/0QKl3NUGtuFaCYQqwVhAblIA03bQ4ZdT/1wf9e97SuXEPaL gVMooyAdbkBYNpKaLrnIG25o26rbmkTjY97E2Ay1LLJdXqPruJgcUnhtbKuRMzQIJLMe IChC0h8mbpe4gom9gtq/BF5KlJ4/nHWQfk2/BFhHqT1UtCecgu5ubwGSnB3R7f+OFtmS 5h3UiSSVBkNAOZ4SxkfaPTjZOHIXWqbFrRc92qRBfF2Sl36NWaaTeDS9yY66XzWCrIvX 3RwOS2mXkQEIDKV7jFt6kw+C+4q765nWoDnuEs9X1h3tDuwBqyZMMYn+qlPeUhYkxEVg NPEg==
X-Gm-Message-State: AOAM5335+sfQRu4hrq1hImhUuZL41SY5bzrI5D6JK7lThzjiynQBorr8 FSK0iA1KfzWs+HCczqx2x5yzcu1Y2f+yQO54xA8seYwVtONBB56zivPBf8ljWgzwYE9J9NtvjN8 s5CpHQvWNcBWVV8aunv4Ri71dlg==
X-Received: by 2002:a6b:b9d5:: with SMTP id j204mr3667130iof.38.1590693600475; Thu, 28 May 2020 12:20:00 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJwkrgm5yfJsaNsi10cF+HOYludFFBbGVV0qB/SGbmOSX+muyGA4QMucCjFocv5jvAN70N1gy7fbQmbPpQsg4vE=
X-Received: by 2002:a6b:b9d5:: with SMTP id j204mr3667050iof.38.1590693599422; Thu, 28 May 2020 12:19:59 -0700 (PDT)
MIME-Version: 1.0
From: Ashutosh Srivastava <as12738@nyu.edu>
Date: Thu, 28 May 2020 15:19:48 -0400
Message-ID: <CAJyCXab5M=hUaORAeQs5NO3W-rDYPe6r5j6Wyx6q=Bxz4GEzvA@mail.gmail.com>
To: tcpprague@ietf.org
Content-Type: multipart/related; boundary="000000000000f11ca505a6ba3606"
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 impostorscore=0 phishscore=0 spamscore=0 cotscore=-2147483648 malwarescore=0 priorityscore=1501 mlxscore=0 clxscore=1011 mlxlogscore=999 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005280127
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpprague/khEqpww9ne99DdsFsIQJN8JCJpU>
Subject: [tcpPrague] TSO burst sizing causing TCP Prague unfairness on high capacity links ?
X-BeenThere: tcpprague@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "To coordinate implementation and standardisation of TCP Prague across platforms. TCP Prague will be an evolution of DCTCP designed to live alongside other TCP variants and derivatives." <tcpprague.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpprague/>
List-Post: <mailto:tcpprague@ietf.org>
List-Help: <mailto:tcpprague-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 May 2020 19:20:07 -0000

Hi everyone,

I am a PhD student at the NYU Tandon School of Engineering. Recently, I
have been working on evaluating
<http://witestlab.poly.edu/~ffund/pubs/tcp-mmwave.pdf> low latency
congestion control protocols ( like BBR, TCP-Prague) over high capacity
mmWave wireless links. We observed unfairness between TCP Prague flows when
running over high capacity links ( not just wireless but in general ) and I
would like to share some of our findings here.

The plot below shows the throughput share between two competing TCP Prague
flows with one of them starting 5 seconds after the first one.  The
experiment settings were as follows:

   - This experiment was done on the Cloudlab
<https://www.cloudlab.us/> testbed
   with a 3-node topology ( source, router, receiver).
   - The bottleneck between the router and receiver was a 1 Gbps wired link
   ( 10 Gig interfaces , capacity restricted to 1Gbps using linux traffic
   shaping tools (tc) ).
   - The flows were sent using iperf3.
   - The AQM at the router was a FQ qdisc with a single bucket and was
   marking packets with ECN at a marking threshold of 5 ms. You can use the
   following parameters with the tc-fq qdisc to replicate this setting :  fq
   limit 5000p flow_limit 5000p orphan_mask 0 ce_threshold 5ms
   - The RTT scaling and ECN fallback features of TCP Prague were disabled
   for this set of experiments as we ran into some other issues with them.
   - The propagation / base delay of the setup was very low ( around 0.4 ms

[image: Screen Shot 2020-05-28 at 2.07.02 PM.png]

As you can observe, the second flow grabs almost all the
available bandwidth and the first one is starved. This experiment was done
using commit number e741f5a
the TCP Prague linux kernel implementation ( Apr 8 , 2020 ). After some
investigation, we found that there might be something broken with the TSO
burst sizing updates dones by TCP Prague. I disabled the TSO burst size
updates and ran the experiment with the exact same settings and found that
the fairness / convergence this time was much better. ( See next plot ).

[image: Screen Shot 2020-05-28 at 2.10.28 PM.png]

We have not gone further on investigating / fixing this issue for now, but
this email was a follow up to a meeting we had earlier today with Bob, Koen
and other members of the TCP Prague team. I would be happy to answer your
questions / comments on these results and continue further discussion on
these issues.

Also, if interested you can look into the ss data plots ( srtt and cwnd )
for these two experiments at this link :

Thank you,

Ashutosh Srivastava
First year PhD student
Department of Electrical and Computer Engineering
NYU Tandon School of Engineering