[tcpPrague] TSO burst sizing causing TCP Prague unfairness on high capacity links ?

Ashutosh Srivastava <as12738@nyu.edu> Thu, 28 May 2020 19:20 UTC

Return-Path: <as12738@nyu.edu>
X-Original-To: tcpprague@ietfa.amsl.com
Delivered-To: tcpprague@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4E4243A00E1 for <tcpprague@ietfa.amsl.com>; Thu, 28 May 2020 12:20:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_IMAGE_RATIO_06=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=nyu.edu header.b=MLumLQKB; dkim=pass (2048-bit key) header.d=nyu-edu.20150623.gappssmtp.com header.b=YXvBAvzt
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id d6GIDOnZVd7O for <tcpprague@ietfa.amsl.com>; Thu, 28 May 2020 12:20:03 -0700 (PDT)
Received: from mx0b-00256a01.pphosted.com (mx0a-00256a01.pphosted.com [148.163.150.240]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1BEF53A00E0 for <tcpprague@ietf.org>; Thu, 28 May 2020 12:20:02 -0700 (PDT)
Received: from pps.filterd (m0142701.ppops.net [127.0.0.1]) by mx0b-00256a01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 04SJ8E43127422 for <tcpprague@ietf.org>; Thu, 28 May 2020 15:20:02 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nyu.edu; h=mime-version : from : date : message-id : subject : to : content-type; s=20180315; bh=6PXY9pQyr7qXQ00SDEYzGqui0I2evCOA9ZR/9UVkx9s=; b=MLumLQKBZDYSA0HKGfFuoiWZYvycuSh1ixQFuOGe3sb4n6RaIix4F+9+9PxBb4YeOC3k d6K9rmvSBXwHpVaV76769nbYl+9pN3+Yy+ISSHszgeTyrtu8v+ZUzzX4jIhTO3+mFxQw GcgmP+BMGMXXiyommGCtiFc2QoB3bpbXjc9ivRlFPM5OvayZuYFcIMPpQAc6B7n6NZTv gbUS/96OpoUff/hhKYY8IrEwf2SEi8KMoPuS+Y+GHtp9dOcDkf0N52/DTYNHaIU627tI LiQNsnJVG2AtLeSmmlBUqeG1RnRzsDO4QjkVdfaiop5pbmftx3mTJcFeqGCEXkVi/LK1 1w==
Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by mx0b-00256a01.pphosted.com with ESMTP id 319ktr2ruw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <tcpprague@ietf.org>; Thu, 28 May 2020 15:20:01 -0400
Received: by mail-il1-f199.google.com with SMTP id c29so141868ilf.20 for <tcpprague@ietf.org>; Thu, 28 May 2020 12:20:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nyu-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=6PXY9pQyr7qXQ00SDEYzGqui0I2evCOA9ZR/9UVkx9s=; b=YXvBAvztYtX2ReQdeJ8tqDm85dnj/eEiHW2j+TA0cyIUa5v4fp14bCz+jeRveauXwV oVYSS/JC1gfkbUjy/KhcLvUprEPQfsAHHaWNNkG89iOsdHE8aqRmmhtnfNWKry2OmzHS WNux6yJVfS7/w9B29J1Vgv58Q/T1n941p5qbi3lkgGb9NNfW5W4QOMeF6sjqbx7TC3fP GEv9SlNip6y7Znzn33eGqXyMr9waFgLtlvzstpi0hHm4l68MIvveZ++6Ktat6YCiY1F+ zh3fSyuYGl6B3YyRIzU0Ii/QX0k8M7EII/Youe96oZEoWYnE8lW03/Pb6qkb8Zjrd38k AglQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=6PXY9pQyr7qXQ00SDEYzGqui0I2evCOA9ZR/9UVkx9s=; b=d9CbLsEuXX/TfV4pFT/0QKl3NUGtuFaCYQqwVhAblIA03bQ4ZdT/1wf9e97SuXEPaL gVMooyAdbkBYNpKaLrnIG25o26rbmkTjY97E2Ay1LLJdXqPruJgcUnhtbKuRMzQIJLMe IChC0h8mbpe4gom9gtq/BF5KlJ4/nHWQfk2/BFhHqT1UtCecgu5ubwGSnB3R7f+OFtmS 5h3UiSSVBkNAOZ4SxkfaPTjZOHIXWqbFrRc92qRBfF2Sl36NWaaTeDS9yY66XzWCrIvX 3RwOS2mXkQEIDKV7jFt6kw+C+4q765nWoDnuEs9X1h3tDuwBqyZMMYn+qlPeUhYkxEVg NPEg==
X-Gm-Message-State: AOAM5335+sfQRu4hrq1hImhUuZL41SY5bzrI5D6JK7lThzjiynQBorr8 FSK0iA1KfzWs+HCczqx2x5yzcu1Y2f+yQO54xA8seYwVtONBB56zivPBf8ljWgzwYE9J9NtvjN8 s5CpHQvWNcBWVV8aunv4Ri71dlg==
X-Received: by 2002:a6b:b9d5:: with SMTP id j204mr3667130iof.38.1590693600475; Thu, 28 May 2020 12:20:00 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJwkrgm5yfJsaNsi10cF+HOYludFFBbGVV0qB/SGbmOSX+muyGA4QMucCjFocv5jvAN70N1gy7fbQmbPpQsg4vE=
X-Received: by 2002:a6b:b9d5:: with SMTP id j204mr3667050iof.38.1590693599422; Thu, 28 May 2020 12:19:59 -0700 (PDT)
MIME-Version: 1.0
From: Ashutosh Srivastava <as12738@nyu.edu>
Date: Thu, 28 May 2020 15:19:48 -0400
Message-ID: <CAJyCXab5M=hUaORAeQs5NO3W-rDYPe6r5j6Wyx6q=Bxz4GEzvA@mail.gmail.com>
To: tcpprague@ietf.org
Content-Type: multipart/related; boundary="000000000000f11ca505a6ba3606"
X-Orig-IP: 209.85.166.199
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 impostorscore=0 phishscore=0 spamscore=0 cotscore=-2147483648 malwarescore=0 priorityscore=1501 mlxscore=0 clxscore=1011 mlxlogscore=999 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005280127
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpprague/khEqpww9ne99DdsFsIQJN8JCJpU>
Subject: [tcpPrague] TSO burst sizing causing TCP Prague unfairness on high capacity links ?
X-BeenThere: tcpprague@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "To coordinate implementation and standardisation of TCP Prague across platforms. TCP Prague will be an evolution of DCTCP designed to live alongside other TCP variants and derivatives." <tcpprague.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpprague/>
List-Post: <mailto:tcpprague@ietf.org>
List-Help: <mailto:tcpprague-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 May 2020 19:20:07 -0000

Hi everyone,

I am a PhD student at the NYU Tandon School of Engineering. Recently, I
have been working on evaluating
<http://witestlab.poly.edu/~ffund/pubs/tcp-mmwave.pdf> low latency
congestion control protocols ( like BBR, TCP-Prague) over high capacity
mmWave wireless links. We observed unfairness between TCP Prague flows when
running over high capacity links ( not just wireless but in general ) and I
would like to share some of our findings here.

The plot below shows the throughput share between two competing TCP Prague
flows with one of them starting 5 seconds after the first one.  The
experiment settings were as follows:

   - This experiment was done on the Cloudlab
<https://www.cloudlab.us/> testbed
   with a 3-node topology ( source, router, receiver).
   - The bottleneck between the router and receiver was a 1 Gbps wired link
   ( 10 Gig interfaces , capacity restricted to 1Gbps using linux traffic
   shaping tools (tc) ).
   - The flows were sent using iperf3.
   - The AQM at the router was a FQ qdisc with a single bucket and was
   marking packets with ECN at a marking threshold of 5 ms. You can use the
   following parameters with the tc-fq qdisc to replicate this setting :  fq
   limit 5000p flow_limit 5000p orphan_mask 0 ce_threshold 5ms
   - The RTT scaling and ECN fallback features of TCP Prague were disabled
   for this set of experiments as we ran into some other issues with them.
   - The propagation / base delay of the setup was very low ( around 0.4 ms
   )


[image: Screen Shot 2020-05-28 at 2.07.02 PM.png]

As you can observe, the second flow grabs almost all the
available bandwidth and the first one is starved. This experiment was done
using commit number e741f5a
<https://github.com/L4STeam/linux/commit/e741f5ac756503e27be9c183dd107eadbea40c5c#diff-38ce93325583f02d790276f5cafd1c42>
of
the TCP Prague linux kernel implementation ( Apr 8 , 2020 ). After some
investigation, we found that there might be something broken with the TSO
burst sizing updates dones by TCP Prague. I disabled the TSO burst size
updates and ran the experiment with the exact same settings and found that
the fairness / convergence this time was much better. ( See next plot ).

[image: Screen Shot 2020-05-28 at 2.10.28 PM.png]

We have not gone further on investigating / fixing this issue for now, but
this email was a follow up to a meeting we had earlier today with Bob, Koen
and other members of the TCP Prague team. I would be happy to answer your
questions / comments on these results and continue further discussion on
these issues.

Also, if interested you can look into the ss data plots ( srtt and cwnd )
for these two experiments at this link :
https://drive.google.com/drive/folders/1pLC0dcMF0-M1cgtw9IhoFiYOMvFJ-cc7?usp=sharing

Thank you,

Ashutosh Srivastava
First year PhD student
Department of Electrical and Computer Engineering
NYU Tandon School of Engineering