Re: [tcpPrague] TCP Prague under-utilising capacity when RTT scaling turned on

"Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com> Wed, 10 June 2020 10:27 UTC

Return-Path: <olivier.tilmans@nokia-bell-labs.com>
X-Original-To: tcpprague@ietfa.amsl.com
Delivered-To: tcpprague@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4C0883A08DE for <tcpprague@ietfa.amsl.com>; Wed, 10 Jun 2020 03:27:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nokia.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ufl20rqPQLUX for <tcpprague@ietfa.amsl.com>; Wed, 10 Jun 2020 03:27:45 -0700 (PDT)
Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30104.outbound.protection.outlook.com [40.107.3.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 866543A08DD for <tcpprague@ietf.org>; Wed, 10 Jun 2020 03:27:45 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nlMQvX41k9nmPaXSuYBBRtOys6YNF1u7HKc2a2Z5YEMlQ3APWGO94d2xk1PU804JplssklmCAlP0CgTrhII+QRgust+K+ybnVv7VJhvEefg2n4W2n9/fxfUanm0KTRZRPp5RF/xzlXS0eQJFI5n+Bd54ou67Z+vfUKeu+p9sXASUWqLkkm0SzBS18Akau5TBsjxKlmw15uFuDuEQ0uIGbo193zx+vXYXkOtmtz7OfvGWkb8R4L39EtFb/I552157PFPklfjUdWJGNbp1HiP0TQw3X4RbxH/QN1+pAQFh+SQJtQgay+6Ys8moNXZXCozYt5M4cGmlVXbN2bdxVyxbCw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jHXnM8EeM2NDXz/9y7+S1CJBTHFo+YPhPqG8z865DmU=; b=lZi3/gYu24YteY5qYkhIzYpN2b4YNvQKa7e/woUBauMl21JjhLlmV0nyAMq2X+qUCRtMrXkQ6EkQ5XXJgiivsW8SMkb2d1R3HIYgaulyH80XQeuk7wN7dLxBsya1igJDuNSU9d+4PKRatNVw4NhZ4zn6OcpJ2NC7ddtoI7O+UaBb3HNRREeuj9Oa9w8pT0jt49ZCE0jk7t0O6Rm55tWiMkcKalU4RPb8EK6VhipBtLF416Lhnh46Ekik+QXNw7dmlrNXaPkZrVFj2VlKgNoYGHQKn1vT2N3LXPrHPDuNkMWhpQIqSHLVH9S879mMUwho3w0Z6hv/j+adQq5pCHM6yQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nokia-bell-labs.com; dmarc=pass action=none header.from=nokia-bell-labs.com; dkim=pass header.d=nokia-bell-labs.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia.onmicrosoft.com; s=selector1-nokia-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jHXnM8EeM2NDXz/9y7+S1CJBTHFo+YPhPqG8z865DmU=; b=K2vLZt6+AaN0ACu7feV9sfQ3qJ7oaMa7QLcoyFmAlLwYaygzhBA17b39HsqCDQybUR8WYfAIXCLI8wEC2DuBs10xOh9ZJbK+mXI1XGo2PBZcgLw5/OWn9tQum1XOsbgKTx3X0CGbbdYqXb9asTWVSBoiVp4LmsxDXszKaj7Vsls=
Received: from AM0PR07MB4641.eurprd07.prod.outlook.com (2603:10a6:208:79::26) by AM0PR07MB4497.eurprd07.prod.outlook.com (2603:10a6:208:7a::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3088.13; Wed, 10 Jun 2020 10:27:43 +0000
Received: from AM0PR07MB4641.eurprd07.prod.outlook.com ([fe80::8d0b:d9d6:884a:85e1]) by AM0PR07MB4641.eurprd07.prod.outlook.com ([fe80::8d0b:d9d6:884a:85e1%3]) with mapi id 15.20.3088.019; Wed, 10 Jun 2020 10:27:43 +0000
From: "Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com>
To: Ashutosh Srivastava <as12738@nyu.edu>, "tcpprague@ietf.org" <tcpprague@ietf.org>
Thread-Topic: [tcpPrague] TCP Prague under-utilising capacity when RTT scaling turned on
Thread-Index: AQHWNSUkO2PTw4jZfk6qjXs2GgQ/eqjRt38A
Date: Wed, 10 Jun 2020 10:27:42 +0000
Message-ID: <AM0PR07MB46414BF56843D476FCB4FA12E0830@AM0PR07MB4641.eurprd07.prod.outlook.com>
References: <CAJyCXaYXbMwrejcNTLv6bxSmf3L0hGekqF4Ddui8=yxba2b-iA@mail.gmail.com>
In-Reply-To: <CAJyCXaYXbMwrejcNTLv6bxSmf3L0hGekqF4Ddui8=yxba2b-iA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: nyu.edu; dkim=none (message not signed) header.d=none;nyu.edu; dmarc=none action=none header.from=nokia-bell-labs.com;
x-originating-ip: [2a02:1811:3820:100:fd8e:d206:899:753d]
x-ms-publictraffictype: Email
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: 99e6f308-36f3-4905-5a6b-08d80d28e8f3
x-ms-traffictypediagnostic: AM0PR07MB4497:
x-microsoft-antispam-prvs: <AM0PR07MB4497C1CE95CA315A1DAB4CB6E0830@AM0PR07MB4497.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 0430FA5CB7
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: siV7EIK5Acz7rI1RRofAjVnJl+cNde048W918969pZQxEY/BsdtQmZgvD4dBis4Zc4hEshWmkQYvM/g06XcKPH4I7ZvjA3ovJhROLIXDgdA+ucIvKRiEaplefsdoa44T3Rb+pzxkHx6481tG2riTsajk58sT0jtaHijlJM0KaxfN/T01lN3r39C3NIW++XAbO0P6zmIzkZ76ck0yswC9gFZj7nsP7RoyZ0gjMTUWJ7G36IC5xhzC06Vl+JOhdYH8Rq6Bz7d2ZFtTaLpdGyb9Z8NJvBYxk8tISj2rjYdZEh5mqNbg9kEnViHZQQRuF7A4FrRLopeVX+u6pSzLYL7DTw3i3bwakT7JSuBQgrB7EF8ELFd8Gmd/sdk+H2bbMl6R8hvz0wZWJqGUpYpImmWD7Q==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM0PR07MB4641.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(136003)(39860400002)(376002)(396003)(346002)(366004)(8676002)(83380400001)(64756008)(66476007)(110136005)(8936002)(66446008)(66574014)(55016002)(9686003)(66556008)(86362001)(966005)(76116006)(186003)(52536014)(2906002)(6506007)(71200400001)(33656002)(478600001)(316002)(7696005)(5660300002)(66946007); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: YwL99hHBNw0LZwtSIn/Ydq4Z0yGRhX9nmvgYy+ns02Zd4asKr7wSlbVUXW4LwGFj4K8fjPqahycsQYQnXxZHehw7lzLzye5q80a835L+50koXEl9ISR0QTO6q/F8dSCn2r7jqnvYqx/uJ73ytT5rcArYGV4xhhNBHfNaPG+l0jFZrSZMXQEwZltyLJoOUai7ouJjO9L6fNFkHdR/MywX7tH2liFg72bT3RMU+f7/juD6HMv0cP1WXQIOzu4f4CzMWHWOHqcWrffJlHzx3dqWPBE1S9HdeLmyJuKm992iYAY5Fr3NRdRg5+0gZTESAPWJTor8CJDWf+Mh/k8KQhzzshxtsVZfmttCUHbIyZ1aOlkZJuhmqNpwWCFLLsuy4qNQmoajcV1EDmh1U9bMJoBCkNk8UvLHF2KBWnK21/5hT2A1UVJsPHQqd0kZoi2/Rz73dU/RQgrzt9CVD6gjLRuYFjk2W7a11SEPpLG9xis6H/ppU3j3E9EQLMfmvRXPPeuRmVaB0ztWwM5i+gn3Wm2oB1nN2KKNa1iIFTVrlyW82jo=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: nokia-bell-labs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 99e6f308-36f3-4905-5a6b-08d80d28e8f3
X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Jun 2020 10:27:42.9371 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: aApS2p1F+sLD3YFCfEELe4pH4PDWx3dtFhNwxdUHOTc+5hsD/TLALvYS3rVYZofDrx6MdVv2hhFIN/BUKyh4xzSu32bZKeWpr+OI5ELr7XrLCUk9S4GUUuwUM2/5TsHU
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR07MB4497
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpprague/v1vPL3wuc1MJ1pCxGtPCxHmH2JE>
Subject: Re: [tcpPrague] TCP Prague under-utilising capacity when RTT scaling turned on
X-BeenThere: tcpprague@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "To coordinate implementation and standardisation of TCP Prague across platforms. TCP Prague will be an evolution of DCTCP designed to live alongside other TCP variants and derivatives." <tcpprague.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpprague/>
List-Post: <mailto:tcpprague@ietf.org>
List-Help: <mailto:tcpprague-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jun 2020 10:27:47 -0000

Hi Ashutosh,

 > This is about an issue I found with the recently added RTT scaling feature
 > of TCP Prague linux kernel implementation. This was first observed with
 > commit number 7f267bd
 > <https://github.com/L4STeam/linux/commit/7f267bd591c249697f63f81b21a1755202
 > e86cc2#diff-38ce93325583f02d790276f5cafd1c42>  ( Mar 16, 2020 ) where RTT
 > scaling was turned on by default and also with future commits when we
 > explicitly enable RTT scaling.
 > 
 > The plot below shows the throughput of a single TCP Prague flow. Although
 > the available capacity was 1Gbps, TCP Prague is not able to reach that
 > point anytime during the experiment.

Thanks again for this report. The most inexplicable behavior here are the
apparent cwnd reductions that are happening while the connection is
under-utilizing the link (so should not be receiving CE marks).

I am unfortunately unable to reproduce this behavior on a phy or emulated
testbed, so would need additional information to work through this issue.

Could you confirm that this is also happening with the tip of the current
Branch? I.e., 3b63cc0 is the "future commits" you are referring to above?

In addition to the ss plots, could you:
- Confirm that this happens with both the internal TCP pacing and the fq
qdisc as pacer on the data sender
- Confirm that this happens both with and without gro/gso/tso/lro on the
endhosts (data sender and receiver, and also the aqm node as fq does not do
gro splitting prior to enqueue)?
- Confirm that the problem persists if you increase the base RTT--a simple
netem qdisc on the reverse path to add 2-5ms should be sufficient, do not
forget to disable gro/gso as that poorly interacts with netem.
- Log the reported CE marks by the AQM, as well as the
delivered_ce/received_ce counters throughout the experiment on the data
sender/receiver?
- Log eventual drops (dropped counter on AQM, and retransmission
counters on the data sender).


Thanks!


Best,
Olivier


 > 
 > The experiment settings were as follows
 > 
 > *	This experiment was done on the Cloudlab <https://www.cloudlab.us/>
 > testbed with a 3-node topology ( source, router, receiver).
 > 
 > *	The bottleneck between the router and receiver was a 1 Gbps wired
 > link ( 10 Gig interfaces , capacity restricted to 1Gbps using linux traffic
 > shaping tools (tc) ).
 > 
 > *	The flow was sent using iperf3.
 > 
 > *	The AQM at the router was a FQ qdisc with a single bucket and was
 > marking packets with ECN at a marking threshold of 5 ms. You can use the
 > following parameters with the tc-fq qdisc to replicate this setting :  fq
 > limit 5000p flow_limit 5000p orphan_mask 0 ce_threshold 5ms
 > *	The "ECN fallback on detection of classic ECN AQM" feature of TCP
 > Prague were disabled for this set of experiments.
 > *	The propagation / base delay of the setup was very low ( around 0.4
 > ms ) and no delay was added on top.
 > 
 > If interested, you can look at the ss data plots for these experiments at :
 > https://drive.google.com/drive/folders/1O6uEngxrDX5ipY71sjqr36lXCQBoZV--
 > ?usp=sharing
 > <https://drive.google..com/drive/folders/1O6uEngxrDX5ipY71sjqr36lXCQBoZV--
 > ?usp=sharing>
 > 
 > Looking forward to your comments and questions regarding these results.
 > 
 > Thank you,
 > 
 > Ashutosh Srivastava
 > First year PhD student,
 > Department of Electrical and Computer Engineering
 > NYU Tandon School of Engineering