Re: [tsvwg] new tests of L4S RTT fairness and intra-flow latency: defaults ready for testing

"De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com> Wed, 18 November 2020 11:56 UTC

Return-Path: <koen.de_schepper@nokia-bell-labs.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 11C7D3A17F1 for <tsvwg@ietfa.amsl.com>; Wed, 18 Nov 2020 03:56:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nokia.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vzFUCFQS7ucM for <tsvwg@ietfa.amsl.com>; Wed, 18 Nov 2020 03:56:48 -0800 (PST)
Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-eopbgr140099.outbound.protection.outlook.com [40.107.14.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A7F643A17F0 for <tsvwg@ietf.org>; Wed, 18 Nov 2020 03:56:47 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Yad6W0OgENxzX0E9AuN1r+Mn+EUyDs/byX7IhM0x5AZDRoQGwPk6JKwoYL2l4LHcj3HYAjQQIeAzvXnhnngTRlfeU02EYpPo4k9HEX9HoMovD844tKgMAhE1SV7BTVahiESZRvblU0Va/15znNQIDMPWeL0ZyLr+k9Wtk0/BOf6Yi4477ro2mBcChoYuuvN4SlsvpT3VqXJ2JDSWVlnjVWhSGfNxz6Y5fnqdI8ic/Cmk2slx/OuqNi5QmuDgFtlPwQZZQAw0DdCn45DJrRVkHg4RrBgY0VKEUbORc59xgHBhSYGOYB/KN/om356PH6EAY77RIUIJrQMqgZMBL1GqKQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kTkMxZfFVCB4a2OeIwgTsGxvV7K2ciO88QbuvqgDb6I=; b=jElgu6zHFr0KSHmtiXWRfHeHzXGOOLHAylzGvPZ4lvQQ4+xgWXyjQXyrvFKToKjjFUZw4g/O0PEO8B5iDNCERy1NcO17yb5Jn5TLtkRbsKbHhJupaZVtEXNqYlRJJZEkd03qx6aQa1HZ22mL1ilUitaqeChm0rnTYyGEU6DQnQDuWuwNFShGqcZDcIE2Dbpy2Rur8n0qK3H9zGekRh8vgs+sovvHWZ0jVlMOgGFvaNcb5LFBzJalL+8fIdCdeI7CzBJvHcPwn4fdjO+qdjUHdOZAI0ay/KKeBmK66lvdMnO6KokbOjUhYkDAt0i3UICtL1dc3984JE+v7y3AVAy9zA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nokia-bell-labs.com; dmarc=pass action=none header.from=nokia-bell-labs.com; dkim=pass header.d=nokia-bell-labs.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia.onmicrosoft.com; s=selector1-nokia-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kTkMxZfFVCB4a2OeIwgTsGxvV7K2ciO88QbuvqgDb6I=; b=gsTgGapPh6pJZ+nnVGE4VsJ65wtt9eampBKHRJR6rvsNs1PMh5jfOokEaMWzpo2/luN/rQVGH4Z1SKd4J2x1Wn54TQjXCHXYlFhync4BW0yuJFe5CnpzWpHT6ey4muBvXBPolCmOXQXpG+nBe9eD8jrcaWoXPztx+4vnspcGdfw=
Received: from AM8PR07MB7476.eurprd07.prod.outlook.com (2603:10a6:20b:24e::12) by AM9PR07MB7300.eurprd07.prod.outlook.com (2603:10a6:20b:2c3::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3589.15; Wed, 18 Nov 2020 11:56:45 +0000
Received: from AM8PR07MB7476.eurprd07.prod.outlook.com ([fe80::e966:7a41:b22a:1560]) by AM8PR07MB7476.eurprd07.prod.outlook.com ([fe80::e966:7a41:b22a:1560%4]) with mapi id 15.20.3589.019; Wed, 18 Nov 2020 11:56:45 +0000
From: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
To: Jonathan Morton <chromatix99@gmail.com>
CC: Lars Eggert <lars@eggert.org>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>, tsvwg IETF list <tsvwg@ietf.org>, Neal Cardwell <ncardwell@google.com>
Thread-Topic: [tsvwg] new tests of L4S RTT fairness and intra-flow latency: defaults ready for testing
Thread-Index: Ada85MWc272rjDX0SbC69eow4sPeGgAXOjCAABbVbKA=
Date: Wed, 18 Nov 2020 11:56:44 +0000
Message-ID: <AM8PR07MB74761B7A75E727A7830E72A6B9E10@AM8PR07MB7476.eurprd07.prod.outlook.com>
References: <AM8PR07MB7476081896E0A1C4897FFBA3B9E20@AM8PR07MB7476.eurprd07.prod.outlook.com> <811A76DD-3D48-43D3-A962-3F15AE9E858B@gmail.com>
In-Reply-To: <811A76DD-3D48-43D3-A962-3F15AE9E858B@gmail.com>
Accept-Language: nl-BE, en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=nokia-bell-labs.com;
x-originating-ip: [2a02:1810:1e00:cb00:210b:63c2:20dd:546d]
x-ms-publictraffictype: Email
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: baa80c05-e396-4aba-48ba-08d88bb90583
x-ms-traffictypediagnostic: AM9PR07MB7300:
x-microsoft-antispam-prvs: <AM9PR07MB730037D66561DC5E1590356FB9E10@AM9PR07MB7300.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 7o1t1rqsPZYbuazm+snUat0bimiGXt/Enmq23XZQicewG+IF+L/Nl/+zwQFSuQuN9lqUuGV1pX80wm7/QKPMqa56/O0LCn94I68Rpvt+3FWrNmsW8+yWvBasDLfhr3IQYauJgzoVglnu/HBo83VVYN/P064S1TqcCoUl9g1/wjeHO4xwcWONE3b3QYsFR6guyD8gbcD1eFof5uidxHzqNtGfpwboLXYRMe+6Jtelh8hnWFuJw/H6gA18fH1Kx7xTl08+mybVxGgdi22qDDlBZdc8k6OVNsThk8WjdFungO3kJxBILiIJJDlyYNQoW29TBwrVnZ9BxAtpgxYlZ6FgFfk2pWyiuk71G2pDYRgW8GRydDwSk9vSJlUnqoS5JXQvsZ7ciba+GZ91my0MlpLUmQ==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM8PR07MB7476.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(39860400002)(366004)(346002)(396003)(136003)(9686003)(8676002)(316002)(86362001)(5660300002)(4326008)(64756008)(66576008)(99936003)(54906003)(66946007)(478600001)(66446008)(55016002)(66556008)(66476007)(166002)(83380400001)(33656002)(8936002)(76116006)(186003)(71200400001)(6916009)(52536014)(7696005)(53546011)(6506007)(2906002); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: mp7yZp3R5mLeZTXc71lDkrps8FdBhq949GMU5e9zh0T/Evspjx7Hz6jl7U7UWXkiXtCa6qMs79bUFjzukVI2TnQ/LkS98MjQdL/aSAdoW1kQ3pjP7WxbhlwZU2fMee+1Ae2IqcTaGbhNljRDjDQw/oL2PaDPR75d+yiqU5e2yY0LkpXfXsuAksX4jVmjvh8z/+pbYEIYh6PmWPVK801csu4X10iTljR75inEh09CNeXsGCLYRmDjsQoVe+rC9B+OEwfFxpkn/Qe5/MwYMq2qQtsw4OFMRCl/aWV2//1xBNHYspCY34+sDiQzAiXApdmaEhFXFtqxAIJRNTzVLzYBuODDluYSnTeW02saV6Z/l/HJ0D5DQoLc4i1glCAu6IzBtzxdUDmELSTeAWoa90893QMFO8ib+lsimvNve2t8FHt0q13v1s39TmC03GxcAd+Km98LGMGNE6+9Q8D31CbQGc6p9mpU7uaTf4kxPOY2QWcvJb7kARUGHGlRf5q8YLAXRWOCdFi7+nKsU3ZE0C91nBz2SWA9e4twNQ6QXQHOA1Z0sAoVTCHN/ylZqp/mL5L6miFPm6g9/51xc6MqkCJzXNe4gcESC993IZ+xNOsFgPuPz4VrJm2qClP/4MUITjax5BFbfY86qFNW0xaM/tkGhUgJgYnuox4UY7gL7VNprIXc+CKXi8bNepBcFVt3/qEl4zI6FYSHhO0bYO+4itHdAshE5SEPnQ5ICiiRRVkIa3/jGvREmqc1Rj7m46cXx953ihEAn1EKk4RMR4u3S+vULVuHH4PGfskO8KpVyhE4wYoFwtKzWicVD3MPizOSmKSY7be2lJnnUWga/3jBfVsVHSJD1n3bIVbJLWntbyiYqk6fUtlH2bVjWjJC1jhDVKL7twDoSYtpH1cGRfOO8UJhoP3f8ch5n7hXtCoRzfCibCkh8lmZ53xyZyaPRx2sfN6Uy1/+uAncotnK37SHYw1jrw==
x-ms-exchange-transport-forked: True
Content-Type: multipart/related; boundary="_005_AM8PR07MB74761B7A75E727A7830E72A6B9E10AM8PR07MB7476eurp_"; type="multipart/alternative"
MIME-Version: 1.0
X-OriginatorOrg: nokia-bell-labs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: AM8PR07MB7476.eurprd07.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: baa80c05-e396-4aba-48ba-08d88bb90583
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Nov 2020 11:56:44.9610 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: V698snro4MXTgrQ6MpUzedG9RJVl09pqf59WYV7oH013XYy2p5Zjy31+C/Bzya4TCIPIwwZ+H+MXfNrhuo6kBLtB7M7VQ48NU4DC64PfEKFKxptpn4pfkywDmJT/kcm+
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR07MB7300
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/UV3r5OzHpCQU8mGcHzs7V41xx-M>
Subject: Re: [tsvwg] new tests of L4S RTT fairness and intra-flow latency: defaults ready for testing
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Nov 2020 11:56:51 -0000

Hi Jonathan, Pete, Sebastian,

Thanks for also showing this case sharp again. The reason for this worse longer RTT behavior of Prague and DCTCP is that the smallest flow induces the 1-RTT-on/X-RTT-off burst pattern on the step threshold of the L4S queue, which is known to trigger the r=2/(p^2*RTT) DCTCP response (as defined in the original DCTCP paper). L4S started when we found that DCTCP behaves as r=2/(p*RTT), so the square on p disappears, when it gets the continuous per RTT-stable marking rate from a Classic AQM (when it gets coupled). Any in between burstiness will result in a value somewhere in between those equations.

So when the 1-RTT-on/X-RTTs-off bursts of the smallest flow become a steady per RTT marking rate for the biggest RTT flow, it will have in worse case an extra 1/p disadvantage. The second paper described this behavior as a 1/RTT^2 dependency, but I think it is more related due to this effect. Neal Cardwell recently suggested looking into the more fluid per packet implementation to eliminate the RTT-dependence behavior, which I thought was not relevant as it "only" solves the getting the 1/RTT^2 to 1/RTT, and so not a solution for 1/RTT^0 objective for RTT-independence. But it would actually solve this "Too big rate disadvantage for long >80ms RTTs" problem. So currently to be tried and further investigated. Further contributions are appreciated there as well (from anyone).

As I understood there is an expectation to have a CC that can be just enabled and autotunes to all conditions. As a fast work-around to make selecting Prague as a good alternative in the full range, we are considering switching to non-ETC or ECT(0) and Reno when the RTT is >80ms. Even better, but a bigger code impact, would be to switch to Cubic in that case. Olivier can push the fallback to Reno if RTT>80ms, which for now would solve this issue until there is any of the better solutions available.

Regards,
Koen.


From: Jonathan Morton <chromatix99@gmail.com>
Sent: Wednesday, November 18, 2020 1:28 AM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
Cc: Lars Eggert <lars@eggert.org>; Ingemar Johansson S <ingemar.s.johansson@ericsson.com>; tsvwg IETF list <tsvwg@ietf.org>
Subject: Re: [tsvwg] new tests of L4S RTT fairness and intra-flow latency: defaults ready for testing

On 17 Nov, 2020, at 3:32 pm, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com<http://labs.com>> wrote:

The RTT-independence was implemented, available and demonstrated several meetings ago already and as presented working very well according to our tests. The following parameters are now set as default, so can be tested out of the box:

All Prague flows with an RTT below 25ms will now converge to the same rate, independent of their real base RTT. This means that flows with a bigger RTT than 25 ms will never have to compete against smaller than 25ms RTT flows.

Now the defaults are set, I'm looking forward to independent evaluations.

Since our tests are quite well automated, we were able to run a subset of them (all at 50Mbps) against the new defaults this evening.

I'll give you credit: there is some improvement in some of the tests.  However, we could still draw most of the same conclusions from the new data as we did from last week's data; the big-picture problems are still present and in some cases have actually deteriorated.

I'll focus on two major concerns in particular:

1: Prague outcompetes CUBIC in DualPI2, at a common baseline RTT.  This only stops being true when the BDP is large enough for Prague to have difficulty growing to steady state in a reasonable amount of time.

With the new code, the Jain's index improves from .823 to .987 at 10ms (the advantage in both cases being to Prague), but actually worsens from .880 to .838 at 20ms, and from .936 to .890 at 80ms.  All of these are sampled after allowing two minutes for the flows to converge to steady-state.

2: Prague vs Prague competition on differing RTTs.

Here is Figure 3 from the test report we recently posted, followed by an equivalent chart generated from the new data this evening.  Let's play spot the difference:

[cid:image001.png@01D6BDAA.28DA8D70]
[cid:image002.png@01D6BDAA.28DA8D70]

I can say that the throughput ratio for Prague vs Prague via DualPI2 is, in fact, slightly improved in the new data, but it is still significantly worse even than the 16:1 ratio expected from the baseline RTTs at identical average cwnd.  In a similar test with 80ms versus 20ms RTTs, the two Prague flows also have more than the expected 4:1 throughput ratio.  I don't have an immediate explanation for that.

Notice that with both the old and new code, CodelAF gets very close to parity in throughput with the same traffic load, and that even through DualPI2, a pair of CUBIC flows is closer to parity than a pair of Prague flows.  That is not, overall, an improvement in RTT independence from switching to TCP Prague and/or DualPI2.

However, we did find an improvement in fairness, compared to the older code, when comparing 20ms vs 10ms Prague flows.  That's what you were going for, wasn't it?  A shame that, in achieving that singular success, so many other things are left unresolved.

I'm sure we will have the opportunity to run more tests on your future efforts.  For the moment, with limited time on our hands, this will have to do.

 - Jonathan Morton