Re: [tsvwg] new tests of L4S RTT fairness and intra-flow latency: defaults ready for testing

Sebastian Moeller <moeller0@gmx.de> Wed, 18 November 2020 06:21 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 706643A0B37 for <tsvwg@ietfa.amsl.com>; Tue, 17 Nov 2020 22:21:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.648
X-Spam-Level:
X-Spam-Status: No, score=-1.648 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vnErSXvI3dp4 for <tsvwg@ietfa.amsl.com>; Tue, 17 Nov 2020 22:21:30 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9BA3E3A0B38 for <tsvwg@ietf.org>; Tue, 17 Nov 2020 22:21:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1605680485; bh=TpTwe3BE9YS6YYdI/A4Thuea07LTNBSED4555CO7O88=; h=X-UI-Sender-Class:Date:In-Reply-To:References:Subject:To:CC:From; b=DLDe5q4IExTG9kkf8hKoMtdgaVpIWvmzHfmS5jujuMLVwD+FuoYXPyybLHssfGmbQ r9UhwfV0sS0poOQxYUDfRaO/RAFRXajbrnatRvGRO/SIVjgE8LThUOf99iihhexmzF pJi5V0WDNkjmldxxZM8tTmtd5yixFK/IJUH7MoB4=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.42.159] ([77.0.76.58]) by mail.gmx.com (mrgmx105 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MS3il-1kr7aZ3bGF-00TVLW; Wed, 18 Nov 2020 07:21:24 +0100
Date: Wed, 18 Nov 2020 07:21:23 +0100
User-Agent: K-9 Mail for Android
In-Reply-To: <811A76DD-3D48-43D3-A962-3F15AE9E858B@gmail.com>
References: <AM8PR07MB7476081896E0A1C4897FFBA3B9E20@AM8PR07MB7476.eurprd07.prod.outlook.com> <811A76DD-3D48-43D3-A962-3F15AE9E858B@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----SUM454L29WHEP2NRGOOZPMUL9AE7DX"
Content-Transfer-Encoding: 7bit
To: Jonathan Morton <chromatix99@gmail.com>, "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
CC: tsvwg IETF list <tsvwg@ietf.org>
From: Sebastian Moeller <moeller0@gmx.de>
Message-ID: <B0880150-AE61-46AF-8C3E-542DFE28BD51@gmx.de>
X-Provags-ID: V03:K1:H2Oy6mg3kBCp8e8fjGYUrTkEtkHH3yGqHUrONZLBYfTVgHMai5g lFHWKY5hQaFHJo+63TTIrrc6P7duagCWj1WbSnWLuhKIus7F+mu4HFkp2KH8UaOvfeZbKbq oxFbrTdlm+9bkLpTtoUrqhuIqagFOzzNs4hAcJlrWwnIzircNsqJWnnli4ihLPTI09IMvOm FYdA/J5lWEKtcMhrDeM/A==
X-UI-Out-Filterresults: notjunk:1;V03:K0:IqEiCGKIyjE=:zP1lGHevHIxh3EjXqe/APl ueBmvcl79QBZg7YRkumJSuCpPjOBmoyBeL1c97UyjhE0z/mKmeu1nYDwadLWvvps/Acvd4FKi 6e8wxD7fIfN47xBCsRnRJ/rtrbYOP3Ph/mG/QxfRAK9m+aZL2LFwQvM7WtBgc4mXgspz//cXp 2rLftuQeAq1hlHxlB5aopO0ZyaDYIz467Q6mHCv+kt17Qan/z4I486qEIQbmzogSWSWkijmxR W84sCKZxsJEcN+jcuXRikoRCbrYjH7K1ZCmzfyzSgtZENCXMNAKJ+huaP+RJb68Qe6P7F5PoZ Z5JWi6yfO3VuaSYmllSxccBVDIFHcwdw4nyJolpz5kFcap8VwRXCpOsrhnhAIOiku/xQ7tyTS 2H+do+Jn6v4R+exRTdpiMsdoDe+1fkmf//D7Oc8zZGrJDnFFZ7z/WLE8x0QcrWoNtDxPhwiQl mNC+LiLi6qXPaSLAauR19czrKC1u5cHIs+xNUlhlYgVP/XGO1JAGPBvnzwheCtRp2ucPdotdP FSR6h8XYrLc48+NxcvZIzIHDoF7UcJ8QiVAGOY+PEcbVIYkevo8xy8mNy+Csf/ky/Hg0MpLtf /rGZ19rl1U19Rsxy2nTryYrvkn863n/7Op8RqlG75y2VbCUCqaMZEYu1SLx9hFrxuRlUWmSAf 71GlnPUY7XMIiQVaAdr444XZhOCcGA0lm4zqV208WaoqcUZOGmTtCnPwQ9qhPmFmT6xU2g+Ri g3tpwrlzJ6fyGDpXl7Bxrt5P5yuIj3YzKKPZRACQ7trN1R/VagoMowzT88nFy+Tvu9D06juIL pGWKtTyV4gLfUjOo5DngXNFFlbtek5ekBfv7mi7NHWKJO4w/fChMMmPtYVzyJsuTi37mkqKJ5 pbWhv+KK6lVT6myoHBXA==
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/F9KPW7IBqIkdc-hKUmu8CzRFqO8>
Subject: Re: [tsvwg] new tests of L4S RTT fairness and intra-flow latency: defaults ready for testing
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Nov 2020 06:21:35 -0000

Hi Jonathan,

The changes that Oliver submitted to TCP Prague not only increase the fudge RTT to 25ms, but they also, as Koen mentioned, increase the time it takes for Prague to switch to the new fairness mode to 500 RTTs, or 5 seconds at 10ms RTT.
I really wonder how that affects fairness if the shallow queue is used by a bunch of short flows with 10ms RTT. As far as I can tell the consequences of this delayed engagement have not been described properly. 
@Koen, @Oliver, in the kernel code you hint at a paper about your RTT independence method. It would be great if you could post links to your analysis how this transition period affects sharing between flows inside the shallow queue and across queues. Ideally that data would show data for the old 100 RTT transition delay, as well for the new value of 500 RTTs.

My fear is that this will now give transient TCP Prague flows an undeserved advantage (for longer, that transition was part of the RTT independence code before), and just because that condition has not been tested by external parties yet, does not mean that it is no matter of concern. In fact that lack of external testing is rather cause for concern, as so far almost all external testing found problem spots in L4S almost immediately.
In fact the non-chalance in which these RTT independence parameters where changed apparently with an email to this list, and without any note that the functionality of these new parameters had actually been empirically verified.

Best Regards
        Sebastian



On 18 November 2020 01:28:05 CET, Jonathan Morton <chromatix99@gmail.com> wrote:
>> On 17 Nov, 2020, at 3:32 pm, De Schepper, Koen (Nokia - BE/Antwerp)
><koen.de_schepper@nokia-bell-labs.com> wrote:
>> 
>> The RTT-independence was implemented, available and demonstrated
>several meetings ago already and as presented working very well
>according to our tests. The following parameters are now set as
>default, so can be tested out of the box:
>> 
>> All Prague flows with an RTT below 25ms will now converge to the same
>rate, independent of their real base RTT. This means that flows with a
>bigger RTT than 25 ms will never have to compete against smaller than
>25ms RTT flows. 
>> 
>> Now the defaults are set, I'm looking forward to independent
>evaluations.
>
>Since our tests are quite well automated, we were able to run a subset
>of them (all at 50Mbps) against the new defaults this evening.
>
>I'll give you credit: there is some improvement in some of the tests. 
>However, we could still draw most of the same conclusions from the new
>data as we did from last week's data; the big-picture problems are
>still present and in some cases have actually deteriorated.
>
>I'll focus on two major concerns in particular:
>
>1: Prague outcompetes CUBIC in DualPI2, at a common baseline RTT.  This
>only stops being true when the BDP is large enough for Prague to have
>difficulty growing to steady state in a reasonable amount of time.
>
>With the new code, the Jain's index improves from .823 to .987 at 10ms
>(the advantage in both cases being to Prague), but actually worsens
>from .880 to .838 at 20ms, and from .936 to .890 at 80ms.  All of these
>are sampled after allowing two minutes for the flows to converge to
>steady-state.
>
>2: Prague vs Prague competition on differing RTTs.
>
>Here is Figure 3 from the test report we recently posted, followed by
>an equivalent chart generated from the new data this evening.  Let's
>play spot the difference:
>
>
>
>
>I can say that the throughput ratio for Prague vs Prague via DualPI2
>is, in fact, slightly improved in the new data, but it is still
>significantly worse even than the 16:1 ratio expected from the baseline
>RTTs at identical average cwnd.  In a similar test with 80ms versus
>20ms RTTs, the two Prague flows also have more than the expected 4:1
>throughput ratio.  I don't have an immediate explanation for that.
>
>Notice that with both the old and new code, CodelAF gets very close to
>parity in throughput with the same traffic load, and that even through
>DualPI2, a pair of CUBIC flows is closer to parity than a pair of
>Prague flows.  That is not, overall, an improvement in RTT independence
>from switching to TCP Prague and/or DualPI2.
>
>However, we did find an improvement in fairness, compared to the older
>code, when comparing 20ms vs 10ms Prague flows.  That's what you were
>going for, wasn't it?  A shame that, in achieving that singular
>success, so many other things are left unresolved.
>
>I'm sure we will have the opportunity to run more tests on your future
>efforts.  For the moment, with limited time on our hands, this will
>have to do.
>
> - Jonathan Morton

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.