Re: [tsvwg] L4S status: #17 Interaction w/ FQ AQMs

Sebastian Moeller <moeller0@gmx.de> Mon, 11 November 2019 22:33 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 48E7112022C for <tsvwg@ietfa.amsl.com>; Mon, 11 Nov 2019 14:33:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3Lv-ThSelGx6 for <tsvwg@ietfa.amsl.com>; Mon, 11 Nov 2019 14:33:03 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B74D91201E0 for <tsvwg@ietf.org>; Mon, 11 Nov 2019 14:33:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1573511576; bh=4iM8wTNd4DXtJPK0n6qjx/b6cYjZr/Io3bBhxtoVNDo=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=RrvRMDHxK75+eQTLh2m0gThyfIlK6DdySFTtW6VlZyWePzjXQ7wXEKO3r8CFn5gMh VwnANw4ZC30WTt8KS3Bi/JZW5morzmCEFF2zTn93/abcnEMhqJNxAiYejjcwWfuSwQ R/4FZK/DalpGu1eO6B5J5l7BizFPqM5BKp/vF6dQ=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from hms-beagle2.lan ([77.1.134.57]) by mail.gmx.com (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MPGW7-1iFgCD11xN-00Pal1; Mon, 11 Nov 2019 23:32:56 +0100
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <D79022F7-7509-45C9-9ECF-D420AE255E60@cablelabs.com>
Date: Mon, 11 Nov 2019 23:32:54 +0100
Cc: Pete Heist <pete@heistp.net>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <5EA3722C-E512-45D4-AFD3-B8887213132D@gmx.de>
References: <8321f975-dfe7-694c-b5cc-09fa371b9b61@mti-systems.com> <B58A5572-510E-42C7-8181-42A0BE298393@gmail.com> <D2E12331-F504-4D5F-B8E7-A1A5E98DDF7E@cablelabs.com> <2275E6A5-C8F8-477F-A24A-3E6168917DDF@gmail.com> <55F724CD-6E74-40D9-8416-D1918C2008DD@cablelabs.com> <BBE7C7A9-0222-4D84-BF27-8D5CAE2F995E@gmail.com> <6f189711-ffa0-90f4-fd16-3464ba4df3ce@mti-systems.com> <4A706B11-3239-4DAC-BE85-0B4BFF2D8FF8@heistp.net> <8B28ECE4-FF4B-4BB2-ACBE-80B30708F97E@cablelabs.com> <AAEA9AC2-B8A1-4837-A7C9-8EEA21A7C523@gmx.de> <D5D560CB-BC47-45BE-811E-E73E2D4909E3@cablelabs.com> <40B5DA8A-8290-41DC-8A2B-B9DE9661D605@heistp.net> <1D98B69D-5D15-4516-8851-119AE706FA7B@cablelabs.com> <D79022F7-7509-45C9-9ECF-D420AE255E60@cablelabs.com>
To: Greg White <g.white@CableLabs.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-Provags-ID: V03:K1:LwJGTp1ZPfJZdsaCYX1hmI5BWHqm0036EMuXpi5j4/E3d3S5uP4 081Xvet1qx5tkwXfYdkUmS5Indt3xILpFSH/dneFMe2JLT12gUVucHtaklpDul+j0N0mqt8 3F0xxAqnXmVAYQrsjmEIvQ0/sQ5zJpWLB+2IgihfPUmJ566nfQ6snexIPHU20RHMByHopBa Q26okiOy2+LaNrjZYvGNA==
X-UI-Out-Filterresults: notjunk:1;V03:K0:+3A9J75fC+Y=:FkGLO4UBeFBofWgTx+L3p3 Om4nQV7LQMuIEtf065DNAkM8Wln/Wyamm5EpFaRWPm0cX1DlX8ZuCMrNssi+dqEv4kzyU9PIn 6+5U4Cd0au+KM9RS0FejRAo3hwnmaCvO89eL8vbRqqtGNyxJOUK1w3m0zJK00sny6QIUiIovH 0uNFWE+w6/lGlhEaQrcfRSMrYxJZT2WymWoPrESaOHC9iq0oQ51lyieq5fxud95ZgYj52zt8h 7LA5YOBn99275AwSgrFtlfNFtp0lljK9c5y99FpnjPun2bsxlKfbMvHQnMh2XMb+l4xcrXSJS 3t/OvbqExpFOnJeSmIfyWt5Qc0XZzpmfghZwdZfyS4uqlgDdkKszvcPP5SFb8i0t8ua6d1HsP h7PmwhHBhk86FTWZEFZbVVsanC/QfjFpI8r9eDH60Zqy/12+W0O+dbHJFv5SeZ6CyeTZQ51EK hi1gze49qf4s+1hZ8TOoLs8XcKtVWUwQrVkRurAmQFLMhAWr3JD/BQtN8tJbXB6GFYfyzzy6K zpVgU3Yd93ILFEGL3Xm3bRnuVmvuhRu/KDoXEAWqQcPgFZ4Mid+P6CymxOQVvEYfLvxbfsPyV brRBRsf4KHxd4MSNM3ZN+9NGrRhsqMWgm5Knjv/HV4zF/18Q5b/bB40+aKo5o5Il2Nm4TQG6+ QN9FkEf3zaQw/s4JVfvx4p9L2fImCx6kWmVeKlyGGdfFg8chtj4qFmgO8uXCQqc9F4gxzMR5i zcqJo8i03G4mqM/JRYj+pqXYEoIcuILjlRbfgYudSJyniw9f3u7Swm/4r8uhlFIzfbTCyNYAN rNRlQea6eizPi0EzIafTg1rxnUfzPEndX6saCbx0W9O9bZcAxxl51JvTrMKX2DU1UB6+yi8ud O3AVxyGjL/Pl2PoQ8edb8PKxxJE121lsqV+9HuklHR8Hy4O/jaxS/B18Z0fISC4OiUjgP2I6o W+gO9+K/jBFLcqHKs7pguhypHsfxXGnf2vxGc9Iq5er/pSdFXIm/t748VBY1Jnf/6Yzi6d0Tl mtYhST4Dekp9qHN5Fd+Ibo9lXzcejj7Tpyszbi+jwcW04ctlffRMXF7QT8ZuYvFzHscM76Fbm 9UDSLz/3dzQS1wruapbLQawHmQ727CFupJCLs70HNSX9SEEo2Ys2QwG3qmnqv8eIRMnf+H0Iz sxlp+si2BfuB98eGY7/d0EVMKA8B2durBbcvpu+Y43q34j5DYROg7BgDDqab6aTt5q7KxZhEL tQBviEeL7PXEIvurhRSl7CWiPTxpOSQwaUU/XYJ0g9p5hqmAWi43Jl1727E4=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/U8escYm3dlyVP_IXQkPUvNYfavA>
Subject: Re: [tsvwg] L4S status: #17 Interaction w/ FQ AQMs
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Nov 2019 22:33:06 -0000

Hi Greg,


> On Nov 11, 2019, at 22:51, Greg White <g.white@CableLabs.com> wrote:
> 
> ....actually, I think I misread your comment (and plot*).   The latency spike is _not_ for the TCP cubic flow, but rather for the Prague flow.
> 
> So, this is a case of fq immediately cutting the BDP in half due to the arrival of a second flow, but not providing sufficient congestion signaling to Prague to cause it to reduce its cwnd quickly.   The sluggish ramp up of the marking probability in the CoDel AQM being a mismatch to the Prague congestion response is the cause here.

	Interestingly, it seems that the recent change of exiting slow start by ~halving the sending rate helps for Prague on position 2, by effectively making Prague respond to the first CE mark in a RFC3168 compatible fashion, but as the Prague at 1st position test shows, this does not help if Prague is running at too high a rate in congestion avoidance mode.
	IMHO, that is one of the reasons why there is so much concern about re-defining the meaning of CE markings. In this specific case the negative side-effect of this seem to be restricted to Prague, that is something I could live with. 
What happens at scenario 1 at low RTTs and at scenario 3 is different, here it is the non-L4S traffic that takes an unexpected/undeserved hit. 


> 
> -Greg
> 
> *in my defense the flent plots aren't the friendliest for color-deficient viewers.

	I believe no defense necessary, there are lots of plots with often only subtle differences and colors that are nicely organized, but hard to keep apart. Loading the files into flent's GUI allows to hover over the legend and the according line gets highlighted, maybe an option if higher saliency is required?

Sebastian

> 
> 
> 
> On 11/11/19, 2:32 PM, "tsvwg on behalf of Greg White" <tsvwg-bounces@ietf.org on behalf of g.white@CableLabs.com> wrote:
> 
>    Hi Pete,
> 
>    Thanks for posting the updated results.
> 
>    In the Issue 17 related scenario (Scenario 6), you observed that there was a 5-second-long latency spike for the TCP cubic flow (but for none of the other flows) when the cubic flow starts up.  
> 
>    https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-11-11T090559-r2/l4s-s6-2/batch-l4s-s6-2-prague-vs-cubic-50Mbit-80ms_fixed.png
> 
>    Since this latency spike is only present for the cubic flow, it is clearly not in the shared FIFO, but in the fq_codel queue. I'm having trouble understanding how TCP Prague could be causing queuing delay that *only* affects a cubic flow in a different queue.  Could you explain how this is possible?  It seems like it may be an error in your result.
> 
>    -Greg
> 
> 
> 
> 
> 
>    On 11/11/19, 12:57 PM, "Pete Heist" <pete@heistp.net> wrote:
> 
>        Hi, thanks for working on the issues we posted in our “bakeoff” tests on Sep. 12. We have updated our repo with a re-run of the tests using the L4S code as of the tag 'testing/5-11-2019', and those results are at the same location:
> 
>        https://github.com/heistp/sce-l4s-bakeoff/
> 
>        We’ve put together a list of changes we saw in the results:
> 
>        https://github.com/heistp/sce-l4s-bakeoff#changes
> 
>        The changes are:
> 
>        - The previously reported intermittent TCP Prague utilization issues appear to be fixed.
> 
>        - TCP Prague vs Cubic now does converge to fairness, but appears to have fairly long convergence times at 80ms RTT (example: https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-11-11T090559-r2/l4s-s3-2/batch-l4s-s3-2-prague-vs-prague-50Mbit-80ms_fixed.png). Convergence times in other scenarios are similarly long.
> 
>        - In scenarios 2, 5 and 6, the previously reported L4S interaction with CoDel seems to be partially, but not completely fixed. By reversing the flow start order (prague vs cubic instead of cubic vs prague, second flow delayed by 10 seconds), we can see that while the TCP RTT spikes no longer occur at flow start, they are still present when a second flow is introduced after slow-start exit (example: https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-11-11T090559-r2/l4s-s2-2/batch-l4s-s2-2-prague-vs-cubic-50Mbit-80ms_fixed.png).
> 
>        - In scenario 3 (single queue AQM at 10ms RTT), the previously reported unfairness between TCP Prague and Cubic grew larger, from ~4.1x (40.55/9.81Mbit) to ~7.7x (44.25/5.78Mbit). This trend appears to be consistent at other RTT delays in the same scenario (0ms and 80ms).
> 
>        I will update the issues in trac with these results, and if there are any questions let us know…
> 
>        Pete
> 
>> On Nov 6, 2019, at 5:23 PM, Greg White <g.white@CableLabs.com> wrote:
>> 
>> Good timing __
>> 
>> We've just wrapped up our findings on Issue 17, and have posted them here (along with some comments on Issue 16 as well):
>> 
>> https://l4s.cablelabs.com 
>> 
>> (Note, the ns3 repo is not public yet, but will be shortly.   We'll update that page with links within a day or two.)
>> 
>> In summary, we reached the following conclusions:
>> 
>> - the main result of concern was due to a bug in initializing the value of TCP Prague alpha, which has been fixed and demonstrated to resolve the latency impact that was spanning multiple seconds
>> 
>> - the remaining short duration latency spike in the FIFO queue is seen in *all* congestion control variants tested, including BBRv1, NewReno, and Cubic, and is not specific to Prague
>> 
>> - if the CoDel queue is upgraded to perform Immediate AQM on L4S flows, the latency spike can be largely avoided.
>> 
>> We invite a thorough review of the work, but believe that this closes issue #17.
>> 
>> 
>> Best Regards,
>> 
>> Greg White, Tom Henderson, Olivier Tilmans
>> 
>> 
>> 
>> 
>> 
>> 
>> On 11/6/19, 12:59 AM, "Sebastian Moeller" <moeller0@gmx.de> wrote:
>> 
>>   Hi Greg,
>> 
>> 
>>> On Sep 11, 2019, at 19:16, Greg White <g.white@cablelabs.com> wrote:
>>> 
>>> I'm planning on doing testing as well, but it will be more than a day or two to get it done.  Rough timeframe would be 2-3 weeks from now.
>> 
>>   	Since I can not hide my impatience any loger, did anything come out of this yet?
>> 
>>   Best Regards
>>   	Sebastian
>> 
>>> 
>>> -Greg
>>> 
>>> On 9/11/19, 1:52 AM, "Pete Heist" <pete@heistp.net> wrote:
>>> 
>>> 
>>>> On Sep 9, 2019, at 9:01 PM, Wesley Eddy <wes@mti-systems.com> wrote:
>>>> 
>>>> Since this thread seems to have dwindled, I just wanted to summarize that it looks to me like we have agreement that testing as described is needed.
>>>> 
>>>> I updated the issue tracker with a comment saying as much and pointing back to this thread in the archive for reference.
>>>> 
>>>> Is anyone planning to perform this testing in a rough timeframe they might want to share?
>>> 
>>>  Hi Wesley, I’ll share results from relevant testing in the next day or two...
>>> 
>>>  Regards,
>>>  Pete
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 
> 
> 
>