Re: [tsvwg] Related to "Non-L4S traffic abusing the L-queue" discussion during the interim

Sebastian Moeller <moeller0@gmx.de> Fri, 25 February 2022 22:01 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F37013A0A16 for <tsvwg@ietfa.amsl.com>; Fri, 25 Feb 2022 14:01:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.646
X-Spam-Level:
X-Spam-Status: No, score=-1.646 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y1EwBglX-I1i for <tsvwg@ietfa.amsl.com>; Fri, 25 Feb 2022 14:01:28 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 619483A0977 for <tsvwg@ietf.org>; Fri, 25 Feb 2022 14:01:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1645826439; bh=S+LQagBAByhVWs1QWKwHhInQitBH4r1XXN6yDjLi9aE=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=H/lIVl0yd/28MKAh+EXaDubYBJXfJyJvuArHIB1YhuVAD7AIUk4/Q3/SiHJlDi0Ai gvzF+Ed4m6xo0v8Mr+rf0AMUMWJxR1eX7bevo97JJvYB2OsjlX9nh3VbjEect5eD/t LWjz6T+l767PyCxTDYXaa3h+J9ZBzLfJ6jk1M5p8=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from smtpclient.apple ([95.116.211.112]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MRCKC-1nbw8p07kK-00NBUW; Fri, 25 Feb 2022 23:00:39 +0100
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <AM9PR07MB731311A9E4532FD501B5D94CB93E9@AM9PR07MB7313.eurprd07.prod.outlook.com>
Date: Fri, 25 Feb 2022 23:00:36 +0100
Cc: "Black, David" <David.Black@dell.com>, Neal Cardwell <ncardwell@google.com>, tsvwg IETF list <tsvwg@ietf.org>, Bob Briscoe <in@bobbriscoe.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <2CDC106F-101F-4CA4-8A79-29832FC9BFB2@gmx.de>
References: <AM9PR07MB7313D5AAF6B9D66C74CC35A1B9369@AM9PR07MB7313.eurprd07.prod.outlook.com> <AM9PR07MB7313F1401B14F6F2DB72A2B2B93E9@AM9PR07MB7313.eurprd07.prod.outlook.com> <MN2PR19MB40454F60DEE5735EAD428465833E9@MN2PR19MB4045.namprd19.prod.outlook.com> <CADVnQyk+uSX9GJtMBnsBhn9NzY+L3BKfhhUJ=yu4Aya98YEonw@mail.gmail.com> <MN2PR19MB40458624D266CDB54009AB19833E9@MN2PR19MB4045.namprd19.prod.outlook.com> <AM9PR07MB731311A9E4532FD501B5D94CB93E9@AM9PR07MB7313.eurprd07.prod.outlook.com>
To: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Provags-ID: V03:K1:pegfAm7T3HpkJSuM2rFDd80j42jDn2rs98RauZVZOpskpsdTE0T C5a4sX5dZ7A+510iF+ZjGZwhhFIz65XlJAUz/3XcRyJwMxx3bBh5fcpZYu5kiuZQzlqI6th NMllY6R+7MVXwxhSG8PQC3LHLnVQf3H5SS+4Pt2H2ZFplj+wJ5VQRZ3gXTF8IQbiYl6YBcD NbtUhWr1Q9F0S1P2CAN0Q==
X-UI-Out-Filterresults: notjunk:1;V03:K0:HyZlHtehjsw=:1Md/Sk4Vbjk+EvKwzMDQBV dipdP8lUdl9dw8pXSrFzTOnipO4amfEpTdlWoLWuRSEU9NTOSsUvgeeZP4ljW+n6+KXs/ye4x T0zRsrUYzrLtbFqDDLVPDW3xtefY15uNJhIkdLlY0dLW/MoRLJaDuSUisJCsKB47uLcjQ3QPQ CPV7xpYiD/826yZ14u6FNQfIdfH520mKFnbsTOZqHfhsYam6ShZBhTTzDDsR++IsoyafweuqV yCZRbQwTdIyk0nt1xakYcqXobEqAL3bev+3xi1DNjnrlH8G6VSkd2CJbMoVGsCIcC71XOo0/E ON3i7VF1GjWMayMQitWyl/KjQr8uJlzzqv/4Lmt2whaKVkQ65K2XRLLyeQwOtKn6DSXqEYzsW mJytj7LsHrk3ndx3Z1LaOM2dN1RMbe3u+Sk8zlopt9mrdmkVQpYnt1j+QmTZwZ4YKgnX8g3Pf Tng/VLV0PFaQZTl/fy2uTPd02luge9g8KcsoU15c9OcBIPd6nRCExTToJb2BhmcWHyqP9IxNe r1fh5DoQQWRE2fS+wBNmZ+Txo+j9NKIgKtEITj5md1rugK48U17EggWdj6t80qQhoAH8MKNKd hQjBG7+otF0dEnUdgKlrQooHyD2S78UsUiT5F6Jew8YUjgSOCeOmjP8NdPvOljcu2B/5B0VFX vbtC0sB2sA2Pwcwtcb6HnnohwrLxlqpITQCaWVJ2wcWdj/87pWZX/D1PgMq8GsszJluoUvK9L hdDgtxMccfv8yd8fi61GLzy7BXBH5Cds1QH14uM4oAIIHg/nz/ABVLudmHUrZkyWofkd48tvL RX2TQd1jjMyrLYZ2J8XFCgzsyn7mljZqnXFnIFoF+3sh2qxnFFlxxj128n0XbS4PLjZtJFiXG BJkNJYSLW76ynfQjcxmfQWtiG4uaz30I9k1HhTM2ninua53Gb3WTflXnS9FKmFy4OgNsjh4Sb gw8i1jfKG0ZDhN0Oiyj0lszpqCPuh4pdvWniyy9liPDNwBpMq9RRi65306BFYkfk7trmkddR1 sSJT4hObVFpBNuismZik6V3dXgXc94opN38MHnM3f76K1+QAOpTyDuKoVR8CG2k6oCkAPVffH iItgMF/AnsrOuc=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/QztswlAtlhydJ36MH9PHuTXFNWU>
Subject: Re: [tsvwg] Related to "Non-L4S traffic abusing the L-queue" discussion during the interim
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Feb 2022 22:01:33 -0000

Hi Koen,

interesting test, now it would be excellent to see how the 50 and 70 steps would look with queue protection active.


> On Feb 25, 2022, at 19:30, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Hi David,
>  
> To be sure, we re-did the overload tests recently, confirming the previous overload results. These results are available at: Overload results caused by non-responsive UDP traffic for PIE, DualPI2 and CoDel AQMs | l4steam.github.io
>  
> Specifically look at figure 8 at the end which shows that L4S traffic gets marks, up to 100% and appropriate drop if it reaches and exceeds the link capacity.
>  
> The test case of Jonathan is approximated by the 70Mbps non-responsive ECT(1) UDP traffic on a 100Mbps link on a DualPI2 (Prague+Cubic) test case. In Jonathan’s case it was 40Mbps on a 50Mbps link. We also evaluated in extreme when sending at 100Mbps non-responsive ECT(1) UDP traffic on a 100Mbps link, and even exceeding at 140Mbps and 200Mbps. You will see the results are as if it is on a Single Q PIE AQM. Note also that CoDel which never drops ECT packets,

	Is that actually true? fq_codel has a batch drop mode it employs to digg itself out of severe overload, see:
https://elixir.bootlin.com/linux/latest/source/net/sched/sch_fq_codel.c#L137
not sure this code path is avoided for ECT(0/1) flows, so on massive overload fq_codel will drop min(64, num_packets_in_fattest_flow * 0.5) from the "fattest" flow, in single queue mode that should still trigger... so yes, in normal operation fq_codel will not drop ECT(0/1) packets, but if push comes to shove it will do so in batches. If I read the code correctly this mode will trigger if during enqueue of a new packet either:
a) all queued packets exceed the memory_limit or 
b) there are more packets queued than limit
in both conditions fq_codel will drop packets in the fattest hash-bucket independent on the ECN bits.


> causes actually close to starvation and high tail-drop delay results as shown in figure 1, even with ECT(0).

	Yes, if you overload the queues you end up triggering the stress mode dropper, by adjusting limit/memory_limit you can tune this behavior closer to your liking. But even in stress drop mode it seems to do head drops. What evidence do you have for tail-drop? 

> So I guess all the concerns about FQ_CoDel and tunnels/Hash-collisions are equally severe and not related to L4S alone

	Yes, we know that the advantages of flow queueing disappear if flows can not be identified anymore... In this specific case though, fq-odel with multiple queues will do fine as long as the unresponsive flow will not manage to already crowd out the other flows before hitting the AQM bottleneck. In full fq-mode most of the fall-out of the unresponsive flow will be restricted to its hash-bucket (including innocent bystanders in the same bucket).


> (can just be exploited by ECT(0) traffic today already!!).


	Subtle difference, today in a FIFO or codel  our offending flow will see actual drops commensurate to its share of capacity, but in the L-queue all it will see are inconsequential CE marks, in other the words the L-queue delivers better service for hostile traffic. I do predict that this will be exploited if we actually deploy such systems, the question is when not if...
	I really do not think this is a good argument to make, if you go and introduce a scheduler into the network, it would be a wasted opportunity not to actually squash a whole class of abuse while at it, if all you claim is to not be worse than a FIFO, than you aim too low.

Sebastian

>  
> Koen.
>  
> From: Black, David <David.Black@dell.com> 
> Sent: Friday, February 25, 2022 7:04 PM
> To: Neal Cardwell <ncardwell@google.com>
> Cc: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>; tsvwg IETF list <tsvwg@ietf.org>; Jonathan Morton <chromatix99@gmail.com>; Bob Briscoe <in@bobbriscoe.net>; Black, David <David.Black@dell.com>
> Subject: RE: [tsvwg] Related to "Non-L4S traffic abusing the L-queue" discussion during the interim
>  
> Hi Neal,
>  
> So, I saw that explanation – could someone check the "running code" to make sure that the coupling and marking occur even when the L queue is always empty?
>  
> Thanks, --David
>  
> From: Neal Cardwell <ncardwell@google.com> 
> Sent: Friday, February 25, 2022 12:58 PM
> To: Black, David
> Cc: De Schepper, Koen (Nokia - BE/Antwerp); tsvwg IETF list; Jonathan Morton; Bob Briscoe
> Subject: Re: [tsvwg] Related to "Non-L4S traffic abusing the L-queue" discussion during the interim
>  
> [EXTERNAL EMAIL] 
> 
>  
>  
> On Fri, Feb 25, 2022 at 11:56 AM Black, David <David.Black@dell.com> wrote:
> Koen,
>  
> I'll observe that "traffic that is not responding at all to CE marks" is not necessary to achieve the reported results if the experimental setup "prevents the L queue from seeing any
> 
> need to apply congestion signals, because it is always empty" as there would be no CE marks for the traffic in the L queue to respond to.
>  
> I think the key part here is "if". :-) The assertion "prevents the L queue from seeing any need to apply congestion signals, because it is always empty" is from:
>   https://sce.dnsmgr.net/downloads/L4S-WGLC2-objection-details.pdf [sce.dnsmgr.net]
> That assertion is inconsistent with the functioning of the Dual-Q algorithm, as described in:
>   https://www.ietf.org/id/draft-ietf-tsvwg-aqm-dualq-coupled-21.html [ietf.org]
>  
> As Bob noted: "in the scenario shown, although the L queue is indeed always empty, it will see a high level of congestion signals (~10% in this case) via the coupling."
> Here's Bob's e-mail for more context/details:
>   https://mailarchive.ietf.org/arch/msg/tsvwg/joFr3sfOrxxkYhWdYrO2rLlCNUw/ [mailarchive.ietf.org]
>  
> thanks,
> neal
>  
>  
>  
> Please give that further consideration.
>  
> Thanks, --David (as an individual)
>  
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of De Schepper, Koen (Nokia - BE/Antwerp)
> Sent: Friday, February 25, 2022 4:29 AM
> To: tsvwg IETF list; Jonathan Morton
> Subject: Re: [tsvwg] Related to "Non-L4S traffic abusing the L-queue" discussion during the interim
>  
> [EXTERNAL EMAIL] 
> 
> Hi Jonathan,
>  
> Can you confirm that this test is done with “Cubic” traffic that is not responding at all to CE marks? So it is just like any other non-responding traffic (like UDP CBR). We don’t see any other way to explain your results. 
>  
> If so, we can/should remove this “issue” from the shepherd’s write-up, as such unresponsive flows will get the same throughput on any single-Q bottleneck with or without AQM (taildrop/PI2/PIE/CoDel/STEP/RED/…) with a latency that matches the AQM strategy.
>  
> Koen.
>  
>  
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of De Schepper, Koen (Nokia - BE/Antwerp)
> Sent: Thursday, February 17, 2022 7:01 PM
> To: tsvwg IETF list <tsvwg@ietf.org>; Jonathan Morton <chromatix99@gmail.com>
> Subject: [tsvwg] Related to "Non-L4S traffic abusing the L-queue" discussion during the interim
>  
> Hi Jonathan,
>  
> It seems that the following open issue identified by the chairs:
>  
> Non-L4S traffic abusing the L-queue
> • ‘DualQ gives a large throughput bonus to L queue traffic, ie. a “fast lane”’
> • Is this a matter specific for DualQ that can be left for experimentation?
>  
> is based on the following experiment you performed:
>  
> >             simple two-flow competition test on a standard dumbbell topology,
> 
> >             with the bottleneck running a DualQ qdisc into a 50Mbps shaper.
> 
> >             Both flows were configured to use CUBIC congestion control with
> 
> >             ECN negotiated, but one was additionally tweaked to set ECT(1)
> 
> >             instead of ECT(0) on all data segments, and to pace its output at
> 
> >             40Mbps. This latter measure prevents the L queue from seeing any
> 
> >             need to apply congestion signals, because it is always empty.  These
> 
> >             tweaks allowed that flow to use 80% of the link capacity, gaining a
> 
> >             fourfold advantage over its competitor,
> 
>  
> If there is capacity seeking traffic in the Classic queue, then it is even desired that the L4S queue does not add extra marks. The L4S marks should come only from the Classic coupling.
> Before diving into details, can you first explain why in your experiment the coupling from the Classic Q has no effect on your paced and ECT(1) labeled Cubic flow?
>  
> I would expect that this ECT(1) labeled Cubic flow would get even less throughput than the Classic Cubic flow, as the first gets the doubled coupled CE marking probability (eg 2*10% = 20%) for L4S flows instead of the squared CE marking probability (10%^2 = 1%) which ECT(0) traffic would get.
>  
> Thanks,
> Koen.