Re: [tsvwg] Reasons for WGLC/RFC asap

Sebastian Moeller <moeller0@gmx.de> Wed, 18 November 2020 19:27 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 323403A0A9C for <tsvwg@ietfa.amsl.com>; Wed, 18 Nov 2020 11:27:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zsFMurwWCs7C for <tsvwg@ietfa.amsl.com>; Wed, 18 Nov 2020 11:27:42 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B45A93A0AA3 for <tsvwg@ietf.org>; Wed, 18 Nov 2020 11:27:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1605727657; bh=db5fSy2mWneF2DLLoiXAj/p8C0Lu88obk9yGMxs3HU0=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=bOus5ZKObxbNc3zDioxoh1KTj91N7CDPsCOSNCCQruMwPcRFVh8OlOJMC+I9i0uWZ K8HuodJ9QB6lOsRuFjm4yMVliuXqV1ce0IDMzv5FGkjBQqEZVRXX7jbIrarKwoY23T UyMCZ1g8e8lYhrM6TzflGxtYY7+yoQejI942WtWA=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.42.229] ([77.0.76.58]) by mail.gmx.com (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MVeI8-1kn3pl1oqC-00Rb7e; Wed, 18 Nov 2020 20:27:37 +0100
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <AM8PR07MB74760BC4858FCFBE085C3A8BB9E10@AM8PR07MB7476.eurprd07.prod.outlook.com>
Date: Wed, 18 Nov 2020 20:27:35 +0100
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <06296366-C48D-40E7-9ABA-49930FA1FFCC@gmx.de>
References: <AM8PR07MB747626CB7622CB89209018A8B9E10@AM8PR07MB7476.eurprd07.prod.outlook.com> <35024083-91DC-40DA-BE58-EE897C5A8AA3@gmx.de> <AM8PR07MB74760BC4858FCFBE085C3A8BB9E10@AM8PR07MB7476.eurprd07.prod.outlook.com>
To: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
X-Mailer: Apple Mail (2.3445.104.17)
X-Provags-ID: V03:K1:6bU1e8ZPaEwriBikg4/UTMfQ8HstktYuHalRQO9LpVPRAm+/y3r kbJtj9N9YevfTI2NrUmrkkNprbA/0fojAK1vjfgIbGVHH1EA6Fbh1cAz5gSge55hdNluVC/ M2uHNysNjOXLSYxGHdefuWIlzbaDtZIs+grpvtkj8W3Z1yys+TOM0vRAzyZlM6lv6FSdENj cJ8ltbBBqW2JbTqBBQrgA==
X-UI-Out-Filterresults: notjunk:1;V03:K0:WAEk4FCJ0zo=:aHUSBFc4cUb0pqzMXgls7w Obe/uirMbyXymOu4mb9dpNUYgq4WoKdtyvBLwpo3bTE4LWq8UrQ0TKtHYuE+LhyzR0kpK1dnb L7ERrGhN16BX8uor7pWkarspJxQmFNqpZq1WTRaWtFr6y08OQp81olZBNOv6UZ19wWv3BVwK9 BOMoAEJpwPp/0ndbUuj0D6I+Yo7RjYxlWeoJTKl2GMYCm6iHQRV6pzycURBviSw0/Swxibf2Q c41JWhMjkDil5LdlP7HdJ3D/pCOWTWb6jrn6bjJu683GqXqD/d2StBYXpjSzANtgfWocmvn7l x5ydeCFOCTOEBnH/QhZ66CRNkU92vcWJqcXlLEWZGlbXwO3Y6uduSmdRUjoIPW0rRTxHaARIh WP/hbAp3OJcJrxo4aTFkeM+gbajm/RI10uvk8LIxPWQ8HxT6nT/AjAqCNbtMUugrmdHAGASZI RZFYoAI0azTmdYdOa3kuecSz56ECskADU9eHI8+WLfUxnNXJPQ998CV6aFgXwDMc+IT9FErDv vvmuAn7geycdq1ONfPxVVJQAOOpVun6HSiabv8/Uw2N+LifrNMSorsVMmRLlWOcPjitvBsG5/ svoBRy25zttaziAZ0WMWk2tIZge2vYOmhSED0VZagm+ehn71h1MpjZGHAIHl0VGJXbjfpe3DC 3TAMit8nMWNFUH72KaANmpknEK4pJbyui2kiG21Z+AsCCx8s4jf9dayoYfRh/F/Tf55cenKpA b+bsqHbk5lFgFntW0tV5EayZ8HV9FG3RDpbLnmn+UIQKfeUkJZgt+gAQCzzGThIHQ4RZcTdSx d/PhCe02f3VYo/kbB8LnCFto2qxx5kfuyusejC8QwKO84JvnQ938i1gm98ZEPjtx+AXp3WO3C 56/8KXSq/Iw4FPSKNMKg==
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/KSfYwlnF-4HaijWuC1pel727cFc>
Subject: Re: [tsvwg] Reasons for WGLC/RFC asap
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Nov 2020 19:27:44 -0000

Hi Koen,


> On Nov 18, 2020, at 18:29, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Hi Sebastian,
> 
> Can you revisit these arguments again from the non-Flow identifying point of view?

	[SM] I think I already did, "sure your [...] version of PIE is better than no AQM", but that is pretty much where it ends. I am involved in supporting OpenWrt users to debloat their internet access links with SQM. BTW those should be pretty much your core audience, but are is not. 
	A number of yeas ago we came up with a pretty workable solution, by using (costly) traffic shapers to keep traffic away from the typically over-sized and under-managed buffers on both sides of typical internet access links. But that only moved the problem somewhere, where home users could tackle it. And it turned out for most users a competent modern AQM pretty much squashes most latency issues pretty well. Initially, sfq was the qdisc of choice, but that was quickly superseded by fq_codel, because that basically is the conceptual successor of sfq an notoceably better in a number of dimensions. 
	That worked amazingly well for most users. And those users unhappy with that solution ofen, we realized, did really require an explicitly unfair set-up, like reliably trying to send 8Mbps real-time video streams over a 10Mbps link. In that case the solution was to create dedicated shaper and filter hierarchies that allow for the desired targeted unfairness. 
	Cake then upped the ante by creating a two tier fairness system, which for example allows to share equitably between all active hosts in a network (and then still offer flow queueing inside of each host share), that pretty much solved the home access latency under load issue and the" please stop downloading when I am in a video conference" issue as well. 
	L4S as currently designed and implemented offers very little of that. It does not do host equitable sharing nor flow fairness, nor targeted unfairness, it offers some sort of "anything goes" with an admittedly lower average queueing delay (albeit only a few milliseconds lower, not multiple orders of magnitude). In short for the problem space where I sit, L4S solves nothing, but threatens to introduce additional sources of unfairness (because its bias against non-L4S flows, its increased RTT dependence, its unsuitable coupling core idea, ...).


> If compared with an FQ, clearly a better flow rate can be enforced. L4S can also support an FQ solution, so the comparison should be fair here.

	[SM] I tend to compare existing apples with existing apples, and there is no FQ L4S solution that I can compare here. Sure my existing FQ-AQM solution is probably inferior to your hypothetical FQ_L4S, but a) without supporting data I do not believe that (I am not that raid a FQ-fan boy, an FQ_L4S solution would still have to pass the same set of safety and sanity tests, I assume that it will be easier to pass them, but I want to see the data nevertheless) and b) that knowledge is pretty much useless in helping other's debloating their access link. I need existing field tested and "batle-hardened" solutions now. If at the begining of my involvement I had waited for L4S, I would still be waiting since like 7? years...


> So either compare FQ_L4S with FQ_Codel or compare DualPI2 with Single Queue CoDel.

	[SM] That is not how I think this comparison should be performed. The challenge is how to solve the latency under load problem on typical internet access links today, and the contenders are the existing testable deployable solutions, so e.g. SQM (a fancy name for traffic shaper plus competent modern AQM, mostly fq_codel, or as all in one solution cake) or DualPI2. 
	Just because you opted for a single queue solution, does not mean I and my users follow that insane stance. It turns out that the traffic shaper is the really expensive part, the fq components is comparatively cheap, so with L4S the CPU cycles gained from having a worse but computationally cheaper AQM will disappear in the noise as the shaper is the truly problematic part. So a lot of the rationale for a "single" queue solution (side-note: last time I checked two was a number larger than one) over an FQ solution goes away, and I do not subscribe to Bob's dogmatic rejection of FQ (side-note: I am more and more convinced that Bob does not actually understand FQ and what it brings to the table, but I digress).
	Yes for a core router or a router at an exchange point between two AS that equation will be different (there will be no shaper necessary, and anything not handled by the ASICs will probably be glacial in comparison). 
	BUT I do not believe for one minute, that L4S will end up anywhere close to core or transit routers, from its (intention or accidental) design it is a short RTT dragster race strip, and there typically are few core devices in the path, and the most relevant will be the ISP's access link AQMs (and maybe stuff on the data center side of the race track). 
	And on the end-user side I can tell you from direct experience, doing the FQ part is not a big issue (doing the shaping part is, at least if te shaper is supposed to not introduce unduly delay/jitter). 
	Any ISP is welcome today to copy what SQM does and move both the egress and ingress shaper to the CPE, where the computational cost can be easily distributed (each CPE only deal with its user's/owner's network traffic, much easier to get X devices that can shape @1Gbps, the on device doing that @XGbps, with X >> 1), without loss of (much) functionality. If the ISP is extra nice, they will put an extra AQM on the ingress side (and by all means as secondary back-stop AQM, use PIE if you must), but no need to restrict the solution by making accommodations for classes of machines that will never run L4S anyway, or to one designers decade long misinterpretation of a decent technical solution for a cult. And once you accept/realize that the natural position for such an latency reduction AQM is at the access link, the whole "is it safe with tunnels" argument for high-jacking CE goes out of the door, most endusers will rarely use tunnels, or use the proper network pluming to instantiate the shaper such that it will see the decapsulated traffic. This will not be perfect and cover 100% of tunnel cases, sure, but it will be in all likelihood good enough.


Okay, that was longer than intended, but I believe it was necessary to explain why I object to most of the L4S state and process. Too little (improvement over what interested users already do today), too late (SQM is in the field with real users since more than 5 years) at too high a cost (breaking backward compatiblity unnecessarily)

Regards
	Sebastian




> 
> Regards,
> Koen.
> 
> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de> 
> Sent: Wednesday, November 18, 2020 2:46 PM
> To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
> Cc: tsvwg IETF list <tsvwg@ietf.org>
> Subject: Re: [tsvwg] Reasons for WGLC/RFC asap
> 
> HI All,
> 
> as you might predict, I object. More verbosely below in-line pre-fixed with [SM]
> 
> 
>> On Nov 18, 2020, at 11:31, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
>> 
>> Hi all,
>> 
>> To continue on the discussions in the meeting, a recap and some extra thoughts. Did I miss some arguments?
>> 
>> Benefits to go for WGLC/RFC asap:
>> 	• There is NOW a big need for solutions that can support Low Latency for new Interactive applications
> 
> 	[SM] citation needed. That is simply an assertion and not a fact. Compared to the best of class AQM's ~5ms average queueing delay (with existing TCP mind you) L4S' 1ms really is only a minor improvement, so whoever is waiting now, is simply unaware of what is the state of the art.
> 
> 
>> 	• The big L4S benefits were a good reason to justify the extra network effort to finally implement ECN in general and AQMs in network equipment
> 
> 	[SM] Which big benefits again, certainly not introducing a robust and reliable bias for L4S over non-L4S flows also not introducing even more RTT-dependence than we have right now, nor the rfc3168 compatibility issues, or DualPI2's inherent misdesign...
> 
>> 	• Timing is optimal now: implementations in NW equipment are coming and deployment can start now
> 
> 	[SM] That exact same argument will be exactly as true in any time point in the future. This is a rather thin argument for L4S, really.
> 
> 
>> 	• Deployment of L4S support will include deployment of Classic ECN too! So even for the skeptics among us, that consider that the experiment can fail due to CCs not performing to expectations, we will fall back to having Classic ECN support
> 
> 	[SM] Remind me again, how the outcome of the experiment is supposed to be measured, and how the experiment on failure is to be un-done?
> 
> 
>> 	• Current drafts are about the network part, and are ready and stable for a very long time now.
> 
> 	[SM] But assume all the hopes the network parts put in the protocols bear out in the end. Since these hopes include operational safety I object that the current network parts are ready and stable for a very long time. There is a standing list of issues with the network parts, like bias against non-L4S flows in general, unexplained magic variables all over the place, like the completely missing rationale for the 15ms delay-Target setting for the deep-queue side of the DualPI2. Non of this is new, but a documented short-coming will not automatically becom acceptable only because it has not been tackled for X years.
> 
> 
>> 	• Only dependency to CCs in the drafts are the mandatory Prague requirements (only required input/review from future CC developers: are they feasible for you)
> 
> 	[SM] This is bad engineering to outsource your safety measures to "stuff still to be designed", as it is bad engineering to assume that an optimal interplay between AQM and end-points can be designed by just thinking about the AQM conponent.
> 
> 
>> 	• We have a good baseline for a CC (upstreaming to Linux is blocked by the non-RFC status)
> 
> 	[SM] Please cite the mail from the net-dev list showing that objection, please. Or do you voluntarily refrain from asking for inclusion?
> 
> 
>> 	• Larger scale (outside the lab) experiments are blocked by non-RFCs status
> 
> 	[SM] Which would be stronger argument if all relevant lab experiments were already concluded with L4S coming out as ready for prime time. Recently posted data puts that very much in question.
> 
>> 	• It will create the required traction within the CC community to come up with improvements (if needed at all for the applications that would benefit from it; applications that don’t benefit from it yet, can/will not use it)
>> 	• NW operators have benefits now (classic ECN and good AQMs) and in the future can offer their customers better Low Latency experience for the popular interactive applications
> 
> 	[SM] Well, sure your bastardized version of PIE is better than no AQM (though I wonder about the consequences of you ripping-out the burst tolerance code, if you have data showing how this affects its operation, please post a link).
> 
>> 	• When more L4S CCs are developed, the real independent evaluation of those can start
> 
> 	[SM] Which is not gated on L4S getting L4S status at all, only on convincing CC developers that L4S has enough merit to spend some time. What we heard here on the list is that even staunch L4S proponents like Ingemar, have refrained from creating an L4S requirements-compliant CC for their protocol, hardly a show of confidence.
> 
>> 
>> Disadvantages to wait for WGLC/RFC:
>> 	• We’ll gets stuck in an analysis paralysis (aren’t we already?)
> 
> 	[SM] Nope, we are in, the issues are openly documented, but the developers try to outwait the critics instead of making the required real changes.
> 
>> 	• Trust in L4S will vanish
> 
> 	[SM] After 7+ years, one or two more years are not going to catastrophically change the trust level, that is  not a logical argument.
> 
>> 	• No signs that we can expect more traction in CC development; trust and expectations of continuous delays will not attract people working on it, as there will be plenty of time before deployments are materializing
> 
> 	[SM] Which the exact rationale, why the network bits need to be redesigned and changed to require much less cooperation from the CC side for L4S to reach its goals and promises (and maybe scaling down the promises)
> 
> 
>> 	• Product development of L4S will stall and die due to uncertainty on if L4S will finally materialize
> 
> 	[SM] Which might be the best outcome for customers until it has been demonstrated that L4S can robustly, reliably, and safely deliver its promised performance over the existing internet. Trust in L4S is going to vanish much quicker, if it turns out that it falls short of its promises and does not deliver.
> 
>> 	• Product development of Classic ECN will stall and die due to uncertainty on how L4S will finally materialize
> 
> 	[SM] That is a non-sequitur, if at all the market opportunity for rfc3168 solutions will be larger if there is not a direct competitor. 
> 
>> 
>> What are the advantages to wait? Do they overcome these disadvantages?
> 
> 	[SM] Waiting, like we did basically for the last two years is not giving any advantages. Using say, the next two years to actually test and modify the L4S network and CC bits to robustly, reliably and fairly deliver on their promises however would be well spent time. And if that dioes not happen in that time, it might be time to call the experiment off.
> 
> 
> Best Regards
> 	Sebastian
> 
> 
> 
> 
>> 
>> Regards,
>> Koen.
>