Re: [tsvwg] These L4S issues reported are not show stoppers

Sebastian Moeller <moeller0@gmx.de> Tue, 19 November 2019 08:25 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <AM4PR07MB345908CF9A2D9374316661F4B94C0@AM4PR07MB3459.eurprd07.prod.outlook.com>
Date: Tue, 19 Nov 2019 09:25:11 +0100
Cc: "Holland, Jake" <jholland@akamai.com>, Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>, "gorry@erg.abdn.ac.uk" <gorry@erg.abdn.ac.uk>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <CAC19FBC-2D27-418B-90BC-5F12CB976522@gmx.de>
References: <AM4PR07MB3459A1508B3D289BC8AE345DB94D0@AM4PR07MB3459.eurprd07.prod.outlook.com> <1FD4EBCE-C566-49E3-97CC-925B6F2C5F36@gmx.de> <AM4PR07MB345908CF9A2D9374316661F4B94C0@AM4PR07MB3459.eurprd07.prod.outlook.com>
To: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/MnnsIFWcqrd6UMOHB-igC_HXLT8>
Subject: Re: [tsvwg] These L4S issues reported are not show stoppers
Precedence: list

Dear Koen,


more below in-line.

> On Nov 19, 2019, at 02:14, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Hi Sebastian,
> 
> Inline [K]:
> 
> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de> 
> Sent: Monday, November 18, 2019 3:13 PM
> To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
> Cc: Holland, Jake <jholland@akamai.com>; Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>; tsvwg@ietf.org; gorry@erg.abdn.ac.uk; Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
> Subject: Re: [tsvwg] These L4S issues reported are not show stoppers
> 
> Dear Koen,
> 
>> On Nov 18, 2019, at 12:29, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
>> 
>> Hi SCE'ers,
>> 
>>>> unproven L4S technology
>> The Network part of the L4S technology didn't change since the L4S BoF. It is based on the theoretical interactions between Scalable CCs (say DCTCP) and Classic CCs (say Reno), and was already extensively (not claiming exhaustive as everybody makes mistakes) verified with experiments to both detect under which conditions it works and which conditions it doesn't or could show unwanted issues. Following the good design rule of keeping the network implementation as simple as possible and higher protocol layer header agnostic, these issues were decided to be solved by the endpoints, and this is why both safety and performance improvement requirements were defined in the drafts from the beginning (BoF), also known as the TCP-Prague requirements. 
> 
> 	[SM] Requirements that currently TCP Prague does not seem to meet, no?

> [K] indeed. Currently not. But as long as there is no real world deployment there isn't much pressure for service providers to work on this, let alone share results (although at least one does). I think having consensus that they can be sufficiently achieved should be good enough.

	[SM] The only proof that the requirements can be sufficiently achieved is to present a reference implementation that actually does so. Assuming that closed door work by unspecified 3rd parties will fullfil these requirements if not even tne openly developed reference implementation does seems rather optimistic. Is that really your argument?

> 
>> 
>>>> I think we've seen strong evidence that L4S may still contain show-stopping problems.
>> Correct me if I'm wrong, but most of the problems that were recently labeled as "show-stoppers" are not new, and only motivating the existence of the related TCP-Prague requirement or known limitations valid for any low latency architecture. I think it is good that L4S gets evaluated and challenged, but I think it is incorrect to label issues immediately as "show stoppers".
>> 
>> A summary of the so called "show-stopping problems" I picked up:
>> 
>> - 4 seconds lasting burst in a cascade of a bufferbloated FIFO and a slightly lower FQ-CoDel bottleneck: originated in a Bug in an alpha version of our TCP-Prague implementation,
> 
> 	[SM] Which is a strong indicator the TCP Prague has not seen sufficient testing to merit wider roll-out into the internet, no? So why the rush to get L4S into experimental RFC status since it obviously is not fully baked yet?

> [K] The only real world evaluation will be the experiment itself.

	[SM] Koen, as it stands the L4S core components TCP Prague and the dual queue AQM fail even in very simple lab experiments (two flows over the most preferable configuration for L4S, basically how an L4S reference deployment would look like with the simplest traffic pattern exercising both queues demonstrates failure to meet the L4S requirements/claims). I would really appreciate if we would not rush this, but first get things working in the lab reliably and robustly. 


> I don't expect many large real world service/application providers to use the basic reference Prague implementation anyway.

	[SM] And I do not expect them to use L4S at, because currently it does not work. But that is beyond the point the reference implementation is the one tol you have to demonstrate that your L4S design actually does work and behave as expected. IMHO the onus is on you present a working implementation, if the is TCP Prague, fine if it is a provider in-house version also fine, assuming it passes all reliability and robustness tests.


> They will have their own version ready, eager to deploy low latency applications and to test and check if they are sufficiently complying to the Prague requirements.

	[SM] I predict that any short cut you make in TCP Prague will be taken as permit to cut the same corners, but I am a passimist.

> A working open source DualPI2 implementation was already a long time available to experiment with (I believe also since the L4S BoF).

	[SM] I maintain my claim, that the dual queue AQM (which I assume you refer to here) DOES NOT WORK as required, and hence I wonder about the validity of your claim.


> An FQ-CoDel implementation with a shallow ECT(1) threshold would also help to facilitate real world deployments and would allow similar experimentation with low latency and high throughput applications based on TCP-Prague compliant CCs. If you have one, we are willing to include it in the L4STeam Linux Kernel git. Probably we have one too that we can put in there.

	[SM] Sounds like a decent experiment, but it does not remove the problem, that your system needs to co-exists with the existing deployed internet nodes and the behavior and configuration that exists.


> 
> 
>> and that was amplified by a wrong assumption of an FQ_CoDel implementation, that overload protection (reverting to drop) is not needed,
> 
> 	[SM] ??? As far as I can tell fq_codel does resort to drop on overload, could you please specify exactly what you are referring to?

> [K] that ECN is ignored and packets are dropped instead of marked. PIE uses 10% as the threshold to switch to drop, (Dual)PI2 25% by default. Both can reach this state quickly. When will FQ-CoDel start dropping packets? Your tests didn't show any drops during the first 4 seconds.

	[SM] Well, codel's hard drop mode will trigger if the configurable maximum queue size is reached/exceeded, which a look into the kernel source will make clear. Same for fq_codel, except fq_codel has a per queue limit as well as a global packet and a global memory limit, all documented in the code and even in the tc man pages.


> 
>> as in an FQ there is isolation between flows and a non-responsive flow will only hurt itself. Usually it does, but in this particular setup where the FQ_CoDel implementation tried to protect other flows from the missing AQM in the preceding FIFO, it was clearly a missing FQ_CoDel feature.
> 
> 	[SM] That is a quite extreme interpretation, for TCP Pragues failure to meet the Prague requirements to properly respect rfc3168 AQMs. My subjective take on this is that the L4S components needs to coexist with the existing internet, unless fixing something is realistically possible.

> [K] In the real world, this won't be the last bug in a congestion control implementation, neither in an AQM. Hopefully you can soon fix this bugs in your system, so you can fully enjoy from the FQ-isolation. 

	[SM Koen, I see what you do here, but I will refrain to accept this as flame bait. This is not an argument relevant to the question of whether the documented short-comings in TCP Prague and the dual queue AQM are show stoppers for further deployment or not.

> 
>> Dropping packets would immediately trigger the Classical congestion response and avoid the reported "show-stopping" effect.
> 
> 	[SM] As would disallowing ECN for ECT(1) flows, 
> [K] Or a shallow marking threshold on ECT(1) 😉

	[SM] Honestly I prefer my solution as it has a better chance of removing most of the accepted-by-designed side-effects of L4S on my home network.


> 
>> 
>> - high unfairness between flows when the base RTT is 0ms: Due to the large difference between the experienced RTTs of both flows, the RTT dependence gives the L4S flow a 10 times higher throughput. This RTT dependence, which we love and hate, is the normal mode of operation on the Internet since the last 40 years. I am personally a big  promotor of the "Less RTT dependent" TCP-Prague requirement, while others argue it is even not necessary, as it is "Normal" and accepted Internet behavior and part of the advantages of using L4S. I think extreme cases as the 0ms show the importance of this requirement to cover at least these extreme cases.
> 
> 	[SM] You seem to misunderstand the issue I raised: Let me try again: L4S introduces a new supposedly equitable sharing system between L4S and "normal TCP" flows that fails to do exactly that: share fairly between the two categories it sorts all packets into (even on one and the same path). IMHO this is a failure independent of the root cause of the behavior.
> 	In addition the AQM L4S selected as reference artificially increases the RTT of the normal queue flows (by selecting a high RTT target of 15* ms without properly considering the consequences that choice has on queue sharing behavior in your coupled design) and then it is argued that due the inflated RTT unfair bandwidth distribution is acceptable due to TCPs known RTT dependence. I would cautiously argue that it might be better to employ an AQM that comes with less obvious failure modes... instead of employing such forced logic.

> [K] I hope the other mail with the reference to the related TCP-Prague requirement clarified this now. It was covered and identified before the L4S BoF, and accepted as a Prague-requirement and even described in the draft, but (I realize now) with the other example with different base RTTs and the same traffic type, but it is equally valid (in a lesser extend though) for your example for flows with the same base RTT and different traffic types: 0ms base RTT + 1ms queue delay for L4S = 1ms RTT for L4S flows; 0ms base RTT + 15ms queue delay for Classic is 15ms total RTT for Classic; so a ratio of 1/15 in RTT means theoretically a ratio of 15/1 in rate.

	[SM] And hence my observation that the 15ms are not really justified and setting that to the theoretical required 5ms should help to recover some of the lost bandwidth to normal traffic, a solution so obvious that I really wonder why it has not been explored by the L4S team (and the fact that you do not offer hard arguments and data why that does not work seems to indicate that such tests were not performed). Also as in the other e-mail there is an emergency anti-starvation method implemented/recommended for your AQM that theoretically should be usable to better balance the bandwidth between the two queues under short RTT conditions...



> The draft had an even more extreme example where the rate ratio was about 50/1. The idea is to have TCP-Prague respond like a flow with an RTT of about 10ms (total RTT independence) or 5 to 20ms (less RTT dependent) between 1 and 100ms to match RTTs on the Internet, and to limit the maximum correction to a 10x or 5x factor respectively.

	[SM] Well, I am waiting fot that feature to appear in TCP Prague and test data demonstrating its functionality. From a robustness aspect, I still believe that the L4S AQM needs to make stricter equality guarantees, as you very much propose to not only admit YCP Prague traffic into the "low latency queue" and hence relaying on TCP Prague to avoid catastrophic sharing failures will not solve the underlaying problem. 


> This could be a compromise between the broken behavior if nothing is done and people arguing L4S should still have some RTT advantage (make sure there is a benefit for using the newest CCs and still remain having some throughput benefits from nearby datacenter/CDN access). Other mappings can be defined/discussed before or even during the experiment.

	[SM] I do not think that this is a robust and reliable path forward, sorry. I admit I have doubts whether the complete set of L4S goals is achievable in reality at all, but I believe that compromising on the "coexist" with existing traffic should be non-negotiable. This requires a solution that is both robust and reliable (and preferably simple enough to understand and test).

Best Regards
	Sebastian


> 
> Regards,
> Koen.
> 
> 
> *) At the tested path that demonstrated this dualq short-coming with an RTT < 1ms PIE actually would only need a sub millisecond target, so no matter how you slice and dice it, it seems unconvincing to first burden the normal queue with an massively over-sized latency target and then take this aas an allowance for giving the L4S queue an unfair bandwidth advantage. Now, I am not an engineer, so this behavior might be acceptable here, but that should be made explicit in both the arch and the dualq drafts.... and I would like to see people here actually ACKing or NACKing that behavior as acceptable for the wider-internet.
> 
> 
> Best Regards
> 	Sebastian
> 
> 
>> 
>> - lower throughput when traffic is passing bursty links: From a congestion control point of view, It is possible to lower the queuing latency below 1ms. Maybe we did not clearly enough state that this is not the "real world" end-to-end latency that can always be achieved. There are many other sources of latency (serialization time, speed of light...) that add to the end-to-end latency, but which don't prevent to achieve the additional 1ms "queuing" delay. But there are many real world sources that do limit what can be achieved. If the serialization time of a packet is longer than 1ms (when the rate drops below 12Mbps), it is a mistake to mark packets at 1ms delay (the Linux DualPI2 does not mark below 2 packets in queue). If packets are waiting to be aggregated and send in a burst it is a mistake to consider this waiting time as "queuing" delay and mark them based on this time. If network technology on your path or at the sender aggregates packets over longer times than 1ms and burst them out at a larger rate that creates a larger queuing delay than 1ms in the smallest path throughput bottleneck, it is a mistake to put the marking threshold in those low throughput paths below the expected burst size. In any of these mistake cases, low latency CC traffic will lose throughput. If you see lower throughput for L4S it is due to additional L4S marking on top of the coupled marking, typically caused by a bursty source. By the way, these are not L4S or L4S codepoint specific, they are also valid for SCE and even DCTCP in datacenters and for delay based congestion controls that would want to avoid 1ms extra delay. Another solution for this is to improve the aggregation and MAC mechanisms in the related link technologies or pace packets out at a lower burst rate (eg in your WiFi access point). Low latency CC is raising the bar for link layer technologies. It will take time for the lower layers to adapt to the new TCP behavior.
>> 
>> Next to the TCP-Prague requirements, I think a way forward is to also explicitly document in the L4S drafts, the limitations or "pitfalls" (if not already). Even if they are not L4S specific, I agree, it is important to set the correct expectations and to clearly inform people that want to deploy or reproduce experiments, that these are known and unavoidable limitations. This way we can move on and focus on finding "real" show-stoppers and specifically in this context on finding "real" differentiation between L4S and SCE.
>> 
>> Before I forget 😉, one more issue:
>> - unfairness to classic TCP when sharing a Classic ECN AQM: I think this is the real differentiator between L4S and SCE. L4S has "covered" this as a TCP-Prague requirement. Agreed it is a bit putting the hot potato into the congestion control developers' basket, but that is where we need to solve it. I think the debate should be around this issue only at this stage. Question is how much to we need to compromise if this TCP-Prague requirement is not sufficiently resolved (and which level of sufficiency is expected), and what do we need to compromise if we select SCE on the other hand... I think it is important to have a future facing vision here.
>> 
>> Regards,
>> Koen.
>> 
>> 
>> 
>> -----Original Message-----
>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Holland, Jake
>> Sent: Monday, November 18, 2019 1:46 AM
>> To: Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>; tsvwg@ietf.org
>> Cc: gorry@erg.abdn.ac.uk; Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
>> Subject: Re: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce
>> 
>> Hi Ingemar,
>> 
>> If fragmenting the space will prevent other SDOs from prematurely adopting the unproven L4S technology, that seems like exactly the right thing to do at this stage.
>> 
>> I think we've seen strong evidence that L4S may still contain show-stopping problems.  Also that we have not yet seen strong evidence that the problems stemming from the ambiguity in the L4S signaling design can be fixed.
>> 
>> This carries a demonstrated potential for breaking existing ECN deployments by under-responding to the already widely-deployed congestion feedback systems.
>> 
>> Certainly L4S's implementation was demonstrated to contain an issue that would have wrecked the latency of existing ECN deployments, and it had not previously been detected, despite the years of lab evaluation and repeated requests from reviewers to test such scenarios earlier.
>> 
>> Although a fix was found for the specific initially-demonstrated case, no fix has yet been demonstrated for what looks to be a very similar issue occurring with staggered flow startup, which can't be attributed to a wrong alpha starting value:
>> https://trac.ietf.org/trac/tsvwg/ticket/17#comment:8
>> 
>> The proposed pseudocode fix (with up to 5 tuning parameters, IIRC) may or may not be able to address this for specific cases, and it may or may not be possible to discover a set of tuning values that can address a wide range of conditions, but it seems appropriate to have some skepticism, at least until demonstration of successful operation under a wide range of conditions, given the history of such proposals.  This suggests that we do the opposite of encouraging other SDOs to move broadly forward with L4S at this time.
>> 
>> I share your concern that we might lose the codepoint (and the low latency functionality), and I acknowledge that a persistently fragmented space introduces a risk that it never happens, or takes an extra decade.
>> 
>> But the risk that concerns me even more is if L4S gets rolled out and then these kinds of issues are discovered in production, after other SDOs have prematurely standardized on this experiment, and it therefore gets shut off with prejudice against future solutions.
>> That outcome also would lose the use of the codepoint, probably even more permanently.
>> 
>> (Or even worse: if it does not get shut off in spite of the problems it causes, which loses even the low-ish latency solutions we already have, and adds to the congestion control aggression arms race.)
>> 
>> IMO, those would be even worse outcomes than a somewhat delayed adoption of a fully vetted system (or at least one that can't break existing deployed networks).
>> 
>> Best regards,
>> Jake
>> 
>> PS: I still don't understand why the gains available through the use of regular AQM (especially with ECN) have not been more widely adopted by the other SDOs that would want to make use of L4S.
>> 
>> It seems possible already to reduce the application-visible delay spikes from ~200ms to ~20ms (provided that no overly aggressive competing traffic improperly ignores the feedback, or that flow-queuing or other queue protection mechanisms are more widely deployed to prevent excessive damage from aggressive flows to less aggressive competing flows).
>> 
>> I wonder if whatever would drive SDOs to start using L4S maybe could instead be leveraged to drive adoption of the much more well- proven existing ECN solutions, which at least already have a lot of endpoint support deployed.
>> 
>> The endpoint support is a critical component to making this useful, and I see no reason to believe it'll be any quicker than the existing regular ECN was.  I'd even expect less so, since the behavior is much more complicated and hard to test.
>> 
>> PPS: I agree it would be interesting to see paced chirping solutions to help do better than slow start, and to quickly grow when new capacity opens on-path.  But I'll point out that's not specific to L4S, but rather should have application for any CC that can avoid pushing the network until queue overflow, which to me likely includes regular ECN-enabled Reno or Cubic, as well as BBR.
>> 
>> However, as yet another unproven TBD, I'll suggest it's not very useful as a strong influence on this debate, in spite of the early demos using L4S.  Regardless of the ultimate low latency solution, that part will need further development and might not work.

Re: [tsvwg] These L4S issues reported are not sho… De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] These L4S issues reported are not sho… Sebastian Moeller
Re: [tsvwg] These L4S issues reported are not sho… Jonathan Morton
Re: [tsvwg] These L4S issues reported are not sho… De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] These L4S issues reported are not sho… Sebastian Moeller