Re: [tsvwg] L4S Issue #16 Discussion Paper: Fall-back on Classic ECN AQM

Sebastian Moeller <moeller0@gmx.de> Sun, 10 November 2019 14:26 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <b9f4d7e8-a5a4-84ae-dd75-35e3a73b4fa5@bobbriscoe.net>
Date: Sun, 10 Nov 2019 15:25:54 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <D4CB17E9-3845-41E7-9A74-47247EE67C6A@gmx.de>
References: <b9f4d7e8-a5a4-84ae-dd75-35e3a73b4fa5@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/qlZ-4c-A9ldErMMzUpJqDVv3UxM>
Subject: Re: [tsvwg] L4S Issue #16 Discussion Paper: Fall-back on Classic ECN AQM
Precedence: list

Hi Bob,

firstly, thanks for finally starting on tackling this issue of teaching L4S-style senders to peacefully coexist with AQMs employing RFC3168 ECN marking.

Secondly, I reserve judgment until you post real data under adversarial to worst-case conditions, the core of issue #16 is not the lack of ideas, but the lack of a real-world implementation that has survived and done the right thing in a number of tests.

Thirdly, I really wonder wether the incentives are properly aligned on this. I would very much prefer if you, if only for the sake of developing your detector, switch from assuming L4S until RFC3168 is proven to assume RFC3168 and switch to L4S-style once enough evidence was accumulated to justify this step (from then on you reverse again and try to collect evidence for RFC3168 for a potential switch back).  As it stands you seem to aim for limiting false positive RFC3168 judgements, while on the principle of "first, do no harm" you should minimize false negatives as well. 

A few more detailed comments:

"Any transition should be suppressed for a number of RTTs after the onset of CE marking, both to allow the connection to stabilize and because aggressive competition for bandwidth is not a great concern with short flows."

There is an obvious conflict here, to get a reliable estimate of whether a path seems limited by a RFC3168-compliant AQM or by a L4S style you require a fair number of data samples, but at the same time you also want to quickly correct ant AQM-type misjudgments to avoid the interference between the ECN response the AQM expects and the response the sender exhibits, to minimize the temporal effect on the AQMs queue. 

"Therefore, relative to overall convergence time, it will be insignificant if a flow takes a couple of dozen rounds to work out whether it should be converging to an L4S or to a classic target."

I could live with that IF all flows start RFC3168-compliant, then they can take as long as they want to switch to L4S style, in the reverse direction we are discussing here however that seems to be sub-optimal. As it stands this, you seem to give yourself leeway to mis classify a RFC3168-limited path for multiple seconds under congestion, this seems quite generous to L4S, too generous for my taste. Given that reasonably low latency networking is already possible with the existing internet and normal traffic I see very little slack to cut for L4S mis classifications (especially since all of this is rooted in L4S re-defining the meaning of CE, IMHO you can do this, but then you need to "eat" all the cost incurred by that decision).

"In that case, a large classic response to a CE-mark could under-utilize the link until cwnd returned. So a small scalable response would be more appropriate."

That seems misguided, unless TCP Prague operates on a L4S-limited path it should behave like a normal TCP flow, and that pretty much implies a multiplicative decrease on CE mark or drop.
I get it you want any advantage for L4S-style flows you can get, but honestly this looks like trying too hard to me.

There is quite some slack in this heuristic and I think this comes from a notion that the goal of that heuristic is to post-hoc "reluctantly" change from L4S-ECN response to RFC3168-ECN response. Personally I would be happier to see the test logic reversed, start in RFC3168-compliant mode and only enable the L4S-ECN response after an RFC3168-ECN has beed ruled out with sufficient certainty. Remember, you argue that dctcp style ECN response is too aggressive to be inflicted onto the internet without proper isolation and now you seem to argue that strict isolation can be replaced by a much looser heuristic.

Best Regards
	Sebastian

> On Nov 5, 2019, at 10:48, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> tsvwg, iccrg and tcpm folks,
> 
> I've published a discussion paper giving rationale and pseudocode for an algorithm we are implementing and evaluating in TCP Prague.
>     TCP Prague Fall-back on Detection of a Classic ECN AQM
> 
> I'd appreciate any immediate concerns with whether it looks feasible and/or any scenarios where you think it's likely not to work well.
> 
> 
> There's three main components: passive, active and base RTT adjustment. 
> Only the first may be necessary, or possibly only the second.
> 
> We'll publish evaluation results as soon as we have them.
> 
> Apologies for not posting as an Internet Draft, but 
> a) I find LaTeX quicker and more readable esp. for plots; 
> b) many patent offices consider arXiv (but not Internet Drafts) as strong as journal publication
> 
> For the avoidance of doubt, I have filed no IPR on this.
> 
> Cheers
> 
> 
> 
> Bob
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/

[tsvwg] L4S Issue #16 Discussion Paper: Fall-back… Bob Briscoe
Re: [tsvwg] L4S Issue #16 Discussion Paper: Fall-… Sebastian Moeller