Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Sebastian Moeller <moeller0@gmx.de> Sat, 09 May 2020 10:53 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.14\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com>
Date: Sat, 09 May 2020 12:52:58 +0200
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <06627DFC-6F54-4FCB-A071-F4F9D671B1CC@gmx.de>
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com>
To: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/x2wAn8wtHlahrVCH52tlAVl5Ufs>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
Precedence: list

Hi Neal,


> On May 8, 2020, at 17:19, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org> wrote:
> [...]
> 
> - SCE seems to involve an ecosystem with a more complex and more experimental CC (with two different kinds of ECN signal) and little real-world/production experience yet..  L4S seems to involve an ecosystem that provides a queue that  is basically a single-threshold, shallow-threshold, DCTCP-style, ECN ecosystem, which is simpler and for which the world has a lot of accumulated academic research and real-world/production experience over the last decade.

	[SM] Interestingly, I take it as a considerable downside that in a decade of work L4S has not managed to come up with robust and reliable solutions to its challenges. "Too little, too late", comes to mind as much as "robust solution after years of diligent engineering", but which one it is is still an open question.
One more downside of the long-winding development is that the change of reference protocol from DCTCP to TCP Prague basically devalues the old DCTCPs measurements as proof of safety.
	My point is, it seems odd, using indirect measures like accumulated development time and magnitude of conducted tests as proxies for the quality of L4S instead of actually looking closely into the RFCs and compare their claims with the existing data. I am not saying that my assessment of L4S' implementation not being close to its promises is the only conclusion one can come to, but I would hope that everybody chiming into this consensus questions actually takes the time to look at that closely for themselves. It is easy to promise the sky, delivery & execution however...



> - L4S flows potentially causing unfairness in RFC3168 ECN bottlenecks has been mentioned as a potential concern. However, a robust RFC3168 ECN bottleneck should already have a mechanism to avoid unfairness caused by flows that are marked as ECT(0|1) and yet not performing RFC3168 responses.

	[SM] That essentially declares all non-FQ AQMs to be fair game, no? Because if they wanted better isolation they could get it (at a cost). That seems at odds with the extra mile L4S goes to avoid using FQ solutions even for a problem that is exceptionally well suited for FQ. Because that can easily be turned around, why not demand the same level of robustness from L4S instead, it being the newcomer and all? Say, require L4S to monitor flow behavior and make its classification based on observed behavior instead of a simple assertion by the sender (ECT(1) is nothing more than that, it is at best a classification on intent, while the thing that should be classified is behavior.) In the context of another thread it seems clear that pure intent signaling is actually expected to be abused:

To cite Tom Herbert paraphrasing Joe Touch  from ([tsvwg] Comment on draft-ietf-tsvwg-transport-encrypt-13)
"It was not previously mentioned in the context of extension headers.
This is a general consideration for any unauthenticated plaintext data
in a packet that an intermediate node chooses to consume. As Joe said,
in the absence of any requirement or contract, it's the prerogative of
the host to manipulate packet contents as it sees fit to gain an
advantage (where sometimes the "advantage" is just that packets get
delivered and not dropped)."

While I do not fully agree that every sender rightfully should try to abuse the network at all costs, I accept that the potential is there and solutions need to take this into account in their threat modeling (and IMHO L4S has not done so sufficiently, simply claiming without supporting evidence that ECT(1) can not be abused is either naively optimistic or intentionally misguided).


> In particular, many of the large sources of known deployments of  RFC3168 --  Linux fq_codel and cake -- are already deployed with fair queueing. In such bottlenecks L4S traffic should not cause harm to other non-L4S flows.

	[SM] Mmmh, that requires active defenses by existing network to accommodate a newcomer; sure it might ameliorate the fall-out from L4S, that would be akin to haphazardly handle flu virus samples in a public kitchen instead of a S3-bio safety laboratory, as the population should be vaccinated and hence immune already, so what is the harm? That seems not like a great idea IMHO, in neither case.


> Furthermore, if there really are ISPs with deployments of RFC3168 bottlenecks that have neither FQ nor any other protection from non-RFC3168-ECT(1) flows, then they can bleach incoming ECT(1) code points to Not-ECT and treat L4S as Not-ECT (ISPs typically already transform the DSCP byte at their ingress anyway). So I do not see harm to RFC3168 ECN bottlenecks as a prohibitive concern.
> 
> - More generally, if there is any problem discovered with the L4S experiment, either the algorithm or particular implementations, bottlenecks can easily identify L4S traffic and bleach it into Not-ECT, and treat it like Reno/CUBIC traffic.

	[SM] At that point L4S regresses into a relative boring pie-derivative (albeit with decreased burst-tolerance*) single queue AQM at those nodes where the dual queue coupled AQM was deployed, sure getting rid of the imprecise/unsafe coupling is going to be a win, but having to spend the last ECN codepoint seems a rather steep cost to get to such pedestrian a result, no?
	More importantly, why not first do the due diligence research to assess the probability of this outcome for the L4S experiment first, before roll-out/elevation to experimental RFC status?

Best Regards
	Sebastian


*) Not the best design when operating in drop only mode for Reno/CUBIC style CCs, no?

> 
> Best regards,
> Neal
>

[tsvwg] Neal Cardwell's rationale for supporting … Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Jeremy Harris
Re: [tsvwg] Neal Cardwell's rationale for support… Roland Bless
Re: [tsvwg] Neal Cardwell's rationale for support… Jonathan Morton
Re: [tsvwg] Neal Cardwell's rationale for support… Holland, Jake
Re: [tsvwg] Neal Cardwell's rationale for support… Sebastian Moeller
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Jeremy Harris
Re: [tsvwg] Neal Cardwell's rationale for support… Jonathan Morton
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Steven Blake
Re: [tsvwg] Neal Cardwell's rationale for support… Sebastian Moeller
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Jonathan Morton
Re: [tsvwg] Neal Cardwell's rationale for support… Bob Briscoe
Re: [tsvwg] Neal Cardwell's rationale for support… Holland, Jake
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Holland, Jake