Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Sebastian Moeller <moeller0@gmx.de> Tue, 12 May 2020 22:04 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.14\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <CADVnQykBXW5Y-+on1CQpN1vg_umV3DKqE+grKS9kvVP1y9NC3g@mail.gmail.com>
Date: Wed, 13 May 2020 00:04:01 +0200
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <CF3DF911-0B7F-47CD-90CB-8EA56344DE67@gmx.de>
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com> <06627DFC-6F54-4FCB-A071-F4F9D671B1CC@gmx.de> <CADVnQykBXW5Y-+on1CQpN1vg_umV3DKqE+grKS9kvVP1y9NC3g@mail.gmail.com>
To: Neal Cardwell <ncardwell@google.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/yGeaXQUJSizszJnXnXiyoOcQsvg>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
Precedence: list

Hi Neal,

thanks for your input, more below in-line.

> On May 12, 2020, at 23:30, Neal Cardwell <ncardwell@google.com> wrote:
> 
> Hi Sebastian,
> 
> Some thoughts in-line below...
> 
> On Sat, May 9, 2020 at 6:53 AM Sebastian Moeller <moeller0@gmx.de> wrote:
>> 
>> Hi Neal,
>> 
>> 
>>> On May 8, 2020, at 17:19, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org> wrote:
>>> [...]
>>> 
>>> - SCE seems to involve an ecosystem with a more complex and more
>>>  experimental CC (with two different kinds of ECN signal) and little
>>>  real-world/production experience yet.. L4S seems to involve an ecosystem
>>>  that provides a queue that is basically a single-threshold,
>>>  shallow-threshold, DCTCP-style, ECN ecosystem, which is simpler and for
>>>  which the world has a lot of accumulated academic research and
>>>  real-world/production experience over the last decade.
>> 
>> [SM] Interestingly, I take it as a considerable downside that in a decade of
>> work L4S has not managed to come up with robust and reliable solutions to
>> its challenges. "Too little, too late", comes to mind as much as "robust
>> solution after years of diligent engineering", but which one it is is still
>> an open question.
>> 
>> One more downside of the long-winding development is that the change of
>> reference protocol from DCTCP to TCP Prague basically devalues the old DCTCPs
>> measurements as proof of safety.
> <
>> My point is, it seems odd, using indirect measures like accumulated
>> development time and magnitude of conducted tests as proxies for the quality
>> of L4S instead of actually looking closely into the RFCs and compare their
>> claims with the existing data. I am not saying that my assessment of L4S'
>> implementation not being close to its promises is the only conclusion one
>> can come to, but I would hope that everybody chiming into this consensus
>> questions actually takes the time to look at that closely for themselves. It
>> is easy to promise the sky, delivery & execution however...
> 
> Both L4S and SCE have algorithms and implementations that are works in
> progress, and not set in stone at this point. Since they are works in
> progress, I think it's worthwhile to focus on the core question we are
> facing here, which is about the interpretation of the ECT(1) code
> point. I think it's useful to distinguish between what is inherent in
> the interpretation of the code point from what is incidental in the
> current algorithms/implementations on either side.

	[SM] That would have been an interesting discussion, but as far as I can tell a discussion we did not have.


> 
>>> - L4S flows potentially causing unfairness in RFC3168 ECN bottlenecks has
>>    been mentioned as a potential concern. However, a robust RFC3168 ECN
>>    bottleneck should already have a mechanism to avoid unfairness caused by
>>    flows that are marked as ECT(0|1) and yet not performing RFC3168
>>    responses.
>> 
>> [SM] That essentially declares all non-FQ AQMs to be fair game, no?
> 
> No, there are ways to deal with abusive flows that do not require fair queuing.

	[SM] Could you please elaborate on this? For example L4S advisory queue protection scheme basically is just a bad implementation of fair queuing... combining most of the cost with few of the upsides (for example, it only triggers after the fact and packets of the offending flow already in the LL-queue stay there, so if I can change the 5-tuple of an offendin flow intended to disturb LL-services often enough, queue protection will constantly run behind me).

> 
>> Because
>> if they wanted better isolation they could get it (at a cost). That seems at
>> odds with the extra mile L4S goes to avoid using FQ solutions even for a
>> problem that is exceptionally well suited for FQ. Because that can easily be
>> turned around, why not demand the same level of robustness from L4S instead,
>> it being the newcomer and all? Say, require L4S to monitor flow behavior and
>> make its classification based on observed behavior instead of a simple
>> assertion by the sender (ECT(1) is nothing more than that, it is at best a
>> classification on intent, while the thing that should be classified is
>> behavior.) In the context of another thread it seems clear that pure intent
>> signaling is actually expected to be abused:
> ...
>> While I do not fully agree that every sender rightfully should try to abuse
>> the network at all costs, I accept that the potential is there and solutions
>> need to take this into account in their threat modeling (and IMHO L4S has not
>> done so sufficiently, simply claiming without supporting evidence that ECT(1)
>> can not be abused is either naively optimistic or intentionally misguided).
> 
> L4S does not claim that ECT(1) cannot be abused.

	[SM] It is missing a realistic discussion about how it wants to deal with t


> Rather, it has a
> rather well-developed story for detecting and dealing with abuse of
> the ECT(1) code point with queue protection algorithms. Please see:
> 
>  https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-04#section-8.2

	[SM] "Such a queue protection function is not considered a necessary part
   of the L4S architecture, which works without it (in a similar way to
   how the Internet works without per-flow rate policing)."

Yeah that might be a solution, but being purely advisory is not going to cut it, the main rationale seems to be wishful thinking instead:
" It is
   hoped that self-interest and standardisation of dynamic behaviour
   (cf.  TCP slow-start) will be sufficient to prevent transports from
   sending excessive bursts of L4S traffic, given the application's own
   latency will suffer most from such behaviour.

Whether burst policing becomes necessary remains to be seen.  Without
   it, there will be potential for attacks on the low latency of the L4S
   service.  However it may only be necessary to apply such policing
   reactively, e.g. punitively targeted at any deployments of new bursty
   malware."

It is not "claimed", but "hoped" that it is not abused, not a sign of safe engineering, as anything that can be abused will be abused.


>  https://tools.ietf.org/html/draft-briscoe-docsis-q-protection-00

	[SM] I have a) read this, and argued that that basically drags in L4-header inspection and keeping (limited) per flow state, if we are willing to pay this price we are actually better of going FQ all the way IMHO, because that would solve a number of warts in the L4S design. 
And b) I have noted that that is not really a hindrance for abuse, it will only mildly push back against totally barren attempts to take over the LL-queue too hastily. But if the whole goal is disruption of the LL-queue it makes my limited burst DOS, auto-homing on my victims probably most latency sensitive flows, and as long as my attack is doing its deed not by continuous high rates, but simply by being sufficiently bursty I bet I can do lots of damage on the nominal LL side of L4S without triggering any circuit breakers or queue protection. 

As I mentioned before, I seems that the L4S design has not been tested with many adversarial traffic patterns yet, which after a decade of development is an odd thing to observe, no?

> 
>>> In particular, many of the large sources of known deployments of RFC3168 --
>>> Linux fq_codel and cake -- are already deployed with fair queueing. In such
>>> bottlenecks L4S traffic should not cause harm to other non-L4S flows.
>> 
>> [SM] Mmmh, that requires active defenses by existing network to
>> accommodate a newcomer...
> 
> It's not perfect, but we can't let the perfect be the enemy of the
> good, and need to evaluate all the trade-offs of the alternatives
> holistically.

	[SM] Erm, my complaints mostly come from comparing the promises in the L4S drafts with the reality of the L4S implementation after a ~decade of working on it. This is not about perfection, but about demonstration that the promises can actually be robustly and reliably delivered. The "perfect versus good" argument does IMHO not apply here, as I am rarely comparing L4S against a theoretically superior hypothetical alternative. 

Best Regards
	Sebastian

> 
> Best regards,
> neal

[tsvwg] Neal Cardwell's rationale for supporting … Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Jeremy Harris
Re: [tsvwg] Neal Cardwell's rationale for support… Roland Bless
Re: [tsvwg] Neal Cardwell's rationale for support… Jonathan Morton
Re: [tsvwg] Neal Cardwell's rationale for support… Holland, Jake
Re: [tsvwg] Neal Cardwell's rationale for support… Sebastian Moeller
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Jeremy Harris
Re: [tsvwg] Neal Cardwell's rationale for support… Jonathan Morton
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Steven Blake
Re: [tsvwg] Neal Cardwell's rationale for support… Sebastian Moeller
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Jonathan Morton
Re: [tsvwg] Neal Cardwell's rationale for support… Bob Briscoe
Re: [tsvwg] Neal Cardwell's rationale for support… Holland, Jake
Re: [tsvwg] Neal Cardwell's rationale for support… Neal Cardwell
Re: [tsvwg] Neal Cardwell's rationale for support… Holland, Jake