Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Sebastian Moeller <moeller0@gmx.de> Tue, 12 May 2020 22:04 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 398743A0C2B for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 15:04:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UEnFc8RmWWbf for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 15:04:07 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8F04D3A0C34 for <tsvwg@ietf.org>; Tue, 12 May 2020 15:04:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1589321043; bh=Fm3ehpyBvK3DjYnmVc2D505aomJPVssL+nyE1/iZ7EM=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=lAUHHXRSoFIRpSxaBgLp9axleBh/6W97AL39g69XpQUWasE1Gm90rnn6L/euIGxFf +mONUJqldCW0PpIbQW+KYfU9sVirndTLp1z81QiunHilZjB1cWd0xhJRY2qtMhtp5J cbGGkDD1FxBpUFXwPsg/mCsF+OBdvdzD/J0++Li8=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from hms-beagle2.lan ([77.3.139.171]) by mail.gmx.com (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1N8obG-1j4bMj19OA-015nt1; Wed, 13 May 2020 00:04:03 +0200
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.14\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <CADVnQykBXW5Y-+on1CQpN1vg_umV3DKqE+grKS9kvVP1y9NC3g@mail.gmail.com>
Date: Wed, 13 May 2020 00:04:01 +0200
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <CF3DF911-0B7F-47CD-90CB-8EA56344DE67@gmx.de>
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com> <06627DFC-6F54-4FCB-A071-F4F9D671B1CC@gmx.de> <CADVnQykBXW5Y-+on1CQpN1vg_umV3DKqE+grKS9kvVP1y9NC3g@mail.gmail.com>
To: Neal Cardwell <ncardwell@google.com>
X-Mailer: Apple Mail (2.3445.104.14)
X-Provags-ID: V03:K1:qjn2Ip+LN8CphrHnHiKWTNIuGOe3Ml7QZ3i3gU6nsHsrv59ii+o QT9m7SWaY/0IwjtPXeR7QKO/0yWsOZhO9PMWgELt40kzzkapJcnTewwbro/5s6+5/RXT38Z rEVEbWETYVxvmQ9JuD7Y/FM6Q0vdp6/D9GmUTr/DhYDgls9UHFopIRLaKhJG0cF8J0d3l1n DmgWd1Fsg3reUPGC4ujpQ==
X-UI-Out-Filterresults: notjunk:1;V03:K0:FZnpELZ/pB0=:sOssqp3ZtuBl9iLASNZA7f vl1Gp8f2DSKCie2K2KLTCSSAE8/Hit2ocNccUwndvkDhg5pnS9LNIOfKWJaN008xayFCyzC6Q 2/UzY9x5PQomkbg3hkxTfkx/LCZ9o3RtmXmyMYRa7+YfjQDU0FrEPFPqHGZELjaOTOF5j/FsV BU2bb40Lc2GPNPwoQ9wk0ayGFci/qkevvOIa4PoHrr1wUQKvE1Z0PqpcysbeIUbmG3GKBmjPO 6RZLga2bmTakfjAbiHCjaN5o0I0lpXCWywu6pG9dO7k84KSBC7uYFkpnrov4DzxjpC66GIHSX C2j5JEvyan4E+WPFb494UF4D/1QCQJcMFeCdpywWKXIq6Y/Uo/oCaERuAcxfqWKluXZU9fuzU 7XwNHpakG5HY9APYE22wDikASQqtrApsgiPgzePi88HMeXdF7PAYXaPl1ioIQA6qk71SykC6O xbpg2A9t9C00B+7Dn8/ygyRSACsRRCt6yPZ3UxwGFJN9lteuxr1qqtA5oE7IctTTwwqCcn2An a5feKvXLBsVT38uvOkkKSGw4HU2koJgBLdespVFXBenKBuW5IGTEm5BZApz0DgyJWRIeO5M3X 6Gnc7uGvim+q8IXjYgh+W3DwhMy5zBXTKXy3F6BaFBDoIKpOv84oh0Kpp9VcoM030MLkAyePF TzDBrf2g787vyBYi117LZDkXaGGZytzkktD3MoS4PejgEhzXvXvzVgJPe38kPRI92nNfBphdJ Nhme9S70hRpNS6H4HrR3yFeiZoMopcmiVdkcjt6rTagsBt9fsQDC3M9cCR8fP2r+XvtMiuPAu WoWAnlEsZJvukxDvlesaCdp9VwipJvJkKFqis3L94HCu68rLUZm9kRni812MAyq/RIWKSJcVh HQKUEhWQHadPJNghz1SQ8ZNgGSLzCfBX5BqVIG0ACMMPaVBNjGmIg465mXtpo506vyxLHzzHa nt18l6IGGYO24IRGjKW+XwdcsH6LyDO+M6htdqVQk8gne0nID9qTvBleauRRcf//X85z9MLcL VfXCDWX2N5aGmDWDD1iEMn3lbnHrUddYG5d5/yzHkdRCNnibiuBcWJhZof/Nkr4fE8QUdyDO3 9gfiFRM8fdLxFjX4hbHHSoNIcSazj226ZAOxoBfnoRQ9TvpxZpTb8Ji+DWUEZS10S0kwt9k2S 8YxGuUwNyRaoUXkINKrRewKHtVKqcxyTCgN/MImrB779LshhmmpT/JyL+wHz6SLW2AfEMHav8 Vg5oHNUzHLxRGu+w2
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/yGeaXQUJSizszJnXnXiyoOcQsvg>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 May 2020 22:04:09 -0000

Hi Neal,

thanks for your input, more below in-line.

> On May 12, 2020, at 23:30, Neal Cardwell <ncardwell@google.com> wrote:
> 
> Hi Sebastian,
> 
> Some thoughts in-line below...
> 
> On Sat, May 9, 2020 at 6:53 AM Sebastian Moeller <moeller0@gmx.de> wrote:
>> 
>> Hi Neal,
>> 
>> 
>>> On May 8, 2020, at 17:19, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org> wrote:
>>> [...]
>>> 
>>> - SCE seems to involve an ecosystem with a more complex and more
>>>  experimental CC (with two different kinds of ECN signal) and little
>>>  real-world/production experience yet.. L4S seems to involve an ecosystem
>>>  that provides a queue that is basically a single-threshold,
>>>  shallow-threshold, DCTCP-style, ECN ecosystem, which is simpler and for
>>>  which the world has a lot of accumulated academic research and
>>>  real-world/production experience over the last decade.
>> 
>> [SM] Interestingly, I take it as a considerable downside that in a decade of
>> work L4S has not managed to come up with robust and reliable solutions to
>> its challenges. "Too little, too late", comes to mind as much as "robust
>> solution after years of diligent engineering", but which one it is is still
>> an open question.
>> 
>> One more downside of the long-winding development is that the change of
>> reference protocol from DCTCP to TCP Prague basically devalues the old DCTCPs
>> measurements as proof of safety.
> <
>> My point is, it seems odd, using indirect measures like accumulated
>> development time and magnitude of conducted tests as proxies for the quality
>> of L4S instead of actually looking closely into the RFCs and compare their
>> claims with the existing data. I am not saying that my assessment of L4S'
>> implementation not being close to its promises is the only conclusion one
>> can come to, but I would hope that everybody chiming into this consensus
>> questions actually takes the time to look at that closely for themselves. It
>> is easy to promise the sky, delivery & execution however...
> 
> Both L4S and SCE have algorithms and implementations that are works in
> progress, and not set in stone at this point. Since they are works in
> progress, I think it's worthwhile to focus on the core question we are
> facing here, which is about the interpretation of the ECT(1) code
> point. I think it's useful to distinguish between what is inherent in
> the interpretation of the code point from what is incidental in the
> current algorithms/implementations on either side.

	[SM] That would have been an interesting discussion, but as far as I can tell a discussion we did not have.


> 
>>> - L4S flows potentially causing unfairness in RFC3168 ECN bottlenecks has
>>    been mentioned as a potential concern. However, a robust RFC3168 ECN
>>    bottleneck should already have a mechanism to avoid unfairness caused by
>>    flows that are marked as ECT(0|1) and yet not performing RFC3168
>>    responses.
>> 
>> [SM] That essentially declares all non-FQ AQMs to be fair game, no?
> 
> No, there are ways to deal with abusive flows that do not require fair queuing.

	[SM] Could you please elaborate on this? For example L4S advisory queue protection scheme basically is just a bad implementation of fair queuing... combining most of the cost with few of the upsides (for example, it only triggers after the fact and packets of the offending flow already in the LL-queue stay there, so if I can change the 5-tuple of an offendin flow intended to disturb LL-services often enough, queue protection will constantly run behind me).

> 
>> Because
>> if they wanted better isolation they could get it (at a cost). That seems at
>> odds with the extra mile L4S goes to avoid using FQ solutions even for a
>> problem that is exceptionally well suited for FQ. Because that can easily be
>> turned around, why not demand the same level of robustness from L4S instead,
>> it being the newcomer and all? Say, require L4S to monitor flow behavior and
>> make its classification based on observed behavior instead of a simple
>> assertion by the sender (ECT(1) is nothing more than that, it is at best a
>> classification on intent, while the thing that should be classified is
>> behavior.) In the context of another thread it seems clear that pure intent
>> signaling is actually expected to be abused:
> ...
>> While I do not fully agree that every sender rightfully should try to abuse
>> the network at all costs, I accept that the potential is there and solutions
>> need to take this into account in their threat modeling (and IMHO L4S has not
>> done so sufficiently, simply claiming without supporting evidence that ECT(1)
>> can not be abused is either naively optimistic or intentionally misguided).
> 
> L4S does not claim that ECT(1) cannot be abused.

	[SM] It is missing a realistic discussion about how it wants to deal with t


> Rather, it has a
> rather well-developed story for detecting and dealing with abuse of
> the ECT(1) code point with queue protection algorithms. Please see:
> 
>  https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-04#section-8.2

	[SM] "Such a queue protection function is not considered a necessary part
   of the L4S architecture, which works without it (in a similar way to
   how the Internet works without per-flow rate policing)."

Yeah that might be a solution, but being purely advisory is not going to cut it, the main rationale seems to be wishful thinking instead:
" It is
   hoped that self-interest and standardisation of dynamic behaviour
   (cf.  TCP slow-start) will be sufficient to prevent transports from
   sending excessive bursts of L4S traffic, given the application's own
   latency will suffer most from such behaviour.

Whether burst policing becomes necessary remains to be seen.  Without
   it, there will be potential for attacks on the low latency of the L4S
   service.  However it may only be necessary to apply such policing
   reactively, e.g. punitively targeted at any deployments of new bursty
   malware."

It is not "claimed", but "hoped" that it is not abused, not a sign of safe engineering, as anything that can be abused will be abused.


>  https://tools.ietf.org/html/draft-briscoe-docsis-q-protection-00

	[SM] I have a) read this, and argued that that basically drags in L4-header inspection and keeping (limited) per flow state, if we are willing to pay this price we are actually better of going FQ all the way IMHO, because that would solve a number of warts in the L4S design. 
And b) I have noted that that is not really a hindrance for abuse, it will only mildly push back against totally barren attempts to take over the LL-queue too hastily. But if the whole goal is disruption of the LL-queue it makes my limited burst DOS, auto-homing on my victims probably most latency sensitive flows, and as long as my attack is doing its deed not by continuous high rates, but simply by being sufficiently bursty I bet I can do lots of damage on the nominal LL side of L4S without triggering any circuit breakers or queue protection. 

As I mentioned before, I seems that the L4S design has not been tested with many adversarial traffic patterns yet, which after a decade of development is an odd thing to observe, no?

> 
>>> In particular, many of the large sources of known deployments of RFC3168 --
>>> Linux fq_codel and cake -- are already deployed with fair queueing. In such
>>> bottlenecks L4S traffic should not cause harm to other non-L4S flows.
>> 
>> [SM] Mmmh, that requires active defenses by existing network to
>> accommodate a newcomer...
> 
> It's not perfect, but we can't let the perfect be the enemy of the
> good, and need to evaluate all the trade-offs of the alternatives
> holistically.

	[SM] Erm, my complaints mostly come from comparing the promises in the L4S drafts with the reality of the L4S implementation after a ~decade of working on it. This is not about perfection, but about demonstration that the promises can actually be robustly and reliably delivered. The "perfect versus good" argument does IMHO not apply here, as I am rarely comparing L4S against a theoretically superior hypothetical alternative. 

Best Regards
	Sebastian

> 
> Best regards,
> neal