Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Sebastian Moeller <moeller0@gmx.de> Sat, 09 May 2020 10:53 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 371F23A09CE for <tsvwg@ietfa.amsl.com>; Sat, 9 May 2020 03:53:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 76ts4mmKQ57z for <tsvwg@ietfa.amsl.com>; Sat, 9 May 2020 03:53:09 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 45F9E3A09D3 for <tsvwg@ietf.org>; Sat, 9 May 2020 03:53:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1589021580; bh=uEmIWhsbWO7BbjkcsJUuWYqjE1Zhfi6/BAx7pBHOt68=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=ScYmvd5cYSpDdt8oOFPN74+8k2Y9XEp4vW5ueUEoDXVnxJffDv/2zdWSUgnbVrEjj sTLLbG1ChvP9TH6/yaJpYsqELniJZ9wmBU0/1pVQOwQknwN+yFOP9ZH56TWq4Offpq cX1QpCkIjiSNO30S8r/hjPxdskHJf0sreo3yXo0s=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from hms-beagle2.lan ([77.0.111.15]) by mail.gmx.com (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1McpNo-1ixU3K1pVy-00a1BU; Sat, 09 May 2020 12:53:00 +0200
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.14\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com>
Date: Sat, 09 May 2020 12:52:58 +0200
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <06627DFC-6F54-4FCB-A071-F4F9D671B1CC@gmx.de>
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com>
To: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>
X-Mailer: Apple Mail (2.3445.104.14)
X-Provags-ID: V03:K1:RvzaLZsm3VOKlx0oZzzvOvKLWf8hKQx4bJZ/OwegwT3VyINRxGo pf+/4Mq41SFhp9DsRFocdGt1nI/+uRKoPNmWADR8aFdP3bEqm8kPXgZ0U5Y/moLGT2hh/Ss lVtUVmlKWH7kZTgr2h3B1RnsTOcOZlPglr6otU0UKyJyHFML9f9wqlKEs/l2hfxXm0L0MqX jcmXNEhqShMihJ3e7GBZw==
X-UI-Out-Filterresults: notjunk:1;V03:K0:FhvbkPkUJ7A=:lb9GyF+o5FzddNCr01f7XY j1xgUAsIeAecLbxVIXUyS9SNAqs6ksoLQ6BQwEyw8OOAqGx9A67I25Xz6wwblTYxrZd/cbQ0k CmfI1gCYdd30nSg5OY5ckGNe+QRvP0jS359uLWoR2n1OPScCCB/m/MFjFIFBTnlgdPMeV6Ew+ iPfuXqj2fMSGgRpSIsoqbfDn4MkuqShjpPlYvwuOQrOTIO5RV4VOqUnBwjVgQr+bJkKMnT9jD pR0dhvyuS2dyhOsGV6y58n6eeLNrIMXVEdt2g1KztRlO/3CLG/vb4gnC7M69E6JqN4IQ0oWGN KJpSyWUHt0B6t4tHCF3niYX8obLpA9JSKwAdLZ9Dei9te+2cf4oVkqr0x+dxbZomqsDla2Pp0 mJcUTyTL7MqINIU/HeZ/L4Va7jl0sYLRyjD1vJZXUfKsfkF6lYP0vZyEycno7vIUGgtpgHK6M jP+U2CRVykdw6hqP1fk2wF1yh3JZjp3zXpnvR67H+um/4lfTeSp4DPrGFn4sCIU1TlbPNnnka EMsiSpJBO1GV/UPpFeOFPkQ/UWbFzyN2vU2lPfV/Rpw4RT+xJGWQiD2etwM8lVu6iO/sBIrb6 ni3CM9uZnO8i5oHLk7VnLzDFQKferiuisi2UUdITSWX4tF+IBbUep2B2o5sER614Zv/+5kL58 4VAOl3dNGSOr6+tIAtpei2CxlB12T5UWYRGwkwdnRQbpUWBBe2jAcDxyuObW/drplzfrFY4qR +NGVSGxL2Ud74Y6lhEfQ/0qq2FnaDC4baPyTd8YJ5KZ8XSzNMD78ssbdoLjmiskEc7P7SCN/n 0t4rasj9xC7+nC7y/pYnHC2fYvEW6W1/H5IFH/Usr28Pij50rgPRT9L6NwsvLFNeRdunJVbCv p8sd/sOjnvnLtQrzY4CQj4vafF/JX+/Ijr9L4r8sv2hEXOQo7cEIGX3b1+2talvfm0Sc+3Kzc E8isCWHYnbEzj7UdVzQ3Gg/NtfHZqpXyaMUpomNkEmcMDOFXhCy6xX9WjYqV6osD+2NwxYjYd yLdxlIJZXXtoqaQ6+VvemDvnO3KYJw+m+9BaAgI7h+vPY1FWebXumCSofL/A5Z1xwgfkDpmTA 5ir07m+2CWZbRGbS2mlVJl3RzeRsVXkzjleVMcOsY5iG7cBDdo9ogF9H+wLJgeyL19Ic5JP64 iwPNLMMP49Vy1V1wGzLy6bceGOgJ12xno0cSaV5S7XeeNLB1lzMC1UXB8ZQ/F5LsQzCo/AxnA /jHXEiOHYR9pscg7C
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/x2wAn8wtHlahrVCH52tlAVl5Ufs>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 May 2020 10:53:11 -0000

Hi Neal,


> On May 8, 2020, at 17:19, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org> wrote:
> [...]
> 
> - SCE seems to involve an ecosystem with a more complex and more experimental CC (with two different kinds of ECN signal) and little real-world/production experience yet..  L4S seems to involve an ecosystem that provides a queue that  is basically a single-threshold, shallow-threshold, DCTCP-style, ECN ecosystem, which is simpler and for which the world has a lot of accumulated academic research and real-world/production experience over the last decade.

	[SM] Interestingly, I take it as a considerable downside that in a decade of work L4S has not managed to come up with robust and reliable solutions to its challenges. "Too little, too late", comes to mind as much as "robust solution after years of diligent engineering", but which one it is is still an open question.
One more downside of the long-winding development is that the change of reference protocol from DCTCP to TCP Prague basically devalues the old DCTCPs measurements as proof of safety.
	My point is, it seems odd, using indirect measures like accumulated development time and magnitude of conducted tests as proxies for the quality of L4S instead of actually looking closely into the RFCs and compare their claims with the existing data. I am not saying that my assessment of L4S' implementation not being close to its promises is the only conclusion one can come to, but I would hope that everybody chiming into this consensus questions actually takes the time to look at that closely for themselves. It is easy to promise the sky, delivery & execution however...



> - L4S flows potentially causing unfairness in RFC3168 ECN bottlenecks has been mentioned as a potential concern. However, a robust RFC3168 ECN bottleneck should already have a mechanism to avoid unfairness caused by flows that are marked as ECT(0|1) and yet not performing RFC3168 responses.

	[SM] That essentially declares all non-FQ AQMs to be fair game, no? Because if they wanted better isolation they could get it (at a cost). That seems at odds with the extra mile L4S goes to avoid using FQ solutions even for a problem that is exceptionally well suited for FQ. Because that can easily be turned around, why not demand the same level of robustness from L4S instead, it being the newcomer and all? Say, require L4S to monitor flow behavior and make its classification based on observed behavior instead of a simple assertion by the sender (ECT(1) is nothing more than that, it is at best a classification on intent, while the thing that should be classified is behavior.) In the context of another thread it seems clear that pure intent signaling is actually expected to be abused:

To cite Tom Herbert paraphrasing Joe Touch  from ([tsvwg] Comment on draft-ietf-tsvwg-transport-encrypt-13)
"It was not previously mentioned in the context of extension headers.
This is a general consideration for any unauthenticated plaintext data
in a packet that an intermediate node chooses to consume. As Joe said,
in the absence of any requirement or contract, it's the prerogative of
the host to manipulate packet contents as it sees fit to gain an
advantage (where sometimes the "advantage" is just that packets get
delivered and not dropped)."

While I do not fully agree that every sender rightfully should try to abuse the network at all costs, I accept that the potential is there and solutions need to take this into account in their threat modeling (and IMHO L4S has not done so sufficiently, simply claiming without supporting evidence that ECT(1) can not be abused is either naively optimistic or intentionally misguided).


> In particular, many of the large sources of known deployments of  RFC3168 --  Linux fq_codel and cake -- are already deployed with fair queueing. In such bottlenecks L4S traffic should not cause harm to other non-L4S flows.

	[SM] Mmmh, that requires active defenses by existing network to accommodate a newcomer; sure it might ameliorate the fall-out from L4S, that would be akin to haphazardly handle flu virus samples in a public kitchen instead of a S3-bio safety laboratory, as the population should be vaccinated and hence immune already, so what is the harm? That seems not like a great idea IMHO, in neither case.


> Furthermore, if there really are ISPs with deployments of RFC3168 bottlenecks that have neither FQ nor any other protection from non-RFC3168-ECT(1) flows, then they can bleach incoming ECT(1) code points to Not-ECT and treat L4S as Not-ECT (ISPs typically already transform the DSCP byte at their ingress anyway). So I do not see harm to RFC3168 ECN bottlenecks as a prohibitive concern.
> 
> - More generally, if there is any problem discovered with the L4S experiment, either the algorithm or particular implementations, bottlenecks can easily identify L4S traffic and bleach it into Not-ECT, and treat it like Reno/CUBIC traffic.

	[SM] At that point L4S regresses into a relative boring pie-derivative (albeit with decreased burst-tolerance*) single queue AQM at those nodes where the dual queue coupled AQM was deployed, sure getting rid of the imprecise/unsafe coupling is going to be a win, but having to spend the last ECN codepoint seems a rather steep cost to get to such pedestrian a result, no?
	More importantly, why not first do the due diligence research to assess the probability of this outcome for the L4S experiment first, before roll-out/elevation to experimental RFC status?

Best Regards
	Sebastian


*) Not the best design when operating in drop only mode for Reno/CUBIC style CCs, no?

> 
> Best regards,
> Neal
>