Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Jonathan Morton <chromatix99@gmail.com> Wed, 13 May 2020 02:07 UTC

Return-Path: <chromatix99@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A8F0D3A0C90 for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 19:07:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.849
X-Spam-Level:
X-Spam-Status: No, score=-1.849 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MmO6ij8Ja9dB for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 19:07:54 -0700 (PDT)
Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2DDE43A0CB2 for <tsvwg@ietf.org>; Tue, 12 May 2020 19:07:54 -0700 (PDT)
Received: by mail-lf1-x12f.google.com with SMTP id a4so12227777lfh.12 for <tsvwg@ietf.org>; Tue, 12 May 2020 19:07:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XZq6OqE9rtz+W/HTCSs3VkDcqpReV5qTAC9Il9afClI=; b=TVMGrFLuAxNEF+MR1ad49IxVqQvTcoSyxDHnENmppTvWuWG//CDheFkskPecERQePW qSB9ZlpfG5EbhbojMTMCVJTisk50pxdRo54++RNNGcwCeNFAKJx4aCJt9OhNPn8o/nMW ytSpUMuX3G0fVTTOmEfJARPNVzHwuntA9C/x9WV8d6dZ+VpNrEyRz/zKmIAIlyyV16JT yT4ujibh73yNjp4x6X2gt3WBZou74OL9MYp5cdwMs1Y0c0546PLFDOdQ0EOW40iCc32d azc/y4uaVeWjIuKJUfeQf6G5GIYlB8oLaIzq5v9aJtYGB8PWk2xdnQ51ZTlq+wTRFgqN yRuA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XZq6OqE9rtz+W/HTCSs3VkDcqpReV5qTAC9Il9afClI=; b=U7+qFSjPCZ7N4Dclj/fB+GrZl+vSGuiJu4Z7JxOedu0bLo+nVd3FhZXMLSPbWQFZwT g4JuLuxzjYqcTIHAniOZHStVH0j/xN9F6etdUEYaIwmIuDuofNOS8SmPRdSxQd0gPX1A 0J/yzQ36luV7+WYv1nMieiIzbi1Mm2fkdV8Lp4/c+StKS4MQ/ek/7WEKpW7LNzliNb1f jOxdE3nn7efxn+tDFXHOUiNYBTJGyoDnRkYhLOAkgGAV6514ANEWMjFxQ7L95EQ2gE1o MmHHGHXNb2NByrGY41tvbqr8UgF1BqVAlnhxfzrkaZAGtbBT0ihY+9wiwPXO0QuDUmzl dWUA==
X-Gm-Message-State: AOAM531k70VpchoTuXd3MrQS5Z/XH/2uMF1q+sXb8XnxF4QfAtGN943p hrosQDpGBCkBkx2JhE3B/8M=
X-Google-Smtp-Source: ABdhPJyBhghm8nls8brvte/N6tcu62kD5aViW+LcBCOKOLar3ba5d/DE/KYMfaEEYzSzH4jAeB+Oog==
X-Received: by 2002:ac2:548e:: with SMTP id t14mr16169421lfk.136.1589335672203; Tue, 12 May 2020 19:07:52 -0700 (PDT)
Received: from jonathartonsmbp.lan (83-245-235-192-nat-p.elisa-mobile.fi. [83.245.235.192]) by smtp.gmail.com with ESMTPSA id q13sm15246449lfh.73.2020.05.12.19.07.50 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 May 2020 19:07:51 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.5\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <CADVnQy=1SJZVkjkn6S+ri++nFpaeTPwxQ6Jt4bDuvK-5R3ifxA@mail.gmail.com>
Date: Wed, 13 May 2020 05:07:50 +0300
Cc: Jeremy Harris <jgh@wizmail.org>, tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <C78A45E5-B1E4-4B46-9C3F-279145EA2FA1@gmail.com>
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com> <2fe941a4-6824-a6bf-5d4d-ac2402912414@wizmail.org> <2F3117CD-6939-4FC3-89B3-D45C481A1B02@gmail.com> <CADVnQykYxdHDPb3XJ6hGRk2Lbx_9gT22TUq=i5ZfP=L0KGx3jw@mail.gmail.com> <4c6b24ef-5c29-8e2e-17b1-91f14e0205a7@wizmail.org> <CADVnQy=1SJZVkjkn6S+ri++nFpaeTPwxQ6Jt4bDuvK-5R3ifxA@mail.gmail.com>
To: Neal Cardwell <ncardwell@google.com>
X-Mailer: Apple Mail (2.3445.9.5)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/WtP3Ut0GqTJCQPLsUVB54uf73TE>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 May 2020 02:07:56 -0000

> On 13 May, 2020, at 1:29 am, Neal Cardwell <ncardwell@google.com> wrote:
> 
>> They would if they tried to participate in a plain 3168 conversation,
>> right?  I don't see you can dislike SCE for that.
> 
> Yes, they would hit these issues if they tried to participate in a
> plain RFC3168 conversation, but I think that's somewhat academic,
> since they don't use RFC3168. I think the useful comparison is L4S vs
> SCE.

If you want to compare L4S to SCE, do it properly.

Look at the big picture.  How many existing, RFC-compliant networks does each proposal break?  How many of the ones that do break are RFC-ignorant to begin with?  How severe is the breakage; does it violate the principle of effective congestion control, or merely degrade gracefully to a more basic level of functionality?

Look at the data produced by both sides, not just one, and question why one set appears to contradict the other set.  Is there a difference in the focus of each set of tests?  Are they looking for the optimistic scenario, as desired for marketing, or for the pessimistic scenario, as required for real engineering?

Look at the arguments being presented.  Do they invoke appeals to authority, or fall into the sunk cost fallacy?  Or are they trying desperately to function as a meritocracy in an environment that seems strangely hostile to that concept?

>> How does it behave for L4S, without changing the datacentre hardware?
> 
> I think for the L4S case, the point is that the sender would mark the
> packets as ECT(1), even shallow queues in the datacenter switch would
> be marked as CE, the receiver would reflect the CE information, and
> the sender would see the CE but would know that the queue might well
> be shallow, so would not necessarily cut its cwnd by 50% (Reno) or 30%
> (CUBIC), until/unless the CE marks were sustained/prevalent.

So basically, from your perspective, it's fine to discard the existing ECN deployment in favour of making everything run DCTCP.  You will ignore all requests to negotiate RFC-3168, and only respond affirmatively to AccECN.  Welcome to the brave new world of L4S.  Safe as helicopters.  I'm *so* glad I'm a Beta.

What do you do about all those flows which, denied the chance to run ECN, have fallen back to packet loss as a congestion signal?  You still have the same problem of coping with their AIMD behaviour, and their application latency is increased by approximately one RTT every time they have to retransmit one of those lost packets.  So much for the L4S promise of "ultra low latency for everyone".

And what of everyone else who is already running, or thinking about deploying, AQMs and RFC-3168 ECN?  L4S throws a huge spanner into those works, which have taken twenty years to get moving to any appreciable degree.  Personally, I think the tipping point has been the development of AQM algorithms that are correctly configurable by mere mortals - a state of affairs to which I have contributed several years of my time.

There will be plenty of people who deliberately disable L4S support because their own network interacts badly with it - including everyone connecting over wifi.  I wonder what percentage of ordinary consumers even know what an Ethernet cable is?  SCE still offers reasonable performance to those users, and I'd be perfectly happy if they carried on using plain old ECN.  Any AQM at all is an improvement over a dumb, two-second-deep FIFO.

L4S still has too many problems - fundamental ones to which no robust solution has even been outlined - to commit the last ECN codepoint to.  I will freely admit that SCE also still needs work, but in some areas we are clearly ahead, and we make an effort to avoid over-promising.  We've had one year to get this far.  Given another year, what can we achieve?

 - Jonathan Morton