Re: [tsvwg] [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

Jonathan Morton <chromatix99@gmail.com> Thu, 21 March 2019 07:46 UTC

Return-Path: <chromatix99@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 58FE7130F3E for <tsvwg@ietfa.amsl.com>; Thu, 21 Mar 2019 00:46:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.749
X-Spam-Level:
X-Spam-Status: No, score=-1.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ybVz6YuoRuNW for <tsvwg@ietfa.amsl.com>; Thu, 21 Mar 2019 00:46:44 -0700 (PDT)
Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EFECE130F39 for <tsvwg@ietf.org>; Thu, 21 Mar 2019 00:46:43 -0700 (PDT)
Received: by mail-lf1-x12d.google.com with SMTP id m13so3876098lfb.6 for <tsvwg@ietf.org>; Thu, 21 Mar 2019 00:46:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=HSYOiSKLH00OFDiItfzxG2rZfep9PdHEWqpAS2LoLhA=; b=rQj/s7S3OjHjC+Cf/chpaYU4PoQKT7d4gV4xZE9YJLsLt7Y2+YBscqQBRvvMI4KDf8 88CvEkL8BOK3urMxwiuupfS8uOiC15xW1hvUsi5KVOD3dFa5l6Vt0BAFSJW1f/gBnhyF sFqu2Aw7f5EbkdU/cDIaP5PdnW6EiaB7X+z1c3UvKJEUaI9cm7xEf8OB2xnKXsv7RdDq L01JwReqRc35SMylaRyrzr1We/tu2ht3fhiCXQzAw8869liydaJ5BuktlpQmrGOS7icU GM2HZpYWgbBrG59eLInufkWeSQM5Phq4il69g4rltP0NhPU8Kd5y4EzU+sbdzpFJ1H1q XPpQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=HSYOiSKLH00OFDiItfzxG2rZfep9PdHEWqpAS2LoLhA=; b=KCXcb+kzRu6mS3uuW/kWQRVWC/hMHAQLbazhVRggIHzwUVVxZlJyRAkAAwUaYDpnjC fD1KKfGq6C6QWYSZWVPAxAzz6E6tdxz7mHJ8TBwjWMNZCeeWBBU1jMuyeh64z4TwP46f 5PLkrVXEPS8aBQmGd2xK+u3krzii3jLM3RFjoy/lFKc/MU/KOrCabyVVSSmPjvBk17WM gSvEC64oa6CkCvvVsWABkPbnaxnCNl/c8Tq0mKWj2DgSL9SYaqTxSBgMhELO04Kd/IxL zqEChdFsdzHfWa1tee+QZR3UCvsWZOUnx6F7qeVedwR0Ans9pRaoHeqXU8DKVJ5Pdn7A A/HA==
X-Gm-Message-State: APjAAAWwdDZRFX9j/WgeQQtZBo4J69vKzADmkuxIOoynJMtQphYgADfN QNkp4jLF5b3aQyy8LDvmv2M=
X-Google-Smtp-Source: APXvYqxn25E3N2o/IX74WvMNTM05fFNNksZNi9I422JugxWUjJd/Q+7nbTUr19nceSl4wfaHJwd8uA==
X-Received: by 2002:a19:40cc:: with SMTP id n195mr1226691lfa.150.1553154402147; Thu, 21 Mar 2019 00:46:42 -0700 (PDT)
Received: from jonathartonsmbp.lan (83-245-226-9-nat-p.elisa-mobile.fi. [83.245.226.9]) by smtp.gmail.com with ESMTPSA id y19sm814974lfd.62.2019.03.21.00.46.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Mar 2019 00:46:41 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <f331e710-ed2c-8628-4c82-f162d9cc8763@bobbriscoe.net>
Date: Thu, 21 Mar 2019 09:46:39 +0200
Cc: "Holland, Jake" <jholland@akamai.com>, tsvwg IETF list <tsvwg@ietf.org>, bloat <bloat@lists.bufferbloat.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <C4BED95B-A169-473E-B857-C26BC2AFBE54@gmail.com>
References: <d91a6a71-5898-9571-2a02-0d9d83839615@bobbriscoe.net> <CAA93jw5MTdn9EQgpZ0xrjqEi7UKqH3H_741anoB+pa0dtD=fpA@mail.gmail.com> <1E80578D-A589-4CA0-9015-B03B63042355@gmx.de> <CAA93jw7jvjbZkEgO8xc03uCayo+o-uENxxAkzQOaz_EZSLhocw@mail.gmail.com> <27FA673A-2C4C-4652-943F-33FAA1CF1E83@gmx.de> <1552669283.555112988@apps.rackspace.com> <alpine.DEB.2.20.1903151915320.3161@uplift.swm.pp.se> <7029DA80-8B83-4775-8261-A4ADD2CF34C7@akamai.com> <CAHxHggfPCqf9biCDmHMqA38=4y6gY6pFtRVMjMrrzYfLyRBf-g@mail.gmail.com> <1552846034.909628287@apps.rackspace.com> <5458c216-07b9-5b06-a381-326de49b53e0@bobbriscoe.net> <AC14ACBB-A7CC-40E0-882C-2519D05ADC05@akamai.com> <7e49b551-22e5-5d54-2a1c-69f53983d7e5@bobbriscoe.net> <04E62EA7-82EF-4F1B-A86D-5A23CA3B190A@gmail.com> <f331e710-ed2c-8628-4c82-f162d9cc8763@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/TkA6KM9dnSx2xwVu3zgSjds-iRI>
Subject: Re: [tsvwg] [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Mar 2019 07:46:46 -0000

> On 21 Mar, 2019, at 8:04 am, Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Congestion controls are tricky to get stable in all situations. So it is important to separate ideas and research from engineering of more mature approaches that are ready for more widespread experimentation on the public Internet. Our goal with L4S was to use proven algorithms, and put in place mechanism to allow those algorithms to evolve.

I hope that from my example, you can see how to adapt a more flexible and "mature" version of DCTCP to use SCE.  You should be able to use the same algorithms that you've worked so hard on; only the signalling method changes, and the trigger for falling back to Classic ECN behaviour is explicit (a plain old CE mark).

As for "proven algorithms", it was conclusively proven that DCTCP was *not* compatible with Classic ECN middleboxes, and had only been proven to work in tightly controlled environments.  I am told that TCP Prague has a failsafe, but I do not yet understand how that failsafe works, and what I have been told sounds fragile.  I am honestly perplexed that no explanation of this is forthcoming.

SCE works transparently with every deployed and proven congestion control algorithm out there, which simply ignores the information SCE provides.  Adaptations of some of those algorithms to incorporate SCE information seem to be straightforward to implement, especially since ns-3 now supports AccECN, so initial full-system experiments should be forthcoming quite soon.  We should even be able to rehabilitate DCTCP without resorting to failsafe workarounds - which *should* have you guys jumping for joy, in theory.

> As regards the desire to use SCE instead of the L4S approach of using a classifier, please answer all the reasons I gave for why that won't work, which I sent in response to your draft some days ago.

I'm afraid that must have got lost in the noise.  There *was* a lot of noise; it gave me a headache.

Regardless, I haven't seen any real claims that SCE won't work, except for some quibbles about RTT-fair convergence with single queues, which I subsequently found an elegant way to address.  We do have a bit of a publication bottleneck over here at the moment; limited manpower.

I have mainly seen claims that SCE isn't a one-for-one replacement for L4S using exactly the same mechanisms and infrastructure as L4S does.  Which is true, but unhelpful, because that would make SCE literally identical to L4S with no advantages of its own.  I'm willing to point out ways to implement L4S' goals using SCE; see below.

> The main one is incremental deployment: the source does not identify its packets as distinct from others, so the source needs the network to use some other identifier if it wants the network to put it in a queue with latency that is isolated from packets not using the scheme. The only way I can see to so this would be to use per-flow-queuing. I think that is an unstated assumption of SCE.

Strictly minimising latency for the individual flow, in the face of competing non-SCE traffic sharing a single queue, is not a goal of SCE per se; I consider it an orthogonal problem which is better addressed by existing solutions.  Coexisting with existing endpoints, existing traffic and existing middleboxes is paramount, and forms our main argument for incremental deployability.

Solutions already available include FQ and Diffserv.  I'll grant you that FQ is easier to implement at lowish speeds, where a cheap CPU can be loaded with flexible software to do the job.  You appear to be more focused on relatively high link capacities, as that is your main argument against FQ.  I'll just note in passing that good FQ can extract a lot of responsiveness from relatively low-capacity links.

Diffserv is widely deployed (in terms of hardware capabilities) and should be a natural fit for distinguishing classes of traffic from each other.  It is rarely used by applications because the networks tend to corrupt it in transit, and rarely make good use of the information into the bargain.  It strikes me that the cable industry may have more influence over that than I do.

> The SCE way round does not allow the ECN field to be used as a classifier…

The ECN field was never intended to be used as a classifier, except to distinguish Not-ECT flows from ECT flows (which a middlebox does need to know, to choose between mark and drop behaviours).  It was intended to be used to convey congestion information from the network to the receiver.  SCE adheres to that ideal.

There is a perfectly good and under-utilised 6-bit field for carrying classifier information, right there in the same byte as the ECN field.  You might want to ask the LE PHB guys for advice on making good use of it.

> You also don't get the benefit of being able to relax resequencing in the network, because the network has no classifier to look at.

My position is that the network is already free to relax resequencing semantics, regardless of the traffic carried.  IP does not guarantee anything about packet ordering, and protocols built on top of it have always had to cope with that, one way or another.

Wifi's head-of-line blocking while performing link-level retries can induce inter-flow coupled delays of many seconds in extreme cases, destroying reliability completely.  Recent work already reduces the effort the Linux wifi stack puts into link-level retries, given that most transports and protocols can survive some level of random loss.  This is done without relying on any classifier, because it benefits all traffic.

On high-capacity bonded links, the likelihood that two packets sent near-simultaneously on different component links, and consequently reordered, both belong to the same flow and will trigger a spurious retransmission, seems to be low enough to not care about, even with existing 3-dupack sensitive TCPs.  Therefore, relaxing resequencing requirements on these links should already be safe.  Perhaps you have hard data showing otherwise?

> …the SCE codepoint would need to be combined with a DSCP, and I assume you don't want to do that.

The SCE codepoint does not need to be combined with a DSCP.  Whether or not a DSCP assignment fits a given application is completely orthogonal to SCE.

You could quite reasonably implement something that looks very like DualQ using a trivial DSCP classifier instead of an ECN-based classifier, and that would be absolutely fine, and it would work with SCE.  It's just not necessary to make SCE work in the first place.

Meanwhile, I still have not seen a detailed answer as to how, precisely, TCP Prague reliably distinguishes a Classic ECN middlebox from an L4S one, in order to activate its failsafe mechanism.  Without that, I'm afraid I must assume that TCP Prague is not incrementally deployable.

Indeed, I was under the impression that DualQ and the use of ECT(1) as a classifier stemmed from this incompatibility, rather than being considered features in their own right.

 - Jonathan Morton