Re: [tsvwg] Switch testing at 25G with ECN --SCE Draft

Jonathan Morton <chromatix99@gmail.com> Sun, 11 August 2019 13:31 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <1cb8e129-c22c-2b26-1149-e68305fee991@mti-systems.com>
Date: Sun, 11 Aug 2019 10:27:36 +0300
Cc: tsvwg@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <46D1EB74-2663-436B-A5FB-FC59A8BB2B8D@gmail.com>
References: <201908100002.x7A02e5h099876@gndrsh.dnsmgr.net> <1cb8e129-c22c-2b26-1149-e68305fee991@mti-systems.com>
To: Wesley Eddy <wes@mti-systems.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/2SR4jouHv2RzoRw1uo12-38eDrw>
Subject: Re: [tsvwg] Switch testing at 25G with ECN --SCE Draft
Precedence: list

> The advice in 3168 is to do marking based on average queue length, but *not* just before it needs to start dropping.  Is it more correct instead to understand what you're saying as the experiment being based on using the instantaneous queue length rather than average?

I think that is right.  In software we like to use the actual sojourn time within the queue of the candidate packet, but in hardware it is often convenient to use the queue length instead.

More generally, I notice it's a surprisingly common misconception on this list that RFC-3168 states that marking should happen when, or just before, packets would otherwise be dropped.  A little critical thought will show that this is absurd; AQM is supposed to leverage congestion-control behaviour to keep the queue depth well away from the hard tail-drop zone on average.  What the spec *actually* says is that an AQM should substitute packet drops for marking, when the packet is not ECT and therefore cannot be marked; this is practically the inverse of the misconception noted.

In particular, I believe marking should definitely occur by the time the queue is half-full, or more precisely so that there is a full BDP (including queuing delay already incurred at the point the mark is applied) of space remaining in the queue.  Why?  To accommodate the RTT of control feedback delay before the signal to exit slow-start takes effect at the queue, during which the cwnd of a typical TCP will double.  Most AQMs do in fact satisfy that principle.

>> RFC3168 ECN marking is already in the switch, the switch was modified to behave in a different manner, using the RFC3168 CE bits.  This different manner was turned OFF to do the dctcp tests and turned ON to do the dctcp-sce tests.
> 
> Since the marking probability function and marking decision isn't part of 3168/ECN, the description here sounds like the switch is always conforming to 3168 behavior, but just using some custom decision logic for marking.
> 
> Just trying to be clear on what we're really talking about ...

I think it's best to consider this an experimental prototype, in which the goal is to investigate whether the obtained behaviour is desirable and useful before continuing with more involved development.  From what I hear, and without going into needless detail, it was easier in the short term to set it up with CE marking and transform that into SCE codepoints at the receiver.  This was always going to be a temporary expedient.

Also from what I hear, experiments had previously been conducted using CE marking and the standard version of DCTCP in Linux, but without anywhere near the level of success seen in the results just posted.  It seems likely that faults in the DCTCP implementation were responsible, but this in turn raises the question of how well maintained that piece of code is, and how such faults were allowed to persist in the mainline codebase.

Linux is normally associated with better development practices than that.  The most reasonable explanation I can come up with is that DCTCP is not actually in widespread use, so when faults appeared they were not noticed by any party interested in having them fixed.  Or, perhaps more disturbingly, DCTCP *is* in widespread use in closed datacentre environments, but when the faults arose in production their effects were not recognised.

So in practice, this controlled experiment has validated certain principles associated with the DCTCP response function (well, a simplified version of it), and also shown that other response functions are similarly effective, when applied to an ultra-low-latency network.  It has also shown that the proposed feedback mechanism for SCE signals on TCP connections is viable, so the relatively complex AccECN mechanism isn't obviously needed. We have shown other results which show similarly good behaviour on Internet-scale latencies with real SCE marking, using a broadly similar marking function.  Overall these are encouraging results for us, and work will continue accordingly.

 - Jonathan Morton

[tsvwg] Switch testing at 25G with ECN --SCE Draft Scaglione, Giuseppe
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Ruediger.Geib
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Jonathan Morton
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Greg White
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Rodney W. Grimes
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Scaglione, Giuseppe
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Greg White
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Scaglione, Giuseppe
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Jonathan Morton
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Greg White
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Scaglione, Giuseppe
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Greg White
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Jonathan Morton
Re: [tsvwg] Switch testing at 25G with ECN --SCE … John Leslie
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Rodney W. Grimes
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Rodney W. Grimes
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Dave Taht
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Jonathan Morton
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Wesley Eddy
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Black, David
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Black, David
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Scaglione, Giuseppe
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Ruediger.Geib
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Gorry Fairhurst
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Rodney W. Grimes
Re: [tsvwg] Switch testing at 25G with ECN --SCE … Gorry Fairhurst