Re: [tcpPrague] Experimental dual-queue ECN

Michael Welzl <michawe@ifi.uio.no> Fri, 24 June 2016 20:24 UTC

Return-Path: <michawe@ifi.uio.no>
X-Original-To: tcpprague@ietfa.amsl.com
Delivered-To: tcpprague@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 05FA612D620 for <tcpprague@ietfa.amsl.com>; Fri, 24 Jun 2016 13:24:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.626
X-Spam-Level:
X-Spam-Status: No, score=-5.626 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-1.426] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GOQacCmI8eke for <tcpprague@ietfa.amsl.com>; Fri, 24 Jun 2016 13:24:30 -0700 (PDT)
Received: from mail-out4.uio.no (mail-out4.uio.no [IPv6:2001:700:100:10::15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4E38912D1C2 for <tcpprague@ietf.org>; Fri, 24 Jun 2016 13:24:30 -0700 (PDT)
Received: from mail-mx4.uio.no ([129.240.10.45]) by mail-out4.uio.no with esmtp (Exim 4.80.1) (envelope-from <michawe@ifi.uio.no>) id 1bGXeM-0006l2-CB; Fri, 24 Jun 2016 22:24:26 +0200
Received: from 3.134.189.109.customer.cdi.no ([109.189.134.3] helo=[192.168.0.104]) by mail-mx4.uio.no with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) user michawe (Exim 4.80) (envelope-from <michawe@ifi.uio.no>) id 1bGXeL-0001GW-Jz; Fri, 24 Jun 2016 22:24:26 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Michael Welzl <michawe@ifi.uio.no>
In-Reply-To: <576D70CB.8060108@erg.abdn.ac.uk>
Date: Fri, 24 Jun 2016 22:24:23 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <8D9E4035-23E9-4BAD-B689-BF82C54BC98F@ifi.uio.no>
References: <574EBEA2.8080705@bobbriscoe.net> <20160601152908.GB1754@verdi> <574F2A2D.9070407@bobbriscoe.net> <574F4F29.9040409@bobbriscoe.net> <20160601215312.GA25116@verdi> <0898e249-03dd-aff9-7179-03cc8642efea@erg.abdn.ac.uk> <5762567D.8010609@bobbriscoe.net> <3f8fa637-17b5-853b-b835-db486a2a69f6@erg.abdn.ac.uk> <CAKKJt-cjncm7zsfj3=7pqB-uSNTxMPfjPY=qpSNnDncVmy+enA@mail.gmail.com> <20160624170118.GA52708@verdi> <576D70CB.8060108@erg.abdn.ac.uk>
To: "<gorry@erg.abdn.ac.uk> Fairhurst" <gorry@erg.abdn.ac.uk>
X-Mailer: Apple Mail (2.3124)
X-UiO-SPF-Received:
X-UiO-Ratelimit-Test: rcpts/h 5 msgs/h 1 sum rcpts/h 5 sum msgs/h 1 total rcpts 43643 max rcpts/h 54 ratelimit 0
X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, TVD_RCVD_IP=0.001, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO)
X-UiO-Scanned: 6B1DD33841F8240EEFBF57C5B23BADDD1253991A
X-UiO-SPAM-Test: remote_host: 109.189.134.3 spam_score: -49 maxlevel 80 minaction 2 bait 0 mail/h: 1 total 1441 max/h 15 blacklist 0 greylist 0 ratelimit 0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpprague/seJMfep0SS3VqfjJXZQFoYDXIbs>
Cc: TCP Prague List <tcpPrague@ietf.org>, Bob Briscoe <ietf@bobbriscoe.net>, John Leslie <john@jlc.net>, Spencer Dawkins <spencerdawkins.ietf@gmail.com>
Subject: Re: [tcpPrague] Experimental dual-queue ECN
X-BeenThere: tcpprague@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "To coordinate implementation and standardisation of TCP Prague across platforms. TCP Prague will be an evolution of DCTCP designed to live alongside other TCP variants and derivatives." <tcpprague.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpprague/>
List-Post: <mailto:tcpprague@ietf.org>
List-Help: <mailto:tcpprague-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Jun 2016 20:24:34 -0000

Dear all,

Two comments in line below - about ABE, what else… sorry for derailing a bit, since this is the TCP Prague list...


> On 24. jun. 2016, at 19.41, Gorry Fairhurst <gorry@erg.abdn.ac.uk> wrote:
> 
> Since this is TCP-Prague, I'm speaking here only as an author of the ABE spec...
> 
> On 24/06/2016 18:01, John Leslie wrote:
>> Spencer Dawkins at IETF<spencerdawkins.ietf@gmail.com>  wrote:
>>> ...
>>> I would support publication of these algorithms as Experimental, if that's
>>> what is proposed.
>>> 
>>> Are you thinking the experiment is, that devices on the public Internet are
>>> upgraded to provide this new functionality even if they aren't
>>> participating in the experiment, and then this new functionality is then
>>> experimented with, and the result of the experiment might be "now we know
>>> why that's a bad idea, so we're not going to standardize it", so then the
>>> devices revert to previous functionality?
>>    I can't answer that part.
>> 
>>> The way I thought this worked was that when you experiment with a
>>> standards-track protocol, the Experimental RFC is supported only by
>>> devices that are participating in the experiment, and the Experimental
>>> RFC doesn't update the standards-track RFC(s) until the experiment
>>> succeeds, and then the Experimental RFC is republished as a standards-
>>> track RFC that updates the previous standards-track RFCs.
> That seems correct to me.
>>> If you decide
>>> the experiment is a bad idea, only devices that opt in to the experiment
>>> need to revert to previous functionality.
>>> Are you thinking that wouldn't work here?
>>    IMHO, such a path to  update RFC 3168 is impractical.
>> 
>>    At 62 pages, RFC 3168 simply covers too much territory. It covers both
>> the behavior of ECN-capable forwarders and the behavior of TCP senders
>> and receivers. (Fortunately, it doesn't try to cover the behavior of
>> non-TCP transports.)
>> 
>>    At the time it was written, it really wasn't practical to cover less
>> territory. But today, we understand that even TCP transport is entirely
>> too broad a topic.
>> 
>>    IMHO, we need to break off a limited part of it for Experimental
>> protocols:
>> 1. that reaction to ECN-CE should be the "same as" drop; and
> I disagree of course - the CE-marking proposal has already been discussed at the IETF - and I suggest no ECN is likely to be found using RED - and ECN-marked RED is now anyway now deprecated. Many modern AQM methods CE-mark on a shallow queue - and I think we need to update the PS to reflect this.

+1 to Gorry  (unsurprisingly).

The ECN “Experiment” has hardly happened, so on what basis de we say that a reaction that is the “same as” drop is safe?
If we just take operational experience, Cubic has first used a backoff (multiplication) factor of 0.8, then 0.7, deployed in Linux and widely used in the Internet.
This isn’t even limited to an ECN signal, which is likely to be produced by an AQM mechanism, and hence much more likely to indicate a shallow queue than loss.

So: I think we can now assert that the Internet won’t melt down if we'd back off using a different multiplication factor than 0.5.
Using such a larger factor *only* in response to ECN is even more conservative, so even safer.

Adding to this, what exactly is the logic that makes “react to marking the same way as you would react to drop” particularly safe?
I can only assume that this assumes the same behavior in the network for ECN-marking and dropping, and so, if we keep as much as possible similar, this would be a safe way to go.
Reality is different. Please see, for example Figure 13 in this pdf:
https://www.duo.uio.no/bitstream/handle/10852/37381/khademi-AQM_Kids_TR434.pdf
Compare the left and right diagram - this is standard AQM marking behavior (mark where you would otherwise drop) and standard TCP behavior too, yet the result is pretty different.
This is because a packet that is marked, not dropped, is admitted into the queue and thereby changes the AQM dynamics.

So: you have a different result from *just* turning on ECN, similar-to-loss backoff or not.

In conclusion, I struggle to see the big reason why an “exactly like loss” backoff is standard behavior and experimenting with other values should be prohibited.  A new particular value may constitute an experiment, but the “equal to loss” limitation simply isn’t a good thing and should be removed - this removal isn’t an experiment, it's a bugfix. Other values are already used (in Cubic, as I said, and there not only for ECN), but for ECN-only, which is a more conservative route than doing this for loss, the IETF doesn’t even allow the Experiment.


> I also don't see a specific experiment that is needed here  - what would be needed to test for safety in deployment? I *think* particular update can be taken directly to PS.

+1


>> 2. that ECN(1) MAY request marking for a lesser degree of congestion than
>>    ECN(0).
> That to me requires two changes - a standards action to address the current ECN PS requirement that the two codepoints (i.e., ECT(0) and ECT(1) are treated the same in the default DS class.
> 
> *AND* followed by an experimental proposal to use something different and obsolete the ECN-NONCE.
> 
> To me though, these are both TSVWG concerns - discussion should be taken on that list.
>>    This needs justification; and I believe we can give ample justification
>> without setting on _just_ dual-queue. Surely you've noticed there's
>> already a separate proposal for differing treatment of ECN-CE.
>> 
>>    I'm hoping we can agree that both experiments can co-exist...
> If you refer to TCP's recation proposed in ABE (now a draft in TCPM), as a co-author I would suggest that these two experiments (and their usage if they continue) can co-exist. We have spoken to Bob and Koen on this topic.

Yes - there are two queues for L4S, and, from a prior email from Bob in this list:
"L4S needs to redefine the semantics of both ECT(1) /and/ CE, with the definition of CE shared between L4S and Classic flows.”
 
These different semantics are necessary to differentiate between the “normal” and “L4S” ways of using ECN.


>>    And even if _both_ experiments are declared a failure; the principle
>> of allowing differing requests and differing reactions is still 100%
>> valid.
> Indeed, until a PS is published.
>>    Your concern about backing out of an experiment is 100% correct.
>> 
>>    My preference is some kind of registration scheme to state which
>> senders and which forwarders are implementing an experiment. Registration
>> is conceptually simple; but associating senders with forwarders remains
>> challenging. IMHO, we need not solve that: it's sufficient to leave that
>> question to those who propose a particular experiment.
>>    Unfortunately, we have to deal with human beings to design experiments.
>> It is up to those who review the proposals to keep pressure on those
>> volunteers to distinguish their eventual aims from how to conduct and
>> evaluate experiments.
>> 
>>    IMHO, if we ask each experimental-track writer to solve that in addition
>> to designing the intended end-state _and_ how to modify RFC 3168, we're
>> inviting failure.
>> 
>>    Obviously, YMMV; but I think I owe it to you to provide whatever advice
>> I can.
>> 
>> --
>> John Leslie<john@jlc.net>
> I can see we need to think about how to manage experiments and that there could indeed be lots of possibilities about how to manage the ECT(1) codepoint usage. I think your proposal should probably be analysed (at least discussed) if (hopefully when) TSVWG decides to allow experimental use of that codepoint.
> 
> Gorry


Cheers,
Michael