Re: [aqm] Minutes of the AQM WG session

"De Schepper, Koen (Koen)" <koen.de_schepper@alcatel-lucent.com> Tue, 04 August 2015 11:46 UTC

Return-Path: <koen.de_schepper@alcatel-lucent.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D570F1B37BD for <aqm@ietfa.amsl.com>; Tue, 4 Aug 2015 04:46:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.21
X-Spam-Level:
X-Spam-Status: No, score=-4.21 tagged_above=-999 required=5 tests=[BAYES_50=0.8, RCVD_IN_DNSWL_HI=-5, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L5AMagUWPiRM for <aqm@ietfa.amsl.com>; Tue, 4 Aug 2015 04:46:42 -0700 (PDT)
Received: from smtp-fr.alcatel-lucent.com (fr-hpgre-esg-01.alcatel-lucent.com [135.245.210.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5D7321B36DE for <aqm@ietf.org>; Tue, 4 Aug 2015 04:46:42 -0700 (PDT)
Received: from fr711usmtp1.zeu.alcatel-lucent.com (unknown [135.239.2.122]) by Websense Email Security Gateway with ESMTPS id 879E3FD0187DE; Tue, 4 Aug 2015 11:46:38 +0000 (GMT)
Received: from FR711WXCHHUB02.zeu.alcatel-lucent.com (fr711wxchhub02.zeu.alcatel-lucent.com [135.239.2.112]) by fr711usmtp1.zeu.alcatel-lucent.com (GMO) with ESMTP id t74BkcfM000694 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Tue, 4 Aug 2015 13:46:40 +0200
Received: from FR711WXCHMBA05.zeu.alcatel-lucent.com ([169.254.1.213]) by FR711WXCHHUB02.zeu.alcatel-lucent.com ([135.239.2.112]) with mapi id 14.03.0195.001; Tue, 4 Aug 2015 13:46:39 +0200
From: "De Schepper, Koen (Koen)" <koen.de_schepper@alcatel-lucent.com>
To: Dave Taht <dave.taht@gmail.com>
Thread-Topic: [aqm] Minutes of the AQM WG session
Thread-Index: AdDFFEQfchFQnswYRAmejLFAPAbHvAAFQjiAAMqhjnAALRr7gACdPQbw
Date: Tue, 04 Aug 2015 11:46:39 +0000
Message-ID: <BF6B00CC65FD2D45A326E74492B2C19FB75DB51C@FR711WXCHMBA05.zeu.alcatel-lucent.com>
References: <ba3b6f6b4d3d453d887c451fbca412fa@hioexcmbx05-prd.hq.netapp.com> <CAA93jw5WrT0Azcew_gic5H-tJtBo62m-f4fBB0=qQp01uf3VuQ@mail.gmail.com> <BF6B00CC65FD2D45A326E74492B2C19FB75D9FF9@FR711WXCHMBA05.zeu.alcatel-lucent.com> <CAA93jw5TfTci=Qo0SDWnrN=o7eXUNDZMmcaDpPdtFS2nqGa1FQ@mail.gmail.com>
In-Reply-To: <CAA93jw5TfTci=Qo0SDWnrN=o7eXUNDZMmcaDpPdtFS2nqGa1FQ@mail.gmail.com>
Accept-Language: nl-BE, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.239.27.40]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/DJlFypXNg2hX-KJZFcme9b5MMJM>
Cc: "Scheffenegger, Richard" <rs@netapp.com>, "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] Minutes of the AQM WG session
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Aug 2015 11:46:46 -0000

Dave,

> When switched to fq_codel it showed 20ms delay for both queues. This
> struck me as overlarge. 

This is due to the low base RTT (7ms) and the number of flows. The FQ_Codel qsize for DCTCP flows are around double the size of Cubic flows. With a base RTT of 7ms, Codel was not holding on its target RTT (for Cubic) of 5ms for more than 2 flows. For 20 flows the Cubic queues became on average around 10ms (tail up to 30ms) and the DCTCP queues around 20ms (tail up to 60ms).

> A common problem is that perhaps you had
> ethernet offloads still on for GRO/GSO/TSO on the aqm hop?

Good that you mention this, because we didn't realize that the NICs with GRO on the router are gluing incoming packets together again. All our experiments were with GRO/GSO/TSO on.

Doing some quick tests, GRO/GSO/TSO on or off had no noticeable effect on the FQ_Codel Q-sizes. So the results were only due to the higher level of congestion (lower RTTs). So in fact, at a certain extent, Codel is also trading loss with delay, as we proposed for Curvy RED, only at a much higher loss level. 

GRO at the router has a noticeable effect for the very low queues when DCTCP runs alone. It glues the initial window packets together and makes them send at line speed (1Gbps) at the NIC. I think it is due to this burstyness that we get a slightly bigger average queuing delay. Also, when DCTCP runs alone, we have a simple packet level threshold (mark when q size > 5 packets). In that case packets with a size of 15K (are the biggest that we saw) will get through without marking. We plan to work on alternatives for the fallback AQM (at least byte based, or delay based). But for the Linux implementation we might have to partially mark (split?) these glued packets.

Koen.





> -----Original Message-----
> From: Dave Taht [mailto:dave.taht@gmail.com]
> Sent: dinsdag 28 juli 2015 11:37
> To: De Schepper, Koen (Koen)
> Cc: Scheffenegger, Richard; aqm@ietf.org
> Subject: Re: [aqm] Minutes of the AQM WG session
> 
> On Tue, Jul 28, 2015 at 10:57 AM, De Schepper, Koen (Koen)
> <koen.de_schepper@alcatel-lucent.com> wrote:
> > Hi Dave,
> >
> > 2) Yes, we used only ECN to distinguish between classic and L4S TCP.
> As Richard already mentioned, we hope indeed that the final deployment
> could be that simple too. The benefits of using ECN combined with 1/p
> congestion controllers is so big that we hope nobody will want anything
> less ;-). Also, (see CurvyRED and PI^2 in my backup slides) AQMs that
> think twice to drop "at-the-output" are much simpler to implement and
> understand (due to linear relation between p and number of flows), and
> are immediately compatible with 1/p congestion controllers (by skipping
> the square-at-the-output). It might be useful to think about a curvy or
> squared CoDel/Cake (note the intentionally missing "fq_") as well...
> (to improve dctcp support as mentioned in your topic 3). As far as I
> understand the new option in the *FQ_*codel is disabling codel
> completely and applies also a sojourn time based step function for ECN
> (which makes sense for the fq_ version, where every flow has its own
> queue & AQM, and RR-scheduling is used for fairness).
> 
> I have not yet had time to push the ce_threshold idea into toke's
> testbed. You can infer from the commit message where it is being tried
> in production, and the commit message has the only public data so far
> on how it works:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?
> id=80ba92fa1a92dea128283f69f55b02242e213650
> 
> and I believe it entered the linux mainline as of linux 4.1
> 
> As for thinking about much of this, curvyred, etc, more deeply I
> prefer to have patches and data to play with first.
> 
> > 4) After I presented in the ICCRG, Daniel Borkmann suggested trying
> the dctcp_clamp_alpha_on_loss option, but it didn't work. The problem
> is probably that the alpha, once set, is moving quickly to a lower
> value before it is applied (at the end of the RTT). So it's a matter of
> keeping the state in another variable and reset that one, once the
> alpha=1 is set just before it is applied to reduce the window. Daniel
> Borkmann, Florian Westfall and Glenn Judd are aware of the
> dctcp_clamp_alpha_on_loss issue.
> 
> Good, then I shall cease to worry about it. Still bothersome.
> 
> > 5) Actually Curvy RED is used to control the classic queue and couple
> it (non-curved) to the DCTCP traffic (for the throughput fairness). The
> size of the L4S queue is normally 0, only initial windows and other
> flow variations above the bottleneck links are shortly queued in the
> L4S queue. If no classic traffic is available, our experiments used 5
> packets (= 1.6ms using the DCTCP step function marking threshold).
> 
> The demo showed the dctcp queue having < 5ms delay, and the "classic"
> queue having 42 ms delay.
> 
> When switched to fq_codel it showed 20ms delay for both queues. This
> struck me as overlarge. A common problem is that perhaps you had
> ethernet offloads still on for GRO/GSO/TSO on the aqm hop? Codel's
> (and pie and qfq!) original deployment was designed and tested with
> offloads off, mostly, and we strenously encouraged experimenters to
> turn it off - so much so that we did not do enough testing with it on.
> Recently we were blindsided by the armada 385 chip (used by a whole
> bunch of new high speed home routers) doing very aggressive GRO - on
> everything - I had never seen it crack 24k before, it routinely
> cracked 64k and that led to some headaches.
> 
> Codel had the maxpacket concept in it which was designed to operate
> correctly at a single MTU, not 64K!
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?
> id=a5d28090405038ca1f40c13f38d6d4285456efee
> 
> I do not know how to fix it in red or pie at the moment. pie's
> estimator is byte based, but there may be other issues. So it would be
> my hope that your demo would run better against fq_codel with offloads
> off and/or this patch applied. ?
> 
> Cake has "peeling", which turns the up to 64k superpackets back into
> appropriately-sized packets for the afforded workload. So has TBF, in
> a more limited fashion. HTB does not, presently. Probably we will add
> peeling features to htb at some point.
> 
> 
> >
> > 6) Would be good, as hashed FQ gives a surprisingly (for us at least)
> high number of collisions.
> 
> set associative hashing in cake *is* turning in pretty great and
> interesting results, so far. and computationally cheap. set
> associativity, of course, is a very common thing in conventional cpu
> cache design, it is surprising I have not seen it in queue design as
> well before now (anyone got any prior cites?)
> 
> I don't know what a "surprisingly" high number of collisions is, the
> math is straightforward. Admittedly most of the workloads were tons of
> short flows against a very few number of big flows, and only recently
> (with cake) did we start trying to tackle torrent-like flows again.
> 
> >
> > 7) Nothing decided yet (see topic 2).
> 
> l.
> 
> > Regards,
> > Koen.
> >
> >
> >> -----Original Message-----
> >> From: aqm [mailto:aqm-bounces@ietf.org] On Behalf Of Dave Taht
> >> Sent: donderdag 23 juli 2015 13:24
> >> To: Scheffenegger, Richard
> >> Cc: aqm@ietf.org
> >> Subject: Re: [aqm] Minutes of the AQM WG session
> >>
> >> 1) in hard delay targets, I am credited with what matt mathis said
> >> (not that I disagree).
> >>
> >> 2) In the dual queue thing I had noted that distinguishing between
> the
> >> two queues based solely on the presence of ecn capability and then
> >> using dctcp was non-sensical as plenty of other things like cubic
> >> would also end up with ecn enabled.
> >>
> >> 3) and for the record, fq_codel as it exists today works reasonably
> >> well when hammered with dctcp + ecn, cubic + ecn, and and any other
> >> stuff without markings with the defaults. It may not give the
> desired
> >> reduction in individual queue length - and cubic vs dctcp will do
> >> badly against each other on a hash collision - but it is, at least,
> >> "mostly safe" were some sysadmin finger foo things and push dctcp
> (or
> >> some other non traditional cc) out on the world internet against
> >> fq_codel'd ecn-enabled systems.
> >>
> >> 4) If you can't tell, it *really bothers me* when someone reports a
> >> bug in dctcp - at ietf - rather than through proper channels -
> >> particularly when it leads to evil behavior on loss, and yet I have
> no
> >> patches submitted nor means to reproduce it.  PLEASE GET A PATCH OUT
> >> to the netdev maintainers, it will be immediately put into the next
> >> release of linux AND backported to the linux stable releases.
> >>
> >> What I see is dctcp_clamp_alpha_on_loss not defaulting to on, which
> >> based on the comments, should default to on. Is there somewhere else
> >> this is busted?
> >>
> >> static void dctcp_state(struct sock *sk, u8 new_state)
> >>
> >> {
> >>
> >>         if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) {
> >>
> >>                 struct dctcp *ca = inet_csk_ca(sk);
> >>
> >>
> >>                 /* If this extension is enabled, we clamp
> dctcp_alpha
> >> to
> >>
> >>                  * max on packet loss; the motivation is that
> >> dctcp_alpha
> >>
> >>                  * is an indicator to the extend of congestion and
> >> packet
> >>
> >>                  * loss is an indicator of extreme congestion;
> setting
> >>
> >>                  * this in practice turned out to be beneficial, and
> >>
> >>                  * effectively assumes total congestion which
> reduces
> >> the
> >>
> >>                  * window by half.
> >>
> >>                  */
> >>
> >>
> >> 5) Missing from the preso was the actual dual queue length, just a
> >> comparison (cool demo tho!) - you get what queue length using curvy
> >> red?
> >>
> >> 6) cake, of course, gets to 100s of flows without hash collisions.
> >>
> >> 7) As I missed the tcp-prague discussion, is the proposal to reserve
> >> ect(1) - the former nonce bit - for dctcp? or some other diffserv or
> >> flowheader marking?
> >>
> >> On Thu, Jul 23, 2015 at 8:56 AM, Scheffenegger, Richard
> <rs@netapp.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > Thanks to David Schinazi for serving as the notes taker;
> >> >
> >> > I've uploaded the minutes
> >> https://www.ietf.org/proceedings/93/minutes/minutes-93-aqm
> >> >
> >> > If you made a statement during the session, please review that the
> >> notes capture the essence of what you wanted to convey!
> >> >
> >> > Also, one name is missing (and I'm unable to replay the meetecho
> >> recording currently) in the minutes, if you know who made that
> comment
> >> please inform us chairs.
> >> >
> >> > Thanks,
> >> >   Richard (aqm chair)
> >> >
> >> > _______________________________________________
> >> > aqm mailing list
> >> > aqm@ietf.org
> >> > https://www.ietf.org/mailman/listinfo/aqm
> >>
> >>
> >>
> >> --
> >> Dave Täht
> >> worldwide bufferbloat report:
> >> http://www.dslreports.com/speedtest/results/bufferbloat
> >> And:
> >> What will it take to vastly improve wifi for everyone?
> >> https://plus.google.com/u/0/explore/makewififast
> >>
> >> _______________________________________________
> >> aqm mailing list
> >> aqm@ietf.org
> >> https://www.ietf.org/mailman/listinfo/aqm
> 
> 
> 
> --
> Dave Täht
> worldwide bufferbloat report:
> http://www.dslreports.com/speedtest/results/bufferbloat
> And:
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast