Re: [aqm] Minutes of the AQM WG session

On Tue, Jul 28, 2015 at 10:57 AM, De Schepper, Koen (Koen)
<koen.de_schepper@alcatel-lucent.com> wrote:
> Hi Dave,
>
> 2) Yes, we used only ECN to distinguish between classic and L4S TCP. As Richard already mentioned, we hope indeed that the final deployment could be that simple too. The benefits of using ECN combined with 1/p congestion controllers is so big that we hope nobody will want anything less ;-). Also, (see CurvyRED and PI^2 in my backup slides) AQMs that think twice to drop "at-the-output" are much simpler to implement and understand (due to linear relation between p and number of flows), and are immediately compatible with 1/p congestion controllers (by skipping the square-at-the-output). It might be useful to think about a curvy or squared CoDel/Cake (note the intentionally missing "fq_") as well... (to improve dctcp support as mentioned in your topic 3). As far as I understand the new option in the *FQ_*codel is disabling codel completely and applies also a sojourn time based step function for ECN (which makes sense for the fq_ version, where every flow has its own queue & AQM, and RR-scheduling is used for fairness).

I have not yet had time to push the ce_threshold idea into toke's
testbed. You can infer from the commit message where it is being tried
in production, and the commit message has the only public data so far
on how it works:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=80ba92fa1a92dea128283f69f55b02242e213650

and I believe it entered the linux mainline as of linux 4.1

As for thinking about much of this, curvyred, etc, more deeply I
prefer to have patches and data to play with first.

> 4) After I presented in the ICCRG, Daniel Borkmann suggested trying the dctcp_clamp_alpha_on_loss option, but it didn't work. The problem is probably that the alpha, once set, is moving quickly to a lower value before it is applied (at the end of the RTT). So it's a matter of keeping the state in another variable and reset that one, once the alpha=1 is set just before it is applied to reduce the window. Daniel Borkmann, Florian Westfall and Glenn Judd are aware of the dctcp_clamp_alpha_on_loss issue.

Good, then I shall cease to worry about it. Still bothersome.

> 5) Actually Curvy RED is used to control the classic queue and couple it (non-curved) to the DCTCP traffic (for the throughput fairness). The size of the L4S queue is normally 0, only initial windows and other flow variations above the bottleneck links are shortly queued in the L4S queue. If no classic traffic is available, our experiments used 5 packets (= 1.6ms using the DCTCP step function marking threshold).

The demo showed the dctcp queue having < 5ms delay, and the "classic"
queue having 42 ms delay.

When switched to fq_codel it showed 20ms delay for both queues. This
struck me as overlarge. A common problem is that perhaps you had
ethernet offloads still on for GRO/GSO/TSO on the aqm hop? Codel's
(and pie and qfq!) original deployment was designed and tested with
offloads off, mostly, and we strenously encouraged experimenters to
turn it off - so much so that we did not do enough testing with it on.
Recently we were blindsided by the armada 385 chip (used by a whole
bunch of new high speed home routers) doing very aggressive GRO - on
everything - I had never seen it crack 24k before, it routinely
cracked 64k and that led to some headaches.

Codel had the maxpacket concept in it which was designed to operate
correctly at a single MTU, not 64K!

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a5d28090405038ca1f40c13f38d6d4285456efee

I do not know how to fix it in red or pie at the moment. pie's
estimator is byte based, but there may be other issues. So it would be
my hope that your demo would run better against fq_codel with offloads
off and/or this patch applied. ?

Cake has "peeling", which turns the up to 64k superpackets back into
appropriately-sized packets for the afforded workload. So has TBF, in
a more limited fashion. HTB does not, presently. Probably we will add
peeling features to htb at some point.

>
> 6) Would be good, as hashed FQ gives a surprisingly (for us at least) high number of collisions.

set associative hashing in cake *is* turning in pretty great and
interesting results, so far. and computationally cheap. set
associativity, of course, is a very common thing in conventional cpu
cache design, it is surprising I have not seen it in queue design as
well before now (anyone got any prior cites?)

I don't know what a "surprisingly" high number of collisions is, the
math is straightforward. Admittedly most of the workloads were tons of
short flows against a very few number of big flows, and only recently
(with cake) did we start trying to tackle torrent-like flows again.

>
> 7) Nothing decided yet (see topic 2).

l.

> Regards,
> Koen.
>
>
>> -----Original Message-----
>> From: aqm [mailto:aqm-bounces@ietf.org] On Behalf Of Dave Taht
>> Sent: donderdag 23 juli 2015 13:24
>> To: Scheffenegger, Richard
>> Cc: aqm@ietf.org
>> Subject: Re: [aqm] Minutes of the AQM WG session
>>
>> 1) in hard delay targets, I am credited with what matt mathis said
>> (not that I disagree).
>>
>> 2) In the dual queue thing I had noted that distinguishing between the
>> two queues based solely on the presence of ecn capability and then
>> using dctcp was non-sensical as plenty of other things like cubic
>> would also end up with ecn enabled.
>>
>> 3) and for the record, fq_codel as it exists today works reasonably
>> well when hammered with dctcp + ecn, cubic + ecn, and and any other
>> stuff without markings with the defaults. It may not give the desired
>> reduction in individual queue length - and cubic vs dctcp will do
>> badly against each other on a hash collision - but it is, at least,
>> "mostly safe" were some sysadmin finger foo things and push dctcp (or
>> some other non traditional cc) out on the world internet against
>> fq_codel'd ecn-enabled systems.
>>
>> 4) If you can't tell, it *really bothers me* when someone reports a
>> bug in dctcp - at ietf - rather than through proper channels -
>> particularly when it leads to evil behavior on loss, and yet I have no
>> patches submitted nor means to reproduce it.  PLEASE GET A PATCH OUT
>> to the netdev maintainers, it will be immediately put into the next
>> release of linux AND backported to the linux stable releases.
>>
>> What I see is dctcp_clamp_alpha_on_loss not defaulting to on, which
>> based on the comments, should default to on. Is there somewhere else
>> this is busted?
>>
>> static void dctcp_state(struct sock *sk, u8 new_state)
>>
>> {
>>
>>         if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) {
>>
>>                 struct dctcp *ca = inet_csk_ca(sk);
>>
>>
>>                 /* If this extension is enabled, we clamp dctcp_alpha
>> to
>>
>>                  * max on packet loss; the motivation is that
>> dctcp_alpha
>>
>>                  * is an indicator to the extend of congestion and
>> packet
>>
>>                  * loss is an indicator of extreme congestion; setting
>>
>>                  * this in practice turned out to be beneficial, and
>>
>>                  * effectively assumes total congestion which reduces
>> the
>>
>>                  * window by half.
>>
>>                  */
>>
>>
>> 5) Missing from the preso was the actual dual queue length, just a
>> comparison (cool demo tho!) - you get what queue length using curvy
>> red?
>>
>> 6) cake, of course, gets to 100s of flows without hash collisions.
>>
>> 7) As I missed the tcp-prague discussion, is the proposal to reserve
>> ect(1) - the former nonce bit - for dctcp? or some other diffserv or
>> flowheader marking?
>>
>> On Thu, Jul 23, 2015 at 8:56 AM, Scheffenegger, Richard <rs@netapp.com>
>> wrote:
>> > Hi,
>> >
>> > Thanks to David Schinazi for serving as the notes taker;
>> >
>> > I've uploaded the minutes
>> https://www.ietf.org/proceedings/93/minutes/minutes-93-aqm
>> >
>> > If you made a statement during the session, please review that the
>> notes capture the essence of what you wanted to convey!
>> >
>> > Also, one name is missing (and I'm unable to replay the meetecho
>> recording currently) in the minutes, if you know who made that comment
>> please inform us chairs.
>> >
>> > Thanks,
>> >   Richard (aqm chair)
>> >
>> > _______________________________________________
>> > aqm mailing list
>> > aqm@ietf.org
>> > https://www.ietf.org/mailman/listinfo/aqm
>>
>>
>>
>> --
>> Dave Täht
>> worldwide bufferbloat report:
>> http://www.dslreports.com/speedtest/results/bufferbloat
>> And:
>> What will it take to vastly improve wifi for everyone?
>> https://plus.google.com/u/0/explore/makewififast
>>
>> _______________________________________________
>> aqm mailing list
>> aqm@ietf.org
>> https://www.ietf.org/mailman/listinfo/aqm

-- 
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast