Re: [aqm] AQM schemes: Queue length vs. delay based

Dave Taht <dave.taht@gmail.com> Fri, 15 November 2013 20:00 UTC

Return-Path: <dave.taht@gmail.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4D8BC11E8258 for <aqm@ietfa.amsl.com>; Fri, 15 Nov 2013 12:00:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.373
X-Spam-Level:
X-Spam-Status: No, score=-2.373 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, NO_RELAYS=-0.001, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R5-xibBwFkCw for <aqm@ietfa.amsl.com>; Fri, 15 Nov 2013 12:00:51 -0800 (PST)
Received: from mail-we0-x22b.google.com (mail-we0-x22b.google.com [IPv6:2a00:1450:400c:c03::22b]) by ietfa.amsl.com (Postfix) with ESMTP id 6372111E8226 for <aqm@ietf.org>; Fri, 15 Nov 2013 11:59:38 -0800 (PST)
Received: by mail-we0-f171.google.com with SMTP id t61so3972756wes.2 for <aqm@ietf.org>; Fri, 15 Nov 2013 11:59:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wzdeD/dyiaf2HSyhkX6EmVM/dGjEL2Kh+cjvUdVEYd4=; b=qDP7bYefpdp8N2YTlSKhPPrmJ5RqEmTD3ZUKYI8wyIiOl5/xPMe6X0qhknm3CPykrk U6t0cXVYqK2Zeq1bXarvpb6VQoQgJQd6QXdxUWwDxqRJR2TvR27Dk84O4CVdJsrFlp6M qVoi1nkmUg3RqVv6lNkNeiKcH0DVaKN7vyQjhdGUtrZJCV72lx1XVwQg9GbD+aY0ecLj n8u28KlrIuss8WCCvggYMO1NRUf21eoqcPamTyuqQq9C4w8qLVGPRC1sHrm3O7kaYyTX drXrxUi726JQKwJkntU2NV+MysNS9uVoSBknBLpchS+r9R8ZRVe5jcoFl35PfzHdMutF JQKQ==
MIME-Version: 1.0
X-Received: by 10.194.104.42 with SMTP id gb10mr8130054wjb.16.1384545577485; Fri, 15 Nov 2013 11:59:37 -0800 (PST)
Received: by 10.217.51.5 with HTTP; Fri, 15 Nov 2013 11:59:37 -0800 (PST)
In-Reply-To: <CEABA46C.55B4B%ropan@cisco.com>
References: <CEAB91A5.55AD3%ropan@cisco.com> <CEABA46C.55B4B%ropan@cisco.com>
Date: Fri, 15 Nov 2013 11:59:37 -0800
Message-ID: <CAA93jw6hDEwqsx8jKTAJOjbY8dV_nu6ntvmBX1jpTwyEN87G8w@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: "Rong Pan (ropan)" <ropan@cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"
Cc: Preethi Natarajan <preethi.cis@gmail.com>, Naeem Khademi <naeem.khademi@gmail.com>, Michael Welzl <michawe@ifi.uio.no>, "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] AQM schemes: Queue length vs. delay based
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Nov 2013 20:00:52 -0000

I have been doing my best to ignore this thread.

Gripe #1:

What I saw of the ARED presentation seemed to show that if you
sacrificed quite a bit of throughput you'd get vastly better latency,
on a very simple set of benchmarks. I can go through each slide to see
where there were results that were in obvious.

The goal in this exercise is to aim for 100% utilization at 0 latency.

#2)

I also find box plots while often informative to be misleading when
thinking about network dynamics, where you care more about the
patterns of the outliers. At some point I suspect I'll go back to
using them on appropriate benchmarks but I generally prefer raw graphs
of the data and cdf plots of the summary data.

#3) The bufferbloat effort has been in full swing for 3+ years now.
Unlike previous efforts we didn't stop for publication, patents, etc,
but published results as fast as we could on the bloat list, and
cycled code into ns2 and the linux kernel as fast as possible. In
nearly cases there were negative results, which are hard to publish
anyway. I apologize for having not left a academic paper trail that
could be followed, but as there was no funding for the effort
whatsoever and very little benefit to me as to academic publishing, we
skipped it. Secondly, as fast as I found what seemed to be a problem,
or an interesting result worth writing about, Eric Dumazet and other
members of the netdev group of linux viewed it as a bug, and fixed it
- to the point where the linux stack looks nothing like it did when it
started. I would like it very much if others merely reported their
interesting results and treated them as bugs to be fixed rather than
papers to be written.

I hope to document some of this stuff in an upcoming fq_codel rfc. And
I'd certainly love to make CCR, ACM queue (again) and lwn, but we're
at the point where code is being seriously deployed and I'm mostly
engaged in fixing wifi right now.

Some history for the new:

I started with wonder shaper, and wondering why it didn't scale up,
and how to address it's flaws.

A big focus was how to fix variable bandwidth systems like cable,
wireless and wifi, where fixed rates were inadequate. Secondary was
finding a fixed rate system that would work against things like DSL.
We figured if we fixed the first, the second would be easy.

I'd tried RED back in the early 00s and wondershaper blew it away...
Moved from there to SFB, which was fixed and pushed into the linux
kernel first of all the ideas we had. It didn't work worth a damn.
Kathie and Van of course had RED at their disposal, and thinking about
various replacements like RED light and what became codel. In trying
to match up a multiplicity of RED related results, I noticed that RED
as implemented in linux behaved nothing like RED in the papers, and
eric dumazet found and fixed the two bugs in it that had existed for
several years (I think the broken period was from 2.6.36 through 3.4).

I went through I think nearly every paper published in the 90s and 00s
that was available online. There was a big survey of every RED variant
in there that as I recall had FRED in it and a few others that I
discarded as not useful.

Stephen hemminger did a couple qdiscs, like choke, which also didn't
work as well as advertised. I remember that kathie looked into AVQ
among others.

I spent a lot of time combining qdiscs, notably DRR or QFQ + a drop
system of some kind, like RED, choke, etc. This is partially
documented in the debloat script. Around august of the first year I'd
concluded the biggest win would be that combination but failed to
pursuade many.

Eric spent an enormous amount of time improving SFQ based on ideas
that Paul Mckenney had discarded as "too hard" back in the days of the
68020. The big version of that paper was pretty crucial to
understanding what we could achieve on modern hardware. Paul has been
a great help...

Along the way he got hot on ARED, for about two weeks, and implemented
that, from which I got generally lousy results in isolation. SFQRED
worked pretty good in a range of bandwidths between 4Mbit and 65Mbit
but proved hard to tune. Naheem rightly points out that SFQARED
variant was rather inadaquately tested (as about that time codel was
beginning to emerge) - but as I pointed out months ago, the code is
readily available in the linux kernel and GO FOR IT.

We had an idea ("hoq" or head of queue) in SFQRED that was very
SQF-like (as it turned out) but didn't work correctly, so we ended up
dropping it from linux 3.4

Andrew Mcgregor and I spent many a late night discussing among other
things the mixed-rtt problem and testing out all these algorithms
live, over video conferencing, and other systems. JG as well. There
were hundreds of people on the bloat list and dozens of cerowrt users
helping as well, I'm sorry for not listing names here and I'm sure
I've forgotten someone.

Kathie briefed us on codel one day in may, I got very excited, I wrote
the first  version for linux, eric dumazet then fixed it, sped it up
by 100x, an excited team assembled (see the early days of the codel
list) found a bug in the paper before it was even published, and fixed
that, one week later it was in the kernel and then one saturday
afternoon after we'd caught up on sleep, eric combined DRR and codel
to create fq_codel, and also got that slammed into the linux kernel in
under a week.

... I tried it on a wide range of benchmarks and said, at about 4AM
that day, that "wow. We just fixed the internet."

PIE took the fundamental insights of codel and created something
simpler from a straight aqm standpoint, and it is easier to implement
in *some* hardware... but doesn't interact with a packet scheduler
anywhere near as well as fq_codel.

I have been working ever since to help improve pie so that those forms
of hardware can have some form of AQM, and so that the linux
implementation is accurate and can be directly compared to all the
other packet schedulers and aqm systems also in there.

And there we stand today. If you want more papers published I have
hundreds of thousands of rrul and related plots, other harder to
express experiments, and will gladly co-author something or
participate in more experiments that seem sane in order to achieve
publication.

The ONLY paper I managed to get published was this one

http://www.enst.fr/ drossi/paper/rossi13tma-b.pdf

and all we were doing was pointing out that - if we won as big as we
expected with some form of aqm or packet scheduling - we'd at a single
stroke obsolete the field of low priority congestion control, with
some ideas towards making it work again...

Dario Rossi has since continued to analyze RED vs ledbat...

anyway...

I would not mind at all if others started with wondershaper and worked
up. I am VERY big on repeatable results, and repeatable experiments.
I'd love to see the presentation given at ietf against sfq, for
example. I'd like sources for the new DCTCP ECN results, too....