Re: [aqm] floating a draft charter

I understand packet scheduling is important. However, most network devices today treat these two issues separately: classed-based scheduling/queueing and drop/mark algorithms. In class-based queueing, vendors differ in how many classes that they support: 2, 4 or 8. In drop/mark algorithm, vendors tend to implement both tail-drop and WRED.

I think we can continue to treat these two issues separately as it has been done in practice. We can prove the values of delay-based AQM and we can also prove the value of FQ. The goal of AQM is to give early congestion notification before the buffer becomes full or the delay gets big. Scheduling algorithms determine bandwidth allocation and weighted fairness among different classes/flows. We don't have to mandate a particular scheduling algorithm with a particular AQM scheme. Vendors shall be free to choose their best performance-cost tradeoff point. I think it would be challenging to force vendors to adopt a fixed set.

Best Regards,

Rong

From: Jim Gettys <jg@freedesktop.org<mailto:jg@freedesktop.org>>
Date: Tuesday, June 4, 2013 8:47 AM
To: Wesley Eddy <wes@mti-systems.com<mailto:wes@mti-systems.com>>
Cc: Mirja Kuehlewind <mirja.kuehlewind@ikr.uni-stuttgart.de<mailto:mirja.kuehlewind@ikr.uni-stuttgart.de>>, "aqm@ietf.org<mailto:aqm@ietf.org>" <aqm@ietf.org<mailto:aqm@ietf.org>>
Subject: Re: [aqm] floating a draft charter

There are a number of things I want to see:

1) packet scheduling is as important as drop/mark algorithms: it is the two together that is such a win with fq_codel.  So I, for one, would be *very* unhappy to see a charter that meant it was out of scope to discuss and document scheduling and flow queuing strategies.  And packet scheduling can interact with drop mark algorithms.

Optimal scheduling depends on where you are in the network (e.g. a host, versus home router, vs. ISP head end, etc.); documenting what makes the most sense where would likely help both implementors of software and hardware and operators of those networks.  I've given some thought to the area, but until we have running code, it's hand waving, but probably won't be hand waving in a year or two.

2) many mark/drop algorithms can co-exist (and generating informational RFC's to help customers write RFP's is worth while): but so far, there is no universal "one size fits all" mark/drop algorithm that is optimally effective everywhere.  Kathy and Van say a primary goal of CoDel was "first, do no harm", rather than "optimal under all circumstances".  The goal was an algorithm that never hurt you and would always be safe to have enabled. CoDel's constants were chosen so that CoDel will work at the edge of the network with planetary length paths in the face of highly variable bandwidth.

But CoDel inside a datacenter (unless you mess with its constants) is not effective in a timely fashion.

People have feared tuneable algorithms such as (W)RED can hurt them, so what we have had for AQM is sometimes not present at all, or not enabled (even in nodes that it would help greatly).  Note that one of Dave Taht's discoveries was that RED had been broken in the Linux source tree for 5 years and *no one had noticed*.

Maybe someday we'll reach nirvana with a drop/mark algorithm that "just works" everywhere well all the time and can never hurt and always works perfectly, but that's a lofty goal left for research; in the meanwhile, helping people understand what to use where in the network and the tradeoffs is important.

But if CoDel (as we believe) or some other algorithm really is "first, do no harm" without tuning,  it has great value to be on by default just to prevent the disaster an operator not enabling any AQM at all; the world will never be worse than that, and usually be radically better. Anything that *requires* tuning of knobs to work is a not a candidate....

3) Some algorithms may be hard to retrofit into existing platforms.  For example, if you can't timestamp a packet, it's really hard to use CoDel, which is an algorithm easy to implement from scratch, but could be really hard retrofit in an existing hardware design. I don't want to discourage people doing things a lot better than nothing as a stop-gap, but people need to understand that it may be a second best solution.

Again, advice as to what/where algorithms may be applicable will help designers and operators both.  We face a 5-10 year transition here, even if we succeed this time.

4) There are "feagtures" or lack of features that today's hardware does that can/does make life for software very hard. Changes could make life much easier.

For example, today's hardware makes it hard to do true head drop (which hardware assist could make trivial).  Some have noticed that at low bandwidth we may adjust CoDel parameters to work around the lack of head drop and unaccounted for time in the driver's ring buffers (which CoDel needs), since Linux's qdisc's can't see the time in the rings. So there are a set of recommendations to hardware and OS designers that would help the state of the network: this simple head drop feature, packet pacing, making GSO/TSO less evil, and so on, war stories on all the places we've been working on debloating Linux to guide other OS developers, etc.

This is really the other side of queue management: try to encourage traffic and systems that is less bursty and gentle to the network, so the AQM/scheduling algorithms don't have to work as hard.

Such things should go someplace in the IETF, and if not here, where?  Doing nothing anywhere is what I want to avoid.

Best regards,
    Jim

On Thu, May 30, 2013 at 12:13 PM, Wesley Eddy <wes@mti-systems.com<mailto:wes@mti-systems.com>> wrote:
On 5/30/2013 8:20 AM, Mirja Kuehlewind wrote:
>
> But my point is, we should not only to standardize more and more AQM
> mechanisms because that's a large research area which maybe even should be
> homed in the IRTF, but we need to find actually deployment strategies. Having
> a working group on AQM will already help to make people aware of the topic
> and maybe think about using AQM. But only standardizing more and more AQM
> queue, might end up in investing a lot of working that no-one will ever use.

I suggest appending something like this to the charter:

  Many AQM algorithms have been proposed in academic literature, but
  very few are widely implemented and deployed.  A goal of the working
  group is to produce recommendations that will actually be used, and
  algorithms that will actually be implemented, deployed in equipment,
  and enabled.  Towards these ends, the group actively encourages
  participation from operators and implementers, and will coordinate
  with the IETF OPS area and other relevant parts of the IETF and
  Internet community.  Wider research and evaluation of AQM mechanisms
  shall be coordinated with the IRTF/ICCRG, and significant
  participation in this WG from the academic and research community is
  highly desirable, when it is directly relevant to implementation and
  deployment.

We will definitely need to get engagement from some of the operators
that participate in the IETF, and should solicit participation more
widely outside as well (e.g. advertise to NANOG list, bufferbloat
list, etc.) in order to attract operators and implementers that aren't
already aware of this activity.

Would this help to address your concern?

--
Wes Eddy
MTI Systems
_______________________________________________
aqm mailing list
aqm@ietf.org<mailto:aqm@ietf.org>
https://www.ietf.org/mailman/listinfo/aqm