Re: [aqm] CoDel's control law that determines drop frequency

Jeff Weeks <> Tue, 03 November 2015 16:22 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 566691A8732 for <>; Tue, 3 Nov 2015 08:22:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.253
X-Spam-Status: No, score=-1.253 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FB_ROLLER_IS_T=1.357, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id rVT1sBZLomJL for <>; Tue, 3 Nov 2015 08:22:12 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 1B8121A6F6F for <>; Tue, 3 Nov 2015 08:22:12 -0800 (PST)
Received: from ([fe80::ac6b:cc1e:f2ff:93aa]) by ([::1]) with mapi id 14.03.0195.001; Tue, 3 Nov 2015 11:22:13 -0500
From: Jeff Weeks <>
To: Andrew Mcgregor <>, Dave Dolson <>
Thread-Topic: [aqm] CoDel's control law that determines drop frequency
Thread-Index: AQHQoVgaU5atdTCgY0uHb6M/BmhvL55NDzAAgAgi1ACAAIiFgIAAAp0AgABFbgCAIwhnAIAE4uWAgA1qD4s=
Date: Tue, 3 Nov 2015 16:22:11 +0000
Message-ID: <>
References: <> <> <> <> <> <> <> <>, <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
x-originating-ip: []
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <>
Cc: Kathleen Nichols <>, Bob Briscoe <>, Dave Taht <>, Van Jacobson <>, AQM IETF list <>
Subject: Re: [aqm] CoDel's control law that determines drop frequency
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 03 Nov 2015 16:22:15 -0000

The drop rate is affected by sojourn time, yes, but a 2x sojourn time goes through the same incremental reduction of interval size, as does a sojourn time of x.

In investigating codel, I've setup various worst case scenarios, and I feel like the algorithm could be made better by having its response time more dependent upon how far away from the target latency it is.

For example, consider a disabled interface, with a large (or even conceptually infinite queue), that's attached to a fairly small shaper.

The interface is then enabled, and immediately starts seeing 100Mbps, and tries to shape it to 1Mbps.

The queue will obviously build up quickly, and codel will notice this, and enter the drop state.  But it will start at count = 1.

If the interface is receiving 64b udp packets, then it'll be receiving 100,000,000/512 == 195,312 packets per second, and only transmitting 1953 packets per second.

The default target is 5ms, which is about 10 packets.  So of those 195,312 packets/second, we should ideally be dropping 195,312 - (1953 + 10) == 192,349 packets/second.

But in order to drop that many packets, 'count' needs to ramp up to the point where the drop interval is consistently 5,198 ns.

I believe that means 'count' has to reach some nearly impossibly high value of (100ms/5198ns)^2 == 370,107,128

I say nearly impossible, because it will take minutes (hours?) to get that high (if my math is correct, it'll take over 17 seconds just to reach 7500).

In the meantime, the queue *isn't* being effectively managed, as packets with extremely high latencies will be transmitted for far too long.

Of course, as I stated earlier, simply increasing count more quickly, based on how far away we are from the target latency effectively invalidates the optimization which most (all?) codel implementations use (namely the newton step integer-only sqrt approximation) as, at some point, the approximation starts *diverging* from the appropriate value.

One alternative which I've been investigating is the possibility of skewing the precalculated 1/sqrt(count) value.

If this is kept as a 32-bit all decimal fixed point number, then performing the multiplication by intentionally miss-shifting will result in doubling sqrt(count):

eg, take the following accurate calculation of next interval:

    codel->next_interval_start_ticks = base_time + ((interval * codel->one_over_sqrt_count) >> 32)

And intentionally miss-shift by 1 bit:

    codel->next_interval_start_ticks = base_time + ((interval * codel->one_over_sqrt_count) >> 33)

Will effectively have the interval reduce twice as fast.

Alternatively, (and similarly to how CAKE halves the count while re-entering the drop interval), count can periodically be doubled, if the current value is seen to not be adequately affecting traffic, and the pre-calculated 1/sqrt(count) can then be divided by sqrt(2) (i.e., do not rely on the newton step approximation for this modification of count).


Sandvine Incorporated
From: aqm [] on behalf of Andrew Mcgregor []
Sent: Sunday, October 25, 2015 6:44 PM
To: Dave Dolson
Cc: Kathleen Nichols; Bob Briscoe; Dave Taht; Van Jacobson; AQM IETF list
Subject: Re: [aqm] CoDel's control law that determines drop frequency

CoDel does have the form of a controller; drop rate (not probability) is a function of sojourn time (not queue size) and history, encoded in the state variables.

Now, I don't take it as proven that the particular form of the controller is the best we could do, but making it a rate and based on sojourn time are clear wins. Yes, you can use size as a proxy for sojourn time if your link really has a constant bit rate, but not even ethernet is exactly CBR in practice (and in some hardware situations, knowing the size is much more expensive than measuring sojourn; the opposite can also apply).  Yes, you can use probability as a proxy for rate if you are doing a hardware implementation or otherwise doing the rate the way CoDel does is hard.

PIE is closely related; the controller is the biggest difference, and it uses both those proxies because it was aimed at a situation where there was hardware support for one set of choices.

On 23 October 2015 at 07:07, Dave Dolson <<>> wrote:
Catching up...

Bob said:
> I meant to say in my mail to Andrew that the rate at which the control
> law increases the dropping frequency ought to depend on how far the
> queue is from target, or even how quickly it is diverging from target.
> I've always said that this control law should not be just a hard-coded
> thing that has no dependency on how the queue is moving.

In other words, make it a PID controller, not just a PI controller?
(Or is it currently just "I" ?)
With consideration of the derivative term (how quickly the queue
size is moving away from the target), a faster deviation from target
queue size would cause a faster response.

I find it unfortunate that neither CODEL nor PIE algorithms are formulated as a
controller. It's difficult to reason about.
(They *are* controllers, just not explained that way.)
IMHO, an explanation should begin with a linear controller,
with drop probability as a function of queue size,
and then explanations of why non-linear modifications were necessary.

-----Original Message-----
From: aqm [<>] On Behalf Of Bob Briscoe
Sent: Wednesday, September 30, 2015 9:09 AM
To: Dave Taht; Andrew Mcgregor
Cc: Kathleen Nichols; AQM IETF list; Van Jacobson
Subject: Re: [aqm] CoDel's control law that determines drop frequency


It says the point at which cake enters drop state depends on how quickly
the queue is growing.
But I don't see anything on the page you referenced about a better
curve. I looked under AQM and under other possibly relevant headings.

I meant to say in my mail to Andrew that the rate at which the control
law increases the dropping frequency ought to depend on how far the
queue is from target, or even how quickly it is diverging from target.
I've always said that this control law should not be just a hard-coded
thing that has no dependency on how the queue is moving.

Also below I've restated a couple of points I made back in 2013 when I
first posted this.

On 12/11/13 22:30, Bob Briscoe wrote:
> 5/ My analysis also shows that the rate of increase in drop probablity
> is inversely proportional to the link rate. I don't believe this is
> desirable, as it will require tuning for the expected number of flows
> on the link. However it is probably OK for variable rate links.
> 6/ The rate of increase of drop probability is also inversely
> proportional to the square of 'interval'. I suggest a new constant is
> defined for the control law, rather than relating the control law to
> interval in this way. Otherwise, when anyone tunes interval, they will
> get undesirable changes to the control law (e.g. reducing interval 10x
> [to 10ms] would make the control law 100x more sluggish).

This last point is particularly important. If the control law is still
to be hard-coded, it should relate to another variable RTT_ave, not
interval. TO decide when to enter dropping state, interval should be set
to the likely worst-case RTT (in the low hundreds of ms). But, If
hard-coded, the dynamics of the control law will be correct most often
if it depends on the /most likely/ RTT, which these days with CDNs is
perhaps 10-25ms (depending on population density in a country).

There's another reason for having two separate variables, not just
'interval' for both. I've said the following before somewhere as well:
For ECN marking, 'interval' can be set much lower (or even zero) because
being on a hair-trigger with ECN marks doesn't matter, whereas it does
for drops.{Note 1} Then CoDel will signal ECN without the sluggish delay
it has to have before it starts dropping. However, once it has entered
dropping state, the ECN control law still needs to be based on the same
RTT_ave as for drop, because the control law is trying to mimic the
dynamics of the congestion control, which RFC3168 says should be the
same for ECN as for drop.{Note 2}


{Note 1}: The faster dynamics won't affect the long-term stable rate
between an ECN and a non-ECN flow, because by definition long-term
stable means the dynamics have settled.
{Note 2}: As suggested in draft-briscoe-aqm-dualq-coupled-00, if the
congestion control were radically different, the packets would probably
need another identifier to distinguish them from 'classic' ECN.

On 30/09/15 10:00, Dave Taht wrote:
> On Wed, Sep 30, 2015 at 1:50 AM, Bob Briscoe <<>> wrote:
>> Andrew,
>> I am also not so interested in an AQM dealing directly with unresponsive
>> traffic - I prefer to keep policing and AQM as separately deployable
>> functions, because AQM should be policy-neutral, whereas policing inherently
>> involves policy.
>> My concern was merely that CoDel's linear increase in drop probability can
>> take a long time to reach where it intends to get to. I would have thought
>> some form of exponential increase, or at least super-linear, would have been
>> more responsive to changing traffic conditions. I.e., rather than have to
>> answer the question "how quickly should drop probability increase?", make it
>> increase increasingly quickly.
>> Early on, Rong Pan showed that it takes CoDel ages to bring high load under
>> control. I think this linear increase is the reason.
> cake uses a better curve for codel, but we still need to do more
> testing in the lab.
>> Bob
>> On 30/09/15 01:42, Andrew Mcgregor wrote:
>> Hmm, that's really interesting.
>> Most interesting is that my understanding is that the control law was
>> intended to deal with aggregates of mostly TCP-like traffic, and that an
>> overload of unresponsive traffic wasn't much of a goal; this seems like
>> vaguely reasonable behaviour, I suppose, given that pathological situation.
>> But I don't have a way to derive the control law from first principles at
>> this time (I haven't been working on that for a long time now).
>> On 25 September 2015 at 06:27, Bob Briscoe <<>> wrote:
>>> Toke,
>>> Having originally whinged that no-one ever responded to my original 2013
>>> posting, now it's my turn to be embarrassed for having missed your
>>> interesting response for over 3 months.
>>> Cool that the analysis proves correct in practice - always nice.
>>> The question is still open whether this was the intention, and if so why
>>> this particular control law was intended.
>>> I would rather we started from a statement of what the control law ought
>>> to do, then derive it.
>>> Andrew McGregor said he would have a go at this question some time ago...
>>> Andrew?
>>> Bob
>>> On 07/06/15 20:27, Toke Høiland-Jørgensen wrote:
>>> Hi Bob
>>> Apologies for reviving this ancient thread; been meaning to get around
>>> to it sooner, but well... better late than never I suppose.
>>> (Web link to your original mail, in case Message-ID referencing breaks:
>>> ).
>>> Having recently had a need to understand CoDel's behaviour in more
>>> detail, your analysis popped out of wherever it's been hiding in the
>>> back of my mind and presented itself as maybe a good place to start. :)
>>> So anyhow, I'm going to skip the initial assertions in your email and
>>> focus on the analysis:
>>> Here's my working (pls check it - I may have made mistakes)
>>> _________________
>>> For brevity, I'll define some briefer variable names:
>>>          interval =      I [s]
>>>          next_drop =     D [s]
>>>          packet-rate =   R [pkt/s]
>>>          count =         n [pkt]
>>> >From the CoDel control law code:
>>>          D(n) = I / sqrt(n)
>>> And the instantaneous drop probability is:
>>>          p(n) = 1/( R * D(n) )
>>> Then the slope of the rise in drop probability with time is:
>>>          Delta p / Delta t       = [p(n+1) - p(n)] / D(n)
>>>                                  = [1/D(n+1) - 1/D(n)] / [ R * D(n) ]
>>>                                  = sqrt(n) * [sqrt(n+1) - sqrt(n)] /
>>> [R*I*I]
>>>                                  = [ sqrt(n(n+1)) - n ] / R*I^2
>>> I couldn't find anything wrong with the derivation. I'm not entirely
>>> sure that I think it makes sense to speak about an "instantaneous drop
>>> probability" for an algorithm that is not probabilistic in nature.
>>> However, interpreting p(n) as "the fraction of packets dropped over the
>>> interval from D(n) to D(n+1)" makes sense, I guess, and for this
>>> analysis that works.
>>> At count = 1, the numerator starts at sqrt(2)-1 = 0.414.
>>> Amd as n increases, it rapidly tends to 1/2.
>>> So CoDel's rate of increase of drop probability with time is nearly
>>> constant (it
>>> is always between 0.414 and 0.5) and it rapidly approaches 0.5 after a few
>>> drops, tending towards:
>>>          dp/dt = 1/(2*R*I^2)
>>> This constant increase clearly has very little to do with the square-root
>>> law of
>>> TCP Reno.
>>> In the above formula, drop probability increases inversely proportional to
>>> the
>>> packet rate. For instance, with I = 100ms and 1500B packets
>>> at 10Mb/s =>    R = 833 pkt/s =>        dp/dt = 6.0% /s
>>> at 100Mb/s =>   R = 8333 pkt/s =>       dp/dt = 0.6% /s
>>> I also tried to test this. I configured CoDel (on a Linux 4.0 box) on
>>> 1Mbps, 2Mbps and 10Mbps links with interval settings of 1 second and
>>> 500ms, and a total packet limit of 100k packets. This was to make it
>>> deliberately slower to react (so the change in drop probability is more
>>> visible), and to make sure no packets are dropped from queue overflow.
>>> I then sent an unresponsive UDP stream over the link at 110% of the link
>>> capacity (as passed to Iperf, so approximately), and collected the
>>> output of `tc -s qdisc` every 0.2 seconds.
>>> The attached plot is of 'pkts dropped / (pkts sent + pkts dropped)' in a
>>> 2-second sliding window over the duration of the test (the plot is also
>>> available here:
>>> ).
>>> I've included linear trend lines from the initial time to the point of
>>> maximum drop probability, and as is apparent from the plot, got quite a
>>> good fit (r>0.99 for all six data sets). The legend includes the slopes
>>> of the linear fits for each of the data sets, which are not too far from
>>> what your analysis predicts (and I'm guessing the difference can be
>>> attributed to the difference in exact packet rates, but I haven't
>>> checked).
>>> The Flent data files with the qdisc stats over time (readable by the
>>> newest git version of Flent), as well as the Python script I used to
>>> create the graph are available here:
>>> So, in short: It seems that CoDel's "drop rate" does increase linearly
>>> in the presence of a persistent queue, and that the rate of increase
>>> depends on both the interval and the link rate.
>>> Now, I'll refrain from commenting on whether or not this is "bad", or
>>> indeed if it is contrary to design. It was surprising to me at least, so
>>> I thought I'd share my findings, in the hope that someone would either
>>> find them useful or tell me how they're wrong (or both!). :)
>>> -Toke
>>> _______________________________________________
>>> aqm mailing list
>>> --
>>> ________________________________________________________________
>>> Bob Briscoe                     
>>> _______________________________________________
>>> aqm mailing list
>> --
>> Andrew McGregor | SRE |<> | +61 4 1071 2221<tel:%2B61%204%201071%202221>
>> --
>> ________________________________________________________________
>> Bob Briscoe                     
>> _______________________________________________
>> aqm mailing list

Bob Briscoe                     

aqm mailing list<>

Andrew McGregor | SRE |<> | +61 4 1071 2221