Re: [aqm] CoDel's control law that determines drop frequency

Bob Briscoe <> Wed, 30 September 2015 13:08 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 16EA91A7013 for <>; Wed, 30 Sep 2015 06:08:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 8r5jkkDvNCK5 for <>; Wed, 30 Sep 2015 06:08:34 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 56FBD1A7011 for <>; Wed, 30 Sep 2015 06:08:34 -0700 (PDT)
Received: from ([]:49255 helo=[]) by with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.85) (envelope-from <>) id 1ZhH7Y-0006el-MS; Wed, 30 Sep 2015 14:08:32 +0100
To: Dave Taht <>, Andrew Mcgregor <>
References: <> <> <> <> <> <>
From: Bob Briscoe <>
Message-ID: <>
Date: Wed, 30 Sep 2015 14:08:32 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
Archived-At: <>
Cc: Kathleen Nichols <>, AQM IETF list <>, Van Jacobson <>
Subject: Re: [aqm] CoDel's control law that determines drop frequency
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 30 Sep 2015 13:08:38 -0000


It says the point at which cake enters drop state depends on how quickly 
the queue is growing.
But I don't see anything on the page you referenced about a better 
curve. I looked under AQM and under other possibly relevant headings.

I meant to say in my mail to Andrew that the rate at which the control 
law increases the dropping frequency ought to depend on how far the 
queue is from target, or even how quickly it is diverging from target. 
I've always said that this control law should not be just a hard-coded 
thing that has no dependency on how the queue is moving.

Also below I've restated a couple of points I made back in 2013 when I 
first posted this.

On 12/11/13 22:30, Bob Briscoe wrote:
> 5/ My analysis also shows that the rate of increase in drop probablity 
> is inversely proportional to the link rate. I don't believe this is 
> desirable, as it will require tuning for the expected number of flows 
> on the link. However it is probably OK for variable rate links.
> 6/ The rate of increase of drop probability is also inversely 
> proportional to the square of 'interval'. I suggest a new constant is 
> defined for the control law, rather than relating the control law to 
> interval in this way. Otherwise, when anyone tunes interval, they will 
> get undesirable changes to the control law (e.g. reducing interval 10x 
> [to 10ms] would make the control law 100x more sluggish).

This last point is particularly important. If the control law is still 
to be hard-coded, it should relate to another variable RTT_ave, not 
interval. TO decide when to enter dropping state, interval should be set 
to the likely worst-case RTT (in the low hundreds of ms). But, If 
hard-coded, the dynamics of the control law will be correct most often 
if it depends on the /most likely/ RTT, which these days with CDNs is 
perhaps 10-25ms (depending on population density in a country).

There's another reason for having two separate variables, not just 
'interval' for both. I've said the following before somewhere as well: 
For ECN marking, 'interval' can be set much lower (or even zero) because 
being on a hair-trigger with ECN marks doesn't matter, whereas it does 
for drops.{Note 1} Then CoDel will signal ECN without the sluggish delay 
it has to have before it starts dropping. However, once it has entered 
dropping state, the ECN control law still needs to be based on the same 
RTT_ave as for drop, because the control law is trying to mimic the 
dynamics of the congestion control, which RFC3168 says should be the 
same for ECN as for drop.{Note 2}


{Note 1}: The faster dynamics won't affect the long-term stable rate 
between an ECN and a non-ECN flow, because by definition long-term 
stable means the dynamics have settled.
{Note 2}: As suggested in draft-briscoe-aqm-dualq-coupled-00, if the 
congestion control were radically different, the packets would probably 
need another identifier to distinguish them from 'classic' ECN.

On 30/09/15 10:00, Dave Taht wrote:
> On Wed, Sep 30, 2015 at 1:50 AM, Bob Briscoe <> wrote:
>> Andrew,
>> I am also not so interested in an AQM dealing directly with unresponsive
>> traffic - I prefer to keep policing and AQM as separately deployable
>> functions, because AQM should be policy-neutral, whereas policing inherently
>> involves policy.
>> My concern was merely that CoDel's linear increase in drop probability can
>> take a long time to reach where it intends to get to. I would have thought
>> some form of exponential increase, or at least super-linear, would have been
>> more responsive to changing traffic conditions. I.e., rather than have to
>> answer the question "how quickly should drop probability increase?", make it
>> increase increasingly quickly.
>> Early on, Rong Pan showed that it takes CoDel ages to bring high load under
>> control. I think this linear increase is the reason.
> cake uses a better curve for codel, but we still need to do more
> testing in the lab.
>> Bob
>> On 30/09/15 01:42, Andrew Mcgregor wrote:
>> Hmm, that's really interesting.
>> Most interesting is that my understanding is that the control law was
>> intended to deal with aggregates of mostly TCP-like traffic, and that an
>> overload of unresponsive traffic wasn't much of a goal; this seems like
>> vaguely reasonable behaviour, I suppose, given that pathological situation.
>> But I don't have a way to derive the control law from first principles at
>> this time (I haven't been working on that for a long time now).
>> On 25 September 2015 at 06:27, Bob Briscoe <> wrote:
>>> Toke,
>>> Having originally whinged that no-one ever responded to my original 2013
>>> posting, now it's my turn to be embarrassed for having missed your
>>> interesting response for over 3 months.
>>> Cool that the analysis proves correct in practice - always nice.
>>> The question is still open whether this was the intention, and if so why
>>> this particular control law was intended.
>>> I would rather we started from a statement of what the control law ought
>>> to do, then derive it.
>>> Andrew McGregor said he would have a go at this question some time ago...
>>> Andrew?
>>> Bob
>>> On 07/06/15 20:27, Toke Høiland-Jørgensen wrote:
>>> Hi Bob
>>> Apologies for reviving this ancient thread; been meaning to get around
>>> to it sooner, but well... better late than never I suppose.
>>> (Web link to your original mail, in case Message-ID referencing breaks:
>>> ).
>>> Having recently had a need to understand CoDel's behaviour in more
>>> detail, your analysis popped out of wherever it's been hiding in the
>>> back of my mind and presented itself as maybe a good place to start. :)
>>> So anyhow, I'm going to skip the initial assertions in your email and
>>> focus on the analysis:
>>> Here's my working (pls check it - I may have made mistakes)
>>> _________________
>>> For brevity, I'll define some briefer variable names:
>>>          interval =      I [s]
>>>          next_drop =     D [s]
>>>          packet-rate =   R [pkt/s]
>>>          count =         n [pkt]
>>> >From the CoDel control law code:
>>>          D(n) = I / sqrt(n)
>>> And the instantaneous drop probability is:
>>>          p(n) = 1/( R * D(n) )
>>> Then the slope of the rise in drop probability with time is:
>>>          Delta p / Delta t       = [p(n+1) - p(n)] / D(n)
>>>                                  = [1/D(n+1) - 1/D(n)] / [ R * D(n) ]
>>>                                  = sqrt(n) * [sqrt(n+1) - sqrt(n)] /
>>> [R*I*I]
>>>                                  = [ sqrt(n(n+1)) - n ] / R*I^2
>>> I couldn't find anything wrong with the derivation. I'm not entirely
>>> sure that I think it makes sense to speak about an "instantaneous drop
>>> probability" for an algorithm that is not probabilistic in nature.
>>> However, interpreting p(n) as "the fraction of packets dropped over the
>>> interval from D(n) to D(n+1)" makes sense, I guess, and for this
>>> analysis that works.
>>> At count = 1, the numerator starts at sqrt(2)-1 = 0.414.
>>> Amd as n increases, it rapidly tends to 1/2.
>>> So CoDel's rate of increase of drop probability with time is nearly
>>> constant (it
>>> is always between 0.414 and 0.5) and it rapidly approaches 0.5 after a few
>>> drops, tending towards:
>>>          dp/dt = 1/(2*R*I^2)
>>> This constant increase clearly has very little to do with the square-root
>>> law of
>>> TCP Reno.
>>> In the above formula, drop probability increases inversely proportional to
>>> the
>>> packet rate. For instance, with I = 100ms and 1500B packets
>>> at 10Mb/s =>    R = 833 pkt/s =>        dp/dt = 6.0% /s
>>> at 100Mb/s =>   R = 8333 pkt/s =>       dp/dt = 0.6% /s
>>> I also tried to test this. I configured CoDel (on a Linux 4.0 box) on
>>> 1Mbps, 2Mbps and 10Mbps links with interval settings of 1 second and
>>> 500ms, and a total packet limit of 100k packets. This was to make it
>>> deliberately slower to react (so the change in drop probability is more
>>> visible), and to make sure no packets are dropped from queue overflow.
>>> I then sent an unresponsive UDP stream over the link at 110% of the link
>>> capacity (as passed to Iperf, so approximately), and collected the
>>> output of `tc -s qdisc` every 0.2 seconds.
>>> The attached plot is of 'pkts dropped / (pkts sent + pkts dropped)' in a
>>> 2-second sliding window over the duration of the test (the plot is also
>>> available here:
>>> ).
>>> I've included linear trend lines from the initial time to the point of
>>> maximum drop probability, and as is apparent from the plot, got quite a
>>> good fit (r>0.99 for all six data sets). The legend includes the slopes
>>> of the linear fits for each of the data sets, which are not too far from
>>> what your analysis predicts (and I'm guessing the difference can be
>>> attributed to the difference in exact packet rates, but I haven't
>>> checked).
>>> The Flent data files with the qdisc stats over time (readable by the
>>> newest git version of Flent), as well as the Python script I used to
>>> create the graph are available here:
>>> So, in short: It seems that CoDel's "drop rate" does increase linearly
>>> in the presence of a persistent queue, and that the rate of increase
>>> depends on both the interval and the link rate.
>>> Now, I'll refrain from commenting on whether or not this is "bad", or
>>> indeed if it is contrary to design. It was surprising to me at least, so
>>> I thought I'd share my findings, in the hope that someone would either
>>> find them useful or tell me how they're wrong (or both!). :)
>>> -Toke
>>> _______________________________________________
>>> aqm mailing list
>>> --
>>> ________________________________________________________________
>>> Bob Briscoe                     
>>> _______________________________________________
>>> aqm mailing list
>> --
>> Andrew McGregor | SRE | | +61 4 1071 2221
>> --
>> ________________________________________________________________
>> Bob Briscoe                     
>> _______________________________________________
>> aqm mailing list

Bob Briscoe