Re: [aqm] CoDel on high-speed links

"Agarwal, Anil" <Anil.Agarwal@viasat.com> Wed, 10 June 2015 20:58 UTC

Return-Path: <prvs=7603dbb992=anil.agarwal@viasat.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9A6FB1ACD0C for <aqm@ietfa.amsl.com>; Wed, 10 Jun 2015 13:58:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.789
X-Spam-Level:
X-Spam-Status: No, score=0.789 tagged_above=-999 required=5 tests=[BAYES_50=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DJjc0VQJzWfG for <aqm@ietfa.amsl.com>; Wed, 10 Jun 2015 13:58:18 -0700 (PDT)
Received: from mta-us-west-01.viasat.com (mta-us-west-01.viasat.com [8.37.96.47]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 89D561A8ABD for <aqm@ietf.org>; Wed, 10 Jun 2015 13:58:18 -0700 (PDT)
Received: from pps.filterd (VCASPAM01.hq.corp.viasat.com [127.0.0.1]) by VCASPAM01.hq.corp.viasat.com (8.14.5/8.14.5) with SMTP id t5AKudx3011058; Wed, 10 Jun 2015 20:58:16 GMT
From: "Agarwal, Anil" <Anil.Agarwal@viasat.com>
To: Steven Blake <slblake@petri-meat.com>, Dave Taht <dave.taht@gmail.com>
Thread-Topic: [aqm] CoDel on high-speed links
Thread-Index: AQHQos8IGeFX3KW1Mke4BjGZntzWBp2k+GcAgAAPkgCAASr8kA==
Date: Wed, 10 Jun 2015 20:58:15 +0000
Message-ID: <7A2801D5E40DD64A85E38DF22117852C70AB71E5@wdc1exchmbxp05.hq.corp.viasat.com>
References: <1433866312.4324.29.camel@petri-meat.com> <CAA93jw6eypSDqcBwaaNfXNuMjx0bBE2JbsOu_oU5H2Bpdw1wqA@mail.gmail.com> <1433882422.4324.87.camel@petri-meat.com>
In-Reply-To: <1433882422.4324.87.camel@petri-meat.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151, 1.0.33, 0.0.0000 definitions=2015-06-10_15:2015-06-10,2015-06-10,1970-01-01 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 kscore.compositescore=0 circleOfTrustscore=0 compositescore=0.91638259206292 suspectscore=0 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0.91638259206292 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=0 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.91638259206292 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1506100328
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/yBCRhAFCWInBbaxjTz18LMTpK_Q>
Cc: "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] CoDel on high-speed links
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jun 2015 20:58:21 -0000

To follow up on Steve's posting - here is a simple (simplistic) analysis to estimate the time taken by CoDel 
to converge to the right value of the interval I. Perhaps, similar analysis has been done elsewhere.
 
Analysis of Single Queue CoDEL for large number of TCP connections.

Given:
  Link Capacity = C bps
  Number of active TCP connections = n
  Round Trip Time = rtt seconds
  Packet size = PS bytes
  CoDel initial interval = I0 seconds

Assume:
  All TCP connections are long-lived.

  At steady state, for every TCP conn, cwnd increases linearly from W/ 2 to W
  by one packet per RTT and then drops to W / 2 when a packet is dropped or
  ECN-marked

Derive:
  bps_per_conn = C / n
  Conn max window size = W = bps_per_conn * rtt / PS (packets)
  Ideal drop interval per conn = DI1 = W / 2 * rtt (seconds)
  Ideal drop rate per conn = DR1 = 1 / DI1 (packets/sec)
  Aggregate Ideal drop rate = DR = DR1 * n (packets/sec)
  Ideal CoDel interval = II = 1 / DR (seconds)

  CoDel Packet drop rate increase per sec = DRI =~ 0.5 / I0^2
    This can be derived from the derivative of the equation I(n) = I0 / sqrt(n),
    where I(n) is the interval size after n consecutive congested intervals.
    Under sustained congestion, the packet drop rate increases
    by DRI packets/second, every second.
    DRI does not depend on C, n or rtt !

  CoDel time when Interval converges to II = T = DR / DRI (seconds)

  or
  CoDel time when Interval converges to II = T = 4 * PS / C * (n * I0 / rtt) ^ 2 (seconds)

Comments
  The time T taken by CoDel to reach the correct I value is -
    - Inversely proportional to link Capacity C
    - Proportional to the square of the number of active TCP connections
    - Proportional to the square of the initial interval value
    - Inversely proportional to the square of rtt.

  If link capacity increases and the number of TCP connections increases
  correspondingly, then T will increase.

  During time 0 to T, the queue will likely remain full and packets will
  experience tail-drops. Of course one could argue that the number of connections n 
  is likely to increase slowly, perhaps slower than the convergence time of I (?).

  In steady state, large sudden changes in n will likely result in tail-drops.

  This analysis is probably not applicable when rtt > I0 or if W is too small.

Numerical Examples
  1. C = 1 Gbps, n = 1000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
     DRI = 50 packets/sec/sec
     W = 8.33 packets
     Drop Rate DR = 2400 packets/sec
     T = 48 seconds

  2. C = 10 Gbps, n = 10000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
     DRI = 50 packets/sec/sec
     W = 8.33 packets
     Drop Rate DR = 24000 packets/sec
     T = 480 seconds

3. C = 10 Gbps, n = 11000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
     DRI = 50 packets/sec/sec
     W = 7.58 packets
     Drop Rate DR = 29040 packets/sec
     T = 580 seconds
     (10% increase in number of TCP connections caused T to increase by 100 seconds).

This does seem to indicate that for core and edge routers that handle large number of connections,
if the number of connections changes by large amounts , then the convergence of the CoDel Interval
value will likely take a correspondingly large amount of time.

This is probably true for other AQM schemes as well - is there any published analysis on the subject? 

The example I have used may be a bit extreme, but I hope it illustrates the issue Steve is pointing to.

Regards,
Anil Agarwal
ViaSat Inc.

-----Original Message-----
From: aqm [mailto:aqm-bounces@ietf.org] On Behalf Of Steven Blake
Sent: Tuesday, June 09, 2015 4:40 PM
To: Dave Taht
Cc: aqm@ietf.org
Subject: Re: [aqm] CoDel on high-speed links

On Tue, 2015-06-09 at 12:44 -0700, Dave Taht wrote:

> The below makes several mis-characterisations of codel in the first 
> place, and then attempts to reason from there.

Hmmm...

> On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake <slblake@petri-meat.com> wrote:
> > I have a question about how CoDel (as defined in
> > draft-ietf-aqm-codel-01) behaves on high-speed (e.g., >= 1 Gbps) links.
> > If this has been discussed before, please just point me in the right 
> > direction.
> >
> > In the text below, I'm using "drop" to mean either packet 
> > discard/ECN mark.  I'm using (instantaneous) "drop frequency" to 
> > mean the inverse of the interval between consecutive "drops" during 
> > a congestion epoch, measured in drops/sec.
> >
> > The control law for CoDel computes the next time to drop a packet, 
> > and is given as:
> >
> >   t + interval/sqrt(count)
> >
> > where t is the current time, interval is a value roughly 
> > proportional to maximum RTT (recommended 100 msec), and count is 
> > cumulative number of drops during a congestion epoch.
> 
> No. Count is just a variable to control the curve of the drop rate. It 
> is not constantly incremented, either, it goes up and down based on 
> how successful it is at controlling the flow(s), only incrementing 
> while latency exceeds the target, decrementing slightly after it stays 
> below the target. The time spent below the target is not accounted 
> for, so you might have a high "bang-bang" drop rate retained, when 
> something goes above from below.
> 
> This subtlety is something people consistently miss and something I 
> tried to elucidate in the first stanford talk.

I specifically mentioned "during a congestion epoch", but let me be more
precise: count is continuously incremented during an extended period where latency exceeds the target (perhaps because CoDel isn't yet dropping hard enough). Correct?

The fact that the drop frequency doesn't ramp down quickly when congestion is momentarily relieved is good, but doesn't help if it takes forever for the algorithm to ramp up to an effective drop frequency (i.e., something greater than 1 drop/flow/minute).

> 
> >It is not hard to see that drop
> > frequency increases with sqrt(count).  At the first drop, the 
> >frequency  is 10 drop/sec; after 100 drops it is 100 drops/sec; after 
> >1000 drops it  is 316 drops/sec.
> >
> > On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel 
> > immediately starts dropping 1% of packets and ramps up to ~10% after 
> > 100 drops (1.86 secs).
> 
> No it will wait 100ms after stuff first exceeds the target, then 
> progressively shoot harder based on the progress of the 
> interval/sqrt(count).

Ok.  At the first drop it is dropping at a rate of 1 packet/100 msec ==
10 drops/sec and ramps up from there.  At the 100th drop it is dropping at a rate of 100 msec/sqrt(100) == 1 packet/10 msec == 100 drops/sec.
This just so happens to occur after 1.8 secs.

Aside: as described, CoDel's drop frequency during a congestion epoch increases approximately linearly with time (at a rate of about 50
drops/sec^2 when interval = 100 msec).

> secondly people have this tendency to measure full size packets, or a 
> 1k average packet. The reality is a dynamic range of 64 bytes to 64k 
> (gso/tso/gro offloads). So bytes is a far better proxy than packets in 
> order to think about this properly.
> 
> offloads of various sorts bulking up packet sizes has been a headache.
> I favor reducing mss on highly congested underbuffered links (and bob 
> favors sub-packet windows) to keep the "signal strength" up.
> 
> The original definition of packet (circa 1962) was 1000 bits, with up 
> to 8 fragments. I do wish the materials that were the foundation of 
> packet behavior were online somewhere...

I don't see how this has anything to do with the text of the draft or my questions.

> >  This seems like a reasonable range.  On a 10 GE link serving 2.5 
> > MPPs on average, CoDel would only drop 0.013% of packets after 1000 
> > drops (which would occur after 6.18 secs).
> 
> I am allergic to averages as a statistic in the network measurement case.
> 
> > This doesn't seem
> > to be very effective.  It's possible to reduce interval to ramp up 
> > drop frequency more quickly, but that is counter-intuitive because 
> > interval should be roughly proportional to maximum RTT, which is 
> > link-speed independent.
> 
> Except that tcp's drop their rates by (typically) half on a drop, and 
> a matter of debate as to when on CE.

Ex/ 10 GE link, ~10K flows (average).  During a congestion epoch, CoDel with interval = 100 msec starts dropping 257 packets/sec after 5 secs.
How many flows is that effectively managing?

> > Unless I am mistaken, it appears that the control law should be 
> > normalized in some way to average packet rate.  On a high-speed 
> > link, it might be common to drop multiple packets per-msec, so it 
> > also isn't clear to me whether the drop frequency needs to be 
> > recalculated on every drop, or whether it could be recalculated over 
> > a shorter interval (e.g.,
> > 5 msec).
> 
> Pie took the approach of sampling, setting a rate for shooting, over a 
> 16ms interval. That's pretty huge, but also low cost in some hardware.
> 
> Codel's timestamp per-packet control law is continuous (but you do 
> need to have a cheap packet timestamping ability).
> 
> Certainly in all cases more work is needed to address the problems 
> 100gps rates have in general, and it is not just all queue theory! A 
> small packet is .62 *ns* in that regime. A benefit of fq in this case 
> is that you can parallelize fib table lookups across multiple 
> processors/caches, and of fq_codel is that all codels operate 
> independently.

High-speed metro/core links need AQM, too.  I believe that
draft-ietf-aqm-codel-01 doesn't work for these links in the general case (e.g., large #flows, recommended value for interval, no FQ).  IMHO the draft should say something about this, or should adjust the algorithm accordingly.


Regards,

// Steve

_______________________________________________
aqm mailing list
aqm@ietf.org
https://www.ietf.org/mailman/listinfo/aqm