Re: [aqm] Question re draft-baker-aqm-recommendations recomendation #2

Bob Briscoe <bob.briscoe@bt.com> Fri, 12 July 2013 00:16 UTC

Return-Path: <bob.briscoe@bt.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 91FD021F9B7E for <aqm@ietfa.amsl.com>; Thu, 11 Jul 2013 17:16:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.372
X-Spam-Level:
X-Spam-Status: No, score=-3.372 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7bmxwYi2xczf for <aqm@ietfa.amsl.com>; Thu, 11 Jul 2013 17:15:58 -0700 (PDT)
Received: from hubrelay-by-04.bt.com (hubrelay-by-04.bt.com [62.7.242.140]) by ietfa.amsl.com (Postfix) with ESMTP id 6B2D421F99FC for <aqm@ietf.org>; Thu, 11 Jul 2013 17:15:55 -0700 (PDT)
Received: from EVMHR72-UKRD.domain1.systemhost.net (10.36.3.110) by EVMHR04-UKBR.bt.com (10.216.161.36) with Microsoft SMTP Server (TLS) id 8.3.297.1; Fri, 12 Jul 2013 01:15:48 +0100
Received: from EPHR02-UKIP.domain1.systemhost.net (147.149.100.81) by EVMHR72-UKRD.domain1.systemhost.net (10.36.3.110) with Microsoft SMTP Server (TLS) id 8.3.279.1; Fri, 12 Jul 2013 01:15:52 +0100
Received: from bagheera.jungle.bt.co.uk (132.146.168.158) by EPHR02-UKIP.domain1.systemhost.net (147.149.100.81) with Microsoft SMTP Server id 14.2.342.3; Fri, 12 Jul 2013 01:15:52 +0100
Received: from BTP075694.jungle.bt.co.uk ([10.109.157.20]) by bagheera.jungle.bt.co.uk (8.13.5/8.12.8) with ESMTP id r6C0FnT7026110; Fri, 12 Jul 2013 01:15:49 +0100
Message-ID: <201307120015.r6C0FnT7026110@bagheera.jungle.bt.co.uk>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Fri, 12 Jul 2013 01:15:47 +0100
To: Dave Taht <dave.taht@gmail.com>
From: Bob Briscoe <bob.briscoe@bt.com>
In-Reply-To: <CAA93jw708PAARKSQ_YZe68PX_WdHkdFHXDAAb=s_O7G44jhSpg@mail.g mail.com>
References: <8C48B86A895913448548E6D15DA7553B82A5E5@xmb-rcd-x09.cisco.com> <517FF171.4010306@mti-systems.com> <CAA93jw708PAARKSQ_YZe68PX_WdHkdFHXDAAb=s_O7G44jhSpg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
Cc: Wesley Eddy <wes@mti-systems.com>, "Fred Baker (fred)" <fred@cisco.com>, "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] Question re draft-baker-aqm-recommendations recomendation #2
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jul 2013 00:16:03 -0000

Dave,

At 17:40 30/04/2013, Dave Taht wrote:
>On Tue, Apr 30, 2013 at 9:29 AM, Wesley Eddy <wes@mti-systems.com> wrote:
> > On 4/29/2013 10:00 AM, Fred Baker (fred) wrote:
> >> Do we generally agree with the recommendation of 
> http://tools.ietf.org/html/draft-baker-aqm-recommendation-01#section-4.2? 
> This is the question of signaling to an endpoint using both dropping and ECN.
> > The second part (imposing an upper bound) might be worth expanding a
> > bit.  I don't know what a reasonable upper bound is, for failing into
> > a tail-drop mode,
>
>No reason to specify "tail" drop here.

I agree with you here that it doesn't have to be tail-drop. But I 
would add that we need to record the lessons learned in the past, not 
just say "drop" and hope people understand.


1/ The RED algo was defined such that at a configured length of the 
smoothed queue it rose to 100% drop probability. When that was 
initially implemented in alt_q, it allowed an unresponsive flow to 
turn RED into a DOS amplifier, because 100% drop = 0% throughput. Of 
course, once the algo drops 100% of incoming traffic, the smoothed 
queue will reduce and the drop prob will reduce, but this led to 
unstable behaviour.

In the alt_q RED code you can still see this 100% drop stuff that 
Kenjiro Cho commented out, and replaced with fall-back to tail drop.

When I last looked at the CoDel code (which wasn't recently), if 
unresponsive traffic was keeping the queue above threshold, it used 
the drop_next_ variable to continually decrease the time between 
drops until it would eventually drop 100%. I figured it was likely 
the same instability would ensue. You may have fixed this since, 
given CoDel performed badly with unresponsive traffic in the early tests.

My point is that we need to record this advice, because everyone 
seems to start out making the same mistake - the boundary conditions 
of the algos get overlooked, and they are important for Internet stability.

So we should give examples of good and bad approaches for the 
boundary conditions. An example of a good simple strategy would be 
something very much like tail drop, but where the 'tail' is a fixed 
tight delay cap, not the tail of the buffer.



BTW #1, once in dropping mode, the CoDel code that I looked at 
initially took the square-root of the count of drops to decrease the 
time between drops. Ironically this leads to a /linear/ increase in 
drop frequency over /time/. So I'm sure there should be a much 
simpler way to code it without the square root, by driving drops from 
elapsed time not drop count.

BTW #2, in that same CoDel code, the dropping pattern once in the 
dropping mode did not depend on how large the queuing delay was 
getting - it was all pre-ordained how it would behave. Again, this 
may have since been fixed, but to me it seemed very wrong not to drop 
more frequently if the queuing delay was getting longer and longer.

When I looked at the original CoDel code, I just thought the word 
auto-tuning was being used as empty marketing BS. It's use of service 
time certainly auto-tuned the queue threshold to the link rate. 
However, in both its modes it didn't change how quickly it responded 
dependent on how bad the problem had become.

As an output of this proposed AQM WG, I would like to see advice that 
says what auto-tuning means, not just the empty word "auto-tuning", 
ie. not just using time as the unit of queuing, but also an AQM 
should not take a hard-coded time to respond irrespective of how much 
the queuing delay has grown.


[Sorry for responding nearly 2 months late (I'm just catching up on 
stuff that arrived while I was tuned out from the AQM list, due to other work)]


Bob



________________________________________________________________
Bob Briscoe,                                                  BT