Re: [aqm] Draining queues

Jim Gettys <jg@freedesktop.org> Wed, 20 March 2013 16:25 UTC

Return-Path: <gettysjim@gmail.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E07C721F8C71 for <aqm@ietfa.amsl.com>; Wed, 20 Mar 2013 09:25:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.298
X-Spam-Level:
X-Spam-Status: No, score=-1.298 tagged_above=-999 required=5 tests=[AWL=1.451, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Va0eK8aaCdOh for <aqm@ietfa.amsl.com>; Wed, 20 Mar 2013 09:25:10 -0700 (PDT)
Received: from mail-oa0-f45.google.com (mail-oa0-f45.google.com [209.85.219.45]) by ietfa.amsl.com (Postfix) with ESMTP id 5726E21F862A for <aqm@ietf.org>; Wed, 20 Mar 2013 09:25:09 -0700 (PDT)
Received: by mail-oa0-f45.google.com with SMTP id o6so2015417oag.18 for <aqm@ietf.org>; Wed, 20 Mar 2013 09:25:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=AMLVK3HU6xEqvfbYy4S67h5lbUlD48hwo4ZeZCSXoYM=; b=0+kAU8d1lCIp6QBgGhsiZYx544lsiK1ySGLQBTa3plhs+W/Td2zlI73UuNGN6UDXMX qSHaC9ZkRr5NJ0r+8WSNxFzcaQYSZt49DYSwGplGFBi85j6i6Pnbu7RE0IcNmh0WNWDo IpZJBosIyfaqUTIWkIVjmyUpmQUSXZ2zgmjngyTTKFmXKCKjVvqcM8TTHRe/GeAtv7xu 46+i4/G7uBh1+XHVFW2F/HmSI0u4BKYZVRTwgUaMfVq9qksSICJRMjLlX7kpCJrOaFx6 GTno798WJcU7XV5bI0A4CfbjP18WB06sP5/qrQk9Jz5z0/CwZV2Ob2FmntMjwmKiCMwX 1IFA==
MIME-Version: 1.0
X-Received: by 10.60.8.40 with SMTP id o8mr4497862oea.112.1363796706200; Wed, 20 Mar 2013 09:25:06 -0700 (PDT)
Sender: gettysjim@gmail.com
Received: by 10.76.22.193 with HTTP; Wed, 20 Mar 2013 09:25:06 -0700 (PDT)
In-Reply-To: <1363787198.319422762@apps.rackspace.com>
References: <mailman.5225.1363736608.3432.aqm@ietf.org> <1363748540.64155384@apps.rackspace.com> <20130320035645.GI9569@verdi> <1363787198.319422762@apps.rackspace.com>
Date: Wed, 20 Mar 2013 12:25:06 -0400
X-Google-Sender-Auth: FymHnPXxTiEu9KeC3W49xuxdigk
Message-ID: <CAGhGL2CDY2EYow=J3E7xDr_W_piMNwGJYuRf8pf17P0fs1TcZw@mail.gmail.com>
From: Jim Gettys <jg@freedesktop.org>
To: David P Reed <dpreed@reed.com>
Content-Type: multipart/alternative; boundary="e89a8ff1c7ea38147104d85daa73"
Cc: John Leslie <john@jlc.net>, "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] Draining queues
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Mar 2013 16:25:13 -0000

On Wed, Mar 20, 2013 at 9:46 AM, <dpreed@reed.com> wrote:

> John - I understand your response, and I understand the idea of doing ECN
> marking at the 1 msec. level.  That certainly would increase the amount of
> congestion information delivered to the endpoints.
>
>
>
> However, if you think about how long it will take for the receiver to
> reduce its "window" in typical cases, along with the "reluctance" of router
> designers to ever drop packets, I think this does not solve the problem.
>
>
>
> Yes, ECN might help a bit in the case where there are *large* numbers of
> stationary-process TCP flows traversing a node, leading to nearly full
> capacity.
>
>
>
> But one should focus on the actually difficult problems: e.g. what went
> wrong in DOCSIS 2.0 and early DOCSIS 3 deployments, still in the field,
> that made queueing delays close to 1 second typical from a single home.
>
>
>
> Because there are relatively small numbers of flows, *way* too much
> buffering, leading to sustained queues that could not drain fast enough
> under ECN or anything else, user experience *sucked*.
>
>
>
> Which is why Comcast tried "killing BitTorrent" with Sandvine DPI-based
> RST-injection, and started claiming it was to "stop music piracy, which was
> destroying the Internet".
>
>
>
> When Comcast was forced to actually look at why the queues were sustained
> and unfixable, they realized that the problem was that DOCSIS was
> *designed* to hold packets rather than drop them.  ECN marking would
> probably not have helped get the queues down under 50 msec, without
> dropping packets, neither would RED.  The only way to get the queue down is
> to prevent it from being created in the first place....  And since the RTT
> from US east coast to US west coast is under 45 msec, it does little good
> to send early congestion signals.
>
>
>
> I'm sure you know this.
>
>
>
> An analogous problem occurs today because of the 3-4 *second* buffering in
> HSPA and LTE networks under load.  Having measured it in many American and
> European deployments of such equipment, I would suggest that the
> fundamental problem is *dropping packets fairly and quickly*.  You can't
> get there (to sub-50 msec. queueing) with ECN or RED.
>

First off: everyone has made the same mistake: operating system writers,
DSL, and GPON, just to name a few.  Don't single out cable.  We've all
screwed up for a long time; in this sense, we're all "Bell Heads".

With flow queuing such as fq_codel or sfqcodel, the ECN equation changes:
other flows don't get held up the way they do when running a simple AQM
such as RED in a single queue.

And the web's behavior means that any straight AQM such as CoDel, RED or
PIE *cannot* get latency anywhere where it needs to be: too many packets
end up in flight at once to get stuck in any queue; we *have* to do
something to solve this problem: flow queuing makes a huge difference.
 I've measured 100's of milliseconds of transient latency hitting some web
sites as these packet flights hit the broadband edge.  This becomes a
non-issue with fq_codel.

Even so, Dave Taht's experiments (IIRC) were that for low bandwidth, drop
was preferable for ECN as even a single full sized packet takes
considerable time.

Exactly how this is going to play out is really only something we can
figure out by experiment.

I also need to clarify what I mean by "Bell Heads": these people are *not*
restricted to the telecom industry, but everywhere.  In fact, Dave Taht
first encountered "every packet is precious" in working on 802.11, where
the driver writers seemed to think that ever dropping even one packet was
evil, and would retransmit effectively infinitely.  So those of you working
on cellular wireless are in good company, and don't take this personally.
                                                   - Jim



>
> -----Original Message-----
> From: "John Leslie" <john@jlc.net>
> Sent: Tuesday, March 19, 2013 11:56pm
> To: dpreed@reed.com
> Cc: aqm@ietf.org
> Subject: Draining queues
>
>  dpreed@reed.com <dpreed@reed.com> wrote:
> >
> > A problem with ECN is that it does not drain an already flooded queue,
>
> I presume you mean that ECN doesn't drop the packet it marks.
>
> > and depends (like RED) on the idea that one has a large queue in normal
> > operation.
>
> I don't follow this at all. ECN is pointless without an AQM which
> drops or marks well before tail-drop. Were we to run ECN without AQM,
> we'd have to drop each packet anyway after marking it.
>
> Several flavors of AQM mark/drop well enough in advance of tail-drop
> that the queue need never fill. Did you mean to say that ECN requires
> being able to queue the ECN-marked packet which would otherwise be
> dropped by the AQM?
>
> > It is not obvious that sustaining a deep queue is a desirable state,
>
> Agreed!
>
> > especially when there are highly variable capacities along any path,
> > as is the case where one link is more than 2x slower than the rest of
> > the links.
>
> Isn't that _always_ the case?
>
> > If the dynamics of user loads are fractal, this will cause draining
> > (and latency reduction) not to work.
>
> I'm guessing you mean that under "fractal" loads there's a tendency
> for ECN-marked packets to clump and introduce latency at the bottleneck.
> Recall that this is offset by the quicker response to an ECN mark at
> RTT speed instead of having to infer a packet loss.
>
> > If user loads are stationary gaussian and so forth, ECN can "tune"
> > to relatively stable situations, but the queueing delay will be quite
> > long end-to-end.
>
> I don't follow... Some AQMs may indeed allow queueing delay to grow
> large; but this can only happen if the instantaneous percentage of
> marked packets is quite a bit higher than TCP can tolerate.
>
> > Since there is lots of evidence that user loads are fractal, dropping
> > packets during sudden bursts will prevent sustained queueing, whereas
> > ECN will not.
>
> I believe there is substantial agreement that ECN-marked packets
> will need to be dropped anyway under particularly severe congestion.
> At the very least, tail-drop can become necessary...
>
> > So a proper approach (in my opinion) on the "real" Internet (rather
> > than a bunch of competing long-duration FTPs) requires keeping the
> > queues so short that there is no real opportunity to benefit from ECN.
>
> Here I quite disagree.
>
> An AQM could be designed to ECN-mark at a 1 millisecond queue depth;
> and that can give the full benefit of ECN. Again, recall that ECN will
> deliver the congestion signal in one RTT, instead of waiting to infer
> a loss. TCP stops delivering useful bandwidth at one or two percent
> packet loss: so it's hard to believe that ECN-marked packets will clog
> the path significantly.
>
> > Hence - we need to really see what happens with *real* traffic loads,
> > not simplified simulations.
>
> Agreed! (Now if we could just get enough ECN deployment...)
>
> > Real traffic loads are fractal, and fractal loads do not generally
> > obey the Gaussian style "law of large numbers" - multiple fractal
> > flows are *rougher* not smoother than a single one.
>
> I would say rather that real traffic loads _contain_ fractal flows.
> Aggregation simply wouldn't work if we never experienced gaussian
> smoothing.
>
> > So check your mathematical "intuitions" at the door. Stop writing
> > equations based on tractable analytic models, because actual traffic
> > does not follow those models.
>
> I strongly suspect our models _are_ faulty -- but we need research
> results for _actual_ flows, with and without ECN, is order to diagnose
> their faults.
>
> --
> John Leslie <john@jlc.net>
>
> _______________________________________________
> aqm mailing list
> aqm@ietf.org
> https://www.ietf.org/mailman/listinfo/aqm
>
>