Re: [aqm] ECT(1)

Andrew Mcgregor <> Fri, 02 October 2015 03:00 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 0566F1AC3FD for <>; Thu, 1 Oct 2015 20:00:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.388
X-Spam-Status: No, score=-1.388 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id holrAOmdYFkm for <>; Thu, 1 Oct 2015 20:00:06 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4010:c04::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 48DCC1AC3FC for <>; Thu, 1 Oct 2015 20:00:05 -0700 (PDT)
Received: by lbos8 with SMTP id s8so19634221lbo.0 for <>; Thu, 01 Oct 2015 20:00:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=svEqST7SC00iIZgZnguHzD65sBMtUIFYZ20reQY8TC8=; b=lZJEPR9haz26cJAWWA22vHGelNZot17UH8C7rfRfhHa8elGaM5F0jw2bJdv+SDWpzY C72+qulPbXcEZdKeYZVbTWAawMBtHlJqXXrjVrvhQDE1vBtv0chxfxbZi636FIOMJM0/ b64dRnME/PMAHe8KXcbn4Mz2oq93gzOpm29Dk3GU7RKH1XRHiEVgSpbWKbssWOdbhjxz 9gxPXHpl4W7rXrZoi4CCTZ+j5pBMYAsmiJdsk42CASbj1CJCZB9+3zGnAOckGP70mzqe TmOjKwY7Pod/m68H6dj70oALYBdXmGoHLAo2RauLZa6hdd5e2QzDp3nCoRhxbgmTB57J BTig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=svEqST7SC00iIZgZnguHzD65sBMtUIFYZ20reQY8TC8=; b=fLf22R1hk/nZaDR4daK2DvtZCCovrSTHYXbjTicIYfc869uXXSUZocxu93I72O0Hw/ bCt0+bOEoZJ13buWzlQq5SPE5KOaMEOeP0yGshH3v25xdAtpjmz8oi/bvjIWUqxT4pQt 5e5Gpjt/OIFNy5nqtAJCv2NstwDEoW1AY4v6EqQP5BfXyr2yFT82BFTAHge6Bsq0id+b rGCg9UMkWYJIfV31vkigd0NYVhMLxvDpOzlQFCmEbC6IW38PKJflaiwn9hxEtwrHp55k apMijiTFtlaKO+em7oOaPLLCZQLN1RC8hgKSXLayJtHCaT99L6gTeuDMrzR1HBTc1zJH Q4UQ==
X-Gm-Message-State: ALoCoQmUpT8V48/+7KxFXKcUI6ZA4Sblc7x/syrG6D0TGY784dwdYJK7wbYGHWd5rqDZyJm6HE8N
MIME-Version: 1.0
X-Received: by with SMTP id g91mr2868614lfi.12.1443754803184; Thu, 01 Oct 2015 20:00:03 -0700 (PDT)
Received: by with HTTP; Thu, 1 Oct 2015 20:00:03 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <20150728145036.GK96964@verdi> <> <> <> <>
Date: Fri, 2 Oct 2015 13:00:03 +1000
Message-ID: <>
From: Andrew Mcgregor <>
To: Bob Briscoe <>
Content-Type: multipart/alternative; boundary=001a11401e803022570521165c56
Archived-At: <>
Cc: "" <>, Mikael Abrahamsson <>
Subject: Re: [aqm] ECT(1)
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 02 Oct 2015 03:00:10 -0000

On 1 October 2015 at 20:46, Bob Briscoe <>; wrote:

> Andrew,
> DualQ might be a red herring for the special case you are thinking of
> within the Googleplex. But only a company with a brain the size of Google
> could countenance building bandwidth enforcer (BwE), in the belief that
> knowledge can be collected of the relative priorities, QoS requirements,
> bandwidth requirements, timing and location of all its applications. Then
> it might make sense to collect demand from all hosts up a control hierarchy
> and drive rate-limiting policies back down the hierarchy to every host
> which then sets Diffserv markings for all its different apps which
> determine routing and scheduling against a hierarchically organised
> master-plan.

Well, it does make sense, the paper isn't theoretical but deployed reality
(it takes a fairly large team to maintain the system, even though the
majority of the data collection is automated).

But that wasn't really my point... it's that alternate capacity sharing
mechanisms can be envisaged that don't involve solely deduction from router
signals.  BwE is a configured system for sharing capacity.  One can imagine
a distributed, signalled system that would do something similar, but with
policy communicated from the users; obviously there are enormous trust and
security issues there, but also quite a potential benefit.  That, however,
would be an entirely different project than either AQM or TCP-Prague is
heading for, and complementary to them.

> But for the rest of us surely it's BwE that is the red herring - I mean us
> outside Google's internal networks, on the Internet, which consists of
> independent actors with no master Gosplan coordinating us all.

No single plan, no, but anyone with a substantial CDN has to coordinate
with the providers in which the CDN edge is embedded, so there are
negotiated plans for coordination.  It isn't inconceivable to automate that
negotiation, to achieve the benefits of time-of-day optimisation etc.

> For most of the billions on the Internet who are not on large campus
> networks, geography and economics determines that the access network is the
> bottleneck - the part that is most spread-out physically and therefore
> least able to advantage from economies of scale (aggregation).

Clearly true.

For instance, to demonstrate Coupled DualQ we chose the downstream queue in
> a broadband network gateway (BNG) - each of these queues is shaped to the
> rate of each residential broadband line. It would also be applicable for
> the upstream queue in each residential gateway. And it would also be
> applicable for the equivalent queue entering the access network from each
> end in other access network technologies: the HFC-node and the cable modem
> in cable, or the RNC and the user-equipment in cellular.
> In these logical links, traffic is only for one user or one small site
> (e.g. a home) so it is very sparse. Diffserv is for aggregated traffic -
> Diffserv is useless if all traffic to and from a home at any one time is
> latency sensitive (e.g. Web, VoIP, and interactive video).
> The simple argument for why DualQ:
> a) Today's 'classic' TCPs are the problem in these sparse access links,
> the drastic saw-teeth vary the queue {Note 1}
> b) Even with a good replacement TCP with small sawteeth (like DCTCP), it
> won't be able to keep queuing delay low if there might be 'classic' TCP
> traffic in the same queue, which implies 2 queues minimum.
> c) I don't care how we do that, but the coupling was a neat way to do that
> without the network operator making /any/ capacity management judgements on
> behalf of the users.
> An Internet service that gives /all/ traffic ultra-low queueing delay and
> near-zero congestion loss would be extremely valuable. Not having to ask
> permission for QoS is easy for both customer and network operator. Capacity
> sharing is then orthogonal to that.

Indeed, that's an outstanding goal.  Please don't think I'm against it!

Just because we show that TCP Cubic and DCTCP flows will get the same rates
> within the user's own line, that doesn't stop the user's applications using
> more or less aggressive congestion controls, or none at all, or multiple
> flows - all the usual ways of varying capacity shares still apply when the
> network operator doesn't enforce anything.

I should say, I really like this work, and it's a great demonstration of
one way of doing something really good.  I'm simply saying, I think we
should look wider than this particular solution for a while before trying
to converge the evolution of ECN and DCTCP, especially since there are
going to be places where for one reason or another deploying DualQ as
specified is pretty difficult, but something sufficiently similar might be
doable if the transport has just enough flexibility.

> Actually, even if you do capacity sharing with BwE, if some hosts are
> using a 'classic' TCP, they will still cause varying queuing delay for
> other traffic. However, in your WAN scenario you might have sufficient
> aggregation for the resulting queuing delay not to be a problem.

Small enough buffers that we don't care about queue delay fluctuations,
actually.  But yes, large scale means that the classic-ish TCP doesn't
matter very much (although it still isn't a capacity sharing mechanism).

> Bob
> {Note 1} I expected the slow-start overshoot would also be a problem, but
> (so far) just reducing the congestion avoidance sawteeth seems to remove
> virutally all the queuing delay - probably because we haven't done tests
> with sparse but bursty workloads yet. Anyway, we've got ideas on how to
> reduce the slow-start overshoot once ECN marking is more frequent.
> On 07/08/15 01:51, Andrew Mcgregor wrote:
> I believe getting a DSCP deployed for this is a non-starter; that space is
> a complete mess, and if we tie this proposal to cleaning up that mess we'll
> get nowhere.  The evil bit doesn't fly either, for a lot of reasons.
> That leaves us with ECT(1).
> So, how bad is that?  Well, not very.  In places which are ECN-enabled but
> not dual-queue, DCTCP or another 1/p wrt ECN marking transport will still
> respond to loss; sure, we'll drive the network into a lossy regime, but
> those parts of the network are guaranteed to be there anyway due to the
> presence of TCP.  Provided the loss response is still sane, that will
> equilibrate out and we will end up with a reasonable outcome.  No, it will
> not be fair, but in the real world TCP never is since the equilibrium takes
> hours to establish while essentially no flows last that long; in real world
> practice, the few flows that do last that long need special protection
> because they are so fragile (BGP, I'm looking at you), or else nobody
> really cares how long they take to complete.  TCP does not share capacity
> in any reasonable manner, it is a circuit breaker for avoiding congestion
> collapse, and nobody is proposing running completely oblivious traffic on
> the internet.  DCTCP still has enough congestion avoidance.
> The dual-queue solution is not the only way to deploy either; it's a nice
> solution, if one actually wants to achieve capacity sharing by router
> feedback, but there are other ways to do the capacity sharing (for example
> <>
>  In an
> admission controlled environment, separating the queues is not necessary
> for sane coexistence.  Further I'm sure there are other single-queue
> solutions, involving AQM control systems, although I don't have a proposal
> right now (nor do I think one is necessary).
> I do think that assuming dual queue to be necessary for deployment is a
> red herring.
> On 7 August 2015 at 06:38, Mikael Abrahamsson <>; wrote:
>> On Tue, 4 Aug 2015, Bob Briscoe wrote:
>> *Combining ECT(0) and CE with a globally assigned DSCP solely during
>>> initial deployment of L4S seems the least worst choice.
>> Having the same bits in the header mean different things in combination
>> with DSCP seems like a really hard to get deployed Internet-wide.
>> ECN is just now gaining traction and seems like it might actually see
>> real deployment. Repurposing those bits just now would most likely just
>> cause confusion.
>> I started using ECN when it first appeared in the Linux kernel around
>> 2001 or whenever it was. I had to immediately turn it off because some
>> firewalls dropped those packets. Now almost 15 years later after this
>> sitting in the operating systems for at least 10 years, we're now getting
>> to a point where we're ready to start turning it on widely because things
>> do not break when it's turned on.
>> So whatever you come up with now that requires host stack changes, expect
>> 5-10 years at least until it can be deployed. This means you have to be
>> really sure this is what you actually want before you start to push for
>> deployment. Also, deployment impacts should be taken a lot into account
>> when deciding what to do.
>> So how sure are you that L4S as it currently stands is the way to go? If
>> you think you're going to invent something new in 2-3 years, then please
>> wait until then. Experimentation is all fine and dandy, but until we can
>> actually get DSCP codepoints working on Internet-wide scale, this approach
>> isn't feasable for that use-case (which for me is close to "the only"
>> use-case).
>> My proposal has been before that we should try to get 7 DSCP codepoints
>> deployed by using 000xxx, and nudge providers to incrementally just not
>> bleach them and treat them as BE in their core networks, so we can use them
>> on the edge to influence AQM there.
>> So, if we're going to invent new meaning of ECN bits in combination with
>> DSCP, then that needs to be coupled with work of getting some DSCP working
>> Internet-wide in a fashion that someone actually believes will work out, as
>> in actually getting significant Internet-wide deployment.
>> --
>> Mikael Abrahamsson    email:
>> _______________________________________________
>> aqm mailing list
> --
> Andrew McGregor | SRE |  <> | +61
> 4 1071 2221
> _______________________________________________
> aqm mailing listaqm@ietf.org
> --
> ________________________________________________________________
> Bob Briscoe                     

Andrew McGregor | SRE | | +61 4 1071 2221