Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - comments still welcome

Dave Taht <dave.taht@gmail.com> Fri, 20 March 2015 01:25 UTC

Return-Path: <dave.taht@gmail.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 575C51A8A55 for <aqm@ietfa.amsl.com>; Thu, 19 Mar 2015 18:25:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.588
X-Spam-Level: **
X-Spam-Status: No, score=2.588 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FRT_STOCK2=3.988, J_CHICKENPOX_35=0.6, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6B91BvBkAiej for <aqm@ietfa.amsl.com>; Thu, 19 Mar 2015 18:25:39 -0700 (PDT)
Received: from mail-ob0-x236.google.com (mail-ob0-x236.google.com [IPv6:2607:f8b0:4003:c01::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 083861A8A54 for <aqm@ietf.org>; Thu, 19 Mar 2015 18:25:39 -0700 (PDT)
Received: by obcxo2 with SMTP id xo2so68092156obc.0 for <aqm@ietf.org>; Thu, 19 Mar 2015 18:25:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=DYP9xD4qfJRTzVwDPPCiwjWT6m1iVsYvknmOKc5FhIw=; b=UdhoyzlZlF3x08/SzQAqjyZDGiL1bvSeukIODhGvNzgCXUYVOdYtiEDcAX1HVcoFyz KRjJS80B84U5HN8PeDok4+hr//ty1661IoSQ2Un4dEBCRZQNXpLpgFwiMzhtOzsfrryc lfzbf8CXqYMCPUW1H1MzNyLfrv+ke88of3vTkRsyklmPsSpRWSuR51iFWC09OReyAvpV eJSaTpi7TLhq8OHSpWIq+DmUTcoZSABrE02Mzz4oSTwmzacp9LGRsfppFSESIRvc800u SVS1OseL5V1GvDJEgR7t5tQLdWwxsUZFEqrXhXgj5xZBggSL9XCI9FgD/i4kEzz5CQGb 7EVg==
MIME-Version: 1.0
X-Received: by 10.182.144.136 with SMTP id sm8mr63592524obb.63.1426814738466; Thu, 19 Mar 2015 18:25:38 -0700 (PDT)
Received: by 10.202.51.66 with HTTP; Thu, 19 Mar 2015 18:25:38 -0700 (PDT)
In-Reply-To: <1ae61e484a61838497910f994bea75d8.squirrel@erg.abdn.ac.uk>
References: <a4dc09801ccd09db5350c2eb8a31f216.squirrel@erg.abdn.ac.uk> <CAA93jw74Vr3bhzJcm7WHD2DSFPiMCqQoP5Eimr2due4GJUNPdQ@mail.gmail.com> <20150319013909.GR39886@verdi> <1ae61e484a61838497910f994bea75d8.squirrel@erg.abdn.ac.uk>
Date: Thu, 19 Mar 2015 18:25:38 -0700
Message-ID: <CAA93jw7BzqVoM26apG1KpbGmgVUAj47ido09EbSEm9M3Snsssw@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/BZlnZ565vOto0A7u1Cc6UlpAAK4>
Cc: John Leslie <john@jlc.net>, "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - comments still welcome
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Mar 2015 01:25:42 -0000

On Thu, Mar 19, 2015 at 12:54 AM,  <gorry@erg.abdn.ac.uk> wrote:
> Thanks Dave for reading this ID and providing your comments. It's really

As I am the person that fought to get a pitfalls portion into this
document, and then spaced on adding any text, I apologize for the
delay in feedback. I am extremely busy with make-wifi-fast and have
otherwise dropped out of the ietf besides this group.

> good to explore what may be missing.

For starters, to what extent do others here have operational
experience with deploying ECN? I saw that gorry, in particular, was
doing some interesting work in testing satellite systems, to which I
provided a profusion of comments privately as to how I would use squid
with ecn and fq_codel to better handle web traffic. ?

In my case, tcp + fq_codel (Well, cake, these days) with ecn is
enabled in both my labs to the fullest extent possible, and used day
in and day out, when not testing something else. It is also on the 10
machines I have spread around the world on linode, and isc... and as
best as I recall a few in my google compute cluster. It is used to
protect babel routing packets from being dropped by the queue
management system, I have a multiplicity of benchmarks comparing life
with and without ecn in netperf-wrapper, and so on.

tcp with ecn enabled and fq_codel is also now used throughout
archive.org's systems, but operational difficulties (e.g. configuring
RED right) have precluded using it on the switches presently in use.
It was my hope, this year, to establish a full blown 10+GigE router on
at least some of their traffic this past year, but ENOFUNDING.

I would love to know, in particular, if anyone has been trying the
latest and now readily available in linux DCTCP in a real deployment
anywhere, and was willing to talk about it? I see, for example, that
per route setting of ECN is also now in the kernel, and I surmise
there must be a good reason for that.

I have several hacky test tools that use ECN in various ways, which
could use some more users and love.

>> Dave Taht <dave.taht@gmail.com> wrote:
>>>
>>> section 6 addition. (could use more verbiage)
>>>
>>> 6.3 "An AQM that is ECN aware MUST have overload protection.
>>
>>    I fear I cannot discern what you mean this to say. :^(

Overload protection has been discussed here before. Basically you need
an operational point at which you drop, rather than mark packets. The
consensus here is that operational point should be mark before you
would normally drop, but pie,codel,fq_codel, cake and red *do not do*
that presently, and there are severe constraints/hw/sw costs to having
two different setpoints.

The present version of codel in linux has no overload protection. It
will merrily keep marking packets until the packet limit is exceeded,
then drop, rather than drop at any threshold. Thus ecn is disabled by
default in that version. There have long been several patches being
tested in cerowrt (and available for all to try) that attempt various
methods to do this more sanely, which I have also reported here. The
two we have settled on will hopefully be comprehensively evaluated
this summer.

There was (last I looked) no way to do ecn in ns2, and support for ns3
has not quite landed yet as best I recall.

We viewed fq_codel with/ecn as safe to deploy, due to the flow
isolation, and that is still mostly true. For the hardware
implementation however, we dropped the search all queues portion of
the algorithm (see last paragraph of section 5.1 of the fq_codel
draft) and are still in search of saner ways to find the largest
queue(s) to search in parallel.

We added a mildly smarter version of overflow protection to the linux
version of pie, but it misbehaves when random numbers are excessively
random, dropping when it should probably still be marking.

None of this is directly applicable to the language of the document,
except by better explaining multiple things to naive users.

1) enabling ECN by itself accomplishes nothing, unless there is an AQM
on the bottleneck link(s) also

I note that stuart cheshire did not fully grasp this duality until I
worked closely with him on:
http://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN

He's a smart cookie. Others aren't. More context around ECN is needed.

2) That application developers blithely enabling ecn is potentially
dangerous to the health of the network.

It would seem intuitive to a gamer, perhaps, to mark all their packets
with ECN, so that by god, all their packets got through. (it's not
only intuitive, but other forms of sparse traffic can also benefit
from being ecn marked. I also did favor the ECN enablement of the main
frame in the webrtc nada proposal for example. I have marked dns and
icmpv6 traffic with ECN and watched that do fascinating things to the
network, also.

Everyone here is seemingly stuck on ecn + tcp, where I have long felt
that safer places to innovate were in quic and webrtc.

ooh! another 6.x section addition:

6.x an example where ecn marking can be bad is where the inner header
is copied to the outer, verbatim, and not copied back.

this error in code exists in the field today, it is presently in the
tinc 1.1 vpn system.

>>> It is trivial for a malbehaved application/worm/bot to mark all
>>> its packets with ECN and thus gain priority over other traffic
>>> not ecn marked.
>>
>>    This somewhat-paranoid claim rests on several assumptions that I
>> hope we will recommend against.

Not paranoid at all. Trivially feasible, and a real potential attack
vector. If you would like to be scared about how a flood of ecn marked
packets could do worse damage, you might want to look at the scope of
attacks that cloudflare has to deal with regularly.

>>
>> - the most obvious is an assumption that a tail-drop node will mark
>>   _instead_ of dropping ECN-capable packets. This is not actually
>>   possible, and I hope we will strongly deprecate it. Tail-drop should
>>   drop packets regardless of ECN bits.

I agree that a tail drop queue will not do ECN. However in an aqm
system without overload protection, you basically end up with a tail
drop queue, one that also ends up dropping all the non-ecn marked
packets.

>>
>> - there is also an assumption that an ECN-capable transport can mark
>>   its packets as ECN-capable and then never reduce its sending rate.
>>   I suppose it could; but not-ECN-capable transports can also never
>>   reduce the sending rate. :^( And the not-ECN-capable transports
>>   could accomplish the same reduction in "lost" packets by FEC.

This is false equivalence. If ecn can be gamed, it will be gamed.

A lot of my support of ecn is basically that packet loss is so trivial
above 100mbit that it really doesn't matter much if it used or not, so
it helps a little in the general case, but with well behaved apps
getting marked .01% of the time, on or off and the whole debate is a
tempest in a tea-cup.

It does seem very useful on longer RTTs.

>>
>>    I believe we are going to "suggest" a lower marking threshhold for

despite 3 years of trying have been unable to come up with an
algorithm for that that works well with different setpoints with mixed
traffic.

>> ECN-capable packets than the dropping threshhold for not-ECN-capable
>> packets at AQM-capable nodes. This should reduce the paranoia level,
>> I hope, since the ECN-capable flows will get congestion signals when
>> not-ECN-capable packets are _not_ being dropped.

Look forward to seeing a working version from someone.

>>    We should concentrate our efforts on providing useful signals:
>> that some transports might make poor use of these signals is beyond
>> our scope.

I thought we were providing useful *guidance* to developers of network
applications.

>>
> I understand that router overload needs to be considered in the design of
> an  AQM algorithm, but I inclined to think there is not much say to
> application designers, and that this need may have been said said in the
> AQM Recommendations document. Agreeing with John, I don't see this as the
> place to start putting detail on how routers implement AQM.

That's why it was a short sentence to begin with. However, some
discussion of the benefits and pitfalls of using ECN in new
applications I do feel is needed.

>>> 6.4 Enabling ECN at the application layer requires access to the IP
>>>     header fields, which are usually abstracted out completely at the
>>>     tcp layer, and hard to access from udp with multiple non-portable
>>>     methods to do so.
>>
>>    Yes, there are TCP stacks which are ECN-unfriendly; but there are
>> enough _today_ which are friendly to ECN.

Again, tcp thinking.

1) It is trivial to write an a udp app that emits ecn. Same setsockopt
as IP_TOs. Mosh and multiple other apps does it already.
2) It is less trivial to write a udp app that handles ecn correctly.
Mosh does that also, but so far as I know they got the BSD
implementation wrong.

the sendmsg and recvmsg apis are in dire need of an update since their
specification.

IF you wish to refine the scope of this document to be only TCP with
ECN, and exclude use case such as vpn encapsulation and udp
applications where it might be useful (like webrtc), ok... but....

>>
> I also agree with what you say - although, again I'm not sure we need to
> add this here, I think the design of transports is really the topic of
> RFC5405.bis,
>
>>>     ECN over UDP in new applications such as webrtc and Quic has
>>>     great potential for many other applications, however the same
>>>     care of design that went into ECN on TCP needs to go into
>>>     future UDP based protocols.
>>
>>    I wouldn't disagree; but those issues are essentially-solved
>> problems today.

You are kidding me, right?

>>> Some other section that may end up here?
>>>
>>> ECN marking other sorts of flows (example routing packets) that have a
>>> higher priority than other flows on link-local packets may be of benefit
>>> with wider availability of aqm technologies that are ecn aware...
>>
>  I'm not sure I understand what you are suggesting with respect to ECN.
>
>>    I suppose there might be _some_ use for ECN on routing packets; but
>> I doubt this is desirable today. ECN is not-at-all about getting a
>> higher priority -- it's about getting congestion signals without
>> packet loss.

On that we agree, and I should probably have used a different example
from routing, citing the original webrtc nada draft as my example.

>>
> I think the IETF would normally recommend diffserv priority marking for
> network control traffic.

I am all in favor of CS6. Not so much CS7. And as you know, few
diffserv priorities survive e2e transit, and ECN markings survive much
more often end to end than diffserv.

>
>> --
>> John Leslie <john@jlc.net>
>>
>
> Gorry
>
>



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb