Re: [aqm] Updated draft-ietf-aqm-ecn-benefits

Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - comments still welcome

Michael Welzl <michawe@ifi.uio.no> Fri, 20 March 2015 10:37 UTC

From: Michael Welzl <michawe@ifi.uio.no>
To: Dave Taht <dave.taht@gmail.com>
Thread-Topic: [aqm] Updated draft-ietf-aqm-ecn-benefits - comments still welcome
Thread-Index: AQHQYqzNUDwDAkLEmkSipiMEfTdUPp0lHZAA
Date: Fri, 20 Mar 2015 10:37:18 +0000
Message-ID: <10FB834E-A408-4626-B610-37B994F8BEF8@ifi.uio.no>
References: <a4dc09801ccd09db5350c2eb8a31f216.squirrel@erg.abdn.ac.uk> <CAA93jw74Vr3bhzJcm7WHD2DSFPiMCqQoP5Eimr2due4GJUNPdQ@mail.gmail.com> <20150319013909.GR39886@verdi> <1ae61e484a61838497910f994bea75d8.squirrel@erg.abdn.ac.uk> <CAA93jw7BzqVoM26apG1KpbGmgVUAj47ido09EbSEm9M3Snsssw@mail.gmail.com>
In-Reply-To: <CAA93jw7BzqVoM26apG1KpbGmgVUAj47ido09EbSEm9M3Snsssw@mail.gmail.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-ID: <953EBA557C1925459CEE1369A4923A81@mail.uio.no>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/wVL8FReULsV5TK_x-6H1HnZPy4I>
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, John Leslie <john@jlc.net>, "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - comments still welcome
Precedence: list

Hi,

Thanks again for your comments!
While you do raise some interesting points and ask some interesting questions (e.g. if someone has used the linux DCTCP code...), I'll keep my responses in line below focused on actual text suggestions for the draft.


> On 20 Mar 2015, at 02:25, Dave Taht <dave.taht@gmail.com> wrote:
> 
> On Thu, Mar 19, 2015 at 12:54 AM,  <gorry@erg.abdn.ac.uk> wrote:
>> Thanks Dave for reading this ID and providing your comments. It's really
> 
> As I am the person that fought to get a pitfalls portion into this
> document, and then spaced on adding any text, I apologize for the
> delay in feedback. I am extremely busy with make-wifi-fast and have
> otherwise dropped out of the ietf besides this group.
> 
>> good to explore what may be missing.
> 
> For starters, to what extent do others here have operational
> experience with deploying ECN? I saw that gorry, in particular, was
> doing some interesting work in testing satellite systems, to which I
> provided a profusion of comments privately as to how I would use squid
> with ecn and fq_codel to better handle web traffic. ?
> 
> In my case, tcp + fq_codel (Well, cake, these days) with ecn is
> enabled in both my labs to the fullest extent possible, and used day
> in and day out, when not testing something else. It is also on the 10
> machines I have spread around the world on linode, and isc... and as
> best as I recall a few in my google compute cluster. It is used to
> protect babel routing packets from being dropped by the queue
> management system, I have a multiplicity of benchmarks comparing life
> with and without ecn in netperf-wrapper, and so on.
> 
> tcp with ecn enabled and fq_codel is also now used throughout
> archive.org's systems, but operational difficulties (e.g. configuring
> RED right) have precluded using it on the switches presently in use.
> It was my hope, this year, to establish a full blown 10+GigE router on
> at least some of their traffic this past year, but ENOFUNDING.
> 
> I would love to know, in particular, if anyone has been trying the
> latest and now readily available in linux DCTCP in a real deployment
> anywhere, and was willing to talk about it? I see, for example, that
> per route setting of ECN is also now in the kernel, and I surmise
> there must be a good reason for that.
> 
> I have several hacky test tools that use ECN in various ways, which
> could use some more users and love.
> 
>>> Dave Taht <dave.taht@gmail.com> wrote:
>>>> 
>>>> section 6 addition. (could use more verbiage)
>>>> 
>>>> 6.3 "An AQM that is ECN aware MUST have overload protection.
>>> 
>>>   I fear I cannot discern what you mean this to say. :^(
> 
> Overload protection has been discussed here before. Basically you need
> an operational point at which you drop, rather than mark packets. The
> consensus here is that operational point should be mark before you
> would normally drop, but pie,codel,fq_codel, cake and red *do not do*
> that presently, and there are severe constraints/hw/sw costs to having
> two different setpoints.

This statement seems to conflate to separate issues:
The phrase "mark before you would normally drop" talks about where the marking point should be (assuming that "normally" means: if the packet was not ECN-enabled).
What you say about overload protection is something else: it's a point at which an AQM mechanism would make a decision to drop *ECN-enabled* packets.

I have not seen any sign of consensus for the latter being good practice, and I, for one, am strongly against it, for the following two reasons:
1) it is potentially harmful: later in your email you point at the importance of ECN for non-TCP traffic - indeed the reaction to ECN might not always be exactly the same as it would be to a packet drop (in particular with "mark before you would normally drop"). However, any such behaviour becomes moot when, in the same round-trip time, drops are enforced on some of the ECN-enabled packets: then the sender has no way but to react to the drop the "normal" way, meaning that any potential benefit from a different reaction to ECN is eliminated.
2) I can't see how it would help against attacks: any queue has an upper limit, and I can try to kill all other traffic by sending at a crazy high rate with or without ECN. Most AQM mechanisms operate probabilistically (well, not CoDel), based on an average (delay or queue length), and I can't see how sometimes dropping instead of ECN-marking packets would help against such sources.


> The present version of codel in linux has no overload protection. It
> will merrily keep marking packets until the packet limit is exceeded,
> then drop, rather than drop at any threshold. Thus ecn is disabled by
> default in that version.

Why? It will drop anyway when the total queue length is exceeded.


> There have long been several patches being
> tested in cerowrt (and available for all to try) that attempt various
> methods to do this more sanely, which I have also reported here. The
> two we have settled on will hopefully be comprehensively evaluated
> this summer.
> 
> There was (last I looked) no way to do ecn in ns2, and support for ns3
> has not quite landed yet as best I recall.
> 
> We viewed fq_codel with/ecn as safe to deploy, due to the flow
> isolation, and that is still mostly true. For the hardware
> implementation however, we dropped the search all queues portion of
> the algorithm (see last paragraph of section 5.1 of the fq_codel
> draft) and are still in search of saner ways to find the largest
> queue(s) to search in parallel.
> 
> We added a mildly smarter version of overflow protection to the linux
> version of pie, but it misbehaves when random numbers are excessively
> random, dropping when it should probably still be marking.
> 
> None of this is directly applicable to the language of the document,
> except by better explaining multiple things to naive users.
> 
> 1) enabling ECN by itself accomplishes nothing, unless there is an AQM
> on the bottleneck link(s) also

Isn't this blindingly obvious, even by the definition of ECN in RFC 3168?


> I note that stuart cheshire did not fully grasp this duality until I
> worked closely with him on:
> http://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN
> 
> He's a smart cookie. Others aren't. More context around ECN is needed.
> 
> 2) That application developers blithely enabling ecn is potentially
> dangerous to the health of the network.

I have neither seen evidence nor consensus of this being correct.


> It would seem intuitive to a gamer, perhaps, to mark all their packets
> with ECN, so that by god, all their packets got through. (it's not
> only intuitive, but other forms of sparse traffic can also benefit
> from being ecn marked. I also did favor the ECN enablement of the main
> frame in the webrtc nada proposal for example. I have marked dns and
> icmpv6 traffic with ECN and watched that do fascinating things to the
> network, also.
> 
> Everyone here is seemingly stuck on ecn + tcp, where I have long felt
> that safer places to innovate were in quic and webrtc.
> 
> ooh! another 6.x section addition:
> 
> 6.x an example where ecn marking can be bad is where the inner header
> is copied to the outer, verbatim, and not copied back.
> 
> this error in code exists in the field today, it is presently in the
> tinc 1.1 vpn system.

Agreed, I think we should address that.


>>>> It is trivial for a malbehaved application/worm/bot to mark all
>>>> its packets with ECN and thus gain priority over other traffic
>>>> not ecn marked.
>>> 
>>>   This somewhat-paranoid claim rests on several assumptions that I
>>> hope we will recommend against.
> 
> Not paranoid at all. Trivially feasible, and a real potential attack
> vector. If you would like to be scared about how a flood of ecn marked
> packets could do worse damage, you might want to look at the scope of
> attacks that cloudflare has to deal with regularly.

Please share details. I have trouble understanding the danger of ECN.


>>> - the most obvious is an assumption that a tail-drop node will mark
>>>  _instead_ of dropping ECN-capable packets. This is not actually
>>>  possible, and I hope we will strongly deprecate it. Tail-drop should
>>>  drop packets regardless of ECN bits.
> 
> I agree that a tail drop queue will not do ECN. However in an aqm
> system without overload protection, you basically end up with a tail
> drop queue, one that also ends up dropping all the non-ecn marked
> packets.
> 
>>> 
>>> - there is also an assumption that an ECN-capable transport can mark
>>>  its packets as ECN-capable and then never reduce its sending rate.
>>>  I suppose it could; but not-ECN-capable transports can also never
>>>  reduce the sending rate. :^( And the not-ECN-capable transports
>>>  could accomplish the same reduction in "lost" packets by FEC.
> 
> This is false equivalence. If ecn can be gamed, it will be gamed.

As above, yes it can be gamed - what I have not yet seen is evidence of this being a serious problem (any more serious than transports sending many non-ECN-capable packets without adapting their rate).


> A lot of my support of ecn is basically that packet loss is so trivial
> above 100mbit that it really doesn't matter much if it used or not, so
> it helps a little in the general case, but with well behaved apps
> getting marked .01% of the time, on or off and the whole debate is a
> tempest in a tea-cup.
> 
> It does seem very useful on longer RTTs.
> 
>>> 
>>>   I believe we are going to "suggest" a lower marking threshhold for
> 
> despite 3 years of trying have been unable to come up with an
> algorithm for that that works well with different setpoints with mixed
> traffic.
> 
>>> ECN-capable packets than the dropping threshhold for not-ECN-capable
>>> packets at AQM-capable nodes. This should reduce the paranoia level,
>>> I hope, since the ECN-capable flows will get congestion signals when
>>> not-ECN-capable packets are _not_ being dropped.
> 
> Look forward to seeing a working version from someone.
> 
>>>   We should concentrate our efforts on providing useful signals:
>>> that some transports might make poor use of these signals is beyond
>>> our scope.
> 
> I thought we were providing useful *guidance* to developers of network
> applications.
> 
>>> 
>> I understand that router overload needs to be considered in the design of
>> an  AQM algorithm, but I inclined to think there is not much say to
>> application designers, and that this need may have been said said in the
>> AQM Recommendations document. Agreeing with John, I don't see this as the
>> place to start putting detail on how routers implement AQM.
> 
> That's why it was a short sentence to begin with. However, some
> discussion of the benefits and pitfalls of using ECN in new
> applications I do feel is needed.
> 
>>>> 6.4 Enabling ECN at the application layer requires access to the IP
>>>>    header fields, which are usually abstracted out completely at the
>>>>    tcp layer, and hard to access from udp with multiple non-portable
>>>>    methods to do so.
>>> 
>>>   Yes, there are TCP stacks which are ECN-unfriendly; but there are
>>> enough _today_ which are friendly to ECN.
> 
> Again, tcp thinking.
> 
> 1) It is trivial to write an a udp app that emits ecn. Same setsockopt
> as IP_TOs. Mosh and multiple other apps does it already.
> 2) It is less trivial to write a udp app that handles ecn correctly.
> Mosh does that also, but so far as I know they got the BSD
> implementation wrong.
> 
> the sendmsg and recvmsg apis are in dire need of an update since their
> specification.
> 
> IF you wish to refine the scope of this document to be only TCP with
> ECN, and exclude use case such as vpn encapsulation and udp
> applications where it might be useful (like webrtc), ok... but....
> 
>>> 
>> I also agree with what you say - although, again I'm not sure we need to
>> add this here, I think the design of transports is really the topic of
>> RFC5405.bis,
>> 
>>>>    ECN over UDP in new applications such as webrtc and Quic has
>>>>    great potential for many other applications, however the same
>>>>    care of design that went into ECN on TCP needs to go into
>>>>    future UDP based protocols.
>>> 
>>>   I wouldn't disagree; but those issues are essentially-solved
>>> problems today.
> 
> You are kidding me, right?
> 
>>>> Some other section that may end up here?
>>>> 
>>>> ECN marking other sorts of flows (example routing packets) that have a
>>>> higher priority than other flows on link-local packets may be of benefit
>>>> with wider availability of aqm technologies that are ecn aware...
>>> 
>> I'm not sure I understand what you are suggesting with respect to ECN.
>> 
>>>   I suppose there might be _some_ use for ECN on routing packets; but
>>> I doubt this is desirable today. ECN is not-at-all about getting a
>>> higher priority -- it's about getting congestion signals without
>>> packet loss.
> 
> On that we agree, and I should probably have used a different example
> from routing, citing the original webrtc nada draft as my example.
> 
>>> 
>> I think the IETF would normally recommend diffserv priority marking for
>> network control traffic.
> 
> I am all in favor of CS6. Not so much CS7. And as you know, few
> diffserv priorities survive e2e transit, and ECN markings survive much
> more often end to end than diffserv.
> 
>> 
>>> --
>>> John Leslie <john@jlc.net>
>>> 
>> 
>> Gorry
>> 
>> 
> 

Cheers,
Michael

Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - c… Dave Taht
[aqm] Updated draft-ietf-aqm-ecn-benefits - comme… gorry
Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - c… John Leslie
Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - c… gorry
Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - c… Dave Taht
Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - c… Michael Welzl
Re: [aqm] Updated draft-ietf-aqm-ecn-benefits - c… Dave Taht