[tsvwg] quick review and rant of "Identifying and Handling Non Queue Building Flows in a Bottleneck Link"
Dave Taht <dave@taht.net> Mon, 29 October 2018 04:03 UTC
Return-Path: <dave@taht.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5AA74130DC9 for <tsvwg@ietfa.amsl.com>; Sun, 28 Oct 2018 21:03:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.435
X-Spam-Level: *
X-Spam-Status: No, score=1.435 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_SBL_CSS=3.335, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O9S27K_JrJ9d for <tsvwg@ietfa.amsl.com>; Sun, 28 Oct 2018 21:03:02 -0700 (PDT)
Received: from mail.taht.net (mail.taht.net [IPv6:2a01:7e00::f03c:91ff:feae:7028]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B959A130DC8 for <tsvwg@ietf.org>; Sun, 28 Oct 2018 21:03:01 -0700 (PDT)
Received: from dancer.taht.net (unknown [IPv6:2603:3024:1536:86f0:eea8:6bff:fefe:9a2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.taht.net (Postfix) with ESMTPSA id 5E567220F1; Mon, 29 Oct 2018 04:02:58 +0000 (UTC)
From: Dave Taht <dave@taht.net>
To: tsvwg@ietf.org, g.white@cablelabs.com, bloat@lists.bufferbloat.net
Date: Sun, 28 Oct 2018 21:02:45 -0700
In-Reply-To: tsvwg-request@ietf.org's message of "Sun\, 28 Oct 2018 19\:52\:06 -0700 \(8 seconds ago\)"
Message-ID: <878t2h1jtm.fsf@taht.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/GyJhx6IdR1xJ9NMqQGiDOF_HFGY>
Subject: [tsvwg] quick review and rant of "Identifying and Handling Non Queue Building Flows in a Bottleneck Link"
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2018 04:03:05 -0000
Dear Greg: I don't feel like commenting much on ietf matters these days but, jeeze, I'm really tired of rehashed arguments based on bad papers that aren't even cited in the document, that aren't talking about the real problem. I get that some magical solution to a non-problem is wanted here: https://tools.ietf.org/id/draft-white-tsvwg-nqb-00.txt but then I just read this part... and had to cringe and respond. Flow queueing approaches (such as fq_codel RFC 8290 [RFC8290]), on the other hand, achieve latency improvements by associating packets into "flow" queues and then prioritizing "sparse flows", i.e. packets that arrive to an empty flow queue. Flow queueing does not attempt to differentiate between flows on the basis of value (importance or latency-sensitivity), it simply gives preference to sparse flows, and tries to guarantee that the non-sparse flows all get an equal share Nope, that's not what it does. please strike the sentence "it simply gives preference... " and replace with something like "all flows get as fully mixed as possible." It's just slightly better min/max fair than most other fair queuing mechanisms. Better: Flow queueing approaches (such as fq_codel RFC 8290 [RFC8290]), on the other hand, achieve latency improvements by breaking flows back into individual packets, and then prioritizing "sparse flows", i.e. packets that arrive to an empty flow queue that empties every round. Flow queueing does not attempt to differentiate between flows on the basis of value (importance or latency-sensitivity), it simply interleaves flows as best as possible, giving a slight preference to sparse flows. As a result, fq mechanisms are appropriate for unmanaged environments and general internet traffic. Editorial: Are. Have. A full deployment is basically done now. fq_codel is universally "on" on most linuxes, with sch_fq and pacing filling in the rest. a FQ technique is a near universal default in the cloud now. deployed in the 10s of millions on commercial linux APs. fq_codel is the default in OSX wifi. It's now in freebsd, and there was a great thread recently on how it all works in pfsense here: https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/709 Got any users for this other stuff yet? Downsides to this approach can include loss of low latency performance due to hash collisions (where a sparse flow shares a queue with a bulk data flow) sch_fq has *no* buckets. it's *perfectly* fair to millions of tcp flows. fq_codel hash collision probability is sqrt(buckets), 1024 has thus far been pretty good (32). When you combine that with the probability of there being a fat flow to collide with (call it 3%? of all flows), the real world impact of these collisions is nearly immeasurable against normal traffic. vs a single queue!!! which imposes the queue-length from one fat flow on *all* flows. The probability of that happening is 100%. Jeeze. Describing a "loss of low latency performance" this way is a 99% lie vs what you are comparing against, and it's because I'm ranting today since I've encountered this lie a dozen times in a dozen documents as if repetition made it true! NOW... I NOTE HOWEVER: We happen to agree that that one collision in a few hundred in the case of something you really really care about is too much. So we added not only 8 way set associative hashing to sch_cake (collision probability of roughly zero), but full support for diffserv classification, and it's shipping in linux 4.19. (been available for two years in openwrt, ubnt, evenroute and elsewhere). It's got great support for docsis framing also. I sure hope more people outside the ietf try it. Oh yea, we also defaulted sch_cake to per host FQ (running simultaneously with the per flow FQ), which works, even through nat, and that does a great job of giving low latency to all those IOT devices you might have... keeps torrents even more under control... https://kernelnewbies.org/Linux_4.19#Better_networking_experience_with_the_CAKE_queue_management_algorithm Anyway, observed use of diffserv markings is at nearly nil, and comcast rewrites all traffic it doesn't recognise as CS1, so we wash that clean on inbound. And most of the time, at real bandwidths, fq_codel more than sufficies, except when you need to toss off a one-liner for a shaper, like tc qdisc add dev eth0 root cake docsis bandwidth 20mbit ack-filter "complexity in managing a large number of queues" fq_codel, um, a couple hundred lines of code. Trivial, compared to just about anything real. tcp is a few thousand, for example. It's smaller than most device drivers. The core of the algorithm is 20 lines. (cake is more complicated) Please stop calling it complicated. The codebase for nearly any queuing system is going to be within 10-20% of each other... THE REAL ELEPHANT IN THE ROOM is SHAPING. ... the 680ms of totally gratiutious buffering comcast CMTSes have at 100mbit, and 280 at 10mbit up. Having to outbound shape is not a problem with any hardware we have deployed, even with cpus designed in the late 80s. Inbound shaping of the "Badwidth" being provided by ISPs today is CPU intensive, easily 95% of the overall cost and complexity of doing queuing "right". Can we have a discussion on fixing SHAPING somewhere in the ietf? Can we get ISPs to at the very least, buffer no more than a 85ms of data? "and the scheduling (typically DRR) that enforces that each non-sparse flow gets an equal fraction of link bandwidth causes problems with VPNs and other tunnels" strike "non-sparse" and say "saturating". The problem is MOST VPN traffic is not saturating either, and when it is, it's usually managed by a lousy FIFO queue in the first place. So fq_codel gained the ability to also hash stuff in a terminating tunnel with really nice results for voip in particular a few years back. I can't find the test right now, but we observed peak latency and jitter in the 100ms range before doing that, and 2ms after. Sure, in contrast to other saturating flows outside the tunnel, you might get less bandwidth, but all the vpn bandwidth you do get, is goodwidth. For all traffic. And it's per vpn tunnel. These days I see one vpn per user... Did I rant already that the vast majority of flows are non-saturating? I really wish, just once, one day that the l4s effort would actually try real traffic in a real scenario, with web, voip, dns, gaming. I've reached the point where I just start calling crazy conclusions based on crazy assumptions "big buck bunny scenarios". start with an hour's packet capture of your home gateways normal traffic. count the number of distinct flows.... or your office gateways'... , exhibits poor behavior with less-aggressive CA algos, e.g. LEDBAT, and exhibits poor behavior with RMCAT CA algos. I would not call it poor. I would call it undesirable based on the intent of the algorithm's designers. HOWEVER: LEDBAT advocates think that imposing 100ms of queuing delay on all flows is *ok*. To me, that's *poor*. In fact - *nuts*. https://perso.telecom-paristech.fr/drossi/paper/rossi14comnet-b.pdf I really wish I wasn't outnumbered on writing that paper, and could release a sequel to this based on real world results... 'cause fq_codel users run bittorrent all day long and don't notice its there. Every freaking day. I'm running 10 torrents right now. Never notice. I knew when we wrote that we'd obsolete LPCC's brain damaged 100ms induced latency concept as a "good" method... *entirely*, and we did, and *WE DON'T CARE*. You shouldn't either. Try running torrent all day on cable without an inbound fq_codel shaper. Just try it. Chose. * "Exhibits poor behavior with rmcat CA algos." The relevant paper for this was terrible, and based on an artificial benchmark not modeling real traffic, and was at 2Mbit to boot, by a very biased observer. Videoconferencing traffic works GREAT with fq_codel at higher bandwidths against all forms of real traffic. 20mbit down/4mbit up and above. 2Mbit is just going to always suck for videoconferencing, above that... To quote VJ : "low rate videoconferencing never gets dropped" Which is really the measured case, in real traffic, on real networks, doing videoconferencing. Try it. " In effect the network element is making a decision as to what constitutes a flow, and then forcing all such flows to take equal bandwidth at every instant." Still wouldn't say it that way. It's maximizing entropy that matters. Microbursts get ripped out. Jitter vanishes. Only fat flows see "equal bandwidth". My biggest bugaboo with the l4s thing is that somehow y'all think that traffic will pre-arrive, automagically dispersed in statistically random ways, and that does not happen. Ever. Traffic is naturally bursty and needs to be broken up. Anyway this rant is somewhat mis-directed. I do hope some magical way of identifying sparser flows appears, other than FQ already does so beautifully, but until then I really wish more people would repeat less bullshit and go actually use the stuff they are dissing and understand it more deeply. It certainly is possible to do better entropy e2e than we do. There has certainly been wonderful things with fq + pacing of late, and I look forward to things like the new etx scheduler and intel hardware support for timerwheels to be widely deployed.
- [tsvwg] quick review and rant of "Identifying and… Dave Taht
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Toke Høiland-Jørgensen
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Greg White
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Michael Welzl
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Toke Høiland-Jørgensen
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Dave Taht
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… David Lang
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Michael Welzl
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Dave Taht
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… David Lang
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Michael Welzl
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Greg White
- Re: [tsvwg] [Bloat] quick review and rant of "Ide… Dave Taht