Re: [tsvwg] sce vs l4s comparison plots?
Dave Taht <dave@taht.net> Mon, 11 November 2019 00:18 UTC
Return-Path: <dave@taht.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 720891200FE; Sun, 10 Nov 2019 16:18:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ASxeD8MuJdsb; Sun, 10 Nov 2019 16:18:15 -0800 (PST)
Received: from mail.taht.net (mail.taht.net [IPv6:2a01:7e00:e000:2d4:f00f:f00f:b33b:b33b]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 14319120058; Sun, 10 Nov 2019 16:18:15 -0800 (PST)
Received: from dancer.taht.net (unknown [IPv6:2603:3024:1536:86f0:eea8:6bff:fefe:9a2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.taht.net (Postfix) with ESMTPSA id C3BF221B46; Mon, 11 Nov 2019 00:18:11 +0000 (UTC)
From: Dave Taht <dave@taht.net>
To: Tom Henderson <tomh@tomh.org>
Cc: tsvwg IETF list <tsvwg@ietf.org>, tcpm@ietf.org
In-Reply-To: <4b67d594-e4fc-92d8-fcdc-8384fcb7286b@tomh.org> (Tom Henderson's message of "Sun, 10 Nov 2019 22:27:58 +0000 (UTC)")
References: <742142FB-6233-4048-931B-EE2DD9024454@gmx.de> <87mud4ejl9.fsf@taht.net> <4b67d594-e4fc-92d8-fcdc-8384fcb7286b@tomh.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
Date: Sun, 10 Nov 2019 16:17:59 -0800
Message-ID: <87a7931d1k.fsf@taht.net>
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/-UEeaXDbmQYi3BqxwuSBI3EIcO8>
Subject: Re: [tsvwg] sce vs l4s comparison plots?
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Nov 2019 00:18:18 -0000
Tom Henderson <tomh@tomh.org> writes: > Dave, > I missed replying earlier to a few of your points below (inline). > > On 11/9/19 3:05 PM, Dave Taht wrote: > >> >> By default (when run without -x) flent captures very little metadata >> about the system it is run on (IP addresses, a couple sysctls, and >> qdiscs) but it's helpful to have. One example that would be in that >> metadata, is that I'm unsure if the ns3 data is using an IW4 or IW10? >> >> It sounds like you are rate limiting with htb? (to what quantum?) >> >> Another example, in more "native" environments running at a simulated >> line rate, BQL is quite important to have in the simulation >> also. there's been a couple papers published on BQL's benefits vs a raw >> txring, thus far, there's a good plot of what it does in fig 6 of: >> >> http://sci-hub.tw/10.1109/LANMAN.2019.8847054 > > in the ns-3 simulations posted, BQL is enabled on all links. Cool. If only the dsl and cable worlds had adopted this! it allows for much smarter handling of packet delivery higher in the stack at the cost of one interrupt's worth of standing queue. Without BQL we wouldn't be scaling linux past 10GigE today. I keep hoping *switches* will start doing bql, also. Note: The AQL work (long in google wifi and a few other places) - "airtime queue limits" - is finally entering linux mainline. This makes for a similar savings to BQL at interrupt time on media with variable "line rate" encodings, extremely useful for wifi and lte and possibly for cable and ethernet over powerline devices. Some details on how well it works are in the google document here: http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf Toke's current patchset: https://patchwork.kernel.org/cover/11206223/ > >> >> Lastly... >> >> So far as I know ns(X) does not correctly simulate GSO/TSO even when >> run in DCE mode, but I could be out of date on that. TBF (and cake) >> do break apart superpackets, htb (+ anything, like fq_codel or dualq) >> do not. > > Correct, we do not have an ns-3 model for GSO/TSO. Is it needed (in > the simulation) if BQL is enabled with small device queues? I don't know. Are you seeing GRO/GSO/TSO superpackets in the path on this simulation? It isn't on for a variety of pseudo devices, particularly in older releases of linux. The 4.4 kernel (released 2016-01-10) you are leveraging predates some major new network subsystem features, like better pacing, sch_etx, and the switch to EDF scheduling (in linux 5.1). I'm sure there are dozens more things that might matter, which is why we test. :) I imagine neal is more up on all the changes since 4.4 that might matter. In the field: We started running into problems with GRO starting in 2014, where gige line rate bursts of IW10,20,30,42 or more packets hit certain brands of home router hardware, and were then released as a single superpacket. The resulting load spike (stepping down, to, say, 5mbit) was quite noticible when doing concurrent voip applications, and also codel tended to be late in recogizing stuff past the "burp", it didn't matter how much FQ we had in certain scenarios when dealing with superpackets. back in 2015, we'd had to put in commit: a5d28090405038ca1f40c13f38d6d4285456efee to get GSO even remotely right. Anyway... After much gnashing of teeth and pulling of hair while trying to selectively disable or split gso on various pieces of consumer hw, in the end we made gso-splitting the default in cake, rather than try to turn off GRO everywhere it was needed, or revise the htb shaper so fq_codel could be used more effectively with GSO in place. (as I said "tbf" does splitting, htb - which is what most sqm systems use - does not). fq_codel_fast (which has sce support) also splits GSO. I'd filed a bug on the L4S github site requesting that they also implement GSO splitting and use memory, not packet limits, a while back. Given how fine grained either SCE or L4S need to be on their signalling, my assumption (needing testing!), is that GSO/GRO/TSO need to be disabled to get an expected result, and with GSO/GRO/TSO on, especially at lower rates, the results will be "interesting", and thus in the field with the final delivered codebase(s), it needs to be handled properly by the qdisc either way it goes. Last year GSO went always on in linux kernels, which generally means that locally sourced tcp packets are always 2 packets in size or larger. the popular sch_fq scheduler has a 2 packet quantum also. This is great, if you are targetting 40gige.... commit 0a6b2a1dc2a2105f178255fe495eb914b09cb37a Author: Eric Dumazet <edumazet@google.com> Date: Mon Feb 19 11:56:47 2018 -0800 tcp: switch to GSO being always on Oleksandr Natalenko reported performance issues with BBR without FQ packet scheduler that were root caused to lack of SG and GSO/TSO on his configuration. In this mode, TCP internal pacing has to setup a high resolution timer for each MSS sent. We could implement in TCP a strategy similar to the one adopted in commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts") or decide to finally switch TCP stack to a GSO only mode. This has many benefits : 1) Most TCP developments are done with TSO in mind. 2) Less high-resolution timers needs to be armed for TCP-pacing 3) GSO can benefit of xmit_more hint 4) Receiver GRO is more effective (as if TSO was used for real on sender) -> Lower ACK traffic 5) Write queues have less overhead (one skb holds about 64KB of payload) 6) SACK coalescing just works. 7) rtx rb-tree contains less packets, SACK is cheaper. This patch implements the minimum patch, but we can remove some legacy code as follow ups. Tested: On 40Gbit link, one netperf -t TCP_STREAM BBR+fq: sg on: 26 Gbits/sec sg off: 15.7 Gbits/sec (was 2.3 Gbit before patch) BBR+pfifo_fast: sg on: 24.2 Gbits/sec sg off: 14.9 Gbits/sec (was 0.66 Gbit before patch !!! ) BBR+fq_codel: sg on: 24.4 Gbits/sec sg off: 15 Gbits/sec (was 0.66 Gbit before patch !!! ) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: David S. Miller <davem@davemloft.net> > > - Tom
- [tsvwg] [https://tools.ietf.org/html/draft-ietf-t… Sebastian Moeller
- [tsvwg] sce vs l4s comparison plots? Dave Taht
- Re: [tsvwg] [tcpm] sce vs l4s comparison plots? Rodney W. Grimes
- Re: [tsvwg] [tcpm] sce vs l4s comparison plots? Dave Taht
- Re: [tsvwg] sce vs l4s comparison plots? Tilmans, Olivier (Nokia - BE/Antwerp)
- Re: [tsvwg] sce vs l4s comparison plots? Dave Taht
- Re: [tsvwg] sce vs l4s comparison plots? Dave Taht
- [tsvwg] Fwd: Re: [tcpm] sce vs l4s comparison plo… Tom Henderson
- [tsvwg] Fwd: Re: sce vs l4s comparison plots? Tom Henderson
- Re: [tsvwg] Fwd: Re: sce vs l4s comparison plots? Jonathan Morton
- Re: [tsvwg] sce vs l4s comparison plots? Tom Henderson
- Re: [tsvwg] sce vs l4s comparison plots? alex.burr@ealdwulf.org.uk
- Re: [tsvwg] sce vs l4s comparison plots? Dave Taht
- Re: [tsvwg] sce vs l4s comparison plots? Toke Høiland-Jørgensen
- Re: [tsvwg] sce vs l4s comparison plots? Steven Blake
- Re: [tsvwg] sce vs l4s comparison plots? Toke Høiland-Jørgensen