[tsvwg] fq_codel deployment size

Dave Taht <dave@taht.net> Thu, 07 November 2019 17:51 UTC

Return-Path: <dave@taht.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EDBCD1209C5 for <tsvwg@ietfa.amsl.com>; Thu, 7 Nov 2019 09:51:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mYtq-plx9YKQ for <tsvwg@ietfa.amsl.com>; Thu, 7 Nov 2019 09:51:04 -0800 (PST)
Received: from mail.taht.net (mail.taht.net [176.58.107.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8C308120931 for <tsvwg@ietf.org>; Thu, 7 Nov 2019 09:51:03 -0800 (PST)
Received: from dancer.taht.net (unknown [IPv6:2603:3024:1536:86f0:eea8:6bff:fefe:9a2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.taht.net (Postfix) with ESMTPSA id F0D052296C; Thu, 7 Nov 2019 17:51:00 +0000 (UTC)
From: Dave Taht <dave@taht.net>
To: "alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk>
Cc: Greg White <g.white@cablelabs.com>, Jonathan Morton <chromatix99@gmail.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <8321f975-dfe7-694c-b5cc-09fa371b9b61@mti-systems.com> <B58A5572-510E-42C7-8181-42A0BE298393@gmail.com> <D2E12331-F504-4D5F-B8E7-A1A5E98DDF7E@cablelabs.com> <2275E6A5-C8F8-477F-A24A-3E6168917DDF@gmail.com> <55F724CD-6E74-40D9-8416-D1918C2008DD@cablelabs.com> <BBE7C7A9-0222-4D84-BF27-8D5CAE2F995E@gmail.com> <6f189711-ffa0-90f4-fd16-3464ba4df3ce@mti-systems.com> <4A706B11-3239-4DAC-BE85-0B4BFF2D8FF8@heistp.net> <8B28ECE4-FF4B-4BB2-ACBE-80B30708F97E@cablelabs.com> <AAEA9AC2-B8A1-4837-A7C9-8EEA21A7C523@gmx.de> <D5D560CB-BC47-45BE-811E-E73E2D4909E3@cablelabs.com> <090EDC6E-7B69-401D-931D-E9C3101E68DD@gmail.com> <1053591400.593303.1573131284597@mail.yahoo.com>
Date: Thu, 07 Nov 2019 09:50:47 -0800
In-Reply-To: <1053591400.593303.1573131284597@mail.yahoo.com> (alex's message of "Thu, 7 Nov 2019 12:54:44 +0000 (UTC)")
Message-ID: <878sorh8y0.fsf_-_@taht.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/UyvpwUiNw0obd_EylBBV7kDRIHs>
Subject: [tsvwg] fq_codel deployment size
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Nov 2019 17:51:09 -0000

"alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk> writes:

> On Thursday, November 7, 2019, 3:35:06 AM GMT, Jonathan Morton
> <chromatix99@gmail.com> wrote: 
>
>> > - if the CoDel queue is upgraded to perform Immediate AQM on L4S
> flows, the latency spike can be largely avoided.
>
>> This is not relevant, since the point of this exercise is to
> establish the extent of L4S' compatibility with existing, unmodified
> networks, in which Codel happens to be the most widely deployed > AQM.
> You cannot reasonably expect the entire Internet to "upgrade" to L4S
> compatible AQMs before you can safely begin deployment.
>
> I wonder if Codel actually is the most widely deployed AQM. This is

Codel is almost never deployed by itself.

FQ-codel is the most widely deployed, enabled AQM and so far as
I can tell, the only one with ECN enabled by default.

It is the default in apple's IOS (you have to jailbreak your iphone to
see it), on all interfaces, the default in OSX, the default in nearly
every linux distribution. (the last to add it was RHEL8). One thing
I've lost a lot of sleep over is the heavy use of containers and
network namespaces nowadays (billions of instances), where I fear
fq_codel is the only thing keeping the internet from exploding as
the box is essentially a "router" when used as such. 

The wifi version is the default (in linux) on QCA's ath9k and ath10k
chips, the mt76, and most recently, many of intel's chips as well.

So, that's kind of over a billion boxes.

Merely counting up the major wifi vendors shipping a version - google
chromebooks, google wifi, google fiber, evenroute, eero, meter, meraki,
tp-link, netgear, etc...

I hate to wave figures around without doing detailed research, but I can
comfortably say the wifi version is in the 10s of millions a year....

As for using fq_codel in QoS/SQM systems, it's got huge penetration, but
I doubt more than a fraction of a percent of users (outside the gaming
and clued business markets using ubnt and/or pfsense) enable it.

Free.fr shared details with me of a percentage of their deployment in a
few million dsl home routers. Of the 36,000 samples that I have, the top
level bits were:

* It's on the uplink only as the downlink invisibly multiplexes tv
* It is configured in the 3 class QoS system so common in SQM,
  with fq_codel on each class. Prioirity, best effort and background
* They have a *tight* - bql-like - interface to their DSL chip, so
  they don't need to shape at all, system automagically has pushback,
  copes with dsl problems, thus extremely low cpu costs.
  (I have extreme frustration with not seeing bql deploy widely elsewhere)
* 48% had exherted a CE at some point in this boot cycle. 
* A fairly typical drop and mark ratio is about 20/1 - meaning the
  majority of connections aren't using ecn.
* The customers are happy. Since 2012 :)
* They wish strongly that they could somehow get this into their fiber
* deployment, instead they use small fifos and hope for the best.

Me randomly picking a piece of data from their reference QoS
implemntation looks like this. You can infer the speed of the uplink
by the size of the target variable. (target is set to 1.5MTU at their
link rate)

qdisc prio 1: root refcnt 2 bands 2 priomap  1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1
 Sent 25418296010 bytes 152822679 pkt (dropped 329184, overlimits 0 requeues 481924) 
 backlog 0b 0p requeues 481924 
qdisc fq_codel 8001: parent 21:1 limit 10240p flows 1024 quantum 1538 target 61.0ms interval 100.0ms ecn 
 Sent 25308497918 bytes 152490355 pkt (dropped 327079, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 44385 drop_overlimit 0 new_flow_count 17308870 ecn_mark 20738
  new_flows_len 0 old_flows_len 9
qdisc drr 21: parent 1:2 
 Sent 25418296010 bytes 152822679 pkt (dropped 329184, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq_codel 8002: parent 21:2 limit 10240p flows 1024 quantum 1538 target 61.0ms interval 100.0ms ecn 
 Sent 109798092 bytes 332324 pkt (dropped 2105, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 1500 drop_overlimit 0 new_flow_count 369 ecn_mark 0
  new_flows_len 1 old_flows_len 4
qdisc fq_codel 11: parent 1:1 limit 10240p flows 1024 quantum 1538 target 61.0ms interval 100.0ms ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

The router itself runs multiple services locally and its tcp is cubic
with ecn enabled.

I know of about 14 other ISPs that have done some deployment but they
aren't willing to talk about it. One was overjoyed to share data with me
(as most of their network was fully fq_codeled and even "caked" at the
peering points) until they ran some rrul tests and realized how bloated
(seconds) some of their wireless services were. I figure they'll get
back to me after they fix that....

I've been looking for someone to collaborate with on a paper on this
dataset for some time. (I've got another fun research project going on
which is looking at bufferbloat in coffeeshops. Aside from google
starbucks, it's pretty dismal worldwide)

> not to dispute your point, which is correct - proposing an update to
> Codel doesn't addess existing deployments..
> It's just that I'm aware that there is an awful lot of kit out there
> which has an implementation of RED (or WRED) in hardware - eg for
> TR-059 [1], which mandates WRED support in BRAS. That's from having
> worked at a silicon vendor, so it doesn't give me visibility of actual
> usage. I understand a lot of it isn't switched on, but if even a small

I wish it was switched on, but it seems too hard to tune for most.

> fraction of it was, it would be a pretty big deployment. It would be
> interesting if anyone has actual numbers. . Of course, to be relevant

I have (in the US) only seen one (dsl) ISP that appears to have enabled
WRED in the BRAS, but my data set is limited,

> to this discussion, it would need to be configured to do ECN marking
> as well. 


> [1] https://www.broadband-forum.org/technical/download/TR-059.pdf
>
> Alex