Re: [aqm] L4S status update

Dave Täht <dave@taht.net> Wed, 30 November 2016 06:16 UTC

Return-Path: <dave@taht.net>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ECA90126CD8; Tue, 29 Nov 2016 22:16:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.398
X-Spam-Level:
X-Spam-Status: No, score=-3.398 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-1.497, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TYsmtiTW8-JO; Tue, 29 Nov 2016 22:16:05 -0800 (PST)
Received: from mail.taht.net (mail.taht.net [176.58.107.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 844EA129488; Tue, 29 Nov 2016 22:15:44 -0800 (PST)
Received: from dair-2506.local (c-73-202-26-20.hsd1.ca.comcast.net [73.202.26.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.taht.net (Postfix) with ESMTPSA id D97FE21341; Wed, 30 Nov 2016 06:15:39 +0000 (UTC)
To: Jonathan Morton <chromatix99@gmail.com>, Bob Briscoe <ietf@bobbriscoe.net>
References: <be67928d-e1f7-2495-147d-1d42d6783cc8@bobbriscoe.net> <f6b89407-14d8-b532-b793-7490cb5a2117@kit.edu> <f16c9830-f97a-64e0-76e6-66f146576616@bobbriscoe.net> <627c9db2-a41f-917d-e639-c27df25bdc51@kit.edu> <39e0cfa7-2650-9d98-a933-7e7c016dc276@bobbriscoe.net> <E615574E-2C76-4BAB-9481-E5FCF84659B3@gmail.com>
From: Dave Täht <dave@taht.net>
Message-ID: <fd286d4f-66a9-1d40-bc44-4a52195c3482@taht.net>
Date: Tue, 29 Nov 2016 22:15:37 -0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <E615574E-2C76-4BAB-9481-E5FCF84659B3@gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/aqm/tyHjy863nufij9MDmiKdfyV2J8o>
Cc: tcpm IETF list <tcpm@ietf.org>, AQM IETF list <aqm@ietf.org>, tsvwg IETF list <tsvwg@ietf.org>, "Bless, Roland (TM)" <roland.bless@kit.edu>, TCP Prague List <tcpPrague@ietf.org>
Subject: Re: [aqm] L4S status update
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Nov 2016 06:16:11 -0000


On 11/28/16 5:20 PM, Jonathan Morton wrote:

>> Even worse, in Jan 2017, I am told that fq_CoDel will become hard coded into the Linux WiFi drivers, without even a framework to dynamically load any alternative(s). Of course, we can add such a framework, but we are seeing Linux become the next major middlebox problem. It might be excusable if there were not sound alternatives available,... but there are.

A) As soon as the need for a pluggable framework arises, the linux wifi
architecture is now simple enough to do so. As it is the results are so
spectacular compared the previous "std" behavior that it will take some
doing to prove a different point. Reductions in latency under load of
over an order of magnitude, bandwidth improvements of 5x, stuff like
that... ( https://lwn.net/Articles/705884/ )

Given that this effort took 3 years out of multiple people's lives, and
a year of slow steady, patching out the old wifi engine, with thousands
of hours of testing and refinement, and a summer lost to fixing bugs, on
next to no budget...

http://blog.cerowrt.org/post/crypto_fq_bug/

I hope that more people give it a go soon (most of it is already
available in nightly builds of lede-project ar71xx based platforms, and
nearing shipment elsewhere in things like the turris omnia). There is at
least one bug left to fix, always.

The ECN support works beautifully, in particular, and this is the first
time in my life, since working on wifi in 1998, that I'd feel
comfortable deploying a single AP to multiple station solution in a
commercial multi-customer environment - (after a bit more testing in the
yurtlab campground testbed) - before now, only p2p was sanely saleable.

B) Secondly, FQ technology has always existed in wifi - many WISP
vendors altered the MAC to behave via TDMA - or called something TDMA
that was really SFQ. Some like ubnt, have even gone so far as
implementing full duplex over separate channels in their high end
airstream products, on top of that. FQ has long been a part of meraki's
product line. I don't know how cisco does their magic, but...

C) Airtime fairness first appeared as optional features in cisco and at
least one other manufacture's gear at least 6 years back. the problem it
solves was first described in 2003.

So it is ironically the inclusion of an aqm for the first time that is
the most novel feature of this make-wifi-fast work (on top of fq, and
with airtime fairness), integrated tightly. Very little work on aqm
technology prior to now has been applied to highly contended, half
duplex, lossy layer 2 medium systems, and I expect there will be much
more work to be done here.

So... it is far from "just linux" you need to deal with - and far less -
the initial patchset is for the ath9k chipset only, with the ath10k
running a bit behind, and the mt76 still floating in limbo. We are still
in search of other chipsets worth optimizing, with the hope we'll find
something that is in an android as well as at least one other 802.11ac
chipset. From a queue management perspective there is mu-mimo next
(which requires gang scheduling of up to 16 stations), reducing txop
size under contention, and dealing with currently excessive layer 2
retries sanely under contention.

BUT: from an engineering perspective, there are two other things wrong
with wifi that are more severe than these latter problems - the first
one - excessive channel scans - was seriously detrimental to voip and
videoconferencing and the air around the user - and is on its way to
being improved - bug and pending fix described here:

http://blog.cerowrt.org/post/disabling_channel_scans/

The second huge problem is that rate control, now that there is such a
large search space in wireless-n and ac - is no longer as quickly
effective as it needs to be - taking 10-20sec or more to settle in on a
good rate, when 100ms would be ideal. Work on fixing that (which also
leverages the all the work above) is hopefully coming along again
(google for minstrel-blues for some of the more public details)

These last four problems are much more severe than coming up with a
different fq, aqm, or airtime fairness algorithm is, and thus, most
worth focusing on next. Hopefully - someone *else* focusing on next -
I'm tired of working on microsecond timescales and would like to find a
new gig that relied on no timescale shorter than a season.

...

Certainly I hope that by january an ath9k laptop and an ath9k AP will
behave better than any linux system has behaved since 802.11b has,
certainly be superior to current LTE in all respects - but it will
probably be another 3-6 months before the major distros pick it up, and
we have not identified another common laptop wifi chipset that can be
improved in these ways.

(the core work mostly applies to APs, and does help a lot no matter the
client chipset or OS. I also hope to find someone working on an enodeb
to start applying the same algorithms in this timeframe)


> This tight integration is because it was necessary to solve some serious, long-standing problems with Linux wifi, which couldn’t be solved satisfactorily at the qdisc layer because information about wifi-specific things was needed - and there were *no* practical alternatives which actually solved the problem, otherwise we’d have used them.
> 
> Wifi is also a last-mile technology, and it is often the bottleneck in several types of practical deployment.  Large conferences are a particular example.  I’m rather looking forward to seeing the first large conference to deploy the new Linux wifi stuff, and seeing whether it has made the typical load there easier to cope with.  It probably has.

Hopefully the upcoming SCALE conference will be our first major
deployment test. If not, battlemesh, wlan slovinia, or sudoroom.

Or for all I know, open-mesh, eero, portal, meraki, and/or plume have
already picked up the code.


> 
>  - Jonathan Morton
> 
> _______________________________________________
> aqm mailing list
> aqm@ietf.org
> https://www.ietf.org/mailman/listinfo/aqm
>