Re: [aqm] Last Call: <draft-ietf-aqm-fq-codel-05.txt> (FlowQueue-Codel) to Experimental RFC

Bob Briscoe <research@bobbriscoe.net> Mon, 21 March 2016 18:04 UTC

Return-Path: <research@bobbriscoe.net>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E8C3612DA06; Mon, 21 Mar 2016 11:04:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qnW2NQnj9vPO; Mon, 21 Mar 2016 11:04:49 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9D85112D66C; Mon, 21 Mar 2016 11:04:33 -0700 (PDT)
Received: from 172.146.114.87.dyn.plus.net ([87.114.146.172]:52432 helo=[192.168.0.10]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.86_1) (envelope-from <research@bobbriscoe.net>) id 1ai4Br-0001Tu-9L; Mon, 21 Mar 2016 18:04:31 +0000
To: Toke Høiland-Jørgensen <toke@toke.dk>
References: <20160303172022.12971.79276.idtracker@ietfa.amsl.com> <56EBDA04.3020500@bobbriscoe.net> <8737rox89h.fsf@alrua-desktop.borgediget.toke.dk>
From: Bob Briscoe <research@bobbriscoe.net>
Message-ID: <56F037AE.90608@bobbriscoe.net>
Date: Mon, 21 Mar 2016 18:04:30 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <8737rox89h.fsf@alrua-desktop.borgediget.toke.dk>
Content-Type: multipart/alternative; boundary="------------090903030605060005020803"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/y_Fdc-lVcLWmdfe0wueq2OQcY4M>
Cc: mls.ietf@gmail.com, draft-ietf-aqm-fq-codel@ietf.org, aqm@ietf.org, ietf@ietf.org, aqm-chairs@ietf.org
Subject: Re: [aqm] Last Call: <draft-ietf-aqm-fq-codel-05.txt> (FlowQueue-Codel) to Experimental RFC
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Mar 2016 18:04:56 -0000

Toke,

Sorry for not yet sending the follow-up. Straight after that email, I 
got roped into becoming a makeshift ambulance driver and then ... long 
story...

Thanks for taking my comments constructively, as intended. Responses 
embedded.

On 18/03/16 12:47, Toke Høiland-Jørgensen wrote:
> Hi Bob
>
> Thank you for your timely and constructive comments. Please see the
> inline responses below.
>
>> My main concern is with applicability. In particular, the sentence in
>> section 7 on Deployment Status: "We believe it to be a safe default
>> and encourage people running Linux to turn it on: ...". and a similar
>> sentiment repeated in the conclusions. "and we believe it to be safe
>> to turn on by default, as has already happened in a number of Linux
>> distributions."
>>
>> Can one of the authors explain why a solution with the limitations in
>> section 6 can still be described as "safe"?
> "We believe it to be a safe default" means that we have not seen any of
> the theoretical limitations we have documented in section 6 be a concern
> *in practice* in any of the extensive number of deployments FQ-CoDel has
> seen already. And that the benefits of turning on FQ-CoDel are
> sufficient that nudging people in that direction is a good idea.
This is perhaps because "we" (ie the people looking) tend to have 
significantly more bandwidth than the majority of Internet users (those 
in the developing world). When you have less bandwidth, long-running 
flows last longer, so they tend to overlap more. Given bloat problems 
are only seen intermittently in the first place [Hohlfield14], the 
average person isn't going to see these limitations very often. But if 
you are a homeworker using a VPN (for instance), you will be dogged by 
these problems all the time.

So the main problem here is with the assumption that the test has to be 
"whether we observe these limitations in practice".

Few people observed problems with NATs at the time they were introduced 
(otherwise they wouldn't have sold successfully). So those arguing 
against them tended to be ignored by mainstream comms engineers. But 
then the "theoretical" limitations started to bite. And we ended up 
having to make do with a subset of the potential of the Internet. Those 
sounding the warning bells could see the potential of the Internet, and 
they could see how NATs would close that off. Those ignoring the warning 
bells believed they were right to only be concerned with the here and now.

My concern is about precluding future desirable developments in 
application behaviour. It will be rare to observe such cases by random 
inspection, they may not appear while using existing applications on 
existing high speed links. But, they will occur very frequently in 
scenarios prone to them. That's often the nature of side-effects.

My concern is particularly about fq technology in the network precluding 
improvements in the quality of regular best efforts service that we can 
expect through changes in applications and transports alone.

When I was arguing against FQ_CoDel (back in 2013 at the latency 
workshop - you were there too), numerous people were saying that 
FQ_CoDel is much more subtle than regular FQ. At which point I quietened 
down, because I trusted enough of those people. However, in the recent 
tests with HAS (criticised at length elsewhere), one thing that can be 
said with certainty was that FQ_CoDel just becomes a regular fq 
scheduler when you have two or more long-running flows that can always 
keep their queues from emptying. Whatever instantaneous rate the 
application tries to run at, FQ overrides it and runs at 1/N of the 
capacity. That is not good for a video coming off a camera at a variable 
information rate. FQ skims off all the peaks, so the VBR codec adapts 
down to the worst-case peak rate, not the worst-case average rate.

>> Indeed, these sentences seem rather Orwellian.
> I can assure you that we are not attempting to exert "draconian control
> by propaganda, surveillance, misinformation, denial of truth, and
> manipulation of the past" (quoting
> https://en.wikipedia.org/wiki/Orwellian here). But thank you for
> implying it :)
Well, stating the limitations in the draft, then denying their truth in 
the conclusions by using the word safe to describe them is classic 
Orwellian Newspeak 
<https://en.wikipedia.org/wiki/Newspeak#To_control_thought>.
>
>> Would it not be correct instead to say that FQ_CoDel has been made the
>> default in a number of Linux distributions despite not being safe in
>> some circumstances?
> At the time it was made the default in OpenWrt (several years ago now,
> if memory serves me right), there was not a whole lot of real-world
> deployment experience, due to the chicken-and-egg problem of not wanting
> to change the default before we have gathered more experience. However,
> today the situation is quite different, thanks in part to the boldness
> of the OpenWrt devs. So no, I do not believe that to be the case any
> longer.
The experience that led me to understand this problem was when a bunch 
of colleagues tried to set up a start-up (a few years ago now) to sell a 
range of "equitable quality" video codecs (ie constant quality variable 
bit-rate instead of constant bit-rate variable quality). Then, the first 
ISP they tried to sell to had WFQ in its Broadband remote access 
servers. Even tho this was between users, not flows, when video was the 
dominant traffic, this overrode the benefits of their cool codecs (which 
would have delivered twice as many videos with the same quality over the 
same capacity.

Now, by your test, you will never see the limitations these videos 
suffered. Because they never got developed. Because the developers gave 
up. You can think of FQ_CoDel as nice well-meaning people (the Linux 
community) creating a new middlebox problem.

>
>> 2. Default?
>>
>> If a draft saying "We believe it to be a safe default..." is published as an
>> RFC, it means "The IETF/IESG/etc believes..."
>> Only one solution can be default, so if the IETF says that FQ_CoDel is a safe
>> default, and no other AQM RFC makes any claim to being a safe default (which
>> they do not at the moment), it could be read as the IETF recommending FQ_CoDel
>> for default status and, by implication, other AQMs (like PIE, say) are not
>> recommended for default status.
> This is certainly not my reading. This is an experimental RFC saying "we
> believe it to be safe as a default" not a standards track RFC saying
> "this should be the default". This is an important difference; we are
> not mandating anything, but rather expressing our honest opinion on
> the applicability of FQ-CoDel as a default, should anyone wish to make
> it one in their domain.
>
>> As far as I know, unlike the listed FQ_CoDel limitations, no
>> limitations of PIE have been identified. I don't think anyone is
>> claiming that the performance of FQ_CoDel is awesomely better than
>> PIE. May be a bit better, may be a bit worse, depending on
>> circumstances, and depending on which you value most out of low
>> queuing delay, high utilization, or low loss.
> Well, for CoDel and PIE that is certainly true. But FQ-CoDel in many
> cases reduces latency under load by an order of magnitude compared to
> both of them, while improving throughput.
OK, I have seen such figures, and it makes sense that FQ will give 
single RTT flows v low latency.

My concern is that of course the IESG will want to sign off an RFC with 
this cool performance, given they read that the limitations are not 
important. Whereas I believe the limitations have been downplayed.
>
>> So, if the authors want the IETF to recommend a default AQM on the
>> basis of safety (and I agree safety is the most important factor when
>> choosing a default), the most likely candidate would be PIE, wouldn't
>> it? FQ_CoDel has unintended side-effects, which implies it is not a
>> good candidate for default; it should only be configured deliberately
>> by those who can live with the side-effects.
> I'm not sure it would be possible for the AQM group to agree on a
> recommendation for a default. But I suppose it might be a good
> bikeshedding exercise. And as noted above, this is not what we intend to
> do in this case.
If we don't want the IETF (or the AQM WG) to make this call, we should 
make it clear that we are not making this call.

My concern is that, years down the line, when the context has been lost, 
these sentences could be interpreted as making this call.
For comparison, consider how we have been trying to understand what 
RFC2309 (the RED manifesto) intended to say.

>
>> 3. A Detail
>>
>> I also have a concern about the way the limitations are written
>> (typically, each limitation is stated, followed by a arm-waving
>> qualification attempting to create an impression that there is not
>> really a limitation). To keep the thread clean, I'll send that in a
>> follow-up email.
> It is certainly not our intention to "create an impression that there is
> not really a limitation". Rather, we are trying to suggest ways in which
> each limitation can be mitigated by people who are concerned about it,
> but still want to realise the benefits of deploying FQ-CoDel. Sure, some
> of those proposals are not exactly at the "running code" stage, but
> dismissing them as arm-waving is hardly fair.
>
> I'll add, as I noted initially, that many of the limitations we have
> noted are of a theoretical nature (in the sense that we are not aware of
> any deployments where they have caused issue in practice). This does not
> make it any less important to document them, of course, and we have been
> grateful for the feedback from the working group that the section grew
> out of (you yourself were among the people providing this feedback, I
> believe). However, this also means that it is difficult to do more than
> point out each issue. We can't quantify them, for instance.
>
> If you have concrete suggestions for language that would make things
> clearer, do tell (though I suppose that's exactly what you'll do in your
> follow-up mail). :)
See the next email (like I promised before).

Cheers


Bob


[Hohlfeld14] Hohlfeld, O., Pujol, E., Ciucu, F., Feldmann, A. & Barford, 
P., "A QoE Perspective on Sizing Network Buffers," In: Proc. Internet 
Measurement Conf (IMC'14) pp.333-346 ACM (November 2014)

>
> -Toke

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/