Re: [tsvwg] Traffic protection as a hard requirement for NQB

Bob Briscoe <ietf@bobbriscoe.net> Mon, 09 September 2019 23:24 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 298E0120033 for <tsvwg@ietfa.amsl.com>; Mon, 9 Sep 2019 16:24:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mrxmBLV-ovPu for <tsvwg@ietfa.amsl.com>; Mon, 9 Sep 2019 16:24:36 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 690AC120020 for <tsvwg@ietf.org>; Mon, 9 Sep 2019 16:24:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=rDlVBtzZ6jt9aqpKyZ/yyYJBgXV/o0jBerRCN8OXHMQ=; b=uXjQ3ckwZNAVpGAp1eCrOVTbqm Xk5siBbEwoWBnKyq4pWFNpsX86tQCXteprO3ASd7fLc25StxfGqXlvPP37hzpu3bmxynvvC5uBik7 vZlXb97kmS+4cfjsZ+c4AdeuzRmwW6hVkjAjOC6/x4ZQkqf421J8zyDMXygdgeWAZKywYxfnK/Gzl Z9kfWV341Hv/plpKHHHBWknRSD4+GUJwmt8wYpkc3W8e48alctVybJivaIwj9n/gchvTZMecdszAJ 1FxqjMKnCi3xIjqjyauw5LXaD44CBr5L+Osa6iuxsDvIVl0KCfX10SDoBD4SfWXHhT1JNce/WQeAG j6djz0NA==;
Received: from [31.185.128.31] (port=39584 helo=[192.168.0.3]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <ietf@bobbriscoe.net>) id 1i7T1M-0000l5-Sf; Tue, 10 Sep 2019 00:24:33 +0100
To: Jonathan Morton <chromatix99@gmail.com>
Cc: "Black, David" <David.Black@dell.com>, Sebastian Moeller <moeller0@gmx.de>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <CE03DB3D7B45C245BCA0D24327794936306BBE54@MX307CL04.corp.emc.com> <56b804ee-478d-68c2-2da1-2b4e66f4a190@bobbriscoe.net> <AE16A666-6FF7-48EA-9D15-19350E705C19@gmx.de> <CE03DB3D7B45C245BCA0D24327794936306D4F3F@MX307CL04.corp.emc.com> <50404eb0-fa36-d9aa-5e4c-9728e7cb1469@bobbriscoe.net> <5AFC259E-80F3-445C-B5B3-C04913B23AB1@gmail.com> <1525ad7c-3a89-9da1-998e-7d7a9706cdfb@bobbriscoe.net> <F62D9E9E-34FB-4970-888A-DD05564D339C@gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
X-Remindit: Tue Sep 10 2019 01:00:00 GMT+0100 (BST)
Message-ID: <6535d21c-d0db-b1d4-182a-761a5c36aee3@bobbriscoe.net>
Date: Tue, 10 Sep 2019 00:24:31 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <F62D9E9E-34FB-4970-888A-DD05564D339C@gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/nTH1GgDQbrcLYi1Nmbi4BsuufR0>
Subject: Re: [tsvwg] Traffic protection as a hard requirement for NQB
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Sep 2019 23:24:39 -0000

Jonathan,

On 06/09/2019 01:12, Jonathan Morton wrote:
>> On 6 Sep, 2019, at 2:13 am, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> [BB]: Picky point: I said NQB has 3ms of physical buffer.
> Which you explicitly defined at a 120Mbps drain rate.  But with both BE and NQB queues saturated, each will drain at only 60Mbps, so the same amount of NQB buffer works out to 6ms in that case.
>
> It would only remain at 3ms if there was no BE traffic competing while the NQB queue was saturated - a somewhat unlikely scenario if the NQB marking is used as intended.

[BB] You're right that I said it like this, and I was wrong.

A better way of implementing this would be a 3ms limit on sojourn time, 
which would remain at 3ms whatever the drain-rate.
>
>> Yes, this is a possible attack, altho some picky details are questionable, e.g. I think BBRv2 will run significantly under 60Mb/s because of the interaction between its bandwidth probes and the very shallow buffer (that can be tried experimentally). But in general I admit there will be a gain to the attacker here.
>>
>> However, you cannot put this solely down to incentive misalignment.
> The gain to the attacker *is* the incentive - which I did my best to illustrate through the narrative device of NUR, their motivations, and the evolution of their methods of maximising throughput for themselves.
[BB] Yes, I've said you are right that there is an incentive. But here 
I'm saying you have constructed motives that are not /solely/ due to 
self-interest.

With bandwidth sharing (and in game theory and economics more generally) 
it's normal to deal with self-interest separately from malice, because 
self-interest is pervasive, whereas setting out to deprive others 
irrespective of gain for yourself is not so common.

For instance, if A gets 10Mb/s more than B, any rational individual is 
expected to derive value from the extra 10Mb/s they got. However, it's 
less common for someone to also derive additional value from the mere 
fact that B got less. That's the preserve of sociopathic individuals or 
where A and B happen to be business competitors at a higher layer (in 
addition to the basic lower layer contention for network resources).

The NUR scenario you've constructed contains a mix of self-interest and 
malice. I'm not saying that makes the scenario invalid, just that the 
draft seems to be trying to follow the more usual approach of separating 
these the two types of motivation.
>
> Another form of gain would be if an attacker wanted to *degrade* service in the NQB category specifically.  This does not require that they gain throughput for themselves, only that latency and/or packet loss become unacceptable for legitimate NQB traffic.  This could be done with a DDoS in which most (but not all) of the flood traffic is marked NQB.  This would ensure that the BE queue is saturated (reducing NQB's share to 50% and pushing the latency up to 6ms) and force NQB traffic to share the drop rate required to fit the NQB-marked flood traffic down the remaining half.
>
>> And you're basing NUR's attack on BBRv2's exploitation of a min loss threshold (for whatever value of threshold is chosen), as if BBRv2 cannot be the cause of the tragedy of the commons because Google asserts it is not, then applying it to an NQB queue allows you to blame NQB for the tragedy. Rather a disingenuous argument isn't it?
> I have not yet had the opportunity to try out BBRv2 in practice and in competition with other CC algos.  However, it is a reasonable choice for NUR simply because it achieves near-perfect link utilisation when uncontended.
[BB] That is rather starry eyed. I believe that would only work with a 
deeper buffer and stable conditions. We'd have to try it, but if the 
bandwidth probes hit a shallow buffer (experiencing high loss), I 
believe BBRv2 is designed to run at lower utilization than with a deep 
buffer.
>
> I specifically conceived NUR as a plausible RFC-ignorant actor, concerned only with their own performance.  But they are starting with established leading-edge technology and "tuning" it to their needs, rather than simply blasting packets without CC at all.  The latter would probably break their own CDN in short order - it's possible they tried it, and backed off when they saw the problems it caused themselves.
>
> NUR's subsequent tweak to accept 10% loss is *not* attributable to Google
[BB] I didn't say that. I said that starting from the assumption that 
loss below some level can be ignored, is attributable to Google.
> - they chose a 1% default for good reason - but is simply NUR's own sociopathy showing itself when they started to encounter contention on some last-mile links.  Because BBRv2 understands ECN (to some extent, anyway),
[BB] My strawman NQB qdisc does not support any form of AQM, let alone 
ECN. Why do you assume that NQB gives out ECN signals?

I am trying to check that NQB stands on its own merits, with no 
dependency on L4S.

> it would not run into significant loss when encountering the NQB-aware qdisc you described while still marked BE.
[BB] Eh? How does BBRv2 get classified into the NQB queue if its DSCP is 
BE?
>
> Because it is the sender that chooses what marking to apply and which CC algo to employ, users at the receiving end have little choice but to accept it - aside from discarding the rest of their paid-for subscriptions and moving to a competitor.
>
> Meanwhile, the tragedy of the commons mainly occurs when other people start copying NUR's example.  Until then, the bad effects are confined to the relatively short periods when one of NUR's videos is being downloaded in the same household, *and* the periodic bursts in which BBRv2 is conducting a bandwidth probe.
>
>> Wouldn't it be sufficient to use a Cubic flow or even Reno with NQB, which would get about 75% of 60Mb/s, which would be higher than getting 1/24 of 120Mb/s with QB. It would even be higher than 33% of 120Mb/s if there were only two long-running flows in the QB queue.
> Yes, and that is what later movers following NUR's example might notice, if they bother to actually measure the effect, which most might not - they're just blindly copying some tweak they don't really understand.  I'm not sure that it really helps your argument.
My argument is against saying NQB MUST implement traffic protection. I'm 
happy to help improve understanding of the incentives around NQB, but my 
argument concerns IETF procedure, irrespective of whether there are 
effective attacks or incentives.

You've not addressed my argument yet. I was hoping you might at some point.
>
>> Nonetheless, you have not considered the question of how often NUR's tweak will make throughput worse by using NQB, in all the cases where the QB queue has 0 or 1 long-running flow in it (likely the more prevalent cases for most users). Wouldn't users report this in the forums which would put others off trying it out?
> If there's 0-1 BE flows contending for the link, then NUR will get similar throughput in either case when using BBRv2, because WRR allows either child queue to grow into capacity left unused by the other, and BBR tries to keep the bottleneck queue empty on average.
[BB] In your "starry-eyed" view of BBRv2, it perfectly utilizes 
bandwidth no matter how shallow the buffer. Given your argument depends 
on this, you ought to check. I think you'll find it underutilizes when 
the buffer is shallow, which will make throughput worse in the common 
cases with 0 or 1 long-running QB flow.

> This might be less true if some other CC algo was used, but I note that the 6ms capacity of the NQB side is not very different from the 10ms target of the BE side, unless the path latency is exceptionally low.
[BB] You can note that 6ms is close to 10ms if you want, but you're 
comparing apples with mashed potatoes. The strawman configuration was as 
follows:

                     NQB    QB
Buffer limit  3ms    200ms
AQM target  n/a     10ms

A buffer limit (absolute cap to any variations) is not comparable to an 
AQM target (long-running convergence).

Only the 200ms is dependent on the drain rate. 10ms is absolute, and so 
is 3ms now I've suggested it would be a sojourn limit.


Your scenario was potentially enlightening, but can I ask that you 
reconstruct it, this time using our new mutual understanding of the 
strawman qdisc. Please also separate out the self-interest and malice 
parts of the motivations and please consider a range of likely 
background traffic, not just the one that makes your scenario work.
>
>> More generally, attempts to take more throughput at the expense of other applications of the same customer, have come and gone over the years, and so far none has taken hold. Again, probably because users report the negative effects on your own other apps in the forums.
> As I noted, NUR's attitude here is "who cares about other traffic while they're watching our videos; they can get to it when they're done with us".  Their customer support hotline is outsourced to Bangalore, and involves a rigid script being followed in a very heavy accent, and no access to any means of actually solving the customer's problem.
>
> I'm sure you've encountered a similar setup in the real world.
[BB] Similar setups have come and gone, but none has endured so far in 
the real world. I also used to continually push this 'project fear' 
line, but I decided to take time out to consider why none of these 
setups had endured.

I suspect it's for two reasons: i) network schedulers control 
inter-customer capacity; ii) flows have two ends.

Because of i), setups like NUR only alter intra-customer shares of 
capacity.
Because of ii), altho service providers choose congestion controllers, 
end-customers (at the other end) choose service providers.

whenever a service provider takes a larger share, it can only be at the 
expense of other service providers being used by the same end-customer. 
So, end-customers have no reason to prefer the NURs of this world, which 
wither and die.

>
>> My argument is that implementers can decide whether traffic protection is worthwhile, and it's not the IETF's place to tell them to.
> But the IETF gets to say whether your specification is approved for publication as an RFC.  As part of that decision, they must consider the possible negative effects on the Internet as well as the benefits.  That's the process we're going through right now, and which I am attempting to help inform.
[BB] Understood and thank you. However, the IETF tries not to 
over-constrain an implementation, beyond the needs of interoperability. 
That's a major difference between the roles of IETF and Linux. Mandating 
security over-constrains implementers wherever the security concern is 
not universal.

Regards



Bob
>
>   - Jonathan Morton
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/