Re: [tsvwg] Traffic protection as a hard requirement for NQB

Sebastian Moeller <moeller0@gmx.de> Tue, 10 September 2019 09:39 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9D6D31200A3 for <tsvwg@ietfa.amsl.com>; Tue, 10 Sep 2019 02:39:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.648
X-Spam-Level:
X-Spam-Status: No, score=-1.648 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q1vwH4e3XET0 for <tsvwg@ietfa.amsl.com>; Tue, 10 Sep 2019 02:39:29 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3AB4D120052 for <tsvwg@ietf.org>; Tue, 10 Sep 2019 02:39:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1568108321; bh=4rj1adIh1B/kIWqXuAL8bJBNCi7fGG5juvOhMwPu8Gk=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=HN6cPDoiFzQUYzv4JS/v9dmQPj8IN1OXdBKwlxusCQJfpR9tu2dPUgTDiwuLJlj1Z nJHWpMHMTa5iNZQtUwX6O/mCxPkmH9jkcP2ZkBFl2KB8sMDCPLt4rnRaUslVZkeeFX vwY96m2LZnuAgGmmVCHKvHQWvvN/TmPaMIzZ9Fyw=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [10.11.12.32] ([134.76.241.253]) by mail.gmx.com (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MAfUo-1i0ylN0elc-00B5MC; Tue, 10 Sep 2019 11:38:41 +0200
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <6535d21c-d0db-b1d4-182a-761a5c36aee3@bobbriscoe.net>
Date: Tue, 10 Sep 2019 11:38:38 +0200
Cc: Jonathan Morton <chromatix99@gmail.com>, "Black, David" <David.Black@dell.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <D25A3544-3C47-4223-A2A5-76216C80AAF3@gmx.de>
References: <CE03DB3D7B45C245BCA0D24327794936306BBE54@MX307CL04.corp.emc.com> <56b804ee-478d-68c2-2da1-2b4e66f4a190@bobbriscoe.net> <AE16A666-6FF7-48EA-9D15-19350E705C19@gmx.de> <CE03DB3D7B45C245BCA0D24327794936306D4F3F@MX307CL04.corp.emc.com> <50404eb0-fa36-d9aa-5e4c-9728e7cb1469@bobbriscoe.net> <5AFC259E-80F3-445C-B5B3-C04913B23AB1@gmail.com> <1525ad7c-3a89-9da1-998e-7d7a9706cdfb@bobbriscoe.net> <F62D9E9E-34FB-4970-888A-DD05564D339C@gmail.com> <6535d21c-d0db-b1d4-182a-761a5c36aee3@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
X-Mailer: Apple Mail (2.3445.104.11)
X-Provags-ID: V03:K1:mwnMqcMNgeMvea0MxMRLXUUr3wR1slTY6mD5R+s/unn108XAvGR /TN6qAumS2+hCbDr/vlL8dhDijGXYS4F9Tw2/J/YZXDtpGjdrY8EtkqOHz/HR9GQUQT0j6B AmdjsaI9FclFMQz89JG5l7Bf+udZmahz5VYSofMiDn3vBwrJPIr8XxL3aU13t92eOnkXdGY p1j/DpZODofLWvrivJcsg==
X-UI-Out-Filterresults: notjunk:1;V03:K0:+VogYI+l1AQ=:xvlPnh1MPS1bSb6MeRlZvB phB8MYjp/WWvKf8SrxMe2q9h2QBftpR6vbCoswDI1VMFt5B/aQTjXmwJWisSKt0O2xLS5lhvV vBsp4rkHr4wVHcWi4DI4BlyT4PVUfL8QVxfDNLNbz1qYFohXqLFT/behWwX7mv7jkLFiAzuUo 9EY1Y3eH3KZKKn29r7oWr9cOWbKNuIoUu15gA/N49x0+tI9t+yUFqZO6wS7FTSj4GeUxlbrHG yYTE12/EoNFKl7IECeyTSHlzqJzsyOiv8TqOlmRR/eEdMF2L97bAU9hx+oTezxtPmba560He0 U0EQPATZO+feHAVTP94GNuUtWp3rkn8G9Dr5tb3SwRnob5hOyPNJcmWdveZC0c6U3nsAMobfW 4y1ekM7I32Zky+HK/QqjcDc9Qxf09l2jC4h446u2iYQvbtupP1KIbB088uLLSDEw4x8WJwwiT XjnhoDh84/TQV2tKm0ImASzv1m4b98CxqIdepOw0DVtzKKon/cUXyTZf6YcFyXj4eIatyqOeA P/AGXBY0meaLRZMzgdOqSsnnSdBe/plgLMFKnT2VpO5+m3DChshHjRhJh7mZfVD5NFS3qBiqc SjHLaBtS9TFoQEujWC5zv2eNXPg5TfIpOFy2DSXAyOqApEycLh/1aJiKRDbb51qwglVjnMLRR sKb4ngmuoMzX14gBGJepYjHY+ZRNFyIzqVA+xjAtQCwq0lt9mSypz93C1+M9diIQUeS3WtLRY O7P/g839qedIO9VvWayRULCryAd2E7OFxfm4rQ8V9B3jDi0ZoXxZ/iXBWE9l05Xs52C5yr/Pv K36hbLEF5N77K2qN2Q+52/I27wZrA0zR1rrgpA9RTXYGhiwO3SpFoi/LS2g032xaaJfg8qIJS 3eKsMaCldWx8IQjB+KPWpzX/3wUk1uh3NkYOF6hpANzOd0RDfSEK9bVN1UCMa+W+kHjBuYKpA QF3+Ziy2O9lSsxhZnhEc295deNBHrE4FBeNy+qlFLwC6FIwX2Pe4ORd3ugnhh0VigaRItyDFm LVTlgsvMXLY41rbicDxP88AdHKtYRRh93JX1d2C+wxJxBDOS0Z7vnzu2hnY4zKpnyesRiI5I3 DhUw1kp8BTmyDaKrkcPag/d31Zu7GwncXLGIbu1wVZcCSfiK2X1NzJmyhfLcF6NSvT41TswvG c3K3tHjJOj2BmeVYD4K7EyisIXy/oXfuu56hQK2yyN874y1RQIMgKIqtd+BXwO2LlcMGSI5mg aeIq3m0zC4gfm6nLe
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/wgAR8PwhxIGiHqTo6bKeXxyjnYY>
Subject: Re: [tsvwg] Traffic protection as a hard requirement for NQB
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Sep 2019 09:39:33 -0000


> On Sep 10, 2019, at 01:24, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Jonathan,
> 
> On 06/09/2019 01:12, Jonathan Morton wrote:
>>> On 6 Sep, 2019, at 2:13 am, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>> 
>>> [BB]: Picky point: I said NQB has 3ms of physical buffer.
>> Which you explicitly defined at a 120Mbps drain rate.  But with both BE and NQB queues saturated, each will drain at only 60Mbps, so the same amount of NQB buffer works out to 6ms in that case.
>> 
>> It would only remain at 3ms if there was no BE traffic competing while the NQB queue was saturated - a somewhat unlikely scenario if the NQB marking is used as intended.
> 
> [BB] You're right that I said it like this, and I was wrong.
> 
> A better way of implementing this would be a 3ms limit on sojourn time, which would remain at 3ms whatever the drain-rate.

	[SM] Under the assumption that the 3ms have any bearing on the actual intended use for the NQB mark that is. As written in another post, in reality we are rather talking about 125 to 250 milliseconds worst case buffering.



>> 
>>> Yes, this is a possible attack, altho some picky details are questionable, e.g. I think BBRv2 will run significantly under 60Mb/s because of the interaction between its bandwidth probes and the very shallow buffer (that can be tried experimentally). But in general I admit there will be a gain to the attacker here.
>>> 
>>> However, you cannot put this solely down to incentive misalignment.
>> The gain to the attacker *is* the incentive - which I did my best to illustrate through the narrative device of NUR, their motivations, and the evolution of their methods of maximising throughput for themselves.
> [BB] Yes, I've said you are right that there is an incentive. But here I'm saying you have constructed motives that are not /solely/ due to self-interest.

	[SM] Does this really matter, why the incentives are not aligned unless we can fix them?

> 
> With bandwidth sharing (and in game theory and economics more generally) it's normal to deal with self-interest separately from malice, because self-interest is pervasive, whereas setting out to deprive others irrespective of gain for yourself is not so common.

	[SM] Please do not forget the DDOS as a service crowd (or rather network stress-testing as a service), they will game NQB out of self-interest to reduce the cost for their operations, the malice part is restricted to their customers.


> 
> For instance, if A gets 10Mb/s more than B, any rational individual is expected to derive value from the extra 10Mb/s they got. However, it's less common for someone to also derive additional value from the mere fact that B got less. That's the preserve of sociopathic individuals or where A and B happen to be business competitors at a higher layer (in addition to the basic lower layer contention for network resources).
> 
> The NUR scenario you've constructed contains a mix of self-interest and malice. I'm not saying that makes the scenario invalid, just that the draft seems to be trying to follow the more usual approach of separating these the two types of motivation.
>> 
>> Another form of gain would be if an attacker wanted to *degrade* service in the NQB category specifically.  This does not require that they gain throughput for themselves, only that latency and/or packet loss become unacceptable for legitimate NQB traffic.  This could be done with a DDoS in which most (but not all) of the flood traffic is marked NQB.  This would ensure that the BE queue is saturated (reducing NQB's share to 50% and pushing the latency up to 6ms) and force NQB traffic to share the drop rate required to fit the NQB-marked flood traffic down the remaining half.
>> 
>>> And you're basing NUR's attack on BBRv2's exploitation of a min loss threshold (for whatever value of threshold is chosen), as if BBRv2 cannot be the cause of the tragedy of the commons because Google asserts it is not, then applying it to an NQB queue allows you to blame NQB for the tragedy. Rather a disingenuous argument isn't it?
>> I have not yet had the opportunity to try out BBRv2 in practice and in competition with other CC algos.  However, it is a reasonable choice for NUR simply because it achieves near-perfect link utilisation when uncontended.
> [BB] That is rather starry eyed. I believe that would only work with a deeper buffer and stable conditions. We'd have to try it, but if the bandwidth probes hit a shallow buffer (experiencing high loss), I believe BBRv2 is designed to run at lower utilization than with a deep buffer.
>> 
>> I specifically conceived NUR as a plausible RFC-ignorant actor, concerned only with their own performance.  But they are starting with established leading-edge technology and "tuning" it to their needs, rather than simply blasting packets without CC at all.  The latter would probably break their own CDN in short order - it's possible they tried it, and backed off when they saw the problems it caused themselves.
>> 
>> NUR's subsequent tweak to accept 10% loss is *not* attributable to Google
> [BB] I didn't say that. I said that starting from the assumption that loss below some level can be ignored, is attributable to Google.

	[SM] This is a stance BBR takes that is quite problematic in itself, given that it purposefully ignores drops intended to signal slow-down...


>> - they chose a 1% default for good reason - but is simply NUR's own sociopathy showing itself when they started to encounter contention on some last-mile links.  Because BBRv2 understands ECN (to some extent, anyway),
> [BB] My strawman NQB qdisc does not support any form of AQM, let alone ECN. Why do you assume that NQB gives out ECN signals?

	[SM] I would very much prefer we would discuss the actual intended use-cases for NQB instead of carefully constructed strawman configurations.

> 
> I am trying to check that NQB stands on its own merits, with no dependency on L4S.

	[SM] The draft cautiously argues:
"7.  Relationship to L4S
   The dual-queue mechanism described in this draft is intended to be compatible with [I-D.ietf-tsvwg-l4s-arch]."

But it does not mention any other queueing design at all, so I took this as a reserved way of saying, this is mainly designed to complement L4S.


> 
>> it would not run into significant loss when encountering the NQB-aware qdisc you described while still marked BE.
> [BB] Eh? How does BBRv2 get classified into the NQB queue if its DSCP is BE?

	[SM] if a flow accepts Jonathan's 1-% loss before slowing down, it will also encounter the same loss on the QB queue, as by design it will go up against its ceiling, being a TCP and all?

>> 
>> Because it is the sender that chooses what marking to apply and which CC algo to employ, users at the receiving end have little choice but to accept it - aside from discarding the rest of their paid-for subscriptions and moving to a competitor.
>> 
>> Meanwhile, the tragedy of the commons mainly occurs when other people start copying NUR's example.  Until then, the bad effects are confined to the relatively short periods when one of NUR's videos is being downloaded in the same household, *and* the periodic bursts in which BBRv2 is conducting a bandwidth probe.
>> 
>>> Wouldn't it be sufficient to use a Cubic flow or even Reno with NQB, which would get about 75% of 60Mb/s, which would be higher than getting 1/24 of 120Mb/s with QB. It would even be higher than 33% of 120Mb/s if there were only two long-running flows in the QB queue.
>> Yes, and that is what later movers following NUR's example might notice, if they bother to actually measure the effect, which most might not - they're just blindly copying some tweak they don't really understand.  I'm not sure that it really helps your argument.
> My argument is against saying NQB MUST implement traffic protection.

	[SM] Yes, but once you do this the whole based-on-observable-behavior justification for NQB disintegrates. 


> I'm happy to help improve understanding of the incentives around NQB,

	[SM] But you just admitted above that in reality the incentives for QB flows to avoid the NQB flow are dubious, which, if you still want to salvage the whole NQB concept, indicates that you need to leverage the observable behavior angle, no?

> but my argument concerns IETF procedure, irrespective of whether there are effective attacks or incentives.

	[SM] Unless the IETF is in the business of building better DDOS attack vectors, I fail to see how queue protection can be anything but mandatory?

> 
> You've not addressed my argument yet. I was hoping you might at some point.
>> 
>>> Nonetheless, you have not considered the question of how often NUR's tweak will make throughput worse by using NQB, in all the cases where the QB queue has 0 or 1 long-running flow in it (likely the more prevalent cases for most users). Wouldn't users report this in the forums which would put others off trying it out?
>> If there's 0-1 BE flows contending for the link, then NUR will get similar throughput in either case when using BBRv2, because WRR allows either child queue to grow into capacity left unused by the other, and BBR tries to keep the bottleneck queue empty on average.
> [BB] In your "starry-eyed" view of BBRv2, it perfectly utilizes bandwidth no matter how shallow the buffer. Given your argument depends on this, you ought to check. I think you'll find it underutilizes when the buffer is shallow, which will make throughput worse in the common cases with 0 or 1 long-running QB flow.

	[SM] Jonathan proposed a heavily tuned TCP based on BBRv2 not the canonical BBR.

> 
>> This might be less true if some other CC algo was used, but I note that the 6ms capacity of the NQB side is not very different from the 10ms target of the BE side, unless the path latency is exceptionally low.
> [BB] You can note that 6ms is close to 10ms if you want, but you're comparing apples with mashed potatoes. The strawman configuration was as follows:
> 
>                     NQB    QB
> Buffer limit  3ms    200ms
> AQM target  n/a     10ms
> 
> A buffer limit (absolute cap to any variations) is not comparable to an AQM target (long-running convergence).
> 
> Only the 200ms is dependent on the drain rate. 10ms is absolute, and so is 3ms now I've suggested it would be a sojourn limit.
> 
> 
> Your scenario was potentially enlightening, but can I ask that you reconstruct it, this time using our new mutual understanding of the strawman qdisc. Please also separate out the self-interest and malice parts of the motivations and please consider a range of likely background traffic, not just the one that makes your scenario work.

	[SM] Would it not be a better use of everybody's time, if we would shelf the strawman qdisc discussion, and actually discuss these issues in the light of the intended real AQM configurations for real users of this feature instead? As far as I can see, and I happily repeat myself until ACKd/NACKd, with the 125-250ms as proposed in the dualQ draft, this whole sub-discussion changes considerably, even for shorter flows.


>> 
>>> More generally, attempts to take more throughput at the expense of other applications of the same customer, have come and gone over the years, and so far none has taken hold. Again, probably because users report the negative effects on your own other apps in the forums.
>> As I noted, NUR's attitude here is "who cares about other traffic while they're watching our videos; they can get to it when they're done with us".  Their customer support hotline is outsourced to Bangalore, and involves a rigid script being followed in a very heavy accent, and no access to any means of actually solving the customer's problem.
>> 
>> I'm sure you've encountered a similar setup in the real world.
> [BB] Similar setups have come and gone, but none has endured so far in the real world. I also used to continually push this 'project fear' line, but I decided to take time out to consider why none of these setups had endured.

	[SM] As an aside, quite a number of CDNs are willing to use non-standards TCP-behavior (like iw >> 10) to improve their perceived value proposition and are still doing this (see https://www.researchgate.net/publication/328519615_Demystifying_TCP_Initial_Window_Configurations_of_Content_Distribution_Networks). This is not as extreme a modification as what Jonathan proposes, but seems to endure over time...

> 
> I suspect it's for two reasons: i) network schedulers control inter-customer capacity; ii) flows have two ends.
> 
> Because of i), setups like NUR only alter intra-customer shares of capacity.
> Because of ii), altho service providers choose congestion controllers, end-customers (at the other end) choose service providers.
> 
> whenever a service provider takes a larger share, it can only be at the expense of other service providers being used by the same end-customer. So, end-customers have no reason to prefer the NURs of this world, which wither and die. 

	[SM] I would expect if two typical end-customer's use two competing video sources and one of them stutters in concurrent use, many will dump the stuttering one (even if this is caused by reckless behavior of the other). I think that folding back the consequence of reckless behavior onto the mischievous actor might have a faster change of behavior...

> 
>> 
>>> My argument is that implementers can decide whether traffic protection is worthwhile, and it's not the IETF's place to tell them to.
>> But the IETF gets to say whether your specification is approved for publication as an RFC.  As part of that decision, they must consider the possible negative effects on the Internet as well as the benefits.  That's the process we're going through right now, and which I am attempting to help inform.
> [BB] Understood and thank you. However, the IETF tries not to over-constrain an implementation, beyond the needs of interoperability.

	[SM] I venture a guess here, not being an IETF member, that the IETF will constrain an implementation if not doing so can have drastic consequences of other flows.


> That's a major difference between the roles of IETF and Linux. Mandating security over-constrains implementers wherever the security concern is not universal.

	[SM] This sounds great, but in effect you are risking to establish a new cheaper DOS vector, which has some security implications. Unless your argument is, that with the easy to target and disrupt NQB queue DOS attacks will require significantly less traffic and hence the rest of the internet is better off?

Regards
	Sebastian

> 
> Regards
> 
> 
> 
> Bob
>> 
>>  - Jonathan Morton
>> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>