Re: [tsvwg] Traffic protection as a hard requirement for NQB

Jonathan Morton <chromatix99@gmail.com> Thu, 05 September 2019 19:52 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <50404eb0-fa36-d9aa-5e4c-9728e7cb1469@bobbriscoe.net>
Date: Thu, 05 Sep 2019 22:52:41 +0300
Cc: "Black, David" <David.Black@dell.com>, Sebastian Moeller <moeller0@gmx.de>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <5AFC259E-80F3-445C-B5B3-C04913B23AB1@gmail.com>
References: <CE03DB3D7B45C245BCA0D24327794936306BBE54@MX307CL04.corp.emc.com> <56b804ee-478d-68c2-2da1-2b4e66f4a190@bobbriscoe.net> <AE16A666-6FF7-48EA-9D15-19350E705C19@gmx.de> <CE03DB3D7B45C245BCA0D24327794936306D4F3F@MX307CL04.corp.emc.com> <50404eb0-fa36-d9aa-5e4c-9728e7cb1469@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/dGb7VMulDCwgLKON3zlcPO9s-2o>
Subject: Re: [tsvwg] Traffic protection as a hard requirement for NQB
Precedence: list

> On 5 Sep, 2019, at 9:23 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Config: 
> 	• Scheduler: 
> 		• WRR with weight 0.5 for NQB on a 120Mb/s link. That gives at least 60Mb/s for NQB flows. 
> 		• Scheduler quantum: 1500B. 
> 	• Buffering:
> 		• The NQB buffer is fairly shallow (30 packets or 3ms at 120Mb/s).
> 		• The QB buffer is deeper (say 200ms) with an AQM target delay of say 10ms.

This seems like a reasonable implementation, on the face of it.

> Now, why do you think "the resulting latency is (at least initially) lower" for a QB flow that marks itself NQB? Why do you think incentives are misaligned (i.e. a tragedy of the commons)?

Okay, let's extend your traffic scenario a bit by assuming there's a significant amount of QB traffic, perhaps from a number of 50GB game updates in progress to various PCs and consoles in the household.  This could be as many as a couple of dozen bulk, saturating flows at a time, with the load being more-or-less continuous for an hour or so.  This traffic originates from gaming-related companies, so they know better than to risk interfering with actual gaming traffic carrying the NQB marking.

The above PHB template does a good job of isolating the NQB traffic from the BE traffic in this case, in which everyone behaves nicely.

But now let's introduce an adversary; let's call them Netflix' Unscrupulous Rival (NUR).  They are not in the gaming (interactive entertainment) business, but Video On Demand (passive entertainment).  Their USP is that they get the complete video file onto the subscriber's PC in the minimum possible time; this incidentally also removes the load from their servers as early as possible.  In short, they have chugged the entire cask of Flow Completion Time koolaid, and their flows are multiple gigabytes each.

In service to this goal, NUR have selected BBRv2 as their CC algorithm because, lacking the traditional TCP sawtooth, it achieves higher throughput than most, and without needing to open multiple connections (which means reduced server load and client software complexity).  And, because they're unscrupulous and literally don't care about competing traffic - how important can it be if they're busy watching our video? - they've increased the packet loss threshold to 10% from the default of 1%; they don't mind retransmitting a few packets if it gets the job done faster.  Since we may assume that the AQM in this PHB implements RFC-3168 ECN, this tweak doesn't actually have much effect on this particular case, but it does elsewhere, on the many networks still using dumb FIFOs.

What NUR now notices is that, at some times and with some particular subscribers, their throughput is maybe a tenth of what they expect.  This is unacceptable, so they investigate.  And then they switch on NQB marking to see what happens.  After all, BBRv2 is advertised as not building a queue, so it's allowed, right?

This gives them almost uncontended access to the 60Mbps reserved for NQB traffic, instead of having to share the 120Mbps total pipe with a couple of dozen other bulk flows.  NUR is ecstatic and rolls it out across their entire system.

And there's your incentive to mis-mark as NQB.

Now, what effect does this have on the actual NQB traffic?  Well, BBRv2 spends most of its time pacing at just below the detected path capacity, but periodically probes for additional bandwidth by pacing at a higher rate for an RTT or so.  Usually this will exceed the capacity allocated to NQB and start queuing, which imposes some delay both on itself and on other NQB traffic sharing the same queue.  This will cap out at 6ms (30 packets at 60Mbps) - ten times the worst-case delay you calculated for NQB traffic alone - at which point packet loss will begin to occur.  Because the NUR flow is very long and not application-limited, tail loss is not a factor for most of its lifetime, and they have configured BBRv2 to be exceptionally tolerant of loss before initiating congestive backoff.

In short, NQB traffic may periodically experience an additional 6ms delay and 10% packet loss due to this mis-marked NUR flow, depending on some other factors.

As for the "tragedy of the commons", NUR then proudly publishes an investor report and a white paper explaining how they quintupled their throughput with this one weird trick.  Other unscrupulous Internet companies take note of this and try it themselves, usually with less competence and without measuring the actual extent to which they can expect improvements in practice.  After all, there's no law against it, and it worked for NUR didn't it?  They're a big famous Internet company so they must be right!

Consequently the NQB queue becomes increasingly full of QB traffic.  The latency advantage of the NQB marking is thus eroded, with 6ms delays and significant packet loss becoming the rule rather than the exception, not all that much better than the 10ms target delays on the other side of the scheduler.

Do you now see?

 - Jonathan Morton

[tsvwg] TSVWG: WG adoption of draft-white-tsvwg-n… Black, David
[tsvwg] Traffic protection as a hard requirement … Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Black, David
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Jonathan Morton
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Jonathan Morton
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] Traffic protection as a hard requirem… Steven Blake
Re: [tsvwg] Traffic protection as a hard requirem… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
[tsvwg] [Fwd: Re: Traffic protection as a hard re… Steven Blake
Re: [tsvwg] [Fwd: Re: Traffic protection as a har… Jonathan Morton
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Jerome Henry (jerhenry)
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Jonathan Morton
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Black, David
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White