Re: [tsvwg] Traffic protection as a hard requirement for NQB

Jonathan Morton <chromatix99@gmail.com> Fri, 06 September 2019 00:12 UTC

Return-Path: <chromatix99@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A0917120044 for <tsvwg@ietfa.amsl.com>; Thu, 5 Sep 2019 17:12:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.748
X-Spam-Level:
X-Spam-Status: No, score=-1.748 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9qfHmQtQ_pmf for <tsvwg@ietfa.amsl.com>; Thu, 5 Sep 2019 17:12:09 -0700 (PDT)
Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B9B0E120043 for <tsvwg@ietf.org>; Thu, 5 Sep 2019 17:12:08 -0700 (PDT)
Received: by mail-lj1-x229.google.com with SMTP id l14so4322408lje.2 for <tsvwg@ietf.org>; Thu, 05 Sep 2019 17:12:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hLmJPpAQIASdszeqj/gPqp6LYoqAKbtpWVEYfy6KyM0=; b=CDtjp2y03GANkvwv7t/fBHi+61gjHX4POEIe9sVWGbZbxjnS28NYBFrYnbJXsJP5iW oODSV+d0DiZKG70wyKigUdacOVlSHRvcoMeQaNdM0EYq5oH4mIOLv4VqQoPeeNzqS6fP vtYzh1wd9aEiPoutJ01ZtvYDVyhUQNb1nGeMgNvaobmhrwVX/7zYgZmVMMh8Ul7Urrb8 arY9c5xrUN3Aq7hYxDzlgMzecZnjbZjjGdKHM2xhQHTIdZ5YCagoW/J1sX7wRUyMFSyI OVl2RcKmcrbbkNjeqra8jBHGKEmPPW/PRj0tZlGBgvi5z/CGip2v6hcAV4h8V9jbGkhB TP5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hLmJPpAQIASdszeqj/gPqp6LYoqAKbtpWVEYfy6KyM0=; b=Pn9m8VIGTQQIZ1ZfavqsQjVlkWaB1iw3IfrkguvgNBiN9KFLO3Ie2LATE4U3Qxcrd9 YNs6XpZnGl+21sE64Ef0NO4pp731y229Jdr7ZvRg+dVfBK52YC/l+HGifwMBf8LHF43k zVrJbBWknmvafyauXDVOfm++GIlHSi++41Jg+muiB434GS54z3fxtaR/V6qTjORXOBZd wc4THeQV95EU8syH9DJQedzks7+sq1P+4orKtx1h+rSkPs8rH+NVgRFBokRbvYrsnYQ2 05WOvFekqpNLPkuC7RUrNvU+5jdxImlZ8oLGShSzTAts3VAn8+EDQfMmZhNEbZm3Jyz8 6M/w==
X-Gm-Message-State: APjAAAUptr19Bca8mGBgyXNyrWQMJK03QDtLZy3AW1abRKLAIJZMFW6A aipnHRPbTH785N58YGY/qgs=
X-Google-Smtp-Source: APXvYqxegjBcjYDEgyYzWQ6EvKPTvbs6KVzkQ1ysRHxh4GXKaSH4eLFi6Po+jeLch24cORANqTNQkw==
X-Received: by 2002:a2e:9cd7:: with SMTP id g23mr3907409ljj.25.1567728726939; Thu, 05 Sep 2019 17:12:06 -0700 (PDT)
Received: from jonathartonsmbp.lan (83-245-237-193-nat-p.elisa-mobile.fi. [83.245.237.193]) by smtp.gmail.com with ESMTPSA id t3sm740525lfd.92.2019.09.05.17.12.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 Sep 2019 17:12:06 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <1525ad7c-3a89-9da1-998e-7d7a9706cdfb@bobbriscoe.net>
Date: Fri, 06 Sep 2019 03:12:04 +0300
Cc: "Black, David" <David.Black@dell.com>, Sebastian Moeller <moeller0@gmx.de>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F62D9E9E-34FB-4970-888A-DD05564D339C@gmail.com>
References: <CE03DB3D7B45C245BCA0D24327794936306BBE54@MX307CL04.corp.emc.com> <56b804ee-478d-68c2-2da1-2b4e66f4a190@bobbriscoe.net> <AE16A666-6FF7-48EA-9D15-19350E705C19@gmx.de> <CE03DB3D7B45C245BCA0D24327794936306D4F3F@MX307CL04.corp.emc.com> <50404eb0-fa36-d9aa-5e4c-9728e7cb1469@bobbriscoe.net> <5AFC259E-80F3-445C-B5B3-C04913B23AB1@gmail.com> <1525ad7c-3a89-9da1-998e-7d7a9706cdfb@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/plm81ziRltHPK-y3oD0Tn_z6ycE>
Subject: Re: [tsvwg] Traffic protection as a hard requirement for NQB
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Sep 2019 00:12:11 -0000

> On 6 Sep, 2019, at 2:13 am, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> [BB]: Picky point: I said NQB has 3ms of physical buffer.

Which you explicitly defined at a 120Mbps drain rate.  But with both BE and NQB queues saturated, each will drain at only 60Mbps, so the same amount of NQB buffer works out to 6ms in that case.

It would only remain at 3ms if there was no BE traffic competing while the NQB queue was saturated - a somewhat unlikely scenario if the NQB marking is used as intended.

> Yes, this is a possible attack, altho some picky details are questionable, e.g. I think BBRv2 will run significantly under 60Mb/s because of the interaction between its bandwidth probes and the very shallow buffer (that can be tried experimentally). But in general I admit there will be a gain to the attacker here.
> 
> However, you cannot put this solely down to incentive misalignment.

The gain to the attacker *is* the incentive - which I did my best to illustrate through the narrative device of NUR, their motivations, and the evolution of their methods of maximising throughput for themselves.

Another form of gain would be if an attacker wanted to *degrade* service in the NQB category specifically.  This does not require that they gain throughput for themselves, only that latency and/or packet loss become unacceptable for legitimate NQB traffic.  This could be done with a DDoS in which most (but not all) of the flood traffic is marked NQB.  This would ensure that the BE queue is saturated (reducing NQB's share to 50% and pushing the latency up to 6ms) and force NQB traffic to share the drop rate required to fit the NQB-marked flood traffic down the remaining half.

> And you're basing NUR's attack on BBRv2's exploitation of a min loss threshold (for whatever value of threshold is chosen), as if BBRv2 cannot be the cause of the tragedy of the commons because Google asserts it is not, then applying it to an NQB queue allows you to blame NQB for the tragedy. Rather a disingenuous argument isn't it?

I have not yet had the opportunity to try out BBRv2 in practice and in competition with other CC algos.  However, it is a reasonable choice for NUR simply because it achieves near-perfect link utilisation when uncontended.

I specifically conceived NUR as a plausible RFC-ignorant actor, concerned only with their own performance.  But they are starting with established leading-edge technology and "tuning" it to their needs, rather than simply blasting packets without CC at all.  The latter would probably break their own CDN in short order - it's possible they tried it, and backed off when they saw the problems it caused themselves.

NUR's subsequent tweak to accept 10% loss is *not* attributable to Google - they chose a 1% default for good reason - but is simply NUR's own sociopathy showing itself when they started to encounter contention on some last-mile links.  Because BBRv2 understands ECN (to some extent, anyway), it would not run into significant loss when encountering the NQB-aware qdisc you described while still marked BE.

Because it is the sender that chooses what marking to apply and which CC algo to employ, users at the receiving end have little choice but to accept it - aside from discarding the rest of their paid-for subscriptions and moving to a competitor.

Meanwhile, the tragedy of the commons mainly occurs when other people start copying NUR's example.  Until then, the bad effects are confined to the relatively short periods when one of NUR's videos is being downloaded in the same household, *and* the periodic bursts in which BBRv2 is conducting a bandwidth probe.

> Wouldn't it be sufficient to use a Cubic flow or even Reno with NQB, which would get about 75% of 60Mb/s, which would be higher than getting 1/24 of 120Mb/s with QB. It would even be higher than 33% of 120Mb/s if there were only two long-running flows in the QB queue.

Yes, and that is what later movers following NUR's example might notice, if they bother to actually measure the effect, which most might not - they're just blindly copying some tweak they don't really understand.  I'm not sure that it really helps your argument.

> Nonetheless, you have not considered the question of how often NUR's tweak will make throughput worse by using NQB, in all the cases where the QB queue has 0 or 1 long-running flow in it (likely the more prevalent cases for most users). Wouldn't users report this in the forums which would put others off trying it out?

If there's 0-1 BE flows contending for the link, then NUR will get similar throughput in either case when using BBRv2, because WRR allows either child queue to grow into capacity left unused by the other, and BBR tries to keep the bottleneck queue empty on average.  This might be less true if some other CC algo was used, but I note that the 6ms capacity of the NQB side is not very different from the 10ms target of the BE side, unless the path latency is exceptionally low.

> More generally, attempts to take more throughput at the expense of other applications of the same customer, have come and gone over the years, and so far none has taken hold. Again, probably because users report the negative effects on your own other apps in the forums.

As I noted, NUR's attitude here is "who cares about other traffic while they're watching our videos; they can get to it when they're done with us".  Their customer support hotline is outsourced to Bangalore, and involves a rigid script being followed in a very heavy accent, and no access to any means of actually solving the customer's problem.

I'm sure you've encountered a similar setup in the real world.

> My argument is that implementers can decide whether traffic protection is worthwhile, and it's not the IETF's place to tell them to.

But the IETF gets to say whether your specification is approved for publication as an RFC.  As part of that decision, they must consider the possible negative effects on the Internet as well as the benefits.  That's the process we're going through right now, and which I am attempting to help inform.

 - Jonathan Morton