Re: [tsvwg] Traffic protection as a hard requirement for NQB

Sebastian Moeller <moeller0@gmx.de> Fri, 06 September 2019 09:52 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <ad360251-638e-1c7e-3b9d-2838cc5917ef@bobbriscoe.net>
Date: Fri, 06 Sep 2019 11:51:52 +0200
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <7B407AE2-7B5C-4310-9FA7-F399E98B7086@gmx.de>
References: <CE03DB3D7B45C245BCA0D24327794936306BBE54@MX307CL04.corp.emc.com> <56b804ee-478d-68c2-2da1-2b4e66f4a190@bobbriscoe.net> <AE16A666-6FF7-48EA-9D15-19350E705C19@gmx.de> <ad360251-638e-1c7e-3b9d-2838cc5917ef@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/T5nqt7J6kmqGnBHCaFzMAd2v4hU>
Subject: Re: [tsvwg] Traffic protection as a hard requirement for NQB
Precedence: list


> On Sep 5, 2019, at 23:46, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Sebastian, (sorry, I mis-spelled your name previously), inline...
> 
> On 03/09/2019 10:14, Sebastian Moeller wrote:
>> Dear Bob,
>> 
>> allow me to chime in.
>> 
>>> On Sep 2, 2019, at 16:47, Bob Briscoe <in@bobbriscoe.net> wrote:
>>> 
>>> David,
>>> 
>>> Thanks for your closing remarks on the NQB adoption call.
>>> You say your last point is open for discussion, so I will dive straight in to start that discussion.
>>> 
>>> On 30/08/2019 16:40, Black, David wrote:
>>>> 	• [snip]
>>>> 	• The criticisms on this list of the “queue protection” requirement in the draft are largely accurate.   The draft needs at least an Editor’s Note that this material will be revised, as while the DOCSIS mechanism is an example of how to do queue protection, it is not appropriate to require implementation of that mechanism.   A plausible plan that I have discussed with the authors is to write a set of functional/behavioral requirements for NQB “traffic protection” that can be satisfied by a “queue protection” mechanism such as the DOCSIS mechanism, or by a suitably configured FQ AQM implementation. [snip]
>>>>  In addition, related to item 2), my expectation (which is open to further discussion) that “traffic protection” will be a “MUST” requirement, perhaps with some well-specified exceptions (including explanations of why the exceptions are ok).   This is because “traffic protection” (e.g., “queue protection” or a suitably configured FQ AQM) appears to be necessary in general to keep queue-building traffic out of the NQB traffic aggregate, as allowing such traffic degrades the properties of the NQB PHB.
>>> I think we should be wary of making traffic protection a hard requirement at such an early stage in our knowledge of the NQB behaviour. I believe this is a case where the market, not the IETF, ought to decide whether protection is required.
>> 	[SM] The draft says "... it is worthwhile to note that the NQB designation and marking would be intended to convey verifiable traffic behavior, not needs or wants." in the light of this requirement it seems obvious that a hop willing to honor that DSCP should/must actually verify the traffic behavior, no? Requiring behavior to according to a set of requirements but not enforcing these requirements seems very very optimistic.
> [BB] Just because the behaviour is verifiable does not mean it always has to be verified.

> For instance, if someone claims a car is 17 years old, you might trust it is likely to be truthful because the speaker knows you could verify it. But it doesn't mean you have to actually look up the engine and chassis serial numbers to check the date of manufacture.

	[SM] I fail to see how this is a fitting analogy to this issue at hand. The NQB draft defines NQB-ness as " the NQB designation and marking would be intended to convey verifiable traffic behavior, not needs or wants." IMHO either get rid of this section or mandate the verification.

> 
> In the case of NQB, there is less need to verify behaviour because applications or users have no incentive to mismark - they would be worse off.

	[SM] I still do not believe that to be true, and no amount of Gedankenexperimente is going to change that, as this is rather a data question. 
As I wrote in another mail, any application that can gracefully deal with the shallow buffer of the NQB queue will be fine to misclassify itself into it even if its behaviors would be verifiably capacity-seeking. The only argument for misalignment of incentives to mismark is the shallowness of the buffer, once that is solved (by following Jonathan's BBR example) the argument that "for NQB flows, the NQB queue provides better performance (considering latency, loss and throughput) than the QB queue; and for QB flows, the QB queue provides better performance (considering latency, loss and throughput) than the NQB queue." is shown to be overly optimistic unless backed by queue protection.

> 
> However, there is still the risk of accidents or malice.

	[SM] With malice actually being a market place with its own economics. IMHO that factor alone should make queue protection mandatory, as this proposal will allow denial of service attacks much cheaper (no need to clogg the full pipe as long as one can drive the NQB queue into dropping mode often enough and that, in your example requires at most 31 full-MTU packets...)

> Nonetheless, the harm from any accidental or malicious NQB mismarking will be confined to other applications of the same customer {Note 1}.

	[SM] The draft mentions access links, it does not restrict itself to access links though, so these need to be covered as well, no? But in the DOS case it is not an application of the customer that degrades the NQB queue but the attacker's.

> 
> Given there's no incentive to mismark,

	[SM] Which I am not convinced to be true yet.

> and the effect of accidents or malice are contained, an operator might decide that traffic protection is unnecessary - more cost than benefit.

	[SM] Sure, but in that case our hypothetical operator will have no problem whatsoever to simply ignore the RFC's MUST/SHOULD language. But making it SHOULD means the operator can do so and actually crow about it to its end customers.

> 
> The point about verifiability is merely that, if an operator does choose to protect flows from each other, an implementation can objectively determine which flows are out of compliance, because NQB behaviour is verifiable. Objective verifiability is neutral, which is important in jurisdictions with net neutrality legislation.

	[SM] If a tree falls in a forrest and no one is there to witness it does it still make a sound? Only the bottleneck is guaranteed to have sufficient information to actually perform the verification, if that bottleneck does not do this it is game over; with the exception of the very narrow model of NQB-scheduling only on the access link in which case that observable behavior at the end-user will be strongly correlated with the behavior at the bottleneck (but even there it is not exactly clear which packets were dropped by the upstream NQB-scheduler and why).

> 
> 
> {Note 1}: The draft says NQB is targeted at access links. That implies NQB scheduling is between the classes of one customer and that a higher level in the network scheduling hierarchy isolates customers from each other.

	[SM] Given your explanation, the draft maybe should clarify that it only targets access links, otherwise this argument is not very convincing.

> 
>> 
>> 
>>> The draft claims incentives can be aligned by an implementation being arranged to ensure that NQB traffic benefits from NQB marking and QB traffic benefits from QB marking. Incentives are hard to guess, so that may or may not be true. However, I don't think we (the tsvwg/IETF) can state categorically that it is not true.
>> 	[SM] I would have thought it would be on those claiming a laxer enforcement model to proof that that model is sufficient?
> [BB] See previous email to you and David Black.

	[SM] Still not convinced.

>> 
>>> The draft makes the point that, even if incentives are aligned, queue-building traffic could be mismarked as NQB, either accidentally or maliciously. That's a sound reason for an implementer to include traffic protection, but I don't think it's a good reason for us (the tsvwg/IETF) to require them to.
>>> 
>>> While there is no operational experience of NQB deployments, I think the market (i.e. most early-adopting operators) will want the warm feeling of some form of traffic protection. But as we get more experience we might find incentives really are aligned. And we might find accidents and malice are not a significant problem.
>> 	[SM] @ the chairs, how hard would it be to retroactively change from a SHOULD to  a MUST? If it is easy I agree with Bob that a SHOULD would be nice, but if it is hard I would vote for a MUST as that seems to be the safer option.
>> 
>>> So I think the current 'SHOULD' is the right call. It could be beefed up with warnings on the risks of not providing protection - not least the risk no early adopter will want to use such an implementation.
>>> 
>>> 
>>> 
>>> As an analogy, when TCP congestion control was first developed, it was known that end-systems could run a subverted TCP algorithm or just use unresponsive UDP. At that time, the view could have been taken that per-flow scheduling would have to be a 'MUST' requirement for all Internet bottlenecks.
>>> As it has turned out, the Internet does have /per-user/ scheduling at bottlenecks,
>> 	[SM] Question is this universally true? I am not 100% sure. My access link is limited by my contracted rate, but due to oversubscription there is no guarantee that on a congested link between my home and my ISP's traffic-shaper/the wider internet I get a share that reflects my "per-user" fraction of the shared capacity.
>> 	I fail to see any scheduler beyond my ISPs traffic shaper that has a wholistic-enough view to classify traffic based on "user" (think NATed IPv4 address, but variable length IPv6 prefixes, plus a router's ipv6/64), could you elaborate please?
> [BB] In DSL, DOCSIS, mobile or satellite access networks, a scheduler both limits a customer to their contracted access rate and shares out the capacity during over-subscription periods. How these schedulers actually do that is secret sauce in many cases {Note 2}, but they all do it somehow.

	[SM] in DSL networks the "scheduler" might actually just be the fact that the sync rate between CPE and DSLAM sets an upper limit and that the max sync rate can be programmed from the ISPs side. Such configurations are hopefully rare though.

> 
> If another link deeper into the network becomes the bottleneck (e.g. a core or peering link), then there is typically no scheduler arbitrating between Internet customers {Note 3}. Then it's down to TCP between flows, regardless of which customer they are from. However, a core bottleneck rarely happens by design - usually it would be due to a screw up.

	[SM] Well, in my reality such deeper bottlenecks are not such rare unicorns, there often are predictable (time of day dependent) capacity exhaustions on the transit links to/from an ISP's network that introduce unwanted packet loss and delay. But as I predicted, beyond the access link's upstream end (loosely defined) no network node knows of user directly (albeit using IPv4 dst addresses and IPv6 dst prefixes should give more remote hops also some handle on endpoints/endpoint-networks). But I admit that this is somewhat tangential to our topic.

> 
> The main exception is campus networks (e.g. Universities, corporates), where the access link with the Internet doesn't usually schedule between users. Solutions range from a simple FIFO (i.e. the transport layers will be determining shares per flow, not per user) to complex often DPI-based prioritization solutions that might recognize users and/or applications.
> 
> 
> {Note 2}: When I worked for BT, we described BT's wholesale solution here:
> https://riteproject.files.wordpress.com/2015/12/rite-deliverable-3-3-public1.pdf#10
> (it had already been published elsewhere, due to the regulatory need for transparency in the UK market).

	[SM] Thanks this seems to confirm my intuition derived from observations of my own access link properties.

> 
> {Note 3}: But one customer cannot get a huge share of the core capacity by opening loads of TCP flows, because each customer is still capped by their access scheduler as well.
> 
> 
>> 
>>> but there has been little need for /per-flow/ scheduling for capacity sharing (yes, FQ exists, but it's not needed for capacity sharing).
>>> 
>>> In the TCP case, it turned out that a delicate balance of incentives proved sufficient to allow most Internet equipment to be simpler and cheaper. There is a poorly understood balance of incentives in the NQB case. So let's not require equipment to be more complex than it might need to be, at least not yet.
>> 	[SM] I believe for malicious actors NQB will be a attractive DSCP as it promised to allow doing harm even at low bandwidth (just send a low average rate in lumpy bursts and I would expect an NQB-honoring L4S scheduler to get into trouble*, without requiring a large offensive traffic load, keeping it cheap and hard to detect yet sufficiently disruptive to rob the low latency queue of its intended functionality).
> [BB] Yes, if a malicious actor could get control of either end, it would be fairly easy for them to cause more queuing.

	[SM] Why would the malicious actor need control over either end to just send bursts of NQB-marked packets to one of the endpoints'  IP-addresses? All needed seems to be the IP-address of the end-point and the low bandwidth DOS game is on... (this actually promises to take the (D) out of (D)DOS again as the required traffic burst rates to drive the NQB queue into tail drop seem rather low).


> On the other hand it would be confined within one customer's service, so not a particularly high profile attack for a script kiddie to boast about.

	[SM] On-line gamers do exactly that, they order and pay for (D)DOS attacks on individual other users they want to disadvantage during game-play, as far as I know. The fact that DDOS as a service exists indicates to me, that there exists sufficient demand for malicious action, to predict that if NQB gets adopted by on-line games NQB queues will get targeted. In a sense the proposed NQB example from your other mail, serves the game-traffic on a silver platter in a simple attack sensitive queue that has a convenient way to be addressed from any outside entity, without having to flood the whole segment link, potentially allowing the attack to continue under the ISP's "radar".

> 
> As I said in my other mail to you & David, I'm not saying there's not a risk.

	[SM] The discussion is about the level of risk and how to mitigate this as as far as I see the proposed NQB scheme will increase the attack surface of users behind and NQB-honoring scheduler.


> I'm saying it's not the IETF's place to mandate that implementers MUST protect against this risk. It's not a standards matter. Implementers will provide protection if operators want it. And if some operators don't want it, there will be a market for lower cost solutions without protection. Implementers are perfectly able to make these decisions themselves. The IETF has no mandate to meddle in this market.

	[SM] David's proposal was MUST implement SHOULD use which seems like it would please you. I believe it should be a categorical MUST for implementation and use, as the more I think about it the more it becomes clear that this thing is dangerously allowing targeted attacks on important services like VoIP in a convenient way.

Best Regards
	Sebastian

> 
>> 
>> 
>> *) This is a hypothesis I have not confirmed in any way, but it seems to be in line with how I understand dual queue aqm to work (so it might be more of a reflection of my level of understanding and not so much of dual queue aqm). Please correct me if this seems wrong.
> Cheers
> 
> 
> 
> Bob
> 
> 
>> 
>>> u
>>> 
>>> Bob
>>> 
>>> 
>>>>  Thanks, --David (TSVWG co-chair, will be shepherd for NQB draft).
>>>> ----------------------------------------------------------------
>>>> David L. Black, Senior Distinguished Engineer
>>>> Dell EMC, 176 South St., Hopkinton, MA  01748
>>>> +1 (774) 350-9323 New    Mobile: +1 (978) 394-7754
>>>> David.Black@dell.com
>>>> ----------------------------------------------------------------
>>>>  
>>> -- 
>>> ________________________________________________________________
>>> Bob Briscoe
>>> http://bobbriscoe.net/
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>

[tsvwg] TSVWG: WG adoption of draft-white-tsvwg-n… Black, David
[tsvwg] Traffic protection as a hard requirement … Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Black, David
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Jonathan Morton
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Jonathan Morton
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] Traffic protection as a hard requirem… Steven Blake
Re: [tsvwg] Traffic protection as a hard requirem… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
[tsvwg] [Fwd: Re: Traffic protection as a hard re… Steven Blake
Re: [tsvwg] [Fwd: Re: Traffic protection as a har… Jonathan Morton
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] Traffic protection as a hard requirem… Bob Briscoe
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Jerome Henry (jerhenry)
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Jonathan Morton
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] Traffic protection as a hard requirem… Sebastian Moeller
Re: [tsvwg] Traffic protection as a hard requirem… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Ruediger.Geib
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Sebastian Moeller
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Black, David
Re: [tsvwg] TSVWG: WG adoption of draft-white-tsv… Greg White