Re: [tsvwg] Traffic protection as a hard requirement for NQB

Bob Briscoe <ietf@bobbriscoe.net> Tue, 10 September 2019 21:25 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1152312022A for <tsvwg@ietfa.amsl.com>; Tue, 10 Sep 2019 14:25:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C02mGmEJTVHm for <tsvwg@ietfa.amsl.com>; Tue, 10 Sep 2019 14:25:33 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0405B1200B8 for <tsvwg@ietf.org>; Tue, 10 Sep 2019 14:25:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=pyWrUxm6Zt73X72E0mnCggdPG/bxmCFoWNTtqs48X8k=; b=OJI0S/W4eUXf0/hwtuxdyGs4bE oTH8BU48B1ycnmAO/5vrIiePfzGKNKmXuuk4UQOomQ+nYiUx5aJPafjL+f5GThyINYaI+fckZM4fJ m/ex2WvBuNulYtwilf2HukbRN1zlgHr53xi2MUcNaZ/X88gaGtpIDMrqjFsqGUDYo/W3NxDctcvSh 1wyFy1IuSax22jV2RVDJytMVieirCYsjy0zU/aonb0Y0tXCS1swPzLdtS5lFx/wy4lNq3tfZsedKm Nwe7rPA8Wr/7ZTEB0Q8imS86tHVoF+qQXSKD/ProqKJOnxuHwxgk8n7dw2lNOMHcV65k0UtoIGELA dQ/OkdSw==;
Received: from [31.185.128.31] (port=44296 helo=[192.168.0.3]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <ietf@bobbriscoe.net>) id 1i7ndh-0007s6-Tp; Tue, 10 Sep 2019 22:25:30 +0100
To: Sebastian Moeller <moeller0@gmx.de>
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <CE03DB3D7B45C245BCA0D24327794936306BBE54@MX307CL04.corp.emc.com> <56b804ee-478d-68c2-2da1-2b4e66f4a190@bobbriscoe.net> <AE16A666-6FF7-48EA-9D15-19350E705C19@gmx.de> <ad360251-638e-1c7e-3b9d-2838cc5917ef@bobbriscoe.net> <7B407AE2-7B5C-4310-9FA7-F399E98B7086@gmx.de>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <cfdd0693-7ed3-d3cf-9409-07a43c4f51ca@bobbriscoe.net>
Date: Tue, 10 Sep 2019 22:25:29 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <7B407AE2-7B5C-4310-9FA7-F399E98B7086@gmx.de>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/tKs4fMkgWU6N7uSVkfSjQOjK2u0>
Subject: Re: [tsvwg] Traffic protection as a hard requirement for NQB
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Sep 2019 21:25:38 -0000

Sebastian,

On 06/09/2019 10:51, Sebastian Moeller wrote:
>
>> On Sep 5, 2019, at 23:46, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> Sebastian, (sorry, I mis-spelled your name previously), inline...
>>
>> On 03/09/2019 10:14, Sebastian Moeller wrote:
>>> Dear Bob,
>>>
>>> allow me to chime in.
>>>
>>>> On Sep 2, 2019, at 16:47, Bob Briscoe <in@bobbriscoe.net> wrote:
>>>>
>>>> David,
>>>>
>>>> Thanks for your closing remarks on the NQB adoption call.
>>>> You say your last point is open for discussion, so I will dive straight in to start that discussion.
>>>>
>>>> On 30/08/2019 16:40, Black, David wrote:
>>>>> 	• [snip]
>>>>> 	• The criticisms on this list of the “queue protection” requirement in the draft are largely accurate.   The draft needs at least an Editor’s Note that this material will be revised, as while the DOCSIS mechanism is an example of how to do queue protection, it is not appropriate to require implementation of that mechanism.   A plausible plan that I have discussed with the authors is to write a set of functional/behavioral requirements for NQB “traffic protection” that can be satisfied by a “queue protection” mechanism such as the DOCSIS mechanism, or by a suitably configured FQ AQM implementation. [snip]
>>>>>   In addition, related to item 2), my expectation (which is open to further discussion) that “traffic protection” will be a “MUST” requirement, perhaps with some well-specified exceptions (including explanations of why the exceptions are ok).   This is because “traffic protection” (e.g., “queue protection” or a suitably configured FQ AQM) appears to be necessary in general to keep queue-building traffic out of the NQB traffic aggregate, as allowing such traffic degrades the properties of the NQB PHB.
>>>> I think we should be wary of making traffic protection a hard requirement at such an early stage in our knowledge of the NQB behaviour. I believe this is a case where the market, not the IETF, ought to decide whether protection is required.
>>> 	[SM] The draft says "... it is worthwhile to note that the NQB designation and marking would be intended to convey verifiable traffic behavior, not needs or wants." in the light of this requirement it seems obvious that a hop willing to honor that DSCP should/must actually verify the traffic behavior, no? Requiring behavior to according to a set of requirements but not enforcing these requirements seems very very optimistic.
>> [BB] Just because the behaviour is verifiable does not mean it always has to be verified.
>> For instance, if someone claims a car is 17 years old, you might trust it is likely to be truthful because the speaker knows you could verify it. But it doesn't mean you have to actually look up the engine and chassis serial numbers to check the date of manufacture.
> 	[SM] I fail to see how this is a fitting analogy to this issue at hand.
[BB] It is a highly fitting analogy, so if you fail to see it, please 
try harder. My wife just read this, and said, "What's not to understand? 
That's an extremely relevant analogy."

In life, most trust is based on verifiability, but rarely actually verified.
> The NQB draft defines NQB-ness as " the NQB designation and marking would be intended to convey verifiable traffic behavior, not needs or wants." IMHO either get rid of this section or mandate the verification.
No, it's important that it is verifiable. It's also important that it 
doesn't have to be verified, for reasons I've given in other emails.

For example, the case of a mobile uplink where:
a) apps are verified before being allowed into the app-store
b) and the only place where the operator could apply verification prior 
to the bottleneck radio link would be on the UE

Here continual run-time verification is both complex (b) and unnecessary 
(a). So not appropriate to mandate it.

I'm afraid I cannot continue spending time on your responses if you just 
snip the text of my arguments that you haven't argued against and then 
continue to argue against the conclusion I've drawn from those arguments 
with no rationale, just IYHO assertions.

>
>> In the case of NQB, there is less need to verify behaviour because applications or users have no incentive to mismark - they would be worse off.
> 	[SM] I still do not believe that to be true, and no amount of Gedankenexperimente is going to change that, as this is rather a data question.
> As I wrote in another mail, any application that can gracefully deal with the shallow buffer of the NQB queue will be fine to misclassify itself into it even if its behaviors would be verifiably capacity-seeking. The only argument for misalignment of incentives to mismark is the shallowness of the buffer, once that is solved (by following Jonathan's BBR example) the argument that "for NQB flows, the NQB queue provides better performance (considering latency, loss and throughput) than the QB queue; and for QB flows, the QB queue provides better performance (considering latency, loss and throughput) than the NQB queue." is shown to be overly optimistic unless backed by queue protection.
[BB] Well, let's see what Jonathan comes up with in response to my 
request to reconstruct his argument, given he had misunderstood the config.

But remember that, even if the incentives are not clear-cut, I am saying 
that it is still not appropriate for the IETF to mandate a protection 
mechanism.

>
>> However, there is still the risk of accidents or malice.
> 	[SM] With malice actually being a market place with its own economics. IMHO that factor alone should make queue protection mandatory, as this proposal will allow denial of service attacks much cheaper (no need to clogg the full pipe as long as one can drive the NQB queue into dropping mode often enough and that, in your example requires at most 31 full-MTU packets...)
[BB] You seem to be treating me as if the economics of DDoS and traffic 
security is not my area of expertise.

I assume you are aware that nearly every RFC is vulnerable to DDoS. So 
now please explain why each RFC does not mandate protection against DDoS?

It's not an oversight. It's deliberate, because one doesn't have to 
implement protection at every point in a network and at all times (think 
of the example of app-store verification above).


>
>> Nonetheless, the harm from any accidental or malicious NQB mismarking will be confined to other applications of the same customer {Note 1}.
> 	[SM] The draft mentions access links, it does not restrict itself to access links though, so these need to be covered as well, no?
I guess so. But that's not relevant to whether 'MUST protect' is necessary.

To prove MUST is necessary, you have to prove protection is necessary in 
every scenario. That means you have to argue against the cases where I 
have shown it's /not/ necessary. Unless you have done that, there's no 
point you introducing more cases where you think protection /is/ necessary.

> But in the DOS case it is not an application of the customer that degrades the NQB queue but the attacker's.
[BB] My sentence is about the application that is harmed, not the 
application doing the harm.
>
>> Given there's no incentive to mismark,
> 	[SM] Which I am not convinced to be true yet.
>
>> and the effect of accidents or malice are contained, an operator might decide that traffic protection is unnecessary - more cost than benefit.
> 	[SM] Sure, but in that case our hypothetical operator will have no problem whatsoever to simply ignore the RFC's MUST/SHOULD language. But making it SHOULD means the operator can do so and actually crow about it to its end customers.
[BB] The argument right from the start has been about SHOULD/MUST 
implement, not SHOULD/MUST enable. But it's similar. If it's SHOULD 
implement, the implementer can crow about it. But there's still a market 
for NQB without protection (e.g. for a mobile uplink).
>
>> The point about verifiability is merely that, if an operator does choose to protect flows from each other, an implementation can objectively determine which flows are out of compliance, because NQB behaviour is verifiable. Objective verifiability is neutral, which is important in jurisdictions with net neutrality legislation.
> 	[SM] If a tree falls in a forrest and no one is there to witness it does it still make a sound? Only the bottleneck is guaranteed to have sufficient information to actually perform the verification, if that bottleneck does not do this it is game over;
[BB] Yes.

That is still not an argument for 'MUST protect', which would force 
implementations of NQB to protect in order to claim RFC compliance, even 
if protection is not appropriate for the deployment scenario of the 
particular NQB implementation.

> with the exception of the very narrow model of NQB-scheduling only on the access link in which case that observable behavior at the end-user will be strongly correlated with the behavior at the bottleneck (but even there it is not exactly clear which packets were dropped by the upstream NQB-scheduler and why).
[BB] Access links are the primary and often expected to be the only 
point on a path through the Internet where NQB would be deployed. See 
earlier response about how operators design their networks to ensure the 
bottleneck is usually at a single point that they can easily control.
>
>>
>> {Note 1}: The draft says NQB is targeted at access links. That implies NQB scheduling is between the classes of one customer and that a higher level in the network scheduling hierarchy isolates customers from each other.
> 	[SM] Given your explanation, the draft maybe should clarify that it only targets access links, otherwise this argument is not very convincing.
[BB] At present the draft concerns the bottleneck link of a path, which 
it says is frequently the access network. I think an NQB 
per-hop-behaviour could be applicable to any bottleneck, but the 
discussion of incentives and protection in the draft is mostly specific 
to cases where the bottleneck is an access network link (as in the three 
use-cases given).

So if you mean that the authors could say that the protection discussion 
applies primarily to access networks, I would agree.


Regards



Bob
>
>>>
>>>> The draft claims incentives can be aligned by an implementation being arranged to ensure that NQB traffic benefits from NQB marking and QB traffic benefits from QB marking. Incentives are hard to guess, so that may or may not be true. However, I don't think we (the tsvwg/IETF) can state categorically that it is not true.
>>> 	[SM] I would have thought it would be on those claiming a laxer enforcement model to proof that that model is sufficient?
>> [BB] See previous email to you and David Black.
> 	[SM] Still not convinced.
>
>>>> The draft makes the point that, even if incentives are aligned, queue-building traffic could be mismarked as NQB, either accidentally or maliciously. That's a sound reason for an implementer to include traffic protection, but I don't think it's a good reason for us (the tsvwg/IETF) to require them to.
>>>>
>>>> While there is no operational experience of NQB deployments, I think the market (i.e. most early-adopting operators) will want the warm feeling of some form of traffic protection. But as we get more experience we might find incentives really are aligned. And we might find accidents and malice are not a significant problem.
>>> 	[SM] @ the chairs, how hard would it be to retroactively change from a SHOULD to  a MUST? If it is easy I agree with Bob that a SHOULD would be nice, but if it is hard I would vote for a MUST as that seems to be the safer option.
>>>
>>>> So I think the current 'SHOULD' is the right call. It could be beefed up with warnings on the risks of not providing protection - not least the risk no early adopter will want to use such an implementation.
>>>>
>>>>
>>>>
>>>> As an analogy, when TCP congestion control was first developed, it was known that end-systems could run a subverted TCP algorithm or just use unresponsive UDP. At that time, the view could have been taken that per-flow scheduling would have to be a 'MUST' requirement for all Internet bottlenecks.
>>>> As it has turned out, the Internet does have /per-user/ scheduling at bottlenecks,
>>> 	[SM] Question is this universally true? I am not 100% sure. My access link is limited by my contracted rate, but due to oversubscription there is no guarantee that on a congested link between my home and my ISP's traffic-shaper/the wider internet I get a share that reflects my "per-user" fraction of the shared capacity.
>>> 	I fail to see any scheduler beyond my ISPs traffic shaper that has a wholistic-enough view to classify traffic based on "user" (think NATed IPv4 address, but variable length IPv6 prefixes, plus a router's ipv6/64), could you elaborate please?
>> [BB] In DSL, DOCSIS, mobile or satellite access networks, a scheduler both limits a customer to their contracted access rate and shares out the capacity during over-subscription periods. How these schedulers actually do that is secret sauce in many cases {Note 2}, but they all do it somehow.
> 	[SM] in DSL networks the "scheduler" might actually just be the fact that the sync rate between CPE and DSLAM sets an upper limit and that the max sync rate can be programmed from the ISPs side. Such configurations are hopefully rare though.
>
>> If another link deeper into the network becomes the bottleneck (e.g. a core or peering link), then there is typically no scheduler arbitrating between Internet customers {Note 3}. Then it's down to TCP between flows, regardless of which customer they are from. However, a core bottleneck rarely happens by design - usually it would be due to a screw up.
> 	[SM] Well, in my reality such deeper bottlenecks are not such rare unicorns, there often are predictable (time of day dependent) capacity exhaustions on the transit links to/from an ISP's network that introduce unwanted packet loss and delay. But as I predicted, beyond the access link's upstream end (loosely defined) no network node knows of user directly (albeit using IPv4 dst addresses and IPv6 dst prefixes should give more remote hops also some handle on endpoints/endpoint-networks). But I admit that this is somewhat tangential to our topic.
>
>> The main exception is campus networks (e.g. Universities, corporates), where the access link with the Internet doesn't usually schedule between users. Solutions range from a simple FIFO (i.e. the transport layers will be determining shares per flow, not per user) to complex often DPI-based prioritization solutions that might recognize users and/or applications.
>>
>>
>> {Note 2}: When I worked for BT, we described BT's wholesale solution here:
>> https://riteproject.files.wordpress.com/2015/12/rite-deliverable-3-3-public1.pdf#10
>> (it had already been published elsewhere, due to the regulatory need for transparency in the UK market).
> 	[SM] Thanks this seems to confirm my intuition derived from observations of my own access link properties.
>
>> {Note 3}: But one customer cannot get a huge share of the core capacity by opening loads of TCP flows, because each customer is still capped by their access scheduler as well.
>>
>>
>>>> but there has been little need for /per-flow/ scheduling for capacity sharing (yes, FQ exists, but it's not needed for capacity sharing).
>>>>
>>>> In the TCP case, it turned out that a delicate balance of incentives proved sufficient to allow most Internet equipment to be simpler and cheaper. There is a poorly understood balance of incentives in the NQB case. So let's not require equipment to be more complex than it might need to be, at least not yet.
>>> 	[SM] I believe for malicious actors NQB will be a attractive DSCP as it promised to allow doing harm even at low bandwidth (just send a low average rate in lumpy bursts and I would expect an NQB-honoring L4S scheduler to get into trouble*, without requiring a large offensive traffic load, keeping it cheap and hard to detect yet sufficiently disruptive to rob the low latency queue of its intended functionality).
>> [BB] Yes, if a malicious actor could get control of either end, it would be fairly easy for them to cause more queuing.
> 	[SM] Why would the malicious actor need control over either end to just send bursts of NQB-marked packets to one of the endpoints'  IP-addresses? All needed seems to be the IP-address of the end-point and the low bandwidth DOS game is on... (this actually promises to take the (D) out of (D)DOS again as the required traffic burst rates to drive the NQB queue into tail drop seem rather low).
>
>
>> On the other hand it would be confined within one customer's service, so not a particularly high profile attack for a script kiddie to boast about.
> 	[SM] On-line gamers do exactly that, they order and pay for (D)DOS attacks on individual other users they want to disadvantage during game-play, as far as I know. The fact that DDOS as a service exists indicates to me, that there exists sufficient demand for malicious action, to predict that if NQB gets adopted by on-line games NQB queues will get targeted. In a sense the proposed NQB example from your other mail, serves the game-traffic on a silver platter in a simple attack sensitive queue that has a convenient way to be addressed from any outside entity, without having to flood the whole segment link, potentially allowing the attack to continue under the ISP's "radar".
>
>> As I said in my other mail to you & David, I'm not saying there's not a risk.
> 	[SM] The discussion is about the level of risk and how to mitigate this as as far as I see the proposed NQB scheme will increase the attack surface of users behind and NQB-honoring scheduler.
>
>
>> I'm saying it's not the IETF's place to mandate that implementers MUST protect against this risk. It's not a standards matter. Implementers will provide protection if operators want it. And if some operators don't want it, there will be a market for lower cost solutions without protection. Implementers are perfectly able to make these decisions themselves. The IETF has no mandate to meddle in this market.
> 	[SM] David's proposal was MUST implement SHOULD use which seems like it would please you. I believe it should be a categorical MUST for implementation and use, as the more I think about it the more it becomes clear that this thing is dangerously allowing targeted attacks on important services like VoIP in a convenient way.
>
> Best Regards
> 	Sebastian
>
>>>
>>> *) This is a hypothesis I have not confirmed in any way, but it seems to be in line with how I understand dual queue aqm to work (so it might be more of a reflection of my level of understanding and not so much of dual queue aqm). Please correct me if this seems wrong.
>> Cheers
>>
>>
>>
>> Bob
>>
>>
>>>> u
>>>>
>>>> Bob
>>>>
>>>>
>>>>>   Thanks, --David (TSVWG co-chair, will be shepherd for NQB draft).
>>>>> ----------------------------------------------------------------
>>>>> David L. Black, Senior Distinguished Engineer
>>>>> Dell EMC, 176 South St., Hopkinton, MA  01748
>>>>> +1 (774) 350-9323 New    Mobile: +1 (978) 394-7754
>>>>> David.Black@dell.com
>>>>> ----------------------------------------------------------------
>>>>>   
>>>> -- 
>>>> ________________________________________________________________
>>>> Bob Briscoe
>>>> http://bobbriscoe.net/
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                               http://bobbriscoe.net/
>>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/