Re: [tsvwg] L4S dual-queue re-ordering and VPNs

Bob Briscoe <ietf@bobbriscoe.net> Tue, 18 May 2021 08:07 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1F8803A10F1 for <tsvwg@ietfa.amsl.com>; Tue, 18 May 2021 01:07:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.434
X-Spam-Level:
X-Spam-Status: No, score=-1.434 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n-VRW3-I8RG7 for <tsvwg@ietfa.amsl.com>; Tue, 18 May 2021 01:06:56 -0700 (PDT)
Received: from mail-ssdrsserver2.hosting.co.uk (mail-ssdrsserver2.hosting.co.uk [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1064B3A10F0 for <tsvwg@ietf.org>; Tue, 18 May 2021 01:06:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=LIqG/DW3JHSsl7b/3iTCB3hIGaaNRZTnwGIiALKEJ6A=; b=L5luCT+CpkuVu/erClH5leiszP 6iiPWlo4AZpH9HvzXzHAi2JZKZmVyY6EHxr0Syv4l5SK3KbpIHLMvTTe87dNtXY4cz5GBOV0FDYoe pOTqOSjX3N3Kws8QfbUP0GrFb9cAe2mceHSnAUjb7Hg4e1XKdtDncW14oDT/phuj6M34NR6QN3J6i Te7L9jgebXvFn/43pTKYA9os1m4AOs7n45SYpY+6cZ2j3pV1VRmXXcWC93LKLU6q8PGY9T7DDNKcX jxrla163S+IJgK48CGd+wPAFqNtH8SUtb/h+UJfEwMpj0IKlyhf/q/EWogMNEiJ9Wp6NBCFm4VQ5m HRk6ppmw==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:49098 helo=[192.168.1.11]) by ssdrsserver2.hosting.co.uk with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <ietf@bobbriscoe.net>) id 1liuki-0007qt-VK; Tue, 18 May 2021 09:06:53 +0100
To: Sebastian Moeller <moeller0@gmx.de>
Cc: TSVWG <tsvwg@ietf.org>
References: <68F275F9-8512-4CD9-9E81-FE9BEECD59B3@cablelabs.com> <1DB719E5-55B5-4CE2-A790-C110DB4A1626@gmx.de> <MN2PR19MB40452C9DD1164609A005139583569@MN2PR19MB4045.namprd19.prod.outlook.com> <e15d732f64bf983975dbe507092b39f0744f7f74.camel@heistp.net> <1efe0dfb-afb6-0aa4-dcff-fb4ddeb46b8f@bobbriscoe.net> <689EAC46-9873-40BC-A8EE-12060336FB19@gmx.de> <a6ae94e9-4bdb-222b-d206-7f35fc807948@bobbriscoe.net> <42ECC889-440C-4DE1-BD8C-C983387E460C@gmx.de> <2a3d13ef-043d-aaa9-d036-c7945676f6f7@bobbriscoe.net> <595F0D20-CE6C-4A85-85BD-536DDFDF4D82@gmx.de>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <d76d9f2b-e801-0084-2e22-fd74ae8bd83a@bobbriscoe.net>
Date: Tue, 18 May 2021 09:06:51 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <595F0D20-CE6C-4A85-85BD-536DDFDF4D82@gmx.de>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hosting.co.uk
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hosting.co.uk: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hosting.co.uk: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/64uTT5R5YilYm0jVD17cm0voDNo>
Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 May 2021 08:07:05 -0000

Sebastian,

I think this thread has converged on agreement on every point - so no 
need to reply.
See [BB2] for a couple of clarifications.

On 09/05/2021 22:07, Sebastian Moeller wrote:
> Hi Bob,
>
>
> more below, prefixed [SM2].
>
>> On May 9, 2021, at 18:50, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> Sebastian,
>>
>> Hopefully some of my responses to David's response to you were useful. A little more added below, [BB]...
>>
>> On 09/05/2021 00:59, Sebastian Moeller wrote:
>>> Hi Bob,
>>>
>>>
>>>> On May 9, 2021, at 00:12, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>>>
>>>> Sebastian,
>>>>
>>>> On 08/05/2021 19:09, Sebastian Moeller wrote:
>>>>> Hi Bob, list,
>>>>>
>>>>> see [SM] below
>>>>>
>>>>>> On May 8, 2021, at 16:26, Bob Briscoe <in@bobbriscoe.net> wrote:
>>>>>>
>>>>>> Thank you, Sebastian, for picking up this inter-area inconsistency, and thanks, Pete, for the test data.
>>>>>>
>>>>>> I would have thought that the delay-delta between two AQMs (as in the DualQ) will in general be much less than the delay-delta between a low delay and best efforts Diffserv behaviour, where the best efforts generally has no AQM at all, and is therefore prone to the delay of the whole buffer, which may be bloated.
>>>>> 	[SM] Well possible, but not that relevant, as far as I understand the Linux source code wireguard and ipsev/xfrm only propagate the ECN bits between inner and outer layers and do not propagate the other 6 dscp bits (of the former TOS byte).
>>>> [BB] For a patch that adds DSCP propagation to the outer of a wireguard tunnel, see https://lists.zx2c4.com/pipermail/wireguard/2019-March/004026.html
>>> [SM] But see actual wireguard implementation in the Linux kernel (https://elixir.bootlin.com/linux/v5.12.2/source/drivers/net/wireguard/send.c):
>>>
>>> skb_queue_walk(&packets, skb) {
>>> 		/* 0 for no outer TOS: no leak. TODO: at some later point, we
>>> 		 * might consider using flowi->tos as outer instead.
>>> 		 */
>>> 		
>>> 		PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb);
>>> 		PACKET_CB(skb)->nonce =
>>> 		
>>> 		atomic64_inc_return(&keypair->sending_counter) - 1;
>>>
>>> 		if (unlikely(PACKET_CB(skb)->nonce >= REJECT_AFTER_MESSAGES))			
>>> 		goto out_invalid;
>>> }
>>>
>>>
>>> See the comment and the first zero argument to ip_tunnel_ecn_encap? That is where current wireguard irgnores the DSCP and enforces DSCP 0. I have actually looked into the source while trying to understand this issue.
>> I understood when you said it's not in the mainline. My intent in pointing to that patch was to show that other products/distros that do propagate DSCPs have been based on wg.
> 	[SM2] Your link points to a post in the wireguard mailing list with exactly zero indication that a) that patch went anywhere, and that b) it might have made it into other distributions. so what additional information makes you believe that these other products/distros actually exist, that are based on a patched wireguard?
>
>>>
>>>> Also, there are many more VPNs on the Internet than the software you happen to use.
>>> 	[SM] True, and nothing I claimed to be otherwise. I do mention though that for end-users wireguard  and OpenVPN are common and attractive VPN implementations since they are supported by commercial VPN endpoints (which makes them a bit more attractive than pure ipsec) and the clients are free to use.
>>>
>>>> Throughout the world there's a thriving market in VPN services for home office working, corporate satellite offices, inter-business collaborations, and so on. These VPNs propagate the DSCP to the outer,
>>> 	[SM] What fraction of them does that, and what fraction allows to disable that?
>>>
>>>> so they can use Diffserv service like this one in the published spec of the 'Wholesale Broadband Connect' service my previous employer (BT) offers in the UK. See section 5.8 for the DSCPs supported:
>>>> https://www.bt.com/bt-plc/assets/documents/sinet/sins/downloads/472v2p11.pdf
>>>> This is just one ISP's wholesale product sold to retail ISPs, who then build on top the services they sell to their customers, such as VoIP, corporate VPNs, etc.
>>> 	[SM] This is interesting data, but orthogonal to my point and report. If such VPNs do this, they need to solve the re-ordering issue.
>> [BB] That was my point - it's not just an L4S issue.
> 	[SM2] Sure, but as before with RTT-bias, L4S adds its own significant increase in relevant re-ordering even IFF dscps do not come into play, and that is an L4S issue.
>
>>> BTW, the document does not go into any details of the VPN, so no way to figure out whether it actually protects against replay attacks.
>> [BB] Hope response to David covers this.
> 	[SM2] Honestly, this feels like a distraction, introducing another class of "VPNs" to distract from the issue that "early re-ordering" is not as benign as you made it sound...
>
>>>>> In fact the issue I see is that L4S introduces a novel re-ordering condition that has not exited before and that hence is on nobody's radar.
>>>> [BB] When all the flows are within a VPN, the reordering within a flow in the DualQ is indistinguishable from reordering between flows, as far as the VPN sequencing is concerned. Reordering between flows within a VPN is not a new phenomenon at all - it's expected.
>>> 	[SM] Encrypted traffic is in now way different from other internet traffic, hence some re-ordering along a network path has to be accepted, true; network operators would prefer that "some" to be larger, end-users/protocols would prefer that "some" to be smaller. But the observed up to ~50ms speed-up that ECT(1) flows see over ECT(0) in the DualQ AQM (with default targets) is just far outside of the expected "some" for VPNs. It is unfortunate (for L4S), that replay windows are sensitive for expedited early packets, while ACK streams seem more affected by delayed packets*.
>> [BB] Agreed.
>>
>>> *) that is the theory, there has been little data to actually conform that hypothesis
>>>
>>>
>>>>>> So, wherever a VPN includes flows using different DSCPs, and there is a Diffserv-enabled bottleneck between the ends of the VPN, the VPN's replay window will need to cater for considerably more than 50ms delay-delta within the VPN. More like at least 200ms, and possibly 1-2s in some cases of bloat.
>>>>> 	[SM] Sure, but neither ipsec nor wireguard do that, outer DSCP for wireguard seems to be fixed to 0...
>>>>>
>>>>>> David himself has written about the reordering problem when a WebRTC application encapsulates TCP, SCTP and RTP flows with different DSCPs within UDP [RFC7657]. Indeed, datagram transport layer security (DTLS) is a common encapsulation for WebRTC flows. And DTLS also recommends a default replay window of 64 (see https://tools.ietf.org/html/rfc6347#section-4.1.2.6 ).
>>>>> 	[SM] Interesting case but different from the typical end-user uses a VPN case in which the issue will potentially crop up. VPNs (full or split tunnel) from end users to VPN providers or into the office are quite common nowadays and I bet most will not propagate dscps to the outer layer, so let's treat these as an independent category.
>>>> [BB] You just lost your bet. Not a good idea to bet that no-one uses a technology that is being sold all round the world, especially not on the IETF list for that technology.
>>> 	[SM] So you have numbers of relative usage of the different VPNs from end users home networks? If so please post a link to the source of that knowledge.
>>>   My hypothesis is that most end users will use either OpenVPN or wireguard which both do not seem to propagate DSCP bits around by default...
>> [BB] I only know what I know from talking with customers and potential customers, esp. in the finance sector, which was where I got called in more, due to latency questions. I don't know of any market surveys.
> 	[SM2] Well see my condition "VPNs (full or split tunnel) from end users to VPN providers or into the office", I am talking about end-users here, I clearly should have made that clearer. And in my experience ISPs either do not accepts arbitrary DSCPs from mere end-users, at best they keep the dscp but give no differential treatment, at worst they re-mark as they see fit... (my experience is limited, so this is anecdotal only, I claim no generality here)

[BB2] You're in general correct. I already pointed that out a couple of 
posts ago in this thread: 
https://mailarchive.ietf.org/arch/msg/tsvwg/qtEJqivzZUDQeyLehdlLbI3leF8/


>> - I'm not trying to say low replay windows won't affect the DualQ -
>>>>> 	[SM] The other way around, DualQ requires an increases replay-window, it is L4S that is the root cause of the configuration change here, since it constitutes a new mechanism for re-ordering.
>>>> [BB] Here we go again. Let's find a problem that already exists, show L4S suffers from it, then blame L4S. This is tiresome.
>>> 	[SM] That is not how I see the issue. While this is not a conceptually novel problem L4S created, it is an issue that L4S makes significantly worse compared to the status quo.
>> [BB] I thought you agreed it would likely be worse with Diffserv (which is status quo)
> 	[SM2] Not what I intended to convey. In my setting of end-users using an encryppted VPN from their home link, and either OpenVPN or nowadays more often wireguard, DSCP propagation is not happening by default and hence can not be a significant cause of concern, I hope that clarifies my position.

[BB2] This is a view of a world populated by techies who choose their 
own VPN software. I accept that the VPN products that you select from 
don't propagate the DSCP. But we also need to think of all the 
commercial VPN products that IT departments choose and install on behalf 
of their end-users so they can access the work intranet from home or 
while on the road. That's all I'm trying to explain here.

Regards



Bob

>>> Not my responsibility that L4S has many of these problematic spots (RTT-bias, equitable sharing between C and L queues, lack of even a single fully requirement fulfilling protocol, rfc3168 incompatibility, the list goes on).
>>>
>>>
>>>>>> l4sops and aqm-dualq-coupled should certainly recommend a large enough replay window.
>>>>> 	[SM] Well, that is a stop-gap measure, really. IMHO the IETF recommends to propagate ECN bits, so the IETF should make sure that this recommended behavior does not cause unexpected negative side-effects. Telling VPN users to change because of someone else's experiment is a bit lame. Also it will be hard to actually get to the affected users and tell them not to worry about replay attacks, but simply enlarge the replay-window instead...
>> [BB] See last email to Jonathan, about VPN replay protection being second line of defence behind replay protection in e2e security protocols.
>>
>> But I agree that telling VPN users to change for an experiment is lame.
> 	[SM2] Unexpected, but good!
>
>
>> Fortunately, I doubt anyone will notice if they leave their replay window as it is.
>> Although this is not a fifth order low probability as in the spurious retransmits problem, it is a third order low probability, with the same first two points as for spurious retransmits:
>> 1) multiple bottlenecks on a path are rare
>> 2) the first bottleneck has to be a classic AQM and the second an L4S DualQ
>> 3) running a VPN with replay protection enabled, while coinciding with both 1 & 2.
> 	[SM2] We might be talking past each other, but Pete's scenario is much simpler, one ECT(1) "greedy" flow and one ECT(0) "greedy" flow through an ipsec tunnel with replay protection pass through an L4S AQM, and depending on the replay-window size the ECT(0) flow gets less than expected throughput, while the ECT(1) flow gets more.
> Specifically this means that:
> 1) the number of bottlenecks is pretty irrelevant, one is sufficient, it does need to be DualQ though.
> 2) is irrelevant since one L4S AQM at one bottleneck is sufficient
> 3) is required for the new failure mode, but the relevant 1) & 2) are considerably weaker constraints than in your list.
>
> You might be talking about a different problem here, but I am still concerned about simple functionality under normal expected conditions, I am not looking into attempts at malice here.
>
>> I should maybe add a point zero that Jonathan often makes about FQ AQMs:
>> 0) bottlenecks tend to be congested only transiently
> 	[SM2] Which is enough to cause the issue, and the classic queue is expected to see periodic peaks (the data transmitted into the network between the AQM emitting a CE/drop and the sender learning about that)
>
>>>>> 	Also, if I might add, it demonstrates why FQ actually is a pretty decent solution in general, sure equitable sharing is by no means guaranteed to be the optimal solution, but at the same time, given the limited information at the AQM it might be the least worst it can do...
>>>>>
>>>>>
>>>>>> - I'm just saying that there are other established technologies that reduce queuing delay for a subset of traffic, and from current insanely high levels, so they will be a longer pole in the tent than the DualQ. Then, as long as the replay window of VPNs is large enough for those established technologies, it will be large enough for the DualQ experiment.
>>>>> 	[SM] How that? Pete demonstrated that recommended default values of 32 or 64 packets are already enough to see the issue, but the same 32/64 packets seem to work reasonably well for the re-ordering one might encounter on the existing internet.
>>>> [BB] How do you know? As Pete said, this is the sort of problem that it would be very difficult to diagnose.
>>> 	[SM] For example wireguard will throw an error:
>>>
>>> net_dbg_ratelimited("%s: Packet has invalid nonce %llu (max %llu)\n",
>>> 	peer->device->dev->name,
>>> 	PACKET_CB(skb)->nonce,				
>>> 	keypair->receiving_counter.counter);
>>>
>>> logging the reception of a "stale" packet (as will OpenVPN). So the "noticing something is afoot" part is not that subtle, IMHO getting to the "where and why" the error was thrown is difficult to diagnose.
>> [BB] I meant how do you know "the same 32/64 packets seem to work reasonably well for the re-ordering one might encounter on the existing internet"?
> 	[SM2] My hypothesis here is that networks tend to be designed to give reasonable performance with TCP, which already is re-ordering sensitive, but I have no hard data to back up that hypothesis yet. Why 32/64, here I agree with you that was convenient for some implementation to simply interpret a 32 or 64 integer as bitfield, but it seems that works reasonably well (not perfect though).
>
>> Log messages abound for all sorts of bumps and farts. Few admins spend their time looking for problems to solve.
> 	[SM2] Might be true, but people I know tend to take error message of security/privacy applications more serious than other notifications. If you do not bother whether encryption actually works you are probably much better of with plain encapsulation without encryption, less work.CPU-load.
>
>>>> A quick search found this Cisco troubleshooting guide for replay drops.
>>>> https://www.cisco.com/c/en/us/support/docs/ip/internet-key-exchange-ike/116858-problem-replay-00.html
>>>> You'll see this text in the problem statement:
>>>> "Certain QoS features, such as Low Latency Queueing (LLQ), can cause IPSec packet delivery to become out-of-order and dropped by the receiving endpoint due to a replay check failure."
>>>>
>>>> If a VoIP flow within a VPN went through a low latency queue, and a BE flow within the VPN didn't, the BE flow would experience replay drops as it pushed up its the delay in its own queue. That would sort-of act like an AQM - pretty cool! (except no burst tolerance) ...
>>>> ...I'm distracting myself. My point was going to be that the drops would always be from the higher delay queue. Normally no-one would feel a need to check why there were some packet drops from a BE flow. So this could well be going on unnoticed right under our noses (speculation).
>>>>
>>>>> As I indicated during the re-ordering of CE discussion some time ago, declaring a specific level of re-ordering benign requires a set of assumptions, and these might or might not hold... in the VPN with replay-protection case (which seems to be the recommended mode) these assumptions do not seem to hold.
>>>> [BB] I'm saying the number 64 for a reply window is insane at today's packet rates (it's perfectly sane for implementation efficiency, but perfectly insane in terms of sensitivity to reordering).
>>> 	[SM] Mmmh, that is the point, Pete's data shows the issue already at 10 Mbps, which is not a high packet rate, see http://sce.dnsmgr.net/results/l4s-2020-11-11T120000-final/l4s-s9-tunnel-reordering/l4s-s9-tunnel-reordering-ns-ipsec-replay-win-32-dualpi2-10Mbit-20ms_tcp_delivery_with_rtt.svg.
>> [BB] I mean 64 is insane today (without L4S). See email to Jonathan with pointer to reordering degree on the Internet today, and example where even 150μs jitter between two bonded links could be used by an attacker to trigger VPN drops at just 100Mb/s.
> 	[SM] Not sure this is the right venue to discuss recommendations for default replay-window sizes, the relevant audience simply is not here. This is the right venue though to argue that L4S should do its best not to increase re-ordering gratuitously... in the end applications are only tolerant to re-ordering to a certain degree, so we should think in terms of an acceptable re-ordering budget, and I do not think we should invest that budget to help out L4S side-effects.
>
>
>
>>> At 10 Mbps a 64 packet window covers up to* (32*1538*8) / (10 * 1000^2) * 1000 = 39.37 ms. It takes a bit of reordering along the path to exceed that replay window...
>>>
>>>
>>> *) A fixed packet window obviously will have a duration at bottleneck rate equivalent that depends on the packet size distribution inside the current window.
>> Why 32 in your formula, not 64?
> 	[SM2] Good point, sorry. I initially posted a link to the replay-win-64 graph and had calculations for 64 packets, when I changed that to 32 I failed to update all references. The reason for using the 32 is simply that the issue at hand is more subtle in the 64 packet case (and gets confounded with L4S default biased sharing, almost always giving ECT(1) flows an extra helping of capacity, but since the effect of L4S on re-ordering is an independent issue, let's ignore that here).
>
> Regards
> 	Sebastian
>
>
>>
>>
>> Bob
>>
>>>
>>>>>> In the context of the IETF, irrespective of the L4S experiment, the IETF needs to fix this bigger inconsistency between the standards tracks of its transport and security areas. I'll leave David to escalate this to the ADs if appropriate. Because Pete's right - it may not be easy for admins to identify the cause of this problem, and admins and security implementers don't tend to reach out for advice in transport RFCs.
>>>>> 	[SM] And no wordsmithing in any RFC is guaranteed to reach the current operators/users of such tunnels any time soon.
>>>> [BB] I wasn't meaning wordsmithing. There's a technical conflict to be resolved.
>>> 	[SM] Well, one could try to get better recommendations for how to set the replay window into other ipsec/replay-window RFCs.
>>> That in turn leads to the question how these would look?
>>> For L4S Pete's proposal of something like
>>> (2 * C-queue latency target [ms]) / ((average packet size [bit]) / (bottleneck rate [bit/ms]))
>>> or for 25 ms and 10 mbps:
>>> (2 * 25) / ((1538*8) / (10 * 1000^2) * 1000)  = 40.6371911573
>>> might do, at least for dealing with L4S, IFF the C-queues latency target would not be a free configuration parameter of the L4S AQM, that can be set to arbitrary values...
>>> But I might be over seeing something here, so please explain your solution to this conundrum.
>>>
>>>
>>> And that still leaves the question how to let operators of existing replay-protected tunnels now that they are supposed to fix somebody else's experiment by adapting to new configurations.
>>>
>>>
>>> Regards
>>> 	Sebastian
>>>
>>>
>>>>
>>>> Bob
>>>>
>>>>> Best Regards
>>>>> 	Sebastian
>>>>>
>>>>>
>>>>>> Bob
>>>>>>
>>>>>> On 08/05/2021 07:45, Pete Heist wrote:
>>>>>>> I've added some additional tests at 10 and 20Mbps, and re-worked the
>>>>>>> writeup to include a table of the results:
>>>>>>>
>>>>>>> https://github.com/heistp/l4s-tests/#dropped-packets-for-tunnels-with-replay-protection-enabled
>>>>>>>
>>>>>>> I noticed that this issue seems to affect tunnels with replay window
>>>>>>> sizes of 32 and 64 packets regardless of the bottleneck bandwidth,
>>>>>>> likely because the peak C sojourn times can also increase as the
>>>>>>> bandwidth decreases. IMO, this seems like a safety concern from the
>>>>>>> standpoint that the deployment of DualPI2 can cause harm to
>>>>>>> conventional traffic, in IPsec tunnels using common defaults in
>>>>>>> particular, beyond that which is caused by DualPI2 itself.
>>>>>>>
>>>>>>> It may be fixed by increasing the window size or disabling replay
>>>>>>> protection, but it may not be easy for admins or users to identify the
>>>>>>> source of this problem when it occurs, or know who to contact about it.
>>>>>>>
>>>>>>> Pete
>>>>>>>
>>>>>>> On Sat, 2021-05-08 at 02:01 +0000, Black, David wrote:
>>>>>>>> [posting as an individual, not a WG chair]
>>>>>>>> Linking together a couple of related points:
>>>>>>>>
>>>>>>>>> [SM] Current Linux kernels seem to use a window of ~8K packets, while
>>>>>>>>> OpenVPN defaults to 64 packets, Linux ipsec seems to default to
>>>>>>>>> either 32 or 64. 8K should be reasonably safe, but 64 seems less
>>>>>>>>> safe.
>>>>>>>> Common VPN design practice here appears to be picking a plausible
>>>>>>>> default size (which can be reconfigured and change from release to
>>>>>>>> release) for the accounting window to detect replay, hence this:
>>>>>>>>
>>>>>>>>>>   But, in any case, it seems to me that protocols that need to be
>>>>>>>>>> robust to out-of-order delivery would need to consider being robust
>>>>>>>>>> to re-ordering in time units anyway, and so would naturally need to
>>>>>>>>>> scale that functionality as packet rates increase.
>>>>>>>> may not happen in a smooth fashion.  As Sebastian writes:
>>>>>>>>
>>>>>>>>> [SM] The thing is these methods aim to avoid Mallory fudging with the
>>>>>>>>> secure connection between Alice and Bob (not our's), and need to
>>>>>>>>> track packet by packet, that is not easily solved efficiently with a
>>>>>>>>> simple time-out
>>>>>>>> That's correct, and use of a simple time-out by itself is prohibited
>>>>>>>> for obvious security reasons.  For more details on a specific example,
>>>>>>>> see Section 3.4.3 of RFC 4303 (ESP), which specifies the ESP anti-
>>>>>>>> replay mechanism (could be used as a reference in writing text on how
>>>>>>>> L4S interacts with anti-replay)  ... and the observant reader will
>>>>>>>> notice that this section is a likely source of the anti-replay 32 and
>>>>>>>> 64 packet values for Linux IPsec:
>>>>>>>> https://datatracker.ietf.org/doc/html/rfc4303#section-3.4.3 .
>>>>>>>>
>>>>>>>> Thanks, --David
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Sebastian Moeller
>>>>>>>> Sent: Wednesday, May 5, 2021 5:21 PM
>>>>>>>> To: Greg White
>>>>>>>> Cc: TSVWG
>>>>>>>> Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
>>>>>>>>
>>>>>>>>
>>>>>>>> [EXTERNAL EMAIL]
>>>>>>>>
>>>>>>>> Hi Greg,
>>>>>>>>
>>>>>>>> thanks for your response, more below prefixed [SM].
>>>>>>>>
>>>>>>>>> On May 3, 2021, at 19:35, Greg White <g.white@CableLabs.com> wrote:
>>>>>>>>>
>>>>>>>>> I'm not familiar with the replay attack mitigations used by VPNs, so
>>>>>>>>> can't comment on whether this would indeed be an issue for some VPN
>>>>>>>>> implementations.
>>>>>>>> [SM] I believe this to be an issue for at least those VPNs that use UDP
>>>>>>>> and defend against replay attacks (including ipsec, wireguard,
>>>>>>>> OpenVPN). All more or less seem to use the same approach with a limited
>>>>>>>> accounting window to allow out-of-order delivery of packets. The head
>>>>>>>> of the window typically seems to be advanced to the packet with the
>>>>>>>> highest "sequence" number, hence all of these are sensitive for the
>>>>>>>> kind of packet re-ordering the L4S ecn id draft argues was benign...
>>>>>>>>
>>>>>>>>
>>>>>>>>>   A quick search revealed
>>>>>>>>> (https://urldefense.com/v3/__https://www.wireguard.com/protocol/__;!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOP789qBv$
>>>>>>>>>   [wireguard[.]com] ) that Wireguard apparently has a window of about
>>>>>>>>> 2000 packets, so perhaps it isn't an immediate issue for that VPN
>>>>>>>>> software?
>>>>>>>> [SM] Current Linux kernels seem to use a window of ~8K packets, while
>>>>>>>> OpnenVPN defaults to 64 packets, Linux ipsec seems to default to either
>>>>>>>> 32 or 64. 8K should be reasonably safe, but 64 seems less safe.
>>>>>>>>
>>>>>>>>> But, if it is an issue for a particular algorithm, perhaps another
>>>>>>>>> solution to address condition b would be to use a different "head of
>>>>>>>>> window" for ECT1 packets compared to ECT(0)/NotECT packets?
>>>>>>>> [SM] Without arguing whether that might or might not be a good idea, it
>>>>>>>> is not what is done today, so all deployed end-points will treat all
>>>>>>>> packets the same but at least wireguard and linux ipsec will propagate
>>>>>>>> ECN vaule at en- and decapsulation, so are probably affected by the
>>>>>>>> issue.
>>>>>>>>
>>>>>>>>> In your 100 Gbps case, I guess you are assuming that A) the
>>>>>>>>> bottleneck between the two tunnel endpoints is 100 Gbps, B) a single
>>>>>>>>> VPN tunnel is consuming the entirety of that 100 Gbps link, and C)
>>>>>>>>> that there is a PI2 AQM targeting 20ms of buffering delay in that 100
>>>>>>>>> Gbps link?  If so, I'm not sure that I agree that this is likely in
>>>>>>>>> the near term.
>>>>>>>> [SM] Yes, the back-of-an-envelop worst case estimate is not terribly
>>>>>>>> concerning, I agree, but the point remains that a fixed 20ms delay
>>>>>>>> target will potentially cause the issue with increasing link speeds...
>>>>>>>>
>>>>>>>>
>>>>>>>>>   But, in any case, it seems to me that protocols that need to be
>>>>>>>>> robust to out-of-order delivery would need to consider being robust
>>>>>>>>> to re-ordering in time units anyway, and so would naturally need to
>>>>>>>>> scale that functionality as packet rates increase.
>>>>>>>> [SM] The thing is these methods aim to avoid Mallory fudging with the
>>>>>>>> secure connection between Alice and Bob (not our's), and need to track
>>>>>>>> packet by packet, that is not easily solved efficiently with a simple
>>>>>>>> time-out (at least not as far as I can seem but I do not claim
>>>>>>>> expertise in cryptology or security engineering). But I am certain, if
>>>>>>>> you have a decent new algorithm to enhance RFC2401 and/or RFC6479 the
>>>>>>>> crypto community might be delighted to hear them. ;)
>>>>>>>>
>>>>>>>>> I'm happy to include text in the L4Sops draft on this if the WG
>>>>>>>>> agrees it is useful to include it, and someone provides text that
>>>>>>>>> would fit the bill.
>>>>>>>> [SM] I wonder whether a section on L4S-OPs a la, "make sure to
>>>>>>>> configure a sufficiently large replay window to allow for ~20ms
>>>>>>>> reordering" would be enough, or  wether the whole discussion would not
>>>>>>>> also be needed in
>>>>>>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-B.1__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOFX4wc3G$
>>>>>>>>   [datatracker[.]ietf[.]org] widening the re-ordering scope from the
>>>>>>>> existing "Risk of reordering Classic CE packets" subpoint 3.?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>          Sebastian
>>>>>>>>
>>>>>>>>
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/3/21, 1:44 AM, "tsvwg on behalf of Sebastian Moeller"
>>>>>>>>> <tsvwg-bounces@ietf.org on behalf of moeller0@gmx.de> wrote:
>>>>>>>>>
>>>>>>>>>     Dear All,
>>>>>>>>>
>>>>>>>>>     we had a few discussions in the past about L4S' dual queue design
>>>>>>>>> and the consequences of packets of a single flow being accidentally
>>>>>>>>> steered into the wrong queue.
>>>>>>>>>     So far we mostly discussed the consequence of steering all packets
>>>>>>>>> marked CE into the LL-queue (and
>>>>>>>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-B.1__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOFX4wc3G$
>>>>>>>>>   [datatracker[.]ietf[.]org] Risk of reordering Classic CE packets:
>>>>>>>>> only discusses this point); there the argument is, that this
>>>>>>>>> condition should be rare and should also be relative benign, as an
>>>>>>>>> occasional packet to early should not trigger the 3 DupACK mechanism.
>>>>>>>>> While I would liked to see hard data confirming the two hypothesis,
>>>>>>>>> let's accept that argument for the time being.
>>>>>>>>>
>>>>>>>>>     BUT, there is a traffic class that is actually sensitive to
>>>>>>>>> packets arriving out-of-order and too early: VPNs. Most VPNs try to
>>>>>>>>> secure against replay attacks by maintaining a replay window and only
>>>>>>>>> accept packets that fall within that window. Now, as far as I can
>>>>>>>>> see, most replay window algorithms use a bounded window and use the
>>>>>>>>> highest received sequence number to set the "head" of the window and
>>>>>>>>> hence will trigger replay attack mitigation, if the too-early-packets
>>>>>>>>> move the replay window forward such that "in-order-packets" from the
>>>>>>>>> shorter queue fall behind the replay window.
>>>>>>>>>
>>>>>>>>>     Wireguard is an example of a modern VPN affected by this issue,
>>>>>>>>> since it supports ECN and propagates ECN bits between inner and outer
>>>>>>>>> headers on en- and decapsulation.
>>>>>>>>>
>>>>>>>>>     I can see two conditions that trigger this:
>>>>>>>>>     a) the arguably relatively rare case of an already CE-marked
>>>>>>>>> packet hitting an L4S AQM (but we have no real number on the
>>>>>>>>> likelihood of that happening)
>>>>>>>>>     b) the arguably more and more common situation (if L4S actually
>>>>>>>>> succeeds in the field) of an ECT(1) sub-flow zipping past
>>>>>>>>> ECT(0)/NotECT sub-flows (all within the same tunnel outer flow)
>>>>>>>>>
>>>>>>>>>     I note that neither single-queue rfc3168 or FQ AQMs (rfc3168 or
>>>>>>>>> not) are affected by that issue since they do not cause similar re-
>>>>>>>>> ordering.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     QUESTIONS @ALL:
>>>>>>>>>
>>>>>>>>>     1)  Are we all happy with that and do we consider this to be
>>>>>>>>> acceptable collateral damage?
>>>>>>>>>
>>>>>>>>>     2) If yes, should the L4S OPs draft contain text to recommend end-
>>>>>>>>> points how to cope with that new situation?
>>>>>>>>>          If yes, how? Available options are IMHO to eschew the use of
>>>>>>>>> ECN on tunnels, or to recommend increased replay window sizes, but
>>>>>>>>> with a Gigabit link and L4S classic target of around 20ms, we would
>>>>>>>>> need to recommend a repay window of:
>>>>>>>>>> = ((1000^3 [b/s]) / (1538 [B/packet] * 8 [b/B])) *
>>>>>>>>>> (20[ms]/1000[ms]) = 1625.48764629 [packets]
>>>>>>>>>     or with a power of two algorithm 2048, which is quite a bit larger
>>>>>>>>> than the old default of 64...
>>>>>>>>>          But what if the L4s AQM is located on a back-bone link with
>>>>>>>>> considerably higher bandwidth, like 10 Gbps or even 100 Gbps? IMHO a
>>>>>>>>> replay window of 1625 * 100 = 162500 seems a bit excessive
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Also the following text in
>>>>>>>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-A.1.7__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOJfaO_VT$
>>>>>>>>>   [datatracker[.]ietf[.]org]
>>>>>>>>>
>>>>>>>>>     "  Should work in tunnels:  Unlike Diffserv, ECN is defined to
>>>>>>>>> always
>>>>>>>>>           work across tunnels.  This scheme works within a tunnel that
>>>>>>>>>           propagates the ECN field in any of the variant ways it has
>>>>>>>>> been
>>>>>>>>>           defined, from the year 2001 [RFC3168] onwards.  However, it
>>>>>>>>> is
>>>>>>>>>           likely that some tunnels still do not implement ECN
>>>>>>>>> propagation at
>>>>>>>>>           all."
>>>>>>>>>
>>>>>>>>>     Seems like it could need additions to reflect the just described
>>>>>>>>> new issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Best Regards
>>>>>>>>>          Sebastian
>>>>>>>>>
>>>>>>>>>
>>>>>> -- 
>>>>>> ________________________________________________________________
>>>>>> Bob Briscoe                               http://bobbriscoe.net/
>>>>>>
>>>> -- 
>>>> ________________________________________________________________
>>>> Bob Briscoe                               http://bobbriscoe.net/
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                               http://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/