Re: [tsvwg] L4S dual-queue re-ordering and VPNs

Bob Briscoe <ietf@bobbriscoe.net> Sun, 09 May 2021 15:59 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B23D13A1143 for <tsvwg@ietfa.amsl.com>; Sun, 9 May 2021 08:59:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.773
X-Spam-Level:
X-Spam-Status: No, score=0.773 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jwI_ush2zLSd for <tsvwg@ietfa.amsl.com>; Sun, 9 May 2021 08:59:00 -0700 (PDT)
Received: from mail-ssdrsserver2.hosting.co.uk (mail-ssdrsserver2.hosting.co.uk [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C48B23A1140 for <tsvwg@ietf.org>; Sun, 9 May 2021 08:58:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=N+oRbVyNUw+YDttAIHzSiKXH158CbDcMBkuEmcB/uPU=; b=RAcRUj+RAlWYCfpKUHlplvxwsy YvSRsgkcNIZ+MCzEVDrGAwf4yBQwyYi6jXNovcOtEafryadVeW9INGlbDoLDndGwwjl7/f5nIZSNT xlV1+Vs0XAdTvMGy7GHWI7ys6WP+Y2NcCz6xXRLVtrv0H1xt3uSwh/hrULZmge8yizZPPnmIg3ytP mmD4xKWLkXTXaKvqf0DyjlbZIJd53MRtShxysg6s0X4Jq0cznF1Hsh60Md+Th+JfGKY3FZhxtN4NP OdlxA+vVB5iwPXWjIyVNv0uZOWv0by5gVwJKUmXNiD/8Xo+0NaH6cGukToFokz8pOVznqR3KelFUc +71UTyRg==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:48688 helo=[192.168.1.11]) by ssdrsserver2.hosting.co.uk with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <ietf@bobbriscoe.net>) id 1lflpZ-0003Gn-6L; Sun, 09 May 2021 16:58:57 +0100
To: "Black, David" <David.Black@dell.com>, Sebastian Moeller <moeller0@gmx.de>
Cc: TSVWG <tsvwg@ietf.org>
References: <68F275F9-8512-4CD9-9E81-FE9BEECD59B3@cablelabs.com> <1DB719E5-55B5-4CE2-A790-C110DB4A1626@gmx.de> <MN2PR19MB40452C9DD1164609A005139583569@MN2PR19MB4045.namprd19.prod.outlook.com> <e15d732f64bf983975dbe507092b39f0744f7f74.camel@heistp.net> <1efe0dfb-afb6-0aa4-dcff-fb4ddeb46b8f@bobbriscoe.net> <689EAC46-9873-40BC-A8EE-12060336FB19@gmx.de> <a6ae94e9-4bdb-222b-d206-7f35fc807948@bobbriscoe.net> <42ECC889-440C-4DE1-BD8C-C983387E460C@gmx.de> <MN2PR19MB40456D762230A2583DE6919383559@MN2PR19MB4045.namprd19.prod.outlook.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <49f7fca5-9fb3-0026-0a9c-47bab888da3e@bobbriscoe.net>
Date: Sun, 09 May 2021 16:58:54 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <MN2PR19MB40456D762230A2583DE6919383559@MN2PR19MB4045.namprd19.prod.outlook.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hosting.co.uk
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hosting.co.uk: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hosting.co.uk: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/_7SleeloZwykvh36iQhNl9x4Rq0>
Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 May 2021 15:59:06 -0000

David,

On 09/05/2021 03:51, Black, David wrote:
> Top posting to try to add some insight.
>
> There are multiple different sorts of networking technology that use the virtual private network (VPN) term.  The specific technology under discussion here is secure VPNs that use IPsec, TLS or DTLS, and specifically secure VPNs that provide an anti-replay service.

[BB] Correct.

> The BT WBC offering that Bob provided a reference to may be a VPN, but it is not a secure VPN (e.g., the only mention of IPsec in the referenced document occurs in cautionary text about the impact of tunneling on MTU).

[BB] I thought I'd made it clear, sorry (Sebastian also misunderstood so 
it must be me). I only offered BT's WBC (and the similar products of 
other operators) as proof that operators offer Diffserv service on the 
access link.

Then Retail ISPs and other companies use it as a component part of their 
service packages like VoIP, or corporate VPNs, or VPNs between 
collaborative companies, or whatever.

Most users wouldn't be aware if their VPN or VoIP was using Diffserv 
underneath or not. But if you've ever wondered why the VoIP that certain 
ISPs offer doesn't suffer from latency under load, whereas an 
Over-The-Top one does, it might be 'cos the former is built over 
Diffserv in the access network.

I can't immediately find an example of a corporate VPN product that uses 
the above Diffserv service.

PlusNet is one example of a UK retail ISP that uses Diffserv (years ago 
I tried to get them into AQM, but they prefer their way). Here's their 
page about it:
https://www.plus.net/help/broadband/about-traffic-prioritisation/ . 
You'll see they detect VPN traffic in the network and apply a DSCP to 
the VPN (rather than the VPN asking for it). So this is /not/ an example 
of a case where there might be different DSCPs within a single VPN.

>
> As Sebastian has pointed out, IPsec (see RFC 4301) was designed based on Diffserv and ECN as they existed at that time (e.g., RFC 4301 cites RFCs 2474, 2475 and 3168), and RFC 4301 covers the interaction of Diffserv QoS differences with anti-replay, including providing mechanisms to deal with that interaction, so I'm not seeing a significant inconsistency between Diffserv and IPsec.  FWIW, I would not expect to, as I worked on this area of RFC 4301, e.g., as the author of RFC 2983.

[BB] All correct.

>
> I observe that appropriate L4S usage of a Guard DSCP would result in L4S fitting cleanly into the RFC 4301 provisions for reordering caused by Diffserv, and on that basis I'll politely decline Bob's implicit invitation to revise RFC 4301 (IPsec Architecture) in order to support the L4S experiment's choice of identifier.

[BB] No revision of 4301 is needed (except I'd point out that RFC6040 
already updated it). What did you think my implicit invitation was? It 
was so implicit I must have missed it.

The issue is whether the non-mandatory advice in 4301 is being followed:
* scale the replay window with bit-rate,
* separate DSCPs per SA.

On this last one, did you see my question earlier in the thread asking 
whether someone knows whether the 'SHOULD' in RFC4301 recommending one 
SA per DSCP is generally adhered to? I imagine implementers might cut 
corners on that SHOULD, due to the complexity of setting up multiple SAs.


Bob

>
> Thanks, --David
>
> -----Original Message-----
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Sebastian Moeller
> Sent: Saturday, May 8, 2021 7:59 PM
> To: Bob Briscoe
> Cc: TSVWG
> Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
>
>
> [EXTERNAL EMAIL]
>
> Hi Bob,
>
>
>> On May 9, 2021, at 00:12, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> Sebastian,
>>
>> On 08/05/2021 19:09, Sebastian Moeller wrote:
>>> Hi Bob, list,
>>>
>>> see [SM] below
>>>
>>>> On May 8, 2021, at 16:26, Bob Briscoe <in@bobbriscoe.net> wrote:
>>>>
>>>> Thank you, Sebastian, for picking up this inter-area inconsistency, and thanks, Pete, for the test data.
>>>>
>>>> I would have thought that the delay-delta between two AQMs (as in the DualQ) will in general be much less than the delay-delta between a low delay and best efforts Diffserv behaviour, where the best efforts generally has no AQM at all, and is therefore prone to the delay of the whole buffer, which may be bloated.
>>> 	[SM] Well possible, but not that relevant, as far as I understand the Linux source code wireguard and ipsev/xfrm only propagate the ECN bits between inner and outer layers and do not propagate the other 6 dscp bits (of the former TOS byte).
>> [BB] For a patch that adds DSCP propagation to the outer of a wireguard tunnel, see https://urldefense.com/v3/__https://lists.zx2c4.com/pipermail/wireguard/2019-March/004026.html__;!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZD_94Muae$ [lists[.]zx2c4[.]com]
> [SM] But see actual wireguard implementation in the Linux kernel (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.12.2/source/drivers/net/wireguard/send.c__;!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZD6hnvgVD$ [elixir[.]bootlin[.]com]):
>
> skb_queue_walk(&packets, skb) {
> 		/* 0 for no outer TOS: no leak. TODO: at some later point, we
> 		 * might consider using flowi->tos as outer instead.
> 		 */
> 		
> 		PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb);
> 		PACKET_CB(skb)->nonce =
> 		
> 		atomic64_inc_return(&keypair->sending_counter) - 1;
>
> 		if (unlikely(PACKET_CB(skb)->nonce >= REJECT_AFTER_MESSAGES))			
> 		goto out_invalid;
> }
>
>
> See the comment and the first zero argument to ip_tunnel_ecn_encap? That is where current wireguard irgnores the DSCP and enforces DSCP 0. I have actually looked into the source while trying to understand this issue.
>
>
>> Also, there are many more VPNs on the Internet than the software you happen to use.
> 	[SM] True, and nothing I claimed to be otherwise. I do mention though that for end-users wireguard  and OpenVPN are common and attractive VPN implementations since they are supported by commercial VPN endpoints (which makes them a bit more attractive than pure ipsec) and the clients are free to use.
>
>> Throughout the world there's a thriving market in VPN services for home office working, corporate satellite offices, inter-business collaborations, and so on. These VPNs propagate the DSCP to the outer,
> 	[SM] What fraction of them does that, and what fraction allows to disable that?
>
>> so they can use Diffserv service like this one in the published spec of the 'Wholesale Broadband Connect' service my previous employer (BT) offers in the UK. See section 5.8 for the DSCPs supported:
>> https://urldefense.com/v3/__https://www.bt.com/bt-plc/assets/documents/sinet/sins/downloads/472v2p11.pdf__;!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZD0Rynjgg$ [bt[.]com]
>> This is just one ISP's wholesale product sold to retail ISPs, who then build on top the services they sell to their customers, such as VoIP, corporate VPNs, etc.
> 	[SM] This is interesting data, but orthogonal to my point and report. If such VPNs do this, they need to solve the re-ordering issue. BTW, the document does not go into any details of the VPN, so no way to figure out whether it actually protects against replay attacks.
>
>>> In fact the issue I see is that L4S introduces a novel re-ordering condition that has not exited before and that hence is on nobody's radar.
>> [BB] When all the flows are within a VPN, the reordering within a flow in the DualQ is indistinguishable from reordering between flows, as far as the VPN sequencing is concerned. Reordering between flows within a VPN is not a new phenomenon at all - it's expected.
> 	[SM] Encrypted traffic is in now way different from other internet traffic, hence some re-ordering along a network path has to be accepted, true; network operators would prefer that "some" to be larger, end-users/protocols would prefer that "some" to be smaller. But the observed up to ~50ms speed-up that ECT(1) flows see over ECT(0) in the DualQ AQM (with default targets) is just far outside of the expected "some" for VPNs. It is unfortunate (for L4S), that replay windows are sensitive for expedited early packets, while ACK streams seem more affected by delayed packets*.
>
> *) that is the theory, there has been little data to actually conform that hypothesis
>
>
>>>> So, wherever a VPN includes flows using different DSCPs, and there is a Diffserv-enabled bottleneck between the ends of the VPN, the VPN's replay window will need to cater for considerably more than 50ms delay-delta within the VPN. More like at least 200ms, and possibly 1-2s in some cases of bloat.
>>> 	[SM] Sure, but neither ipsec nor wireguard do that, outer DSCP for wireguard seems to be fixed to 0...
>>>
>>>> David himself has written about the reordering problem when a WebRTC application encapsulates TCP, SCTP and RTP flows with different DSCPs within UDP [RFC7657]. Indeed, datagram transport layer security (DTLS) is a common encapsulation for WebRTC flows. And DTLS also recommends a default replay window of 64 (see https://urldefense.com/v3/__https://tools.ietf.org/html/rfc6347*section-4.1.2.6__;Iw!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZDy4nJfZz$ [tools[.]ietf[.]org] ).
>>> 	[SM] Interesting case but different from the typical end-user uses a VPN case in which the issue will potentially crop up. VPNs (full or split tunnel) from end users to VPN providers or into the office are quite common nowadays and I bet most will not propagate dscps to the outer layer, so let's treat these as an independent category.
>> [BB] You just lost your bet. Not a good idea to bet that no-one uses a technology that is being sold all round the world, especially not on the IETF list for that technology.
> 	[SM] So you have numbers of relative usage of the different VPNs from end users home networks? If so please post a link to the source of that knowledge.
>   My hypothesis is that most end users will use either OpenVPN or wireguard which both do not seem to propagate DSCP bits around by default...
>
>
>>>> - I'm not trying to say low replay windows won't affect the DualQ -
>>> 	[SM] The other way around, DualQ requires an increases replay-window, it is L4S that is the root cause of the configuration change here, since it constitutes a new mechanism for re-ordering.
>> [BB] Here we go again. Let's find a problem that already exists, show L4S suffers from it, then blame L4S. This is tiresome.
> 	[SM] That is not how I see the issue. While this is not a conceptually novel problem L4S created, it is an issue that L4S makes significantly worse compared to the status quo. Not my responsibility that L4S has many of these problematic spots (RTT-bias, equitable sharing between C and L queues, lack of even a single fully requirement fulfilling protocol, rfc3168 incompatibility, the list goes on).
>
>
>>>> l4sops and aqm-dualq-coupled should certainly recommend a large enough replay window.
>>> 	[SM] Well, that is a stop-gap measure, really. IMHO the IETF recommends to propagate ECN bits, so the IETF should make sure that this recommended behavior does not cause unexpected negative side-effects. Telling VPN users to change because of someone else's experiment is a bit lame. Also it will be hard to actually get to the affected users and tell them not to worry about replay attacks, but simply enlarge the replay-window instead...
>>> 	Also, if I might add, it demonstrates why FQ actually is a pretty decent solution in general, sure equitable sharing is by no means guaranteed to be the optimal solution, but at the same time, given the limited information at the AQM it might be the least worst it can do...
>>>
>>>
>>>> - I'm just saying that there are other established technologies that reduce queuing delay for a subset of traffic, and from current insanely high levels, so they will be a longer pole in the tent than the DualQ. Then, as long as the replay window of VPNs is large enough for those established technologies, it will be large enough for the DualQ experiment.
>>> 	[SM] How that? Pete demonstrated that recommended default values of 32 or 64 packets are already enough to see the issue, but the same 32/64 packets seem to work reasonably well for the re-ordering one might encounter on the existing internet.
>> [BB] How do you know? As Pete said, this is the sort of problem that it would be very difficult to diagnose.
> 	[SM] For example wireguard will throw an error:
>
> net_dbg_ratelimited("%s: Packet has invalid nonce %llu (max %llu)\n",
> 	peer->device->dev->name,
> 	PACKET_CB(skb)->nonce,				
> 	keypair->receiving_counter.counter);
>
> logging the reception of a "stale" packet (as will OpenVPN). So the "noticing something is afoot" part is not that subtle, IMHO getting to the "where and why" the error was thrown is difficult to diagnose.
>
>
>> A quick search found this Cisco troubleshooting guide for replay drops.
>> https://urldefense.com/v3/__https://www.cisco.com/c/en/us/support/docs/ip/internet-key-exchange-ike/116858-problem-replay-00.html__;!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZD9D3Q2Lg$ [cisco[.]com]
>> You'll see this text in the problem statement:
>> "Certain QoS features, such as Low Latency Queueing (LLQ), can cause IPSec packet delivery to become out-of-order and dropped by the receiving endpoint due to a replay check failure."
>>
>> If a VoIP flow within a VPN went through a low latency queue, and a BE flow within the VPN didn't, the BE flow would experience replay drops as it pushed up its the delay in its own queue. That would sort-of act like an AQM - pretty cool! (except no burst tolerance) ...
>> ...I'm distracting myself. My point was going to be that the drops would always be from the higher delay queue. Normally no-one would feel a need to check why there were some packet drops from a BE flow. So this could well be going on unnoticed right under our noses (speculation).
>>
>>> As I indicated during the re-ordering of CE discussion some time ago, declaring a specific level of re-ordering benign requires a set of assumptions, and these might or might not hold... in the VPN with replay-protection case (which seems to be the recommended mode) these assumptions do not seem to hold.
>> [BB] I'm saying the number 64 for a reply window is insane at today's packet rates (it's perfectly sane for implementation efficiency, but perfectly insane in terms of sensitivity to reordering).
> 	[SM] Mmmh, that is the point, Pete's data shows the issue already at 10 Mbps, which is not a high packet rate, see https://urldefense.com/v3/__http://sce.dnsmgr.net/results/l4s-2020-11-11T120000-final/l4s-s9-tunnel-reordering/l4s-s9-tunnel-reordering-ns-ipsec-replay-win-32-dualpi2-10Mbit-20ms_tcp_delivery_with_rtt.svg__;!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZD1vzqfng$ [sce[.]dnsmgr[.]net]. At 10 Mbps a 64 packet window covers up to* (32*1538*8) / (10 * 1000^2) * 1000 = 39.37 ms. It takes a bit of reordering along the path to exceed that replay window...
>
>
> *) A fixed packet window obviously will have a duration at bottleneck rate equivalent that depends on the packet size distribution inside the current window.
>
>
>>>> In the context of the IETF, irrespective of the L4S experiment, the IETF needs to fix this bigger inconsistency between the standards tracks of its transport and security areas. I'll leave David to escalate this to the ADs if appropriate. Because Pete's right - it may not be easy for admins to identify the cause of this problem, and admins and security implementers don't tend to reach out for advice in transport RFCs.
>>> 	[SM] And no wordsmithing in any RFC is guaranteed to reach the current operators/users of such tunnels any time soon.
>> [BB] I wasn't meaning wordsmithing. There's a technical conflict to be resolved.
> 	[SM] Well, one could try to get better recommendations for how to set the replay window into other ipsec/replay-window RFCs.
> That in turn leads to the question how these would look?
> For L4S Pete's proposal of something like
> (2 * C-queue latency target [ms]) / ((average packet size [bit]) / (bottleneck rate [bit/ms]))
> or for 25 ms and 10 mbps:
> (2 * 25) / ((1538*8) / (10 * 1000^2) * 1000)  = 40.6371911573
> might do, at least for dealing with L4S, IFF the C-queues latency target would not be a free configuration parameter of the L4S AQM, that can be set to arbitrary values...
> But I might be over seeing something here, so please explain your solution to this conundrum.
>
>
> And that still leaves the question how to let operators of existing replay-protected tunnels now that they are supposed to fix somebody else's experiment by adapting to new configurations.
>
>
> Regards
> 	Sebastian
>
>
>>
>>
>> Bob
>>
>>> Best Regards
>>> 	Sebastian
>>>
>>>
>>>> Bob
>>>>
>>>> On 08/05/2021 07:45, Pete Heist wrote:
>>>>> I've added some additional tests at 10 and 20Mbps, and re-worked the
>>>>> writeup to include a table of the results:
>>>>>
>>>>> https://urldefense.com/v3/__https://github.com/heistp/l4s-tests/*dropped-packets-for-tunnels-with-replay-protection-enabled__;Iw!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZD7pb7Elu$ [github[.]com]
>>>>>
>>>>> I noticed that this issue seems to affect tunnels with replay window
>>>>> sizes of 32 and 64 packets regardless of the bottleneck bandwidth,
>>>>> likely because the peak C sojourn times can also increase as the
>>>>> bandwidth decreases. IMO, this seems like a safety concern from the
>>>>> standpoint that the deployment of DualPI2 can cause harm to
>>>>> conventional traffic, in IPsec tunnels using common defaults in
>>>>> particular, beyond that which is caused by DualPI2 itself.
>>>>>
>>>>> It may be fixed by increasing the window size or disabling replay
>>>>> protection, but it may not be easy for admins or users to identify the
>>>>> source of this problem when it occurs, or know who to contact about it.
>>>>>
>>>>> Pete
>>>>>
>>>>> On Sat, 2021-05-08 at 02:01 +0000, Black, David wrote:
>>>>>> [posting as an individual, not a WG chair]
>>>>>> Linking together a couple of related points:
>>>>>>
>>>>>>> [SM] Current Linux kernels seem to use a window of ~8K packets, while
>>>>>>> OpenVPN defaults to 64 packets, Linux ipsec seems to default to
>>>>>>> either 32 or 64. 8K should be reasonably safe, but 64 seems less
>>>>>>> safe.
>>>>>> Common VPN design practice here appears to be picking a plausible
>>>>>> default size (which can be reconfigured and change from release to
>>>>>> release) for the accounting window to detect replay, hence this:
>>>>>>
>>>>>>>>   But, in any case, it seems to me that protocols that need to be
>>>>>>>> robust to out-of-order delivery would need to consider being robust
>>>>>>>> to re-ordering in time units anyway, and so would naturally need to
>>>>>>>> scale that functionality as packet rates increase.
>>>>>> may not happen in a smooth fashion.  As Sebastian writes:
>>>>>>
>>>>>>> [SM] The thing is these methods aim to avoid Mallory fudging with the
>>>>>>> secure connection between Alice and Bob (not our's), and need to
>>>>>>> track packet by packet, that is not easily solved efficiently with a
>>>>>>> simple time-out
>>>>>> That's correct, and use of a simple time-out by itself is prohibited
>>>>>> for obvious security reasons.  For more details on a specific example,
>>>>>> see Section 3.4.3 of RFC 4303 (ESP), which specifies the ESP anti-
>>>>>> replay mechanism (could be used as a reference in writing text on how
>>>>>> L4S interacts with anti-replay)  ... and the observant reader will
>>>>>> notice that this section is a likely source of the anti-replay 32 and
>>>>>> 64 packet values for Linux IPsec:
>>>>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/rfc4303*section-3.4.3__;Iw!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZD4c2bPtT$ [datatracker[.]ietf[.]org] .
>>>>>>
>>>>>> Thanks, --David
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Sebastian Moeller
>>>>>> Sent: Wednesday, May 5, 2021 5:21 PM
>>>>>> To: Greg White
>>>>>> Cc: TSVWG
>>>>>> Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
>>>>>>
>>>>>>
>>>>>> [EXTERNAL EMAIL]
>>>>>>
>>>>>> Hi Greg,
>>>>>>
>>>>>> thanks for your response, more below prefixed [SM].
>>>>>>
>>>>>>> On May 3, 2021, at 19:35, Greg White <g.white@CableLabs.com> wrote:
>>>>>>>
>>>>>>> I'm not familiar with the replay attack mitigations used by VPNs, so
>>>>>>> can't comment on whether this would indeed be an issue for some VPN
>>>>>>> implementations.
>>>>>> [SM] I believe this to be an issue for at least those VPNs that use UDP
>>>>>> and defend against replay attacks (including ipsec, wireguard,
>>>>>> OpenVPN). All more or less seem to use the same approach with a limited
>>>>>> accounting window to allow out-of-order delivery of packets. The head
>>>>>> of the window typically seems to be advanced to the packet with the
>>>>>> highest "sequence" number, hence all of these are sensitive for the
>>>>>> kind of packet re-ordering the L4S ecn id draft argues was benign...
>>>>>>
>>>>>>
>>>>>>>   A quick search revealed
>>>>>>> (https://urldefense.com/v3/__https://www.wireguard.com/protocol/__;!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOP789qBv$
>>>>>>>   [wireguard[.]com] ) that Wireguard apparently has a window of about
>>>>>>> 2000 packets, so perhaps it isn't an immediate issue for that VPN
>>>>>>> software?
>>>>>> [SM] Current Linux kernels seem to use a window of ~8K packets, while
>>>>>> OpnenVPN defaults to 64 packets, Linux ipsec seems to default to either
>>>>>> 32 or 64. 8K should be reasonably safe, but 64 seems less safe.
>>>>>>
>>>>>>> But, if it is an issue for a particular algorithm, perhaps another
>>>>>>> solution to address condition b would be to use a different "head of
>>>>>>> window" for ECT1 packets compared to ECT(0)/NotECT packets?
>>>>>> [SM] Without arguing whether that might or might not be a good idea, it
>>>>>> is not what is done today, so all deployed end-points will treat all
>>>>>> packets the same but at least wireguard and linux ipsec will propagate
>>>>>> ECN vaule at en- and decapsulation, so are probably affected by the
>>>>>> issue.
>>>>>>
>>>>>>> In your 100 Gbps case, I guess you are assuming that A) the
>>>>>>> bottleneck between the two tunnel endpoints is 100 Gbps, B) a single
>>>>>>> VPN tunnel is consuming the entirety of that 100 Gbps link, and C)
>>>>>>> that there is a PI2 AQM targeting 20ms of buffering delay in that 100
>>>>>>> Gbps link?  If so, I'm not sure that I agree that this is likely in
>>>>>>> the near term.
>>>>>> [SM] Yes, the back-of-an-envelop worst case estimate is not terribly
>>>>>> concerning, I agree, but the point remains that a fixed 20ms delay
>>>>>> target will potentially cause the issue with increasing link speeds...
>>>>>>
>>>>>>
>>>>>>>   But, in any case, it seems to me that protocols that need to be
>>>>>>> robust to out-of-order delivery would need to consider being robust
>>>>>>> to re-ordering in time units anyway, and so would naturally need to
>>>>>>> scale that functionality as packet rates increase.
>>>>>> [SM] The thing is these methods aim to avoid Mallory fudging with the
>>>>>> secure connection between Alice and Bob (not our's), and need to track
>>>>>> packet by packet, that is not easily solved efficiently with a simple
>>>>>> time-out (at least not as far as I can seem but I do not claim
>>>>>> expertise in cryptology or security engineering). But I am certain, if
>>>>>> you have a decent new algorithm to enhance RFC2401 and/or RFC6479 the
>>>>>> crypto community might be delighted to hear them. ;)
>>>>>>
>>>>>>> I'm happy to include text in the L4Sops draft on this if the WG
>>>>>>> agrees it is useful to include it, and someone provides text that
>>>>>>> would fit the bill.
>>>>>> [SM] I wonder whether a section on L4S-OPs a la, "make sure to
>>>>>> configure a sufficiently large replay window to allow for ~20ms
>>>>>> reordering" would be enough, or  wether the whole discussion would not
>>>>>> also be needed in
>>>>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-B.1__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOFX4wc3G$
>>>>>>   [datatracker[.]ietf[.]org] widening the re-ordering scope from the
>>>>>> existing "Risk of reordering Classic CE packets" subpoint 3.?
>>>>>>
>>>>>> Regards
>>>>>>          Sebastian
>>>>>>
>>>>>>
>>>>>>> -Greg
>>>>>>>
>>>>>>>
>>>>>>> On 5/3/21, 1:44 AM, "tsvwg on behalf of Sebastian Moeller"
>>>>>>> <tsvwg-bounces@ietf.org on behalf of moeller0@gmx.de> wrote:
>>>>>>>
>>>>>>>     Dear All,
>>>>>>>
>>>>>>>     we had a few discussions in the past about L4S' dual queue design
>>>>>>> and the consequences of packets of a single flow being accidentally
>>>>>>> steered into the wrong queue.
>>>>>>>     So far we mostly discussed the consequence of steering all packets
>>>>>>> marked CE into the LL-queue (and
>>>>>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-B.1__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOFX4wc3G$
>>>>>>>   [datatracker[.]ietf[.]org] Risk of reordering Classic CE packets:
>>>>>>> only discusses this point); there the argument is, that this
>>>>>>> condition should be rare and should also be relative benign, as an
>>>>>>> occasional packet to early should not trigger the 3 DupACK mechanism.
>>>>>>> While I would liked to see hard data confirming the two hypothesis,
>>>>>>> let's accept that argument for the time being.
>>>>>>>
>>>>>>>     BUT, there is a traffic class that is actually sensitive to
>>>>>>> packets arriving out-of-order and too early: VPNs. Most VPNs try to
>>>>>>> secure against replay attacks by maintaining a replay window and only
>>>>>>> accept packets that fall within that window. Now, as far as I can
>>>>>>> see, most replay window algorithms use a bounded window and use the
>>>>>>> highest received sequence number to set the "head" of the window and
>>>>>>> hence will trigger replay attack mitigation, if the too-early-packets
>>>>>>> move the replay window forward such that "in-order-packets" from the
>>>>>>> shorter queue fall behind the replay window.
>>>>>>>
>>>>>>>     Wireguard is an example of a modern VPN affected by this issue,
>>>>>>> since it supports ECN and propagates ECN bits between inner and outer
>>>>>>> headers on en- and decapsulation.
>>>>>>>
>>>>>>>     I can see two conditions that trigger this:
>>>>>>>     a) the arguably relatively rare case of an already CE-marked
>>>>>>> packet hitting an L4S AQM (but we have no real number on the
>>>>>>> likelihood of that happening)
>>>>>>>     b) the arguably more and more common situation (if L4S actually
>>>>>>> succeeds in the field) of an ECT(1) sub-flow zipping past
>>>>>>> ECT(0)/NotECT sub-flows (all within the same tunnel outer flow)
>>>>>>>
>>>>>>>     I note that neither single-queue rfc3168 or FQ AQMs (rfc3168 or
>>>>>>> not) are affected by that issue since they do not cause similar re-
>>>>>>> ordering.
>>>>>>>
>>>>>>>
>>>>>>>     QUESTIONS @ALL:
>>>>>>>
>>>>>>>     1)  Are we all happy with that and do we consider this to be
>>>>>>> acceptable collateral damage?
>>>>>>>
>>>>>>>     2) If yes, should the L4S OPs draft contain text to recommend end-
>>>>>>> points how to cope with that new situation?
>>>>>>>          If yes, how? Available options are IMHO to eschew the use of
>>>>>>> ECN on tunnels, or to recommend increased replay window sizes, but
>>>>>>> with a Gigabit link and L4S classic target of around 20ms, we would
>>>>>>> need to recommend a repay window of:
>>>>>>>> = ((1000^3 [b/s]) / (1538 [B/packet] * 8 [b/B])) *
>>>>>>>> (20[ms]/1000[ms]) = 1625.48764629 [packets]
>>>>>>>     or with a power of two algorithm 2048, which is quite a bit larger
>>>>>>> than the old default of 64...
>>>>>>>          But what if the L4s AQM is located on a back-bone link with
>>>>>>> considerably higher bandwidth, like 10 Gbps or even 100 Gbps? IMHO a
>>>>>>> replay window of 1625 * 100 = 162500 seems a bit excessive
>>>>>>>
>>>>>>>
>>>>>>>     Also the following text in
>>>>>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-A.1.7__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOJfaO_VT$
>>>>>>>   [datatracker[.]ietf[.]org]
>>>>>>>
>>>>>>>     "  Should work in tunnels:  Unlike Diffserv, ECN is defined to
>>>>>>> always
>>>>>>>           work across tunnels.  This scheme works within a tunnel that
>>>>>>>           propagates the ECN field in any of the variant ways it has
>>>>>>> been
>>>>>>>           defined, from the year 2001 [RFC3168] onwards.  However, it
>>>>>>> is
>>>>>>>           likely that some tunnels still do not implement ECN
>>>>>>> propagation at
>>>>>>>           all."
>>>>>>>
>>>>>>>     Seems like it could need additions to reflect the just described
>>>>>>> new issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     Best Regards
>>>>>>>          Sebastian
>>>>>>>
>>>>>>>
>>>> -- 
>>>> ________________________________________________________________
>>>> Bob Briscoe                               https://urldefense.com/v3/__http://bobbriscoe.net/__;!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZDwqJJoHk$ [bobbriscoe[.]net]
>>>>
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                               https://urldefense.com/v3/__http://bobbriscoe.net/__;!!LpKI!3aRj0rF8_yeDW2vWWy5DqZcqd7I2qrSnPx7OyvWWWZh_Bu8BIMkGkTEZDwqJJoHk$ [bobbriscoe[.]net]

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/