Re: [tsvwg] L4S dual-queue re-ordering and VPNs

Bob Briscoe <in@bobbriscoe.net> Sat, 08 May 2021 14:26 UTC

To: Pete Heist <pete@heistp.net>, "Black, David" <David.Black@dell.com>, Sebastian Moeller <moeller0@gmx.de>, Greg White <g.white@CableLabs.com>
Cc: TSVWG <tsvwg@ietf.org>
References: <68F275F9-8512-4CD9-9E81-FE9BEECD59B3@cablelabs.com> <1DB719E5-55B5-4CE2-A790-C110DB4A1626@gmx.de> <MN2PR19MB40452C9DD1164609A005139583569@MN2PR19MB4045.namprd19.prod.outlook.com> <e15d732f64bf983975dbe507092b39f0744f7f74.camel@heistp.net>
From: Bob Briscoe <in@bobbriscoe.net>
Message-ID: <1efe0dfb-afb6-0aa4-dcff-fb4ddeb46b8f@bobbriscoe.net>
Date: Sat, 08 May 2021 15:26:22 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <e15d732f64bf983975dbe507092b39f0744f7f74.camel@heistp.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/efDNDVn1PGX6amtfA1emFHWD2fI>
Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
Precedence: list

Thank you, Sebastian, for picking up this inter-area inconsistency, and 
thanks, Pete, for the test data.

I would have thought that the delay-delta between two AQMs (as in the 
DualQ) will in general be much less than the delay-delta between a low 
delay and best efforts Diffserv behaviour, where the best efforts 
generally has no AQM at all, and is therefore prone to the delay of the 
whole buffer, which may be bloated.

So, wherever a VPN includes flows using different DSCPs, and there is a 
Diffserv-enabled bottleneck between the ends of the VPN, the VPN's 
replay window will need to cater for considerably more than 50ms 
delay-delta within the VPN. More like at least 200ms, and possibly 1-2s 
in some cases of bloat.

David himself has written about the reordering problem when a WebRTC 
application encapsulates TCP, SCTP and RTP flows with different DSCPs 
within UDP [RFC7657]. Indeed, datagram transport layer security (DTLS) 
is a common encapsulation for WebRTC flows. And DTLS also recommends a 
default replay window of 64 (see 
https://tools.ietf.org/html/rfc6347#section-4.1.2.6 ).

- I'm not trying to say low replay windows won't affect the DualQ - 
l4sops and aqm-dualq-coupled should certainly recommend a large enough 
replay window.

- I'm just saying that there are other established technologies that 
reduce queuing delay for a subset of traffic, and from current insanely 
high levels, so they will be a longer pole in the tent than the DualQ. 
Then, as long as the replay window of VPNs is large enough for those 
established technologies, it will be large enough for the DualQ experiment.

In the context of the IETF, irrespective of the L4S experiment, the IETF 
needs to fix this bigger inconsistency between the standards tracks of 
its transport and security areas. I'll leave David to escalate this to 
the ADs if appropriate. Because Pete's right - it may not be easy for 
admins to identify the cause of this problem, and admins and security 
implementers don't tend to reach out for advice in transport RFCs.


Bob

On 08/05/2021 07:45, Pete Heist wrote:
> I've added some additional tests at 10 and 20Mbps, and re-worked the
> writeup to include a table of the results:
>
> https://github.com/heistp/l4s-tests/#dropped-packets-for-tunnels-with-replay-protection-enabled
>
> I noticed that this issue seems to affect tunnels with replay window
> sizes of 32 and 64 packets regardless of the bottleneck bandwidth,
> likely because the peak C sojourn times can also increase as the
> bandwidth decreases. IMO, this seems like a safety concern from the
> standpoint that the deployment of DualPI2 can cause harm to
> conventional traffic, in IPsec tunnels using common defaults in
> particular, beyond that which is caused by DualPI2 itself.
>
> It may be fixed by increasing the window size or disabling replay
> protection, but it may not be easy for admins or users to identify the
> source of this problem when it occurs, or know who to contact about it.
>
> Pete
>
> On Sat, 2021-05-08 at 02:01 +0000, Black, David wrote:
>> [posting as an individual, not a WG chair]
>> Linking together a couple of related points:
>>
>>> [SM] Current Linux kernels seem to use a window of ~8K packets, while
>>> OpenVPN defaults to 64 packets, Linux ipsec seems to default to
>>> either 32 or 64. 8K should be reasonably safe, but 64 seems less
>>> safe.
>> Common VPN design practice here appears to be picking a plausible
>> default size (which can be reconfigured and change from release to
>> release) for the accounting window to detect replay, hence this:
>>
>>>>   But, in any case, it seems to me that protocols that need to be
>>>> robust to out-of-order delivery would need to consider being robust
>>>> to re-ordering in time units anyway, and so would naturally need to
>>>> scale that functionality as packet rates increase.
>> may not happen in a smooth fashion.  As Sebastian writes:
>>
>>> [SM] The thing is these methods aim to avoid Mallory fudging with the
>>> secure connection between Alice and Bob (not our's), and need to
>>> track packet by packet, that is not easily solved efficiently with a
>>> simple time-out
>> That's correct, and use of a simple time-out by itself is prohibited
>> for obvious security reasons.  For more details on a specific example,
>> see Section 3.4.3 of RFC 4303 (ESP), which specifies the ESP anti-
>> replay mechanism (could be used as a reference in writing text on how
>> L4S interacts with anti-replay)  ... and the observant reader will
>> notice that this section is a likely source of the anti-replay 32 and
>> 64 packet values for Linux IPsec:
>> https://datatracker.ietf.org/doc/html/rfc4303#section-3.4.3 .
>>
>> Thanks, --David
>>
>> -----Original Message-----
>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Sebastian Moeller
>> Sent: Wednesday, May 5, 2021 5:21 PM
>> To: Greg White
>> Cc: TSVWG
>> Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
>>
>>
>> [EXTERNAL EMAIL]
>>
>> Hi Greg,
>>
>> thanks for your response, more below prefixed [SM].
>>
>>> On May 3, 2021, at 19:35, Greg White <g.white@CableLabs.com> wrote:
>>>
>>> I'm not familiar with the replay attack mitigations used by VPNs, so
>>> can't comment on whether this would indeed be an issue for some VPN
>>> implementations.
>> [SM] I believe this to be an issue for at least those VPNs that use UDP
>> and defend against replay attacks (including ipsec, wireguard,
>> OpenVPN). All more or less seem to use the same approach with a limited
>> accounting window to allow out-of-order delivery of packets. The head
>> of the window typically seems to be advanced to the packet with the
>> highest "sequence" number, hence all of these are sensitive for the
>> kind of packet re-ordering the L4S ecn id draft argues was benign...
>>
>>
>>>   A quick search revealed
>>> (https://urldefense.com/v3/__https://www.wireguard.com/protocol/__;!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOP789qBv$
>>>   [wireguard[.]com] ) that Wireguard apparently has a window of about
>>> 2000 packets, so perhaps it isn't an immediate issue for that VPN
>>> software?
>> [SM] Current Linux kernels seem to use a window of ~8K packets, while
>> OpnenVPN defaults to 64 packets, Linux ipsec seems to default to either
>> 32 or 64. 8K should be reasonably safe, but 64 seems less safe.
>>
>>> But, if it is an issue for a particular algorithm, perhaps another
>>> solution to address condition b would be to use a different "head of
>>> window" for ECT1 packets compared to ECT(0)/NotECT packets?
>> [SM] Without arguing whether that might or might not be a good idea, it
>> is not what is done today, so all deployed end-points will treat all
>> packets the same but at least wireguard and linux ipsec will propagate
>> ECN vaule at en- and decapsulation, so are probably affected by the
>> issue.
>>
>>> In your 100 Gbps case, I guess you are assuming that A) the
>>> bottleneck between the two tunnel endpoints is 100 Gbps, B) a single
>>> VPN tunnel is consuming the entirety of that 100 Gbps link, and C)
>>> that there is a PI2 AQM targeting 20ms of buffering delay in that 100
>>> Gbps link?  If so, I'm not sure that I agree that this is likely in
>>> the near term.
>> [SM] Yes, the back-of-an-envelop worst case estimate is not terribly
>> concerning, I agree, but the point remains that a fixed 20ms delay
>> target will potentially cause the issue with increasing link speeds...
>>
>>
>>>   But, in any case, it seems to me that protocols that need to be
>>> robust to out-of-order delivery would need to consider being robust
>>> to re-ordering in time units anyway, and so would naturally need to
>>> scale that functionality as packet rates increase.
>> [SM] The thing is these methods aim to avoid Mallory fudging with the
>> secure connection between Alice and Bob (not our's), and need to track
>> packet by packet, that is not easily solved efficiently with a simple
>> time-out (at least not as far as I can seem but I do not claim
>> expertise in cryptology or security engineering). But I am certain, if
>> you have a decent new algorithm to enhance RFC2401 and/or RFC6479 the
>> crypto community might be delighted to hear them. ;)
>>
>>> I'm happy to include text in the L4Sops draft on this if the WG
>>> agrees it is useful to include it, and someone provides text that
>>> would fit the bill.
>> [SM] I wonder whether a section on L4S-OPs a la, "make sure to
>> configure a sufficiently large replay window to allow for ~20ms
>> reordering" would be enough, or  wether the whole discussion would not
>> also be needed in
>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-B.1__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOFX4wc3G$
>>   [datatracker[.]ietf[.]org] widening the re-ordering scope from the
>> existing "Risk of reordering Classic CE packets" subpoint 3.?
>>
>> Regards
>>          Sebastian
>>
>>
>>> -Greg
>>>
>>>
>>> On 5/3/21, 1:44 AM, "tsvwg on behalf of Sebastian Moeller"
>>> <tsvwg-bounces@ietf.org on behalf of moeller0@gmx.de> wrote:
>>>
>>>     Dear All,
>>>
>>>     we had a few discussions in the past about L4S' dual queue design
>>> and the consequences of packets of a single flow being accidentally
>>> steered into the wrong queue.
>>>     So far we mostly discussed the consequence of steering all packets
>>> marked CE into the LL-queue (and
>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-B.1__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOFX4wc3G$
>>>   [datatracker[.]ietf[.]org] Risk of reordering Classic CE packets:
>>> only discusses this point); there the argument is, that this
>>> condition should be rare and should also be relative benign, as an
>>> occasional packet to early should not trigger the 3 DupACK mechanism.
>>> While I would liked to see hard data confirming the two hypothesis,
>>> let's accept that argument for the time being.
>>>
>>>     BUT, there is a traffic class that is actually sensitive to
>>> packets arriving out-of-order and too early: VPNs. Most VPNs try to
>>> secure against replay attacks by maintaining a replay window and only
>>> accept packets that fall within that window. Now, as far as I can
>>> see, most replay window algorithms use a bounded window and use the
>>> highest received sequence number to set the "head" of the window and
>>> hence will trigger replay attack mitigation, if the too-early-packets
>>> move the replay window forward such that "in-order-packets" from the
>>> shorter queue fall behind the replay window.
>>>
>>>     Wireguard is an example of a modern VPN affected by this issue,
>>> since it supports ECN and propagates ECN bits between inner and outer
>>> headers on en- and decapsulation.
>>>
>>>     I can see two conditions that trigger this:
>>>     a) the arguably relatively rare case of an already CE-marked
>>> packet hitting an L4S AQM (but we have no real number on the
>>> likelihood of that happening)
>>>     b) the arguably more and more common situation (if L4S actually
>>> succeeds in the field) of an ECT(1) sub-flow zipping past
>>> ECT(0)/NotECT sub-flows (all within the same tunnel outer flow)
>>>
>>>     I note that neither single-queue rfc3168 or FQ AQMs (rfc3168 or
>>> not) are affected by that issue since they do not cause similar re-
>>> ordering.
>>>
>>>
>>>     QUESTIONS @ALL:
>>>
>>>     1)  Are we all happy with that and do we consider this to be
>>> acceptable collateral damage?
>>>
>>>     2) If yes, should the L4S OPs draft contain text to recommend end-
>>> points how to cope with that new situation?
>>>          If yes, how? Available options are IMHO to eschew the use of
>>> ECN on tunnels, or to recommend increased replay window sizes, but
>>> with a Gigabit link and L4S classic target of around 20ms, we would
>>> need to recommend a repay window of:
>>>> = ((1000^3 [b/s]) / (1538 [B/packet] * 8 [b/B])) *
>>>> (20[ms]/1000[ms]) = 1625.48764629 [packets]
>>>     or with a power of two algorithm 2048, which is quite a bit larger
>>> than the old default of 64...
>>>          But what if the L4s AQM is located on a back-bone link with
>>> considerably higher bandwidth, like 10 Gbps or even 100 Gbps? IMHO a
>>> replay window of 1625 * 100 = 162500 seems a bit excessive
>>>
>>>
>>>     Also the following text in
>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14*appendix-A.1.7__;Iw!!LpKI!0R_YA5wY-HgCAeBd-ajbFbEamek2Wo9ESyoFSJ6whDL8_0kmFhysbbCeOJfaO_VT$
>>>   [datatracker[.]ietf[.]org]
>>>
>>>     "  Should work in tunnels:  Unlike Diffserv, ECN is defined to
>>> always
>>>           work across tunnels.  This scheme works within a tunnel that
>>>           propagates the ECN field in any of the variant ways it has
>>> been
>>>           defined, from the year 2001 [RFC3168] onwards.  However, it
>>> is
>>>           likely that some tunnels still do not implement ECN
>>> propagation at
>>>           all."
>>>
>>>     Seems like it could need additions to reflect the just described
>>> new issue.
>>>
>>>
>>>
>>>     Best Regards
>>>          Sebastian
>>>
>>>
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

Re: [tsvwg] L4S dual-queue re-ordering and VPNs Greg White
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Sebastian Moeller
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Pete Heist
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Black, David
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Pete Heist
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Jonathan Morton
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Jonathan Morton
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Sebastian Moeller
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Sebastian Moeller
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Jonathan Morton
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Sebastian Moeller
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Sebastian Moeller
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Black, David
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Sebastian Moeller
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Jonathan Morton
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Bob Briscoe
Re: [tsvwg] L4S dual-queue re-ordering and VPNs Sebastian Moeller