Re: [tsvwg] L4S dual-queue re-ordering and VPNs

Pete Heist <pete@heistp.net> Thu, 06 May 2021 15:31 UTC

Return-Path: <pete@heistp.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3C6F13A2629 for <tsvwg@ietfa.amsl.com>; Thu, 6 May 2021 08:31:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=heistp.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8xd7OZrqBdaf for <tsvwg@ietfa.amsl.com>; Thu, 6 May 2021 08:31:31 -0700 (PDT)
Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9A31C3A104E for <tsvwg@ietf.org>; Thu, 6 May 2021 08:31:31 -0700 (PDT)
Received: by mail-wr1-x432.google.com with SMTP id x5so6069738wrv.13 for <tsvwg@ietf.org>; Thu, 06 May 2021 08:31:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heistp.net; s=google; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=BoK0FuWz/8tbUYiZWicIRljy80Whahqj1SMQ7d2kXDE=; b=Vr5ASsZi0C9RsLaD03RjEYhNEtZgn1J7h7IJVc6dLqh8OuybLHypPBrJHq6CEJ0BrO /vn9d839VUiz94zY/6g5FzZ0hOldSI2TSQDRXL04ks7XyrMhCahGA1S4YuuSe3TDyHjM Si3SYRg3RnJz9yu3lqM63xVmNwciJ858/drWF0ZcMZNRgFkUFWeFPC3wkOnKN1QpVD2i RjxiWxCNm8cj1MQNLWneJDnQ67VTmPebca40CccKxbCGfkbpucun0T0mifNvnYXUeXGb 5S77uvz4jJEdxIQWISXWqHCNSBi5/rbj3Se1jjKpTRgYLEzTt6VEQm+SV5Gpxr6phset 9cUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=BoK0FuWz/8tbUYiZWicIRljy80Whahqj1SMQ7d2kXDE=; b=Vk3DyLAYesIsN8TNc/se5ibvWC/pTvakuKFWkU4rmnEC7ldGubyb2sNUuGHs9S5VUk esc5pF/GhqFXSeutHyTZICmBS/6p8w84YigyVLJTP+px8pIyr/0t1ZstLPoikY2Jutxy d/0HkriEiJ8fBGSrVs7BSeO+SwM2eL6NUXYysIDrzg3Hf75NmAPvTX/xB79zTisd6zPf CCei8eu90M7RmPP64gVvb3Q6c5UIZnBXkx9dXsjYGTA2YFxJXwB1x0vQ5kTw9EG1tV9R xs240lByaucU9Pv0t+dQKtDwAURJoRCzIhW+jmG8kz6gj4S9T7Z2Pvf9Bx8dtZB5Jk6a +2PQ==
X-Gm-Message-State: AOAM530HOdlmFtJG0BM8UaYWlLYQEQ5MKTWTew1/zO2a26qebi2OeHym xplFgaMbQ6UGgPTRn4JcoMnUgA==
X-Google-Smtp-Source: ABdhPJxI7vMBmMbrc1YUfah5hqzD6LvMK/deXQORxk/FXIbdB2AX2TSIPFUuOBGJVzV5P9efoteIOA==
X-Received: by 2002:adf:e505:: with SMTP id j5mr6000789wrm.171.1620315089015; Thu, 06 May 2021 08:31:29 -0700 (PDT)
Received: from [10.72.0.88] (h-1169.lbcfree.net. [185.193.85.130]) by smtp.gmail.com with ESMTPSA id u2sm4972728wmm.5.2021.05.06.08.31.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 May 2021 08:31:28 -0700 (PDT)
Message-ID: <833de11952732c9e6e92e6f66d23b225e590f053.camel@heistp.net>
From: Pete Heist <pete@heistp.net>
To: Sebastian Moeller <moeller0@gmx.de>, Greg White <g.white@CableLabs.com>
Cc: TSVWG <tsvwg@ietf.org>
Date: Thu, 06 May 2021 17:31:26 +0200
In-Reply-To: <1DB719E5-55B5-4CE2-A790-C110DB4A1626@gmx.de>
References: <68F275F9-8512-4CD9-9E81-FE9BEECD59B3@cablelabs.com> <1DB719E5-55B5-4CE2-A790-C110DB4A1626@gmx.de>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.40.0
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/EkOev7mBYSo-4cbfO7Z1N-7Itvk>
Subject: Re: [tsvwg] L4S dual-queue re-ordering and VPNs
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 May 2021 15:31:37 -0000

Hi,

To test this out, I added a scenario to the l4s-tests repo of two flow
competition in an IPsec tunnel through DualPI2:

https://github.com/heistp/l4s-tests/#dropped-packets-for-tunnels-with-replay-protection-enabled

It shows that non-L4S traffic through the C queue can see drops when
packets arrive outside the replay window- essentially the concern that
Sebastian raised. The effect here is reduced throughput for a CUBIC
flow in competition with Prague, depending on the replay window in use.

For this 100Mbps case, I tried a range of 5 values for replay-window
(0, 32, 64, 128, 256). It looks like it's good to account for peak
sojourn times through the C queue- perhaps 50ms? If so, a replay window
of 512 packets might be better here. As I understand it, the default
for IPsec is typically 32 or 64 packets. So, some existing deployed
tunnels may need to be reconfigured, depending on their bandwidths.

The default and compiled-in value for Wireguard appears to now be 8192
packets(?), so there I wouldn't expect this to be a problem until the
tunneled traffic is at least ~2Gbps, but that's just a guess, by
estimating what bandwidth can exceed 8192 packets in 50ms.

Thanks Sebastian for bringing this up...

Pete

On Wed, 2021-05-05 at 23:20 +0200, Sebastian Moeller wrote:
> Hi Greg,
> 
> thanks for your response, more below prefixed [SM].
> 
> > On May 3, 2021, at 19:35, Greg White <g.white@CableLabs.com> wrote:
> > 
> > I'm not familiar with the replay attack mitigations used by VPNs, so
> > can't comment on whether this would indeed be an issue for some VPN
> > implementations.
> 
> [SM] I believe this to be an issue for at least those VPNs that use UDP
> and defend against replay attacks (including ipsec, wireguard,
> OpenVPN). All more or less seem to use the same approach with a limited
> accounting window to allow out-of-order delivery of packets. The head
> of the window typically seems to be advanced to the packet with the
> highest "sequence" number, hence all of these are sensitive for the
> kind of packet re-ordering the L4S ecn id draft argues was benign...
> 
> 
> >  A quick search revealed (https://www.wireguard.com/protocol/ ) that
> > Wireguard apparently has a window of about 2000 packets, so perhaps
> > it isn't an immediate issue for that VPN software?
> 
> [SM] Current Linux kernels seem to use a window of ~8K packets, while
> OpnenVPN defaults to 64 packets, Linux ipsec seems to default to either
> 32 or 64. 8K should be reasonably safe, but 64 seems less safe.
> 
> > But, if it is an issue for a particular algorithm, perhaps another
> > solution to address condition b would be to use a different "head of
> > window" for ECT1 packets compared to ECT(0)/NotECT packets?  
> 
> [SM] Without arguing whether that might or might not be a good idea, it
> is not what is done today, so all deployed end-points will treat all
> packets the same but at least wireguard and linux ipsec will propagate
> ECN vaule at en- and decapsulation, so are probably affected by the
> issue.
> 
> > In your 100 Gbps case, I guess you are assuming that A) the
> > bottleneck between the two tunnel endpoints is 100 Gbps, B) a single
> > VPN tunnel is consuming the entirety of that 100 Gbps link, and C)
> > that there is a PI2 AQM targeting 20ms of buffering delay in that 100
> > Gbps link?  If so, I'm not sure that I agree that this is likely in
> > the near term.
> 
> [SM] Yes, the back-of-an-envelop worst case estimate is not terribly
> concerning, I agree, but the point remains that a fixed 20ms delay
> target will potentially cause the issue with increasing link speeds...
> 
> 
> >  But, in any case, it seems to me that protocols that need to be
> > robust to out-of-order delivery would need to consider being robust
> > to re-ordering in time units anyway, and so would naturally need to
> > scale that functionality as packet rates increase.
> 
> [SM] The thing is these methods aim to avoid Mallory fudging with the
> secure connection between Alice and Bob (not our's), and need to track
> packet by packet, that is not easily solved efficiently with a simple
> time-out (at least not as far as I can seem but I do not claim
> expertise in cryptology or security engineering). But I am certain, if
> you have a decent new algorithm to enhance RFC2401 and/or RFC6479 the
> crypto community might be delighted to hear them. ;)
> 
> > I'm happy to include text in the L4Sops draft on this if the WG
> > agrees it is useful to include it, and someone provides text that
> > would fit the bill.
> 
> [SM] I wonder whether a section on L4S-OPs a la, "make sure to
> configure a sufficiently large replay window to allow for ~20ms
> reordering" would be enough, or  wether the whole discussion would not
> also be needed in
> https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14#appendix-B.1
>  widening the re-ordering scope from the existing "Risk of reordering
> Classic CE packets" subpoint 3.?
> 
> Regards
>         Sebastian
> 
> 
> > 
> > -Greg
> > 
> > 
> > On 5/3/21, 1:44 AM, "tsvwg on behalf of Sebastian Moeller"
> > <tsvwg-bounces@ietf.org on behalf of moeller0@gmx.de> wrote:
> > 
> >    Dear All,
> > 
> >    we had a few discussions in the past about L4S' dual queue design
> > and the consequences of packets of a single flow being accidentally
> > steered into the wrong queue.
> >    So far we mostly discussed the consequence of steering all packets
> > marked CE into the LL-queue (and
> > https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14#appendix-B.1
> >  Risk of reordering Classic CE packets: only discusses this point);
> > there the argument is, that this condition should be rare and should
> > also be relative benign, as an occasional packet to early should not
> > trigger the 3 DupACK mechanism. While I would liked to see hard data
> > confirming the two hypothesis, let's accept that argument for the
> > time being.
> > 
> >    BUT, there is a traffic class that is actually sensitive to
> > packets arriving out-of-order and too early: VPNs. Most VPNs try to
> > secure against replay attacks by maintaining a replay window and only
> > accept packets that fall within that window. Now, as far as I can
> > see, most replay window algorithms use a bounded window and use the
> > highest received sequence number to set the "head" of the window and
> > hence will trigger replay attack mitigation, if the too-early-packets
> > move the replay window forward such that "in-order-packets" from the
> > shorter queue fall behind the replay window.
> > 
> >    Wireguard is an example of a modern VPN affected by this issue,
> > since it supports ECN and propagates ECN bits between inner and outer
> > headers on en- and decapsulation. 
> > 
> >    I can see two conditions that trigger this:
> >    a) the arguably relatively rare case of an already CE-marked
> > packet hitting an L4S AQM (but we have no real number on the
> > likelihood of that happening)
> >    b) the arguably more and more common situation (if L4S actually
> > succeeds in the field) of an ECT(1) sub-flow zipping past
> > ECT(0)/NotECT sub-flows (all within the same tunnel outer flow)
> > 
> >    I note that neither single-queue rfc3168 or FQ AQMs (rfc3168 or
> > not) are affected by that issue since they do not cause similar re-
> > ordering.
> > 
> > 
> >    QUESTIONS @ALL:
> > 
> >    1)  Are we all happy with that and do we consider this to be
> > acceptable collateral damage?
> > 
> >    2) If yes, should the L4S OPs draft contain text to recommend end-
> > points how to cope with that new situation? 
> >         If yes, how? Available options are IMHO to eschew the use of
> > ECN on tunnels, or to recommend increased replay window sizes, but
> > with a Gigabit link and L4S classic target of around 20ms, we would
> > need to recommend a repay window of:
> > > = ((1000^3 [b/s]) / (1538 [B/packet] * 8 [b/B])) *
> > > (20[ms]/1000[ms]) = 1625.48764629 [packets]
> >    or with a power of two algorithm 2048, which is quite a bit larger
> > than the old default of 64...
> >         But what if the L4s AQM is located on a back-bone link with
> > considerably higher bandwidth, like 10 Gbps or even 100 Gbps? IMHO a
> > replay window of 1625 * 100 = 162500 seems a bit excessive
> > 
> > 
> >    Also the following text in
> > https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14#appendix-A.1.7
> > 
> >    "  Should work in tunnels:  Unlike Diffserv, ECN is defined to
> > always
> >          work across tunnels.  This scheme works within a tunnel that
> >          propagates the ECN field in any of the variant ways it has
> > been
> >          defined, from the year 2001 [RFC3168] onwards.  However, it
> > is
> >          likely that some tunnels still do not implement ECN
> > propagation at
> >          all."
> > 
> >    Seems like it could need additions to reflect the just described
> > new issue.
> > 
> > 
> > 
> >    Best Regards
> >         Sebastian
> > 
> > 
>