Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance

Paul Wouters <paul@nohats.ca> Wed, 19 October 2022 14:46 UTC

Return-Path: <paul@nohats.ca>
X-Original-To: ipsec@ietfa.amsl.com
Delivered-To: ipsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C6E80C14F72D for <ipsec@ietfa.amsl.com>; Wed, 19 Oct 2022 07:46:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nohats.ca
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gDbl_KHQ19rL for <ipsec@ietfa.amsl.com>; Wed, 19 Oct 2022 07:46:26 -0700 (PDT)
Received: from mx.nohats.ca (mx.nohats.ca [IPv6:2a03:6000:1004:1::85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3A6BCC1522A3 for <ipsec@ietf.org>; Wed, 19 Oct 2022 07:46:26 -0700 (PDT)
Received: from localhost (localhost [IPv6:::1]) by mx.nohats.ca (Postfix) with ESMTP id 4MstpQ54Zjz6h6; Wed, 19 Oct 2022 16:46:22 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nohats.ca; s=default; t=1666190782; bh=+MdwyP1ASbUME5Urtc23SosbKlFvXZvNZf1McLpyF10=; h=Date:From:To:cc:Subject:In-Reply-To:References; b=aA+XbVkXYA1AhLcrd7MOv5yOGioCo2o/im3MRzg1u8r8LWA7td2yMKNdkjeDzUWE1 QRtAX44QHFZ7MapDLFLWATYTXvlM7QUs7ytKIZgC38kawlizEzU7ZkkjNx0GNj3nFB fdGbd+OHTtiPpMky8564I4Qv1euuWXzadmjj7U8Y=
X-Virus-Scanned: amavisd-new at mx.nohats.ca
Received: from mx.nohats.ca ([IPv6:::1]) by localhost (mx.nohats.ca [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id 7plF1wyMkR5t; Wed, 19 Oct 2022 16:46:21 +0200 (CEST)
Received: from bofh.nohats.ca (bofh.nohats.ca [193.110.157.194]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx.nohats.ca (Postfix) with ESMTPS; Wed, 19 Oct 2022 16:46:21 +0200 (CEST)
Received: by bofh.nohats.ca (Postfix, from userid 1000) id 5BC203F7595; Wed, 19 Oct 2022 10:46:20 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1]) by bofh.nohats.ca (Postfix) with ESMTP id 5887B3F7594; Wed, 19 Oct 2022 10:46:20 -0400 (EDT)
Date: Wed, 19 Oct 2022 10:46:20 -0400
From: Paul Wouters <paul@nohats.ca>
To: Valery Smyslov <smyslov.ietf@gmail.com>
cc: 'Steffen Klassert' <steffen.klassert@secunet.com>, 'IPsecME WG' <ipsec@ietf.org>
In-Reply-To: <057901d8e395$2822ef40$7868cdc0$@gmail.com>
Message-ID: <a1efaa10-71d9-a6df-4354-63d92e861b4e@nohats.ca>
References: <15eb01d8dd7e$fdf158e0$f9d40aa0$@gmail.com> <20221014111548.GJ2602992@gauss3.secunet.de> <03ca01d8e232$3bbace10$b3306a30$@gmail.com> <3c58c2e0-f022-15fb-2ebb-658fa51275a6@nohats.ca> <048a01d8e2f6$cedc83e0$6c958ba0$@gmail.com> <50fbd76f-6aad-e422-9d95-afbcd6db87ba@nohats.ca> <057901d8e395$2822ef40$7868cdc0$@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipsec/G9iL-SK93gldvx98mScL0IAw24w>
Subject: Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipsec/>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2022 14:46:30 -0000

On Wed, 19 Oct 2022, Valery Smyslov wrote:

>> Requesting to install 1 million Child SA's until the remote server falls over.
>> Perhaps less extremely, to contain the number of resources a sysadmin
>> allocates to a specific "multi CPU" tunnel.
>
> I still fail to understand how this protection mechanism works.
> One side suggests 10, the other suggests 1 million, according to your draft
> the negotiated value is the largest one, i.e. 1 million. How the peer with
> limited capabilities can protect itself from installing 1 million SAs?

By not doing more than its local policy says to do.

> Fail negotiation of IKE SA? Thus it is very bad protection mechanism,
> because it completely prevents this peers to communicate.
> I would have understood if the smallest value is selected,
> but this is not what the draft says.

Earlier drafts agreed on the smaller value, but that causes problems
both in terms of acquires and for rekeys and simulanious new child sa
establishment that happens at once in flight.

Note that before you get 1M child SA's, you need to have authenticated,
so the abuse is easilly tracked to an entity and can simply be prevented
with returning TS_MAX_QUEUE at the local policy max (eg 40 instead of 1M)

> I think that the real protection mechanism in your draft against installing too many
> SAs is the TS_MAX_QUEUE notify and it makes CPU_QUEUES really redundant and useless.

I feel we keep repeating our arguments. Without CPU_QUEUES both peers
keep guessing at what the best solution is.

> This doesn't work this way. The presence of HSM in the system usually
> means that there are some reasons (e.g. certification requirements) to use it for
> all traffic and not for part of it. So, the system with HSM is usually
> unable to install more SAs than the HSM can handle.

If you have an HSM, either on cpu or on nic, there will be constraints.
If the number of ESP keys is limited, you have to limit the number of
child SA's if you are forced for compliance to go through the HSM.
Nothing that this draft can change on that.

>> Right. All of these are reasons to not too tighyly but in limitations
>> and the notify conveys roughly what each end is willing to put into this
>> connection, but there might be slight temporary or not changes. We feel
>> it is still better to convey what both sides consider "ideal", over just
>> sending CREATE_CHILD_SAs and getting them to fail.
>
> So, we disagree here :-) I think this information is really hard to use
> (you admitted that it is unreliable) and the real mechanism to
> limit the number of SAs is the TS_MAX_QUEUE notify.

That's the hard fail you should never reach. Without CPU_QUEUES, when my
machine has 20 CPUs and the peer has 2, and I want to start all my child
SAs at once (without using ACQUIREs) I would end up setting up the first
child, then start sending out 19 more CREATE_CHILD_SA's (if window size > 1)
and they will all come back failing. If using PFS, that wouldn't be
great.

>> The common case will be both peers present what they want and (within
>> reason) will establish. No failed CREATE_CHILD_SAs happen unless there
>> was an unexpected change on one of the peers. Where in your proposal,
>> there will be a failed CREATE_CHILD_SAs as part of the normal process.
>
> This is not true. In my proposal a failed CREATE_CHILD_SAs will happen only in
> the situation when one side requests more SAs than the other can handle.

What happens when the other peer does not support this at all? Or do you
still propose some notify to indicate support? One without a number ?

With no CPU_QUEUES at all, we have the same situation as we have now.
There is no indication the peer won't delete our first child SA when it
received an identical second child SA. If you send the notify but
without number indication, this issue goes away, but why not indicate
(for free basically) what the ideal number is?

> So, if one peer has 4 CPUs and the other has 10, and the first peer has no problems
> installing 10 SAs, then they will end up with 10 SAs without any failed CREATE_CHILD_SAs:
> they will just request creating new SAs until both have all their CPU in work.

Yes.

Note both sides will need to specify 12 (for window size 1) to keep
10 working ones, unless you are going to not count rekeying ones in your
maximum.  Eg what if peer A starts a rekey for CPU 3, and peer B starts
a rekey for CPU 1 at the same time. You end up briefly with 12 child SAs.

> Failed CREATE_CHILD_SAs only happen if one of the peers cannot handle
> more SAs than it has already installed.

installed or committed to in flight, yes.

>> Note that you don't have to bring up CPUs on-demand via ACQUIREs. You
>> can also fire off all of them at once (after the initial child sa is
>> established). With CPU_QUEUES, you know whether to send 5 or 10 of
>> these. With your method, you have to bring them up one by one to see
>> if you can bring up more or not.
>
> Not exactly. In your proposal when peers exchanged CPU_QUEUES notifies,
> they select the maximum of two values as the expected number of SAs.
> With this logic the peer that proposed the smaller value (because
> it cannot handle more) would still reject the excessive requests.

It is not about cannot handle. The 4 CPU node can (should!) still handle
8 if it wants to optimize talking to 8 CPU peers. I don't think in
general, there will be cases for "cannot handle". Child SA's are not
that expensive in kernel memory. And if you want to talk to 8 CPUS
nodes, you better support 8 (or 10 see above) even if you have 4 CPUs
only. You basically negotiate "how many do we need to cover all our
CPUs" by picking the maximum value (within reason).

> So, actually with your proposal the peer that proposed the larger value
> doesn't know for sure if all its requests will be successful.

It does. Basically, you will do the maximum of the two, based on the
fact that if you send the notify to allow per-CPU child SAs, you surely
can afford to install a few dozen of them if needed.

>> conn one
>>  	[stuff]
>>  	leftsunet=10.0.1.0/24
>>  	rightsunet=10.0.2.0/24
>>
>> conn two
>>  	[same stuff so shared IKE SA with conn one]
>>  	leftsunet=192.0.1.0/24
>>  	rightsunet=192.0.2.0/24
>>
>> If you put up 10 copies of conn one, and then start doing the first
>> (so fallback sa) of conn two and get TS_UNACCEPTABLE, what error
>> condition did you hit? Does the peer only accept 10 connections per
>> IKE peer, or did conn two have subnet parameters that didn't match the
>> peer ?
>
> The latter. Otherwise NO_ADDITIONAL_SAS would be returned.

It would be good to know if the error is "no more of this one child"
versus "no more child sa's at all". I don't think returning the same
error for all these failure cases is good. the peer will not know the
reason of the failure and might end up retrying things it should not,
or not retrying things it should (eg a conn three)

> But if the peer starts creating 11th copy of the first connection
> and get back TS_UNACCEPTABLE, this would mean that no more
> per-CPU SAs are allowed.

Is this a temporary or permanent error? Should the peer retry after a
while? Without knowing the desired numbers, it won't know?

>>> The drawback of the Fallback SA is that it needs a special processing.
>>> Normally we delete SAs when they are idle for a long time
>>> to conserve resources, but the draft says it must not be done with the Fallback SA.
>>
>> Yes. But honestly that is a pretty tiny code change in IKEv2. Again, I
>> will let Steffen and Antony talk about performance, but I think
>> "re-steering" packets is hugely expensive and slow, especially if it
>> needs to be done dynamically, eg first you have to determine which CPU
>> to steer it to, and then steer the packet.
>
> In my proposal the first step is pretty cheap - you store the CPU ID
> which has an active SA in the SA stub entry.

I will let Steffen talkin about this.

>> There is no way for you to know in advance if the peer will send you a
>> delete for the older child SA when establishing the new child SA, thus
>> defeating your purpose. I know RFC 7296 says you can have multiple
>> identical child SAs but in practise a bunch of software just assumes
>> these are the same client reconnecting and the previous chuld sa state
>> was lost. This proposed mechanism therefor wants to explicitely state
>> "we are going to do multiple identical SAs, can you support me".
>
> That behavior is really broken, but in the worst case
> the peers will end up with a single Child SA, as well as
> with your proposal. I assume not implementations
> out there are broken :-)

So now the peer has to keep track of whether its duplicate SA's lead to
removal of its initial SA. What if the remote peer supports this, but
it idled out the initial one and send a delete.

The draft started with a concept of "we must clearly signal this
support because of how people implemented 7296".

>> But you simply don't know if the duplicate SA is going to lead to a
>> deletion of the older SA.
>
> Ok, but once I see this broken behavior I can stop creating more SAs.

See above. It might be a bit hairy to properly detect this.

>>>> The problem currently is that when an identical child SA
>>>> is successfully negotiated, implementations differ on what they do.
>>>> Some allow this, some delete the older one. The goal of this draft
>>>> is to make the desire for multple idential child SAs very explicit.
>>>
>>> RFC 7296 explicitly allows multiple Child SAs with identical selectors,
>>> so if implementations immediately delete them, then they are either broken
>>> or have reasons to do it (e.g. have no resources).
>>
>> Broken or not, they are not going to get "fixed". It was a design
>> choice.
>
> Do you know for sure that they are not going to get "fixed"?
> If so, then probably there is a way to tell their authors that
> this behavior is broken, and probably they can change their mind?

uniqueids=yes started in freeswan a long long time ago before the Second
Age of Openswan and strongSwan, and the Third Age of libreswan :-)

For example apple clients pretty aggressively reconnect on network
failures (eg wifi switching, person entering elevator) and if you don't
start deleting duplicates you end up with a LOT of unused SAs. There
are also VPN services that are pay per user and want to avoid the same
user connecting multiple times. An explicit signal would be helpful.

We could rephrase the payload so it can be completely optional if you
would be okay with that so we could both implement to our own desire?
And then we would need to write out a bit of your explanations in the
draft for clarification?

Paul