Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance

Valery Smyslov <smyslov.ietf@gmail.com> Wed, 19 October 2022 08:31 UTC

Return-Path: <smyslov.ietf@gmail.com>
X-Original-To: ipsec@ietfa.amsl.com
Delivered-To: ipsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 42971C14CE25 for <ipsec@ietfa.amsl.com>; Wed, 19 Oct 2022 01:31:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8Xl26Daws31r for <ipsec@ietfa.amsl.com>; Wed, 19 Oct 2022 01:31:28 -0700 (PDT)
Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B33F0C15A721 for <ipsec@ietf.org>; Wed, 19 Oct 2022 01:31:20 -0700 (PDT)
Received: by mail-lj1-x229.google.com with SMTP id c20so21210405ljj.7 for <ipsec@ietf.org>; Wed, 19 Oct 2022 01:31:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-language:thread-index:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=K4uj8qAwkiXLFXpNciTW0u3DX03vza6cXRNTbki1H/o=; b=ntxRmCB6ixoWtmhAMppSew1r6gxnCXyyOpkG+1SHFU4jkb9j7x0Tfm+9EodnIX5pT8 rrkTEaXbxMejkeErtifg4xMJDz8zXGK5gr8Mkawa5HYxe1RQiWElt8U40hjWvRs75Qlu IBAHTV6iLy6QYbPMlRoVSOEXYdm7HhATGQDdyBrPx6uCHDqhc8F0niot7aBAEq6F+UYS Zx4TqGh4VDxwimRfLU4cXxV8BRayz6bNkm0d7XjcXru790c2lftngWeT6/9M8qoJtPoM N/hYOATicxmLQn4yhyPwDvYq0Y/Zaajbb1fe/bWZxedjSfKaRlbyAeKduOgjQHTNWHIY j0tA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-language:thread-index:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K4uj8qAwkiXLFXpNciTW0u3DX03vza6cXRNTbki1H/o=; b=hSyyqGuWh78iumrPZDrppWeLOgjM50kbNTLQKPLi5e0OyLEUVmS0T2gHZDMqztSyLy 39w8y1SAKGCCtOB6R71ee79/2aCx/6WOTXmxS4IVj+QUSVQkZFOShfxcPA1RDBQWygGK yvgEMY8NfPLCzgdTg2fWq1ID30C9Q4sZ311t99H8Lh6OWg2qn928TP0t3nxVcnZMwGXn wXdZEUiOHg+e0Y88Gz4ISR6ZgbFg2kgUO6B/5eJriBpA5rtC48XA/Xvg3Jqj7YOVzK6X g295o+UACfwMqGWtIIkrK3bh+Qu7r3i+p3l07p6G7jHl8yQStEQd8TXCF4pXrp3eO1Fy 2G8Q==
X-Gm-Message-State: ACrzQf0HLM0thrBVoLDgfm2qEcQPcPv6ijDEmIkrjPLNJKbHL4b8N+Tc CRVt9PNGYgSnnsIQqKVMHuB2YkAgi0s=
X-Google-Smtp-Source: AMsMyM5QdMVIsIb4l+z4sJvAa5J3VLB8crw4JLRP83e/y4ZUlQyig0N91B3ZfMwLko5JYVD0HDsReA==
X-Received: by 2002:a2e:be85:0:b0:26e:6e3:f921 with SMTP id a5-20020a2ebe85000000b0026e06e3f921mr2406463ljr.448.1666168279687; Wed, 19 Oct 2022 01:31:19 -0700 (PDT)
Received: from buildpc ([93.188.44.204]) by smtp.gmail.com with ESMTPSA id v4-20020a05651203a400b004979db5aa5bsm2192322lfp.223.2022.10.19.01.31.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Oct 2022 01:31:19 -0700 (PDT)
From: Valery Smyslov <smyslov.ietf@gmail.com>
To: 'Paul Wouters' <paul@nohats.ca>
Cc: 'Steffen Klassert' <steffen.klassert@secunet.com>, 'IPsecME WG' <ipsec@ietf.org>
References: <15eb01d8dd7e$fdf158e0$f9d40aa0$@gmail.com> <20221014111548.GJ2602992@gauss3.secunet.de> <03ca01d8e232$3bbace10$b3306a30$@gmail.com> <3c58c2e0-f022-15fb-2ebb-658fa51275a6@nohats.ca> <048a01d8e2f6$cedc83e0$6c958ba0$@gmail.com> <50fbd76f-6aad-e422-9d95-afbcd6db87ba@nohats.ca>
In-Reply-To: <50fbd76f-6aad-e422-9d95-afbcd6db87ba@nohats.ca>
Date: Wed, 19 Oct 2022 11:31:16 +0300
Message-ID: <057901d8e395$2822ef40$7868cdc0$@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQJS+NHZSxlPMvP0ZarIorjX6+t7ygK8HRKAAxozn4sAqsbVfwJs/0ScAi8UOHusyCTmUA==
Content-Language: ru
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipsec/RzU1vptNm73xfdGtO7K2rrEmrTE>
Subject: Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipsec/>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2022 08:31:30 -0000

Hi Paul,

> On Tue, 18 Oct 2022, Valery Smyslov wrote:
> 
> >>> implementation with say 10 CPUs. Does it make any difference for this implementation
> >>> If it receives CPU_QUEUES with 100 or with 1000? It seems to me that in both cases
> >>> it will follow its own local policy for limiting the number of per-CPU SAs,
> >>> most probably capping it to 10.
> >>
> >> That would be a mistake. You always want to allow a few more than the
> >> CPUs you have. The maximum is mostly to protect against DoS attacks.
> >
> > How it protects against DoS attacks, can you elaborate?
> 
> Requesting to install 1 million Child SA's until the remote server falls over.
> Perhaps less extremely, to contain the number of resources a sysadmin
> allocates to a specific "multi CPU" tunnel.

I still fail to understand how this protection mechanism works.
One side suggests 10, the other suggests 1 million, according to your draft 
the negotiated value is the largest one, i.e. 1 million. How the peer with 
limited capabilities can protect itself from installing 1 million SAs? 
Fail negotiation of IKE SA? Thus it is very bad protection mechanism,
because it completely prevents this peers to communicate.
I would have understood if the smallest value is selected,
but this is not what the draft says.

I think that the real protection mechanism in your draft against installing too many 
SAs is the TS_MAX_QUEUE notify and it makes CPU_QUEUES really redundant and useless.

> >> If you only have 10 CPUs, but the other end has 50, there shouldn't
> >> be much issue to install 50 SA's. Not sure if we said so in the draft,
> >
> > I'm not so sure. For example, if you use HSM, you are limited by its capabilities.
> 
> Sure. Maybe put the HSM on the fallback SA and not on the per-CPU SAs if
> you don't have an option to use it for all.

This doesn't work this way. The presence of HSM in the system usually 
means that there are some reasons (e.g. certification requirements) to use it for
all traffic and not for part of it. So, the system with HSM is usually
unable to install more SAs than the HSM can handle.

> >> but you could even omit installing 40 of the outgoing SA's since you
> >> would never be expected to use them anyway. but you have to install all
> >> 50 incoming ones because the peer might use them.
> >
> > And what to do in situations you are unable to install all 50 (for any reason)?
> > And how it is expected to deal with situations when the number of CPUs
> > is changed over the lifetime of IKE SA? As far as I understand some modern
> > systems allows adding CPUs or removing them on the fly.
> 
> Right. All of these are reasons to not too tighyly but in limitations
> and the notify conveys roughly what each end is willing to put into this
> connection, but there might be slight temporary or not changes. We feel
> it is still better to convey what both sides consider "ideal", over just
> sending CREATE_CHILD_SAs and getting them to fail.

So, we disagree here :-) I think this information is really hard to use
(you admitted that it is unreliable) and the real mechanism to 
limit the number of SAs is the TS_MAX_QUEUE notify.

> >>> So why we need CPU_QUEUES?
> >>
> >> TS_MAX_QUEUE is conveying an irrecoverable error condition. It should
> >> never happen.
> >
> > That's not what the draft says:
> >
> >   The responder may at any time reject
> >   additional Child SAs by returning TS_MAX_QUEUE.
> >
> > So, my reading is that this notify can be sent at any time if peer
> > is not willing to create more per-CPU SAs. And sending this notify
> > doesn't cause deletion of IKE SA and all its Child SAs (it is my guess).
> 
> Right. It is still something expected. Say there were 4 CPUs and we
> commited to 4, but now one CPU was removed. So TS_MAX_QUEUE tells the
> peer not to try for the 4th one. For whatever reason we cannot do it
> anymore. This prevents that peer from keeping to try this every second.
> Perhaps it will still try every hour. Or perhaps it will let the peer
> run an additional one once it gets that 4th CPU back.
> 
> The common case will be both peers present what they want and (within
> reason) will establish. No failed CREATE_CHILD_SAs happen unless there
> was an unexpected change on one of the peers. Where in your proposal,
> there will be a failed CREATE_CHILD_SAs as part of the normal process.

This is not true. In my proposal a failed CREATE_CHILD_SAs will happen only in 
the situation when one side requests more SAs than the other can handle.
So, if one peer has 4 CPUs and the other has 10, and the first peer has no problems 
installing 10 SAs, then they will end up with 10 SAs without any failed CREATE_CHILD_SAs:
they will just request creating new SAs until both have all their CPU in work.

Failed CREATE_CHILD_SAs only happen if one of the peers cannot handle 
more SAs than it has already installed. 

> > If my reading is wrong and this is a fatal error (or what do you mean by " irrecoverable "?),
> > then the protocol is worse than I thought for devices that for any reason cannot afford
> > installing unlimited number of SAs (e.g. if they use HSM with
> > limited memory). In this case they cannot even tell the peer
> > that they have limited resources.
> 
> I might fatal as "do not attempt to do this again, I am out of
> resources". Maybe you call that more a temporary error. What I was
> trying to convey is that the error means "resources all in use, don't
> keep trying this for now". If you feel that is a "temporary" error,
> that's fine with me. As long as the peer wouldn't keep trying this
> for other CPUs but is smart enough to realize this one failure means
> not to keep pounding the peer with more CREATE_CHILD_SA requests.

OK, that's exactly what I proposed as the mechanism to limit 
the number of SAs. And in my opinion this mechanism makes CPU_QUEUES redundant and useless.

> >> Where as CPU_QUEUES tells you how many per-CPU child SAs
> >> you can do. This is meant to reduce the number of in-flight CREATE_CHILD_SA's
> >> that will never become successful.
> >
> > It seems to me that it's enough to have one CREATE_CHILD_SA with the proper
> > error notify to indicate that the peer is unwilling to create more SAs.
> > I'm not sure this is a big saving.
> 
> Note that you don't have to bring up CPUs on-demand via ACQUIREs. You
> can also fire off all of them at once (after the initial child sa is
> established). With CPU_QUEUES, you know whether to send 5 or 10 of
> these. With your method, you have to bring them up one by one to see
> if you can bring up more or not.

Not exactly. In your proposal when peers exchanged CPU_QUEUES notifies,
they select the maximum of two values as the expected number of SAs.
With this logic the peer that proposed the smaller value (because
it cannot handle more) would still reject the excessive requests.
So, actually with your proposal the peer that proposed the larger value
doesn't know for sure if all its requests will be successful.
It can fire off all of them, but the result will be the same as with
my proposal - some of them may fail.

> >> We did it to distinquish between "too many of the same child sa" versus
> >> other errors in cases of multiple subnets / child SAs under the same IKE
> >> peer. Rethinking it, I am no longer able to reproduce why we think it
> >> was required :)
> >
> > I believe TS_UNACCEPTABLE is well suited for this purpose. You know for sure that TS itself is OK,
> > since you have already installed SA(s) with the same TS, and it's not fatal error notify and
> > is standardized in RFC 7296 and it does not prevent creating SAs with other TS.
> 
> If the peers have two connections:
> 
> conn one
>  	[stuff]
>  	leftsunet=10.0.1.0/24
>  	rightsunet=10.0.2.0/24
> 
> conn two
>  	[same stuff so shared IKE SA with conn one]
>  	leftsunet=192.0.1.0/24
>  	rightsunet=192.0.2.0/24
> 
> If you put up 10 copies of conn one, and then start doing the first
> (so fallback sa) of conn two and get TS_UNACCEPTABLE, what error
> condition did you hit? Does the peer only accept 10 connections per
> IKE peer, or did conn two have subnet parameters that didn't match the
> peer ?

The latter. Otherwise NO_ADDITIONAL_SAS would be returned.
But if the peer starts creating 11th copy of the first connection
and get back TS_UNACCEPTABLE, this would mean that no more
per-CPU SAs are allowed.

> >> The idea of the fallback SA is that you always have at least one child
> >> SA guaranteed to be up that can encrypt and send a packet. It can be
> >> installed to not be per-CPU. It's a guarantee that you will never need
> >> to wait (and cache?) 1 RTT's time worth of packets, which can be a lot
> >> of packets. You don't want dynamic resteering. Just have the fallback
> >> SA "be ready" in case there is no per-cpu SA.
> >
> > The drawback of the Fallback SA is that it needs a special processing.
> > Normally we delete SAs when they are idle for a long time
> > to conserve resources, but the draft says it must not be done with the Fallback SA.
> 
> Yes. But honestly that is a pretty tiny code change in IKEv2. Again, I
> will let Steffen and Antony talk about performance, but I think
> "re-steering" packets is hugely expensive and slow, especially if it
> needs to be done dynamically, eg first you have to determine which CPU
> to steer it to, and then steer the packet. 

In my proposal the first step is pretty cheap - you store the CPU ID
which has an active SA in the SA stub entry.

> Then maybe remember this
> choice for a while because you cannot do this lookup for each packet.
> Then if that SA dies and you need to find another one, that's a whole
> other error path you need to traverse.

If this SA die you have to update all SA stub entries that reference it.
I admit that it's costly, but this happens not frequently.

> > Yes. The idea is that If one peer supports per-CPU SAs and the
> > other doesn't, they still be able to communicate and have multiple SAs.
> 
> There is no way for you to know in advance if the peer will send you a
> delete for the older child SA when establishing the new child SA, thus
> defeating your purpose. I know RFC 7296 says you can have multiple
> identical child SAs but in practise a bunch of software just assumes
> these are the same client reconnecting and the previous chuld sa state
> was lost. This proposed mechanism therefor wants to explicitely state
> "we are going to do multiple identical SAs, can you support me".

That behavior is really broken, but in the worst case 
the peers will end up with a single Child SA, as well as 
with your proposal. I assume not implementations 
out there are broken :-)

> > For example, if the supporting system has several weak CPUs,
> > while the unsupporting one has much more powerful CPU,
> > then multiple SAs will help to improve performance -
> > the supporting system will distribute load on its weak CPUs,
> > while for unsupporting the load will be small enough even for a single CPU.
> 
> But you simply don't know if the duplicate SA is going to lead to a
> deletion of the older SA.

Ok, but once I see this broken behavior I can stop creating more SAs.

> >> The problem currently is that when an identical child SA
> >> is successfully negotiated, implementations differ on what they do.
> >> Some allow this, some delete the older one. The goal of this draft
> >> is to make the desire for multple idential child SAs very explicit.
> >
> > RFC 7296 explicitly allows multiple Child SAs with identical selectors,
> > so if implementations immediately delete them, then they are either broken
> > or have reasons to do it (e.g. have no resources).
> 
> Broken or not, they are not going to get "fixed". It was a design
> choice.

Do you know for sure that they are not going to get "fixed"?
If so, then probably there is a way to tell their authors that
this behavior is broken, and probably they can change their mind?

Regards,
Valery.

> Paul