Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance

Paul Wouters <paul@nohats.ca> Tue, 18 October 2022 15:40 UTC

Return-Path: <paul@nohats.ca>
X-Original-To: ipsec@ietfa.amsl.com
Delivered-To: ipsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6C5FAC1524A8 for <ipsec@ietfa.amsl.com>; Tue, 18 Oct 2022 08:40:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nohats.ca
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qoxFkua11Alg for <ipsec@ietfa.amsl.com>; Tue, 18 Oct 2022 08:40:46 -0700 (PDT)
Received: from mx.nohats.ca (mx.nohats.ca [IPv6:2a03:6000:1004:1::85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 55157C14F692 for <ipsec@ietf.org>; Tue, 18 Oct 2022 08:40:45 -0700 (PDT)
Received: from localhost (localhost [IPv6:::1]) by mx.nohats.ca (Postfix) with ESMTP id 4MsJ3Z3XpTz5Br; Tue, 18 Oct 2022 17:40:42 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nohats.ca; s=default; t=1666107642; bh=cSRsDTsYALIg9APFaURehE5k6SyDUt+BHr9SlU5Pg+Y=; h=Date:From:To:cc:Subject:In-Reply-To:References; b=KD6ElTYQg6mklud7oXQuz8MCVY4G68TjsNQLcatfPeZV37T0xkn5ky2yQK+tXrcLI 8LSB4ThIUsxwsY4gXCJcItlJHDufKsBkHNtL9A8bU+wvXaF8r35OcLHS+tjSAFTkfP IZpbZiJQUQo6hEWCoTjZCGuC7Kuc5ps87ncfvfBE=
X-Virus-Scanned: amavisd-new at mx.nohats.ca
Received: from mx.nohats.ca ([IPv6:::1]) by localhost (mx.nohats.ca [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id A25uQnhW8qCw; Tue, 18 Oct 2022 17:40:41 +0200 (CEST)
Received: from bofh.nohats.ca (bofh.nohats.ca [193.110.157.194]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx.nohats.ca (Postfix) with ESMTPS; Tue, 18 Oct 2022 17:40:41 +0200 (CEST)
Received: by bofh.nohats.ca (Postfix, from userid 1000) id 718093F6A9F; Tue, 18 Oct 2022 11:40:40 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1]) by bofh.nohats.ca (Postfix) with ESMTP id 6E3063F6A9E; Tue, 18 Oct 2022 11:40:40 -0400 (EDT)
Date: Tue, 18 Oct 2022 11:40:40 -0400
From: Paul Wouters <paul@nohats.ca>
To: Valery Smyslov <smyslov.ietf@gmail.com>
cc: 'Steffen Klassert' <steffen.klassert@secunet.com>, 'Michael Richardson' <mcr+ietf@sandelman.ca>, 'IPsecME WG' <ipsec@ietf.org>
In-Reply-To: <048901d8e2f6$cddca010$6995e030$@gmail.com>
Message-ID: <e46dbd3-e21-4b1-7622-8fe639c58cd8@nohats.ca>
References: <15eb01d8dd7e$fdf158e0$f9d40aa0$@gmail.com> <10861.1665504183@localhost> <161701d8dd8c$8d042a50$a70c7ef0$@gmail.com> <20221014101504.GI2602992@gauss3.secunet.de> <03c901d8e232$3850ef20$a8f2cd60$@gmail.com> <1ba3323-5fb1-8350-3acb-5d87ad8f2b16@nohats.ca> <048901d8e2f6$cddca010$6995e030$@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipsec/upBuxIGrx8RfYuJpsESmbDVVoi8>
Subject: Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipsec/>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2022 15:40:51 -0000

On Tue, 18 Oct 2022, Valery Smyslov wrote:

>> I think this is implementation specific. You could install an temporary
>> rule into the SPD that would give the fallback SA more priority than the
>> per-CPU policy installed, so it wouldn't generate ACQUIRES for a while.
>
> Why for a while? And for how long? There is no indication
> from the peer whether inability to create more SAs is temporary
> or permanent, so we may end up with wasting resources when one peer will be constantly
> trying to install per-CPU SA with no success.

That's why we propose the hint of NUM_QUEUES :)
And why we have:

    The TS_MAX_QUEUE notify conveys that the peer is unwilling to create more
    additional Child SAs for this particular Traffic Selector set.

Each peer knows the amount of this child SA they have. So if they
receive a few Delete/Notifies perhaps than they can install more again
if needed.

>> Perhaps userland could also decide to terminate another per-CPU SA that
>> is idle. Although I think the advised policy is stated to install at
>> least one per-CPU SA per CPU (and allow a few more to catch any race
>> conditions and rekeys).
>
> Why more SAs than the number of CPUs are needed (not counting the Fallback SA)?
> If you rekey per-CPU SAs proactively you will always have a per-CPU SA ready.
> Or what do you mean by "race conditions" and why only few more SAs
> help in this case?

During rekey, you briefly have two SA's installed to prevent packet
leaks/fallback.  If both ends rekey at once, you might have two SA's
in flight plus the existing SA for a brief moment in time.

Also when one end has 2 CPUs and the other 4 CPUs, the end with 2 CPUs
will have more than one SA per CPU for the duration of the tunnel.

We tried to put the exact maximum in place in earlier drafts, accounting
for everything but during testing it just showed that this was not
workable for us.

>> Clearly, if for part of the connection, you are
>> using the fallback SA, you are running at suboptimal speed which is not
>> a situation you should remain in.
>
> Why? If the peer is unwilling (or unable) to install more SAs, then
> you will be in this suboptimal situation forever.

That is a different situation? Failure is always an option :)

I'm not sure I understand your point?

Using the fallback SA is well, a fallback that idealy won't trigger. But
in case something happened (eg a CPU got added to the running machine),
it is preferred to send packets out slowly over dropping them
completely. (on a highspeed link of 100gbps, you cannot cache all the
packets to cover 1 RTT for IKE to add the additional Child SA)

>  I don't think
> it's a wrong situation you want to escape from ASAP, I think it's
> a normal situation in general.

We disagree. Maybe we should call the fallback SA the emergency SA ? :)

>> Indeed. The idea is that no matter what, you can encrypt the packets and
>> send them, even if at sub-optimal speeds. We don't want packets to have
>> to wait another RTT for the per-CPU SA to establish. That would cause a
>> lot of issues (slow TCP retransmits, UDP application retransmits, etc).
>> Once the first SA is up, you have a working IPsec tunnel and no more
>> packets should be dropped or wait for SA's to establish.
>
> I understand all these considerations. My proposal is to use
> other existing per-CPU SA in this case. So, no special Fallback SA
> is needed (I understand that re-steering packet to a different CPU requires locking,
> but in my understanding using the Fallback SA requires locking as well).

To me that seems more complicated and issue prone, but I'll let Steffen
speak on this as an implementer.

>>>   The Fallback Child SA MUST NOT be deleted when idle, as
>>>   it is likely to be idle if enough per-CPU Child SAs are installed.
>>>
>>> I think that these BCP14 requirements make the fallback SA very special.
>>
>> Yes it does. It ensures there is a fully working (albeit slow) IPsec
>> connection.
>
> And the "specialityness" of this SA worries me. I think that the same
> functionality can be achieved without introducing this special SA.

How woud you guarantee that at least 1 per-CPU SA is always available to
be steered at by other CPUs ? What if that one is rekeying, how are you
going to sync that to all the other CPU configurations?

Paul