Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance

Valery Smyslov <smyslov.ietf@gmail.com> Wed, 19 October 2022 08:31 UTC

Return-Path: <smyslov.ietf@gmail.com>
X-Original-To: ipsec@ietfa.amsl.com
Delivered-To: ipsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 71200C14CF0F for <ipsec@ietfa.amsl.com>; Wed, 19 Oct 2022 01:31:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0MRlENsuVEy9 for <ipsec@ietfa.amsl.com>; Wed, 19 Oct 2022 01:31:13 -0700 (PDT)
Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0E959C14F721 for <ipsec@ietf.org>; Wed, 19 Oct 2022 01:31:13 -0700 (PDT)
Received: by mail-lj1-x229.google.com with SMTP id c20so21209928ljj.7 for <ipsec@ietf.org>; Wed, 19 Oct 2022 01:31:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-language:thread-index:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=85ldR8AF3mwx9MbmT/rlnokLp82oTH7+S85R9i9z5So=; b=psUW2t1P0TgMeIdUnK5DVnon/t4DgMmlXs29tWKuI18c5M3Zk3bkHWCDR+S9DWFRnK P2QMg7r0LQD56ZB5ezNt//IAQhLuH4rR0KFHQ1+BAABoa+x+hfc1KS494WDvN4vpOPSx DS3z7B917tkgXtancw/DNP/wV47aFK+f+gBWxaqy5hQW6SudVZg5AYvImPQ42kixe+C0 xlwfgfuBL8YaYCTfKcIdslnVfD3cs3TYpC7wH4cccGU3f1coTBijh+AEavCYU5iMqyrO VoZzwsiggoaJuyWwvq50kVmNm7fXrLUWioWIqPWqKwsh53qFtyTaAMJq3swnCuq8+HbF LVSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-language:thread-index:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=85ldR8AF3mwx9MbmT/rlnokLp82oTH7+S85R9i9z5So=; b=BoZ4MZ0T9/PUfKFpNfKafnuoWAWbiGyhjZJgsB0N63rQ0jkGNpqRImBk1P33Wfs9qY U0fcaMtRoAiL8A4AKkvxEN4NYi6cAQgtnUBRakB+zGZBpbdOTABvyqj7AvbTgIk7AGSO fyeenkY1YAzmDk3b6mMVG93HmOE4abtePRzPkG5Ma6+Cy9+1lmaGv6WF2cxRw3OBeh5M jB1q1dtYc8uHfKH3XGqzDGBC7VWrxissfaz8Ju2GpgmrwOj3sP5VrpAxpmmmFkU+J7WX lZmjw8H/1lSa+cbN2eJXE7PAmHY74RVMRnjCLYrEBbWhOIBVQ+uxC+1M3scL3C1J+u6C j92A==
X-Gm-Message-State: ACrzQf1Owncu7gOyhXcqNHqK+63H+QGHgK9nC1PvwmxB33tnPY9WUoRE xUr5J5QGC5In+VJMC/f6bTfH1a8YdY0=
X-Google-Smtp-Source: AMsMyM4gQcby5PSNGa/kkLZTiBOWw5K1s3ucrQI/Z544HRtmkRx4d32fYdpc3Ww85f0Yg2JpV6+Prg==
X-Received: by 2002:a2e:b7c8:0:b0:26e:90c2:4323 with SMTP id p8-20020a2eb7c8000000b0026e90c24323mr2296468ljo.17.1666168269961; Wed, 19 Oct 2022 01:31:09 -0700 (PDT)
Received: from buildpc ([93.188.44.204]) by smtp.gmail.com with ESMTPSA id k15-20020a2e92cf000000b0026c3ecf9a39sm2336973ljh.38.2022.10.19.01.31.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Oct 2022 01:31:09 -0700 (PDT)
From: Valery Smyslov <smyslov.ietf@gmail.com>
To: 'Paul Wouters' <paul@nohats.ca>
Cc: 'Steffen Klassert' <steffen.klassert@secunet.com>, 'Michael Richardson' <mcr+ietf@sandelman.ca>, 'IPsecME WG' <ipsec@ietf.org>
References: <15eb01d8dd7e$fdf158e0$f9d40aa0$@gmail.com> <10861.1665504183@localhost> <161701d8dd8c$8d042a50$a70c7ef0$@gmail.com> <20221014101504.GI2602992@gauss3.secunet.de> <03c901d8e232$3850ef20$a8f2cd60$@gmail.com> <1ba3323-5fb1-8350-3acb-5d87ad8f2b16@nohats.ca> <048901d8e2f6$cddca010$6995e030$@gmail.com> <e46dbd3-e21-4b1-7622-8fe639c58cd8@nohats.ca>
In-Reply-To: <e46dbd3-e21-4b1-7622-8fe639c58cd8@nohats.ca>
Date: Wed, 19 Oct 2022 11:31:06 +0300
Message-ID: <057801d8e395$223f84d0$66be8e70$@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQJS+NHZSxlPMvP0ZarIorjX6+t7ygGkEVyDAbA4wY4BcMyZqQIz4SzuAZ/Nge4CfSrpkQKMhLTZrLP6hMA=
Content-Language: ru
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipsec/sFvDhyMTOIdt0mvUu052K1VUTSM>
Subject: Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipsec/>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2022 08:31:18 -0000

Hi Paul,

> On Tue, 18 Oct 2022, Valery Smyslov wrote:
> 
> >> I think this is implementation specific. You could install an temporary
> >> rule into the SPD that would give the fallback SA more priority than the
> >> per-CPU policy installed, so it wouldn't generate ACQUIRES for a while.
> >
> > Why for a while? And for how long? There is no indication
> > from the peer whether inability to create more SAs is temporary
> > or permanent, so we may end up with wasting resources when one peer will be constantly
> > trying to install per-CPU SA with no success.
> 
> That's why we propose the hint of NUM_QUEUES :)
> And why we have:
> 
>     The TS_MAX_QUEUE notify conveys that the peer is unwilling to create more
>     additional Child SAs for this particular Traffic Selector set.
> 
> Each peer knows the amount of this child SA they have. So if they
> receive a few Delete/Notifies perhaps than they can install more again
> if needed.

OK, but the same behavior can be implemented without NUM_QUEUES.

> >> Perhaps userland could also decide to terminate another per-CPU SA that
> >> is idle. Although I think the advised policy is stated to install at
> >> least one per-CPU SA per CPU (and allow a few more to catch any race
> >> conditions and rekeys).
> >
> > Why more SAs than the number of CPUs are needed (not counting the Fallback SA)?
> > If you rekey per-CPU SAs proactively you will always have a per-CPU SA ready.
> > Or what do you mean by "race conditions" and why only few more SAs
> > help in this case?
> 
> During rekey, you briefly have two SA's installed to prevent packet
> leaks/fallback.  If both ends rekey at once, you might have two SA's
> in flight plus the existing SA for a brief moment in time.

Yes, but they all are transient. I don't count them.

> Also when one end has 2 CPUs and the other 4 CPUs, the end with 2 CPUs
> will have more than one SA per CPU for the duration of the tunnel.

True, but this is not concerned with race conditions.
To put it more accurately - why do you need more SAs than
the largest of the number of CPUs for two peers (not counting the fallback SA
and any transient SAs that are created in the process of rekey)?

> We tried to put the exact maximum in place in earlier drafts, accounting
> for everything but during testing it just showed that this was not
> workable for us.

Still curious why it was not workable...

> >> Clearly, if for part of the connection, you are
> >> using the fallback SA, you are running at suboptimal speed which is not
> >> a situation you should remain in.
> >
> > Why? If the peer is unwilling (or unable) to install more SAs, then
> > you will be in this suboptimal situation forever.
> 
> That is a different situation? Failure is always an option :)
> 
> I'm not sure I understand your point?
> 
> Using the fallback SA is well, a fallback that idealy won't trigger. But
> in case something happened (eg a CPU got added to the running machine),
> it is preferred to send packets out slowly over dropping them
> completely. (on a highspeed link of 100gbps, you cannot cache all the
> packets to cover 1 RTT for IKE to add the additional Child SA)

You assume that the peers are able to handle the number of SAs,
that is equal to the largest of the number of CPUs they both have.
I describe a situation when the peer advertising fewer CPU cannot
handle that many SAs (e.g. it uses HSM and can only install limited 
number of SAs). In this situation this peer rejects creating additional
SAs with TS_MAX_QUEUE, so peers end up with smaller number of SAs.
This configuration is sub-optimal for the peer that have more CPUs,
but in my opinion it is a normal situation, not an error situation.

> >  I don't think
> > it's a wrong situation you want to escape from ASAP, I think it's
> > a normal situation in general.
> 
> We disagree. Maybe we should call the fallback SA the emergency SA ? :)

Will this improve the situation I described above ? :-)

> >> Indeed. The idea is that no matter what, you can encrypt the packets and
> >> send them, even if at sub-optimal speeds. We don't want packets to have
> >> to wait another RTT for the per-CPU SA to establish. That would cause a
> >> lot of issues (slow TCP retransmits, UDP application retransmits, etc).
> >> Once the first SA is up, you have a working IPsec tunnel and no more
> >> packets should be dropped or wait for SA's to establish.
> >
> > I understand all these considerations. My proposal is to use
> > other existing per-CPU SA in this case. So, no special Fallback SA
> > is needed (I understand that re-steering packet to a different CPU requires locking,
> > but in my understanding using the Fallback SA requires locking as well).
> 
> To me that seems more complicated and issue prone, but I'll let Steffen
> speak on this as an implementer.

OK.

> >>>   The Fallback Child SA MUST NOT be deleted when idle, as
> >>>   it is likely to be idle if enough per-CPU Child SAs are installed.
> >>>
> >>> I think that these BCP14 requirements make the fallback SA very special.
> >>
> >> Yes it does. It ensures there is a fully working (albeit slow) IPsec
> >> connection.
> >
> > And the "specialityness" of this SA worries me. I think that the same
> > functionality can be achieved without introducing this special SA.
> 
> How woud you guarantee that at least 1 per-CPU SA is always available to
> be steered at by other CPUs ? What if that one is rekeying, how are you
> going to sync that to all the other CPU configurations?

If the only remaining per-CPU Sa is deleted (e.g. due to lack of traffic),
it will be re-created when an outgoing packet appears, as with any usual SA. 
The syncing is costly, but happens infrequently.

Regards,
Valery.

> Paul