Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-performance

Valery Smyslov <smyslov.ietf@gmail.com> Fri, 17 November 2023 14:42 UTC

From: Valery Smyslov <smyslov.ietf@gmail.com>
To: 'Paul Wouters' <paul@nohats.ca>
Cc: 'Tero Kivinen' <kivinen@iki.fi>, ipsec@ietf.org
References: <25912.18413.748999.323996@fireball.acr.fi> <04b101da16d7$f9421c10$ebc65430$@gmail.com> <70a73330-73dc-ddc9-1fad-9975f3bf12c7@nohats.ca>
In-Reply-To: <70a73330-73dc-ddc9-1fad-9975f3bf12c7@nohats.ca>
Date: Fri, 17 Nov 2023 17:42:40 +0300
Message-ID: <069c01da1964$515c04e0$f4140ea0$@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Thread-Index: AQHKU10JL1+d2T+4V+QjI0HzVVzOTwLGUlP6Ap/aLyqwcnaTcA==
Content-Language: ru
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipsec/kjg9YjCDgUXPs2AnqJVAdZ7IN_I>
Subject: Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-performance
Precedence: list

Hi Paul,

I snipped parts where we are in agreement.

> > 2. Section 2
> >
> >   There are a number of practical reasons why most Implementations have
> >   to limit a Child SA to only one specific hardware resource, but a key
> >   limitation is that sharing the crypto state, counters and sequence
> >   numbers between multiple CPUs is not feasible without a significant
> >   performance penalty.
> >
> > Shouldn't it be clarified, that the performance problems arise
> > if you use an SA by several CPUs at the same time? I don't think
> > there are problems if you use the SA by several CPUs at different time.
> > Consider you have an SA with a traffic one packet per hour.
> > Each time it is processed up by a different CPU, then the resulted
> > state is stored in the shared memory. So, perhaps
> >
> > s/one specific hardware resource/one specific hardware resource at any given time
> 
> I changed it to:
> 
>         but a key limitation is that sharing the crypto state, counters
>         and sequence numbers between multiple CPUs that are trying to
>         use these shared states at the same time is not feasible without
>         a significant performance penalty.

Fine with me.

> > 3. Section 3
> >
> >   If multiple Child SAs with the same Traffic Selectors are
> >   desired, the initiator will add the SA_RESOURCE_INFO notify payload
> >   to the Exchange negotiating the Child SA (eg IKE_AUTH or
> >   CREATE_CHILD_SA).  If this initial Child SA will be tied to a
> >   specific resource, it MAY indicate this by including an identifier in
> >   the Notification Data.  A responder that is willing to have multple
> >   Child SAs for the same Traffic Selectors will respond by also adding
> >   the SA_RESOURCE_INFO notify payload in which it MAY add a non-zero
> >   notify data payload.
> >
> > This text is a bit inconsistent with IKEv2 specification.
> > In my reading the text implies that unless you exchange SA_RESOURCE_INFO,
> > you cannot initiate multiple SAs with same selector, which is wrong -
> > you can do it at any time if you follow RFC 7296.
> > Only if you want to follow this draft (i.e. - associate each Child SA with
> > some resource, and be able to limit their number with TS_MAX_QUEUE)
> > you have to negotiate. I think that this subtle thing should be expressed more accurately.
> 
> Changed to:
> 
>  	If multiple Child SAs with the same Traffic Selectors that
>          are bound to a single resource are desired, [...]

OK.

> > 4. Section 3
> >
> >   These resource-
> >   specific Child SAs MUST be negotiated with identical Child SA
> >   properties that were negotiated for the initial Child SA.  This
> >   includes cryptographic algorithms, Traffic Selectors, Mode (e.g.
> >   transport mode), compression usage, etc.  However, each Child SA does
> >   have its own keying material that is individually derived according
> >   to the regular IKEv2 process.
> >
> > I think that MUST is over-restrictive if we talk about crypto algorithms.
> > For crypto algorithms herhaps something along the lines:
> >
> > SHOULD be negotiated with the same crypto algorithms;
> > if they differ, then they MUST provide identical level of protection.
> 
> I don't agree. We are basically making "clones", so I see no reason why
> to make this more complicated by allowing things to be more different in
> various ways. The negotiation for the first child and second child
> should follow from the same configuration so it should really also end
> up returning the same crypto algorithms. Unless you would make choics
> based on trigger packet on specific resource, but that's something I
> am happy to not create another knob for. If others really feel similar
> to Valery, please speak out. We would also than need to make a clear
> list of all the knobs that MUST be the same and all the ones that SHOULD
> be the same. I'd rather not make that list :P

I see your point. I don't insist, but also don't want to eliminate this option.
For example (as it was pointed out on the list) when different resources
are optimized for different algorithms.

> > (I agree that Mode and Traffic Selectors MUST be the same, not sure about compression).
> 
> 
> > 5. Section 3
> >
> >   During the CREATE_CHILD_SA rekey for the Child SA, the
> >   SA_RESOURCE_INFO notification MAY be included, but regardless of
> >   whether or not it is included, the rekeyed Child SA MUST be bound to
> >   the same resource(s) as the Child SA that is being rekeyed.
> >
> > Isn't binding a local matter? Doesn't peer bother to what resource you bound
> > your end of an SA? Then why is there this MUST? What happens if I bound new SA
> > to a different CPU - peer never know this, so how it will check that you follow this MUST?
> > I think that instead of this requirement there should be just a recommendation for implementers
> > (with no BCP14 language).
> i
> changed MUST to should.

I can live with this, although I would prefer to put it a bit differently:

   During the CREATE_CHILD_SA rekey for the Child SA, the
   SA_RESOURCE_INFO notification MAY be included, but regardless of
   whether or not it is included, the old Child SA should be
   bound to a resource(or resources), and it is usually the same resource(s) 
   the rekeyed SA was bound to.

> > 6. Section 4
> >
> >   A simple distribution could be to install one additional Child SA on
> >   each CPU.  An implementation MAY ensure that one Child SA can be used
> >   by all CPUs ...
> >
> > I believe it should be "can" instead of "MAY", since it is a local matter.
> 
> I do think local matters can still have BCP14 language. eg MUST use
> strong random, or MUST use a secure time source. I would rather change
> this MAY to a SHOULD than to a "can" :P

It was a long discussion about this SA :-)

Going back to the language: "MUST" is the easiest keyword to use,
and "MAY" is the most subtle. Your example is not a good one.

The "MUST" for local actions is often really important for protocol 
(e.g. for its security, like using strong random). "MAY" here doesn't 
influence protocol at all (since it is optional, the protocol behavior
doesn't depend on it), so using "MAY" is inappropriate in my opinion. 
"Can" or "may" are OK.

> > 7. Section 4
> >
> >   When the number of queue or CPU resources are different between the
> >   peers, the peer with the least amount of resources MAY decide to not
> >   install a second outbound Child SA for the same resource as it will
> >   never use it to send traffic.
> >
> > Again, I think it should be "can" instead of "MAY", since it is a local matter.
> 
> Here I don't really care, so changed to "may" (instead of "can" as it
> feels better to me, but someone with better english please speak out
> which is better)

OK.

> > 8. Section 4
> >
> >   If per-CPU SADB_ACQUIRE messages are implemented (see Section 6), the
> >   Traffic Selector (TSi) entry containing the information of the
> >   trigger packet SHOULD be included in the TS set.  This information
> >   MAY be used by the peer to select the most optimal target CPU to
> >   install the additional Child SA on.
> >
> > Then, this SHOULD is already specified in RFC 7296, Section 2.9:
> >
> >   To enable the responder to choose the appropriate range in this case,
> >   if the initiator has requested the SA due to a data packet, the
> >   initiator SHOULD include as the first Traffic Selector in each of TSi
> >   and TSr a very specific Traffic Selector including the addresses in
> >   the packet triggering the request.
> >
> > Why repeat this requirement there? Perhaps just reference Section 2.9 of RFC 7296?
> 
> Because the example is specific to CPU-bound resources here and tweaking
> the decision based on that. I added the reference to 2.9 though.

Thank you.

> > 9. Section 5.1, Section 5.2
> >
> >   *  Protocol ID (1 octet) - MUST be 0.  MUST be ignored if not 0.
> >   *  SPI Size (1 octet) - MUST be 0.  MUST be ignored if not 0.
> >
> > What "MUST be ignored if not 0"? The value of the field or the notify message itself?
> 
> I think it is clear that it is on the bullet line about Protocol ID, and
> not that we are not talking about higher level things.

OK.

> > If the non-zero value must be ignored, then why specify that it MUST be 0?
> 
> Normally we do that so that future implementations can be written that
> won't be dropped by current implementations seeing a new valid value
> that older implementations do not support. I looked at RFC7296 Section
> 3.10 and it uses similar language , eg:
> 
>     [protocol id] ......
>        If the SPI field is empty, this field MUST be
>        sent as zero and MUST be ignored on receipt.

This is a better wording and I mostly raised an issue to nudge you to use it or something like that :-)
It clarifies that the field is set to 0 when sending and must be ignored in all cases on receipt.

>     o  SPI Size (1 octet) - Length in octets of the SPI as defined by the
>        IPsec protocol ID or zero if no SPI is applicable.  For a
>        notification concerning the IKE SA, the SPI Size MUST be zero and
>        the field must be empty.
> 
> So I guess 7296 says we cannot ignore a bad SPI Size, but can ignore a
> bad protocol ID. But it is unclear on what to do when a notify payload
> contains weird things. RFC 7296 Section 2.21 does not make it more clear
> on what we should do.

I believe it is covered by INVALID_SYNTAX. Section 3.10.1:

   INVALID_SYNTAX                            7
       Indicates the IKE message that was received was invalid because
       some type, length, or value was out of range or because the
       request was rejected for policy reasons.

> But thinking about this now, I think this is all wrong. The
> SA_RESOURCE_INFO notify is about a Child SA, so it should really
> have a valid Protocol ID (eg 3 for ESP) and should have a proper
> spi size and length specified that matches the Child SA. So I changed
> it to:
> 
>             Protocol ID (1 octet) - this field MUST contain either (2)
>                 to indicate AH or (3) to indicate ESP.</t>
> 
>             SPI Size (1 octet) - Length in octets of the SPI as defined
>                 by the IPsec protocol ID.</t>

I'm not sure if this change is justified. From RFC 7296, Section 3.10: 

   o  Protocol ID (1 octet) - If this notification concerns an existing
      SA whose SPI is given in the SPI field, this field indicates the
      type of that SA.  For notifications concerning Child SAs, this
      field MUST contain either (2) to indicate AH or (3) to indicate
      ESP.  Of the notifications defined in this document, the SPI is
      included only with INVALID_SELECTORS, REKEY_SA, and
      CHILD_SA_NOT_FOUND.  If the SPI field is empty, this field MUST be
      sent as zero and MUST be ignored on receipt.

In case of SA_RESOURCE_INFO, the child SA does not yet exist (it is being created),
so formally we cannot reference its SPI. Then an additional check would be
needed - whether Protocol ID and SPI in the SA payload and in SA_RESOURCE_INFO
are the same.

> > And how the receiver should interpret the notify message in this case?
> > I think there is an inconsistency. I suggest to remove the second MUSTs,
> > thus the receiver will reject the notify if it is has non-zero protocol and SPI,
> > as it should do with any other such notifications.
> 
> I cannot find this guidance in 7296, where did you find it? But as I
> said, it does not apply here anyway.

I treat the following text as a guidance here (Section 3.10.1):

   INVALID_SYNTAX                            7
       Indicates the IKE message that was received was invalid because
       some type, length, or value was out of range or because the
       request was rejected for policy reasons.

> > 10. Section 5.1
> >
> >   *  Optional Payload Data.  This value MAY be set to convey the local
> >      identity of the resource.  The value SHOULD be a unique identifier
> >      and the peer SHOULD only use it for debugging purposes.
> >
> > This is an optional field used for debugging purposes. Why any BCP14 language here?
> 
> optional use can still have BCP14, eg an optional nonce MUST use secure
> random :P I changed the first MAY to lowercase, but leaving the SHUOLD
> in, as the whole point is to be able to distinguish/recognise the
> resource seperately from each other.

See my comment above on the use of BCP14 for local actions.

And what about the last SHOULD? :-)

> > And if this is a value, associated with a resource, then how it may be unique
> > in a situation when peer associates several SAs with a single resource
> > (e.g. if it has fewer resources then its peer, but doesn't mind creating as many
> > SAs as its peer wants)?
> 
> You can pick a random value or a counter or whatever you want? I'm not
> sure we need text to explain that?

No, my point was that if this value identifies (somehow, possibly indirectly)
the resource the SA is bound to, then it cannot be unique if multiple SAs
are bound to it. Or you need some kind of map (multimap).

> > 12. Section 6
> >
> >   Implementations supporting per-CPU SAs SHOULD extend their local SPD
> >   selector, and the mechanism of on-demand negotiation that is
> >   triggered by traffic to include a CPU (or queue) identifier in their
> >   SADB_ACQUIRE message from the SPD to the IKE daemon.
> >
> > I think that using BCP14 language is not justified here (it is not protocol behavior).
> > It should be "should" instead.
> 
> I think it is similar to "MUST use secure random source". It is an
> assumption or requirement of proper functioning of the specified
> protocol.

I disagree. "MUST use secure random source" - must do it each time
the protocol runs. "SHOULD extend their local SPD" - requirement
for implementers of this protocol extension, not an actions for executing 
during protocol run.

> > 13. Section 6
> >
> >   And
> >   bringing the deleted per-CPU Child SA up again immediately after
> >   receiving the Delete Notify might cause an infinite loop between the
> >   peers.  Another issue of not bringing up all its per-CPU Child SAs is
> >   that if the peer acts similarly, the two peers might end up with only
> >   the first Child SA without ever activating any per-CPU Child SAs.
> >
> > I think more should be said about this situation and how to avoid it.
> > Perhaps advising such implementations to not delete per-CPU Child SAs
> > if they have not been rekeyed.
> 
> I don't like that advise. I would still like to see that idle child SA's
> get removed so we don't collect too many useless Child SAs.

Then advise needed of how to deal with loops.

> > Then the second case will never happen.
> > As for the first case, perhaps advise all implementations to not
> > delete SAs immediately once they are created...
> 
> I don't think that is a problem we have. Even if we use a short idle
> timer of a few seconds, at least it is not a wire speed loop request.

So, if SAs are deleted and re-created every few seconds - is it OK?

> > 14. Section 6
> >
> >   Implementations might support dynamically moving a per-CPU Child SAs
> >   from one CPU to another CPU.  If this method is supported,
> >   implementations must be careful to move both the inbound and outbound
> >   SAs.
> >
> > If this is done "on the fly", then the data in SA_RESOURCE_INFO (if it is provided)
> > would become invalid, killing its usefulness for debugging.
> 
> Not always. Eg if the postfix mail server process migrates to another
> CPU, it might make sense to migrate the Child SA there as well. And
> the debugging would still be valid if mail traffic triggered that SA.

Then I failed to understand the purpose of the data field in SA_RESOURCE_INFO.
If it is not related to a CPU, then what it is related to? Or do you mean that there
is some local mapping between this value and COU number, that is changed
every time SA is migrated to a different CPU?

> >  If it is done
> > as a result of SA rekey, then it contradicts to the last sentence in Section 3
> > (well, I already suggested to remove it, see comment 5).
> 
> I think these are two different atomic processes. rekeying and sa migration.

Why they cannot be done at the same time?

Put it differently - if it is ever allowed to migrate SA from one CPU to another, 
why it is prohibited to do this at the time the SA is being rekeyed (bind
new SA to a different CPU)?

> > 15. Section 6
> >
> >   If this method is supported,
> >   implementations must be careful to move both the inbound and outbound
> >   SAs.
> >
> > Is it justified? Why inbound and outbound SAs cannot be bound to a different resources?
> 
> I don't think that makes sense from a performance point of view?
> I'm happy to hear a few more opinions on this.

>From my understanding incoming and outgoing SAs are completely independent.
I can see situations when there is a value to bound them to the same CPU
(your example with potfix works), but I don't think this is a universal 
requirement for all situations. For example, if both peers are SGW,
separated from traffic generation applications, then what is the value
of this requirement?

> > 16. Section 6
> >
> >   If the IPsec endpoint is a gateway, it can move the inbound SA
> >   and outbound SA independently from each other.
> >
> > This contradict to the previous sentence.
> 
> It does say "must be careful". The gateway case is different from the
> endpoint case where the packet is created on the host that is also the
> ipsec endpoint.

Sure. That's why I'd like to see more flexible recommendations.

> Perhaps Antony has some changes that would make everyone happier?

Please.

> > 17. Section 6
> >
> >   If the IPsec endpoint
> >   is the same host responsible for generating the traffic, the inbound
> >   and outbound SAs SHOULD remain as a pair on the same CPU.
> >
> > Using BCP14 language is not justified here (it is a local matter and
> > not a protocol behavior, so this is just a recommendation
> > for implementers). And some justification for this recommendation
> > would be helpful.
> 
> Again, it would cause performance issues which is at the core of this
> protocol, so I think it is justified?

That's why I raised the point :-) BCP14 language usually is used when
interoperability or security is the issue, here we have an optimization
for (sometimes) better performance. In my opinion, "should" suffice.

> > 18. Section 6
> >
> >   If a host
> >   previously skipped installing an outbound SA because it would be an
> >   unused duplicate outbound SA, it will have to create and add the
> >   previously skipped outbound SA to the SAD with the new CPU ID.
> >
> > This sentence unclear for me. First, I fail to understand why this is needed.
> > Then, I fail to understand how to do this, in particular -
> > how to get SPIs and keying material of the skipped (i.e. deleted) SA.
> > I believe this piece of text should be elaborated.
> 
> The idea is that if you move Child SA #1 away from Child SA #2, and
> #2 did not install an outbound, the resource might be left without
> any outbound which would be wrong/bad.

OK, I see your point. Still wonder whether it is always needed
(it seems that you may recover from this situation or live with it,
having some performance penalty) and how to do it if the outbound
SA was deleted (no information has been left from it). Do you
advise always save outbound SAs when they are deleted as not needed
(i.e. make them instead dormant)?

> > 19. Section 6
> >
> >   The
> >   inbound SA may not have CPU ID in the SAD.
> >
> > I fail to understand what this sentence is related to?
> 
> It is a bit unclear to me too. Perhaps Antony can clarify.

Please.

> > 20. Section 6
> >
> >   To support this, the IKE software might have to hold
> >   on to the key material longer than it normally would, as it might
> >   actively attempt to destroy key material from memory that the IKE
> >   daemon no longer needs access to.
> >
> > In my opinion this requirement is problematic. It highly depends on
> > concrete architecture. In some cases IKE daemon has never access to the Child SA
> > keying material - it only provides values for its generating,
> > which is done in a HSM. And in this case IKE daemon has no control
> > over this material (and it never needs to have an access to it or control over it).
> 
> Whatever method used to keep a pointer to the key is what would be
> needed. I dont think it matters wether that is a crypto module fd
> or just a memory pointer.

Both methods require keeping the information from deleted SAs.

> > So, I think that this should be elaborated, in particular justifying
> > why do you need to re-create outbound SA which you have already deemed unneeded.
> 
> To fill in a hole left on a resource if your moving child sa happens to
> be the last outbound SA on that resource, which then ends up with no
> outbound because the last remaining child sa did not install its
> outbound.

I see. 

> > I wonder whether TS_MAX_QUEUE is a permanent or temporary error.
> 
> I think it can depend. If too many customers with too many CPUs, perhaps
> it is a fairly permanent error. But if customers are removed or CPUs
> added, perhaps it can go away again too. If used as a resource
> constraint, eg say "each customer up to 1024", then it might be more
> or less permanent for that customer.

So, what do you recommend for the peer received this notify?
Should it try to create more SAs later or not?

And one more note - please clarify that this error notification
is non-fatal: there is no need to tear down IKE SA if you received it.

> > 22. Section 7
> >
> >   Similar to how an implementation should limit the number of half-open
> >   SAs to limit the impact of a denial of service attack, an
> >   implementation SHOULD limit the maximum number of additional Child
> >   SAs allowed per unique TSi/TSr.
> >
> > In section 6 you said:
> >
> >   Peers SHOULD be lenient with the maximum number of Child SAs they
> >   allow for a given TSi/TSr combination to account for corner cases.
> >
> > and later
> >
> >  As additional Child SAs consume  little additional overhead, ...
> >
> > So, some wordsmithing to make these pieces of text in concert would be helpful.
> 
> Happy to hear suggestions. I think the overal idea is "dont try to get
> too close to the real number of CPUs, allow at least several times the
> number of CPUs of common Big Appliances. Eg mostly use it as a safety
> cap.

Perhaps:

   While implementations are advised to be lenient with the maximum number of Child SAs they
   allow for a given TSi/TSr combination (see Section 6), they SHOULD limit the maximum number 
   of these Child SAs similar to how they should limit the number of half-open SAs 
   to limit the impact of a denial of service attack

> > 23. Section 7
> >
> >   This trust relationship is usually not present for the Remote Access
> >   VPN type deployments, and allowing per-CPU Child SA's is NOT
> >   RECOMMENDED in these scenarios.  Therefor, it is also NOT RECOMMENDED
> >   to allow per-CPU Child SAs per default.
> >
> > In general, I don't think that this statement is universally true. If it is a corporate
> > Remote Access scenario, then clients are administered by the same people
> > as gateways. Thus, I don't think that advising not use this extension
> > in Remote Access scenario due to the lack of trust to clients is always justified.
> 
> Isn't that covered by "usually not present". I am thinking of $5/month
> customers, not million dollar customers. You have a suggestion for the
> text to make it work for you as well?

I would just remove the "NOT RECOMMENDED" for Remote Access.

> > 24. Section 7
> >
> >   The SA_RESOURCE_INFO notify contains an optional data payload that
> >   can be used by the peer to identify the Child SA belonging to a
> >   specific resource.  The notify data SHOULD NOT be an identier that
> >   can be used to gain information about the hardware.  For example,
> >   using the CPU number itself as identier might give an attacker
> >   knowledge which packets are handled by which CPU ID and it might
> >   optimize a brute force attack against the system.
> >
> > Isn't it protected by IKE SA? I believe this is only possible with MitM attack
> > (only initiator is affected) or when CRQC is used with RFC 8784 scenario for the initial SA.
> > In the first case the SA won't be created. The second case is currently unrealistic.
> > Am I missing something? If not, then I suggest to just describe this
> > threat giving no advices (i.e. remove SHOULD NOT), since it seems to be
> > theoretical threat. Just let the implementers know that this is possible.
> 
> Again, I was thinking of $5/month people that you should not really trust
> to share your hardware details with.

OK I see your point. You just don't trust the peer with whom you 
just established a secure connection to share your hardware details :-)

> The -03 should appear shortly. Once we resolve the last few issues we
> can do another update.

Sorry for being tedious :-)

Regards,
Valery.
 
> Paul

[IPsec] WGLC of draft-ietf-ipsecme-multi-sa-perfo… Tero Kivinen
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Christian Hopps
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Yoav Nir
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Michael Richardson
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Daniel Xu
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Sahana Prasad
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Daniel Migault
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Yoav Nir
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Valery Smyslov
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Michael Richardson
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Yoav Nir
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Paul Wouters
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Paul Wouters
Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-p… Valery Smyslov