Re: [IPsec] draft-liu-ipsecme-ikev2-mtu-dect early TSVAREA review

Hi,

Thanks for the feedback, please find my comments and questions inline.

Yours,
Daniel

On Fri, Jan 13, 2023 at 8:41 PM touch@strayalpha.com <touch@strayalpha.com>
wrote:

> Hi, Daniel,
>
> On Jan 13, 2023, at 2:12 PM, Daniel Migault <mglt.ietf@gmail.com> wrote:
>
> Hi Joe,
>
> Thanks for the comment. There are some terminologies we were not using
> properly, so thank you for the clarification. Please find inline our
> clarification and implementation of your concerns.
>
> Yours,
> Daniel
>
> On Sun, Jan 8, 2023 at 11:45 AM touch@strayalpha.com <touch@strayalpha.com>
> wrote:
>
>> Hi, Daniel,
>>
>> The abstract clearly states a goal that is not achievable (of avoiding
>> reassembly). The best way to avoid the impact of mid-tunnel fragmentation
>> is to use IPv4 as a tunnel header the way that IPv6 would be - with DF=1.
>> However, even so, the egress always needs to handle reasssembly as long as
>> there is even source fragmentation.
>>
>
> I understand the comment as our goal is interpreted to avoid
> reassembling operations to happen completely. This would mean that
> reassembly could even not be implemented.
> This is not our intention. Reassembly happens the same way it happens
> today. The only thing we do is that the egress node notify the ingress node
> that reassembly is happening. The ingress node may or may not take any
> action to prevent reassembly to happen with the next packets being tunneled
> over the IPsec tunnel. In that sense "avoid" needs to be understood as
> reducing the number of occurrences the reassembly operation happens.
>
> We may agree the best way to avoid mid tunnel fragmentation is to set
> DF=1. But in our case we cannot meet this condition.
> The current text in the abstract is
>
> OLD:
> This document considers an ingress and an egress security gateway
> connected over an IPv4 network.
> The Tunnel Link Packet have their Don't Fragment (DF) set to 0.
>
> Does the text below is clearer to say:
>
> NEW:
> This document considers an ingress and an egress security gateway
> connected over a IPv4 network with the Tunnel Link Packet Don't Fragment
> (DF) set to 0.
>
>
> That is better English, but still technically ill-advised. Any solution
> that requires IPv4 DF=0 then requires generation of unique IDs that don’t
> wrap in ways that could cause mis-reassembly per RFC 6864.
>
> The introduction mentions the rationals on why we cannot rely on setting
> DF=1. Typically some routers do not check the MTU and ignore the packet
> without returning a ICMP PTB error and in many deployments the ICMP PTB -if
> sent - is blocked and is not received. This prohibits the use of DF=1 with
> IPv4.
>
>
> You have described the reason why PLPMTUD exists, which is not a rationale
> for continuing to use on-path IPv4 fragmentation.
>

>
>> I appreciate what you WANT to do - but again, it is not possible. You
>> have two behaviors - either use inner fragmentation (which won’t work for
>> transit traffic where IPv4 DF=1 or any IPv6) or reduce the tunnel MTU.
>>
>> But the tunnel MTU is defined by EMTU_R of the tunnel egress, not EMTU_S
>> of the tunnel ingress. If you reduce the tunnel MTU, you’re just going to
>> end up black-holing packets arriving at the tunnel ingress.
>>
>>  ok. I misunderstood tunnel MTU and that tunnel MTU is EMTU_R, this is
> not what we are changing. What we had written might be confusing.
> When I said EMTU_R I was considering the router only without any
> consideration of the tunnel.  From the terminology section of
> intarea-tunnel I did not read EMTU_R applies to a tunnel environment, and
> considered this to be the MTU associated to the interface for incoming
> packet to the router.
>
> Here is what we actually meant:
>
>
> We are ensuring that packet that are encapsulated by the Ingress interface
> do not exceed the tunnel MAP.  My understanding is  that the tunnel MAP is
> the largest IP packet the source can send,  that will not be fragmented by
> the network between the Ingress and egress interface. As it is not
> fragmented, fragments will not be reassembled.
>
>
> Please review intarea-tunnels.
>
> Setting Ingress send size to MAP doesn’t avoid source fragmentation, which
> thus doesn’t avoid reassembly. It just sets the size of each fragment to
> avoid on-path fragmentation - which avoids the need for DF=0. So setting
> DF=0 is exactly what you don’t need.
>
> To do so, we set the MTU of the router associated with the Ingress
> interface is set to the tunnel MAP. This corresponds to set tunMTU =tunMAP
> Figure 11 of intarea.
>
> Suppose an IP packet is sent by the source and meets that router.
> * The packet has DF=1. If it is larger than that MTU (= tunMAP), the
> packet is discarded and an ICMP PTB message is sent back to the source. The
> source will proceed to source fragmentation.
>
>
> When the IP packet gets to the router, the link should have an MTU of the
> tiunnel EMTU_R, not MAP.
>
I agree that setting the MTU to EMTU_R is the largest possible value.
However, setting it to a smaller value may also  be possible. If not, to
better understand, do we have an example of a packet that cannot be
processed if the MTU is set to tunMAP but that can be processed if the MTU
is set to EMTU_R.

> If the packet arrives with DF=1, then if it’s smaller than the tunnel
> EMTU_R, it will pass. If not, the router has no choice but to drop the
> packet (and try to send an ICMP PTB if that’s possible).
>
> If the packet passes, it is wrapped with the tunnel header. THAT packet
> header still needs to be source fragmented if the result is larger than MAP
> into chunks no larger than MAP.
>
>
The rest of this text is incorrect:
>
> * The packet has DF=0.  that is larger than that MTU the router fragments
> it in fragments less than tunMAP thus performing inner fragmentation.
>
> * Any packet smaller than the MTU = tunMAP is sent to the ingress
> interface and encapsulated.
>
>
> I agree that we MUST ensure that ICMP PTB messages are received by the
> source and lead to source fragmentation otherwise, this will result in
> black holding traffic between the tunnel MAP and the original MTU of the
> router.
>
>
> I’m not following whether the ‘router’ here is the ingress or egress or
> along the path.
>
> This is the ingress router indicating its MTU to the source sending the
inner IP packet.

> The router where the ingress occurs (if that’s a router, e.g., and not a
> host) would send PTB upstream if the forwarded packet is larger than the
> tunnel EMTU_R.
>
> agree

> Routers along the tunnel path would send PTB to the ingress of the tunnel
> packets (or source fragments, if fragmented) were larger than its next-hop
> MTU.
>
> agree with the ingress of the tunnel packet being the ingress interface.

> If the packet received is larger than the tunnel egress EMTU_R, then it
> won’t be reassembled. There is no such thing as a “reassembly too big”
> error at the receiver, so no ICMP would be issued. At best, you might get a
> reassembly timeout ICMP - but note that would NOT tell you how big the
> tunnel EMTU_R is.
>

Just to make sure I am correct. If the inner (encrypted) packet is larger
than EMTU_R no ICMP PTB is sent for that inner packet, that is no message
is sent to the source. The reason here is that the inner packet is
encrypted.
On the other hand, if DF=1 an ICMP PTB is sent to the ingress interface.

>
>
> Note that by setting DF=1 you are supposed to be able to handle this kind
> of situation. So I do not see this as a major issue.
>
> Two important points: the tunnel ingress is NOT the one that should ever
>> send PTB back; that’s the job of the router where/if that tunnel ingress
>> resides; second, you cannot claim to get around an ICMP black hole
>> situation by creating a new ICMP black hole situation.
>>
>> With the mechanism we clarified, the ICMP PTB is sent by the router where
> the ingress interface is.
>
>
> See above; whether that happens has nothing to do with tunnel source
> fragmentation or the tunnel MAP. As far as the router is concerned, whether
> a tunnel can transit a packet depends ONLY on the tunnel EMTU_R.
>
> agree

> Regarding the blackholing situation, in the first case, it results from
> the transit network which is out of the control of the administrator of the
> source. On the other hand, the administrator of the source is able to
> ensure that ICMP packet sent by the ingress router will be received by the
> source. This only happens when DF=1 which supposes that MTU can be handled
> as well as source fragmentation.
>
>
>
>> The rest of the document is rife with further errors, e.g.:
>>
>> The last sentence of the abstract is incomprehensible.
>>
>>  The last sentence of the abstract is:
>
> """
> The ingress security gateway is expected to either perform (when possible)
> Inner Fragmentation of to update Tunnel MTU.
> """
> is the wording below clearer ?
> """
> The ingress security gateway is expected to either use inner fragmentation
>  or consider the tunnel MAP as its  tunnel MTU.
>
>
> That is incorrect as explained above.
>
> I won’t bother with the remainder of this post, as continuing to fix text
> errors won’t help until a viable mechanism is proposed.
>
> Joe
>
> """
>>
>> In the intro, the claims about IPv4 ID are incorrect. As noted in
>> RFC6864, speed is not the issue but rather the expected reordering.
>> Additionally, there’s already a timeout, so there is no requirement for
>> indefinitely kept state. Further, given source fragmentation, such issues
>> are irrelevant.
>>
>>
> We mention high rates as ID collision requires a number of packets. Let
> say, a link with 3 packets is very unlikely to result in an ID collision.
> draft-ietf-intarea-tunnels in section 4.1.4 mentions high speed nodes as
> well, so it is unclear what is wrong in our formulation below - but I
> remain open to a clearer one.
>
> """
> Then, as detailed in {{?RFC4963}}, {{!RFC6864}} or {{!RFC8900}}, the
> 16-bit IPv4 identification field is not large enough to prevent duplication
> making fragmentation not sufficiently robust at high data rates.
> """
>
> DF=1 leads to black holing ONLY in the absence of PLPMTUD, which is the
>> appropriate solution for IPsec tunnels anyway.
>>
> Again we are not considering the case where DF=1. Even though we were
> considering DF=1, I am inclined that IPsec does not have  PLPMTUD which
> seems to indicate that DF=1 is subject to blackholing.
> As far as I know there is no PLPMTUD defined for IPsec. The only document
> I found was the following one:
>
> https://www.ietf.org/archive/id/draft-spiriyath-ipsecme-dynamic-ipsec-pmtu-01.txt
>
> That said, with IPsec IKEv2 and ESP may take different paths so having
> a  PLPMTUD is likely to impact ESP and IKEv2. In addition, we are concerned
> not by the MTU but the tunMAP.
> In our case, we simply define a notification for IKEv2, which seems way
> more simpler.
>
> On the other hand, tunnel MTU does not prevent reassembly to occur.
> It is correct that in addition to tunMAP we may also provide tunMTU to
> prevent blackholing from happening at the egress gateway. The ingress node
> will then be able to distinguish if the packet is discarded or fragmented.
>
> Note that even if DF=0, black-holing could still occur if the Tunnel
>> Transit packet exceeds the tunnel egress EMTU_R.
>>
>> The notification may look like:
>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |           Link Path Maximum Atomic Packet Value               |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |                        EMTU_R                                 |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>> Joe
>>
>> —
>> Dr. Joe Touch, temporal epistemologist
>> www.strayalpha.com
>>
>> On Jan 4, 2023, at 7:21 PM, Daniel Migault <mglt.ietf@gmail.com> wrote:
>>
>> Hi Joe,
>>
>> We are waiting for your feedback, but I just want to check we have the
>> same understanding and that we will have a feed back.
>>
>> We would like to understand if the terminology used in the document is
>> aligned with the one of intarea-tunnels and if we agree that
>> intarea-tunnels and IPsec have different perspectives on
>> handling fragmentation. I do not see what we are proposing very different
>> from what IPsec has been doing for years.
>>
>> I also think that everything is explained in the introduction (2 text
>> pages), so you do not have to read the full document. The document is
>> available here:
>> https://datatracker.ietf.org/doc/draft-liu-ipsecme-ikev2-mtu-dect/
>>
>> Yours,
>> Daniel
>>
>> On Sat, Nov 26, 2022 at 9:25 AM Daniel Migault <mglt.ietf@gmail.com>
>> wrote:
>>
>>> Hi Joe,
>>>
>>> So  we just published an update of our draft. We try to catch up the
>>> complete idea in the introduction - to avoid reading the complete draft. I
>>> think we partly aligned with the tunnel document. The current version only
>>> describe the security gateway as a node and does not split it between a
>>> outer and an interface. I think for the remaining of the document we are
>>> taking the exact terminology from the tunnel draft.
>>>
>>> We believe that IKEv2 and the tunnel document have different visions and
>>> tried to highlight this also.
>>>
>>> One big clarification in my point of view is that the previous version
>>> confused MTU with MAP.
>>>
>>> We are happy to get your feedback.
>>>
>>> Yours,
>>> Daniel
>>>
>>> On Mon, Oct 31, 2022 at 5:32 PM touch@strayalpha.com <
>>> touch@strayalpha.com> wrote:
>>>
>>>> On Oct 31, 2022, at 11:07 AM, Daniel Migault <mglt.ietf@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> - the tunnel has two DIFFERENT relevant MTUs
>>>>> the egress reassembly MTU (EMTU_R), which is the only thing that
>>>>> should drive the “tunnel MTU”
>>>>>
>>>>> the tunnel MTU, which the ingress needs to know for source
>>>>> fragmentation, but is NOT relevant to the
>>>>> origin MTU upstream of the ingress
>>>>>
>>>>> Will read the draft - but we believe that is better to generate one
>>>> IPsec packet for every inner IP packet as opposed to two. This is why we
>>>> are proposing to adjust the MTU so the outer packet matches the limit of
>>>> the EMTU_R - and fragmentation be avoided.
>>>>
>>>>
>>>> That doc explains why this is effort isn’t useful. As I noted to Tero,
>>>> there’s no ICMP message that says “bigger than I’d like”. PTB means
>>>> “packets larger than this will be dropped”. That’s not what’s going on
>>>> here, so it’s the wrong message to support.
>>>>
>>>> There is no message that supports what you’re trying to do - perhaps
>>>> because there can’t and shouldn’t be.
>>>>
>>>> Joe
>>>>
>>>
>>>
>>> --
>>> Daniel Migault
>>> Ericsson
>>>
>>
>>
>> --
>> Daniel Migault
>> Ericsson
>> _______________________________________________
>> IPsec mailing list
>> IPsec@ietf.org
>> https://www.ietf.org/mailman/listinfo/ipsec
>>
>>
>>
>
> --
> Daniel Migault
> Ericsson
>
>
>

-- 
Daniel Migault
Ericsson