Re: [Int-dir] int-dir review of draft-ietf-ipsecme-ikev2-fragmentation

Valery Smyslov <svan@elvis.ru> Thu, 08 May 2014 12:59 UTC

Message-ID: <1D14A92EE52B4AD7B21ACAA590F8279F@buildpc>
From: Valery Smyslov <svan@elvis.ru>
To: Ole Troan <otroan@employees.org>, ipsec-chairs@tools.ietf.org, kathleen.moriarty.ietf@gmail.com
References: <5A97FAD4-3E5B-436F-84C8-D0DC2FDA5E20@employees.org>
Date: Thu, 08 May 2014 16:59:06 +0400
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/int-dir/Ed2n4tsYf_8GDtCKezoYGZYa_dk
Cc: Brian Haberman <brian@innovationslab.net>, int-dir@ietf.org
Subject: Re: [Int-dir] int-dir review of draft-ietf-ipsecme-ikev2-fragmentation
Precedence: list

Hi Ole,

thank you for your review. Please, find my comments inline.

> chairs, authors,
>
> Brian asked me as part of the int-directorate to review this document.
>
>
> Joe Touch had some good comments on this from the transport perspective.
> (http://www.ietf.org/mail-archive/web/ipsec/current/msg08653.html)
>
> Architecturally this opens up a larger debate. Any UDP application 
> protocol that depends
> on IP fragmentation and doesn't do segmentation itself must be changed... 
> if the premise is
> correct that IP fragmentation cannot work correctly.
>
> there are a few approaches open to us:
> - consider this a bug in the network, and require that to be fixed
> - start work on a replacement of UDP that supports segmentation, SCTP?
> - fix the applications.
>
> I think this document solves a real problem. as a general recommendation I 
> think the document
> should not re-specify Path MTU discovery. it should use RFC1981 and make 
> application specific
> considerations referencing RFC4821 (section 9, section 10.4)
>
> I would recommend against recursive fragmentation.

Sorry, I don't understabd what do you mean by "recursive fragmentation".
If you refer to fragmenting fragments, than there is no such thing.

> I would also question the need to do path MTU discovery for a protocol 
> that only does a few
> message exchanges? at least the document should clearly point out that 
> there are three options
> for implementations of this:
> - implementations that do not do PMTUD and does use the minimum message 
> sizes (576/1280)
> - implementations that depend on PMTUD (1191/1981) with a fallback to 
> minimum
> - implementations that use PMTUD/PLMTUD (1191/1981/4821) with application 
> probes

I completely agree with you here. Actually, the document tries to make 
exactly the same point:

   In most cases PMTU discovery will not be
   needed, as using values, recommended in section Section 2.5.1, should
   suffice, so there is no requirement to support PMTU discovery in IKE.
   However it is RECOMMENDED to be supported, especially in environments
   where PMTU size are smaller, than those listed in Section 2.5.1, for
   example due to the presence of intermediate tunnels.

(note that in previous version of the draft PMTUD was marked as "MAY", but 
it was pointed out
during IESG evaluation that "MAY" doesn't mean anything and I changed it to 
"RECOMMENDED",
but tried to clear indicate that it is not required to implement).

What about possible ways to perform PMTU discovery.

Classical PMTUD (1191/1981) have some issues with connectionless protocols. 
Both RFCs
mention, that with UDP it is often hard to get ICMP information from the 
kernel to the application
(in Sections 5.2 and 6.2 respectively). Note also, that ICMP may be blocked
or just dropped by broken network devices (as they often drop IP fragments),
so it is unreliable to rely on it. Nevertheless the draft states, that if 
PMTU
information is available, it SHOULD be used (in SEction 2.5.1).
I think I can make this more explicit.

And regarding PLMTUD (4821) - in fact the draft doesn't (re)invent its own
way to perform PMTU discovery, actually it uses 4821 approach,
wich is modified so that probing is done downward instead of upward.
The reason for this modification is that probing upward  with IKE
is equivalent to not doing PLMTUD at all. The peculiarity of IKE
is that during lifetime of IKE SA each peer sends very few
large messages (if no EAP authentication is involved, then there will
be one large message from each peer) and all of them
are sent in the beginning of IKE SA lifetime, during SA establishment.
All other messages, that are sent over IKE SA are typically
less that 500 bytes (there may be few exceptions).
So, if one use RFC4821 algorithm, then it starts probing
from the smallest allowed MTU size. I assume here, that
probing is done using application probes, as recommended
in Section 10.4, and IKE Fragmentation is used to construct
probes of various sizes (so, you just fragment outgoing message
into IKE Fragments of desired size and send all of them,
waiting for response). As you start probing from smallest
fragmens, the very first probe succeeds and the IKE exchange
also succeeds (it is IKE_AUTH exchange - the first and
almost often the only exchange that sends large messages).
As it succeds, IKE SA is established and during it lifetime
there will be no large messages anymore (probably with
a few exceptions), so there is no point in doind PMTUD
anymore as small messages won't be fragmented anyway.

As result - performing unmodified PLMTUD from 4821 in case of IKE
is roughly equivavlent to not performing it at all and always using
the smallest allowed MTU size.

To make PLMTUD work in case of IKE the draft suggests
that probing should be done downward, as they are done
in classic PMTUD (1191/1981). So, we first try to fragment
outgoing message into the largest allowed MTU size (link MTU)
and wait for response. If no response is received after few
retransmissions we assume that fragments are too large and
refragment original message with smaller message size.
And so on untill we get the response. If application is capable
of receiving ICMP information from kernel, this approach
becomes classic PMTUD, if it cannot (or if ICMP doesn't
reach sender), then it takes longer, but still it works.

So, I think the implementers should be given 2 options:
 - do not do PMTUD and does use the minimum message sizes
 - doing combination of PLMTUD with probing downward and PMTUD (1191/1981)
   (they can be combined).

Doing PLMTUD with probing upward (as in 4821) is almost equivalent to the 
first option.

> comments (marked @@)
> ----------------------------------
>
>
> 1. Introduction:
> @@ it would be good if the document made it clear that for IPv4 these NAT 
> devices are broken
> (ref RFCXXX), and that for IPv6 throwing away fragments is an operational 
> choice, and it is as far
> as I know no evidence that this happens on the open Internet. (it 
> certainly happens on network borders,
> e.g. in front of services, where the operator has control and is confident 
> no services using fragmentation exists).
>
>   The problem is that some network devices,
>   specifically some NAT boxes, don't allow IP fragments to pass
>   through.  This apparently blocks IKE communication and, therefore,
>   prevents peers from establishing IPsec SA.  This problem is valid for
>   both IPv4 and IPv6 [FRAGDROP].

How about the following text:

   The problem is that some network devices,
   specifically some NAT boxes, don't allow IP fragments to pass
   through.  This apparently blocks IKE communication and, therefore,
   prevents peers from establishing IPsec SA.  This problem is valid for
   both IPv4 and IPv6 and may be caused either by deficiency of devices 
[RFCXXX]
   or by operational choice [FRAGDROP].

And could you please recommend what RFC can be referenced as RFCXXXX?
I failed to find appropriate reference.

> @@ the document would benefit from a few more definite articles. (and 
> language review in general).
> nothing that reduces understandability though.
>
>   e.g. "Initiator may first try to send
>   unfragmented message and resend it fragmented only if it didn't
>   receive response after several retransmissions, or it may always send
>   messages fragmented (but see Section 3), or it may fragment only
>   large messages and messages causing large responses."

I'm not completely sure what do you mean by "definite articles",
but how about the following:

    Initiator may use various policies regarding using fragmentation.
    It may first try to send unfragmented message and fragment it only if it 
didn't
    receive response after several retransmissions. Another option
    is to always fragment outgoing messages (but see Section 3).
    Or it may fragment only large messages and messages causing
    large responses.

> 2.4 Using IKE fragmentation
>
> @@ this seems a little at odds with a section full of 2119 language.
>    "In general the following guidelines are applicable for Initiator:"
>    "In general the following guidelines are applicable for Responder:"

Mmm, how to make guidlines without using RFC2119 words?

> @@ vague. replace with recommendation to base the decision on the link and 
> path MTU?

OK, this is reasonable.

>   o  Initiator MAY fragment outgoing message if it has some knowledge
>      (possibly from lower layer or from configuration) that either
>      request or response message will be fragmented by IP level or if
>      unfragmented message was sent and no response was received after
>      several retransmissions.
>
>   o  Responder MAY respond to unfragmented message with fragmented
>      response if it has some knowledge (possibly from lower layer or
>      from configuration) that response message will be fragmented by IP
>      level.
>
> @@ this section is does not define "fragmentation threshold"

Good catch, thank you. I'll fix it.

> 2.5.  Fragmenting Message
>
> @@ could "fragment number" and "total fragments" be single octets?

I agree that in most real life use cases 8 bits will suffice.
However, the size of Length field in IKE header is 32 bits,
so in theory IKE messages may be up to 4Gb in size.
Currently the size of IKE message is limited to 64Kb
by UDP transport, but this may change in future,
and I didn't want to introduce potential bottleneck here.
Note also, that making these fields 16 bits long
preserves 32-bit alignment of payload data, that
may be benefitical for implementation.

> 2.5.1
>
> @@ I'm not sure text like... adds value.
>
>   "Sender MAY use other values if they are
>   appropriate."

The idea is not to limit implementers with the recommended values
listed above if implementation is intended to work in environment
where these values are not applicable (for example, some networks
with extremely low MTU - below 576 bytes).

> 2.5.2
>
> @@ I hope "refragment" doesn't mean that the individual fragments are 
> fragmented again?
> but rather that the original message is fragmented into a set of smaller 
> fragments?
>
>   "While doing probes, node MUST start from
>   larger values and refragment message with next smaller value if it
>   doesn't receive response in a reasonable time after several
>   retransmissions. "

The latter: original message is fragmented into a set of smaller fragments.
I can made this more explicit:

   While doing probes, node MUST start from
   larger values and refragment original message with next smaller value if 
it
   doesn't receive response in a reasonable time after several
   retransmissions.

> @@ please give better advice. "reasonable time" is hard to implement.
>   "doesn't receive response in a reasonable time after"

This value are related to IKE retransmission timer. And RFC5996
deliberately rejects to give any specific value for it (Section 2.4):

   The number of retries and length of timeouts are not covered in this
   specification because they do not affect interoperability.  It is
   suggested that messages be retransmitted at least a dozen times over
   a period of at least several minutes before giving up on an SA, but
   different environments may require different rules.  To be a good
   network citizen, retransmission times MUST increase exponentially to
   avoid flooding the network and making an existing congestion
   situation worse.

I do not think that this document should give any specific value either,
but probably I need to explain this better, as well as relation to the 
RFC5996 timers.

> @@ should also reference RFC1981

OK.

> @@ oh, so it is recursive fragmentation. that makes me somewhat 
> uncomfortable.
>
>   "In case of PMTU discovery Total Fragments field is used to
>   distinguish between different sets of fragments, i.e. the sets that
>   were obtained by fragmenting original message using different
>   fragmentation thresholds.  As sender will start from larger fragments
>   and then make them smaller, the value in Total Fragments field will
>   increase with each new try.  When selecting next smaller value of
>   fragmentation threshold, sender MUST ensure that the value in Total
>   Fragments field is really increased.  This requirement should not
>   become a problem for the sender, as PMTU discovery in IKE is supposed
>   to be coarse-grained, so difference between previous and next
>   fragmentation thresholds will be significant anyway.  The necessity
>   to distinguish between the sets is vital for receiver as receiving
>   any valid fragment from newer set will mean that it have to start
>   reassembling over and not to mix fragments from different sets."

Sorry, I don't see here "recursive fragmentation". Here is an explanation
of PLMTUD probing with searching downward. Fragmented messages
are never fragmented again. Recursive (nested) fragmentation
is not used, moreover it is impossible with this specification.

> section 2.6
>
> @@ section 2.5 states that the EFP MUST be the last payload message, but 
> section 2.6
> specifies that it may not be the first. inconsistency?
>
>   The Encrypted Fragment Payload, similarly to the Encrypted Payload,
>   if present in a message, MUST be the last payload in the message.
>
>  Note, that it is possible for this payload to be not the first (and the 
> only) payload in the message (see
>   Section 2.5.3).

No inconsistency here. EFP MUST be the last payload in message - that's 
true.
Currently in all cases it is also the first and the only payload in message.
But IKE allows message construction when there are other payloads
preceeding Encrypted Payload (and, therefore, Encrypted Fragment Payload)
if message is intended to be only partialy encrypted. In this case
EFP won't be the first and the only payload, but still will be the last.
Currently no such mixture messages are defined in IKE and its extensions,
but they are not prohibited and may be defined in future IKE extensions.

> @@ please recommend a default timeout interval.
>   If receiver doesn't get all IKE Fragment Messages needed to
>   reassemble original Message for some Exchange within a timeout
>   interval, it acts according with Section 2.1 of [RFC5996], i.e.
>   retransmits the fragmented request Message (in case of Initiator) or
>   deems Exchange to have failed.

Timeout interval here is the same as timeout for initiator from section 2.4 
of RFC5996.
And again I don't think this document should specify it as it is not 
specified
in RFC5996. However, better explanation of what this timeout is
should have been done.

I think I can address most of your comments in new version of the document
in a few days. However, tomorrow is a state holiday here,
so it won't be probably before Tuesday.

Thank you,
Valery Smyslov.

[Int-dir] int-dir review of draft-ietf-ipsecme-ik… Ole Troan
Re: [Int-dir] int-dir review of draft-ietf-ipsecm… Valery Smyslov
Re: [Int-dir] int-dir review of draft-ietf-ipsecm… Valery Smyslov