Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt

Hi Fred,

Here are some comments on the latest draft.

Both the abstract and introduction lead by discussing IPv4
fragmentation. Should only be talking about IPv6 here.

The definitions of "source" and "destination" are confusing and not
typical. Typically, a source host is the source of an IP packet and
identified by the source address, a destination host is the
destination of an IP packet and is addressed by the destination
address. In the presence of a routing header the "final destination"
is the last address in the route list.

"router" is the common term for "intermediate systems". I suggest just
using "router" instead.

"Upper layer protocols often achieve greater performance by
configuring segment sizes that exceed the path Maximum Transmission
Unit (MTU)." I am not at all convinced that this is true, especially
for TCP which has been optimized both in the protocol and
implementation for segment size to equal Path MTU. In any case, I
think this is unnecessary discussion in the draft-- the motivation for
increasing the size of the fragment identifier is that the identifier
is too small for high speed networks. Quantifying "too small" would be
good here: 16 bits in IPv4 is obviously a problem, but what are the
conditions for which 32 bits IPv6 identification is too small?

"Index/P/S             a control octet that identifies the components
of an IP Parcel [I-D.templin-intarea-parcels]"

This creates a dependency on a much larger draft. I suggest just
reserve these bits and define them in IP parcels as an update to this
draft.

"The Extended Fragment Header is included in a Per-Fragment
Destination Options Header following the Hop-by-Hop Options (if
present) but before the Routing Header (if present)"

I don't see how this can work if a Routing Header is present. If the
fragment option is in DestOpts before the Routing Header that implies
that each segment would need to do reassembly since only the
reassembled packet contains the Routing Header. It's actually worse
than that, because the first hop would reassemble the packet and then
try to forward the packet, but the reassembled packet is likely to
exceed the MTU so the packet can't be forwarded and the node can't
fragment the packet again because it's not the source of the packet.
Fragmentation really needs to be done after the Routing Header which
is the recommended ordering in RFC8200 for Frag header.

Congestion and packet loss management, fragment retransmission,
capabilities negotiation suggested by probing, and fragment
acknowledgments all fall under the auspices of the transport layer. If
we're introducing these in the network layer then I think there needs
to be more depth in the description and consideration of transport
layer requirements.

As an example, consider the interaction with TCP slow start. When a
host starts sending to a destination is it allowed to immediately send
packets composed of 64 fragments? If it does that, the sender is
basically bypassing the Slow Start and isn't being very TCP friendly.
Even if fragmentation provides some performance benefit to the source
host in this case, it may be getting that benefit at the expense of
others. When we look at the performance of a protocol we really need
to consider the effects on the network as a whole, not just at the
endpoints of communication.

Also, it's not clear to me what the application is for these transport
layer aspects. For instance, we know running two independent
congestion control loops for the same packet wreaks havoc on the upper
protocol which in this case is TCP (high variances, unnecessary
retransmission, etc.). I don't believe transport layer aspects of
fragmentation are useful with TCP or QUIC, do you have a use case in
mind for these?

Tom

On Mon, Dec 11, 2023 at 1:13 PM Templin (US), Fred L
<Fred.L.Templin@boeing.com> wrote:
>
> Tom et al, there have been some significant changes to the draft that bring it more
> in line with both the comments on the list and some of my other writings. I think it
> may be worth another look now if you have time and energy.
>
> Thanks - Fred
>
> > -----Original Message-----
> > From: ipv6 <ipv6-bounces@ietf.org> On Behalf Of Templin (US), Fred L
> > Sent: Friday, December 08, 2023 11:01 AM
> > To: Tom Herbert <tom@herbertland.com>
> > Cc: Christian Huitema <huitema@huitema.net>; IPv6 List <ipv6@ietf.org>
> > Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
> >
> > Tom, the service backs off during periods of congestive loss and can resume a more
> > aggressive profile when congestion subsides - the service is therefore adaptive. And,
> > the service is verified to improve performance for TCP and generic UDP as shown in
> > the iperf3 graphs in my Intarea charts. In fact, TCP does best of all.
> >
> > Thank you - Fred
> >
> > > -----Original Message-----
> > > From: Tom Herbert <tom@herbertland.com>
> > > Sent: Friday, December 08, 2023 9:12 AM
> > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > Cc: Christian Huitema <huitema@huitema.net>; IPv6 List <ipv6@ietf.org>
> > > Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
> > >
> > > On Fri, Dec 8, 2023 at 7:37 AM Templin (US), Fred L
> > > <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > > Christian, I am working with the DTN LTP over UDP transport, and what I have found is
> > > > the performance is increased only by increasing the segment size even if that size exceeds
> > > > the path MTU. I have shown performance increases with segment sizes all the way up to
> > > > 64KB even over 1500B path MTUs, and I believe that still larger segment sizes (over paths
> > > > with sufficient MTUs) would do even better. This was also a well-known characteristic of
> > > > NFS over UDP back in the early days, and I believe we will find other transports today that
> > > > would benefit from larger packets.
> > > >
> > > > I have tried many ways to apply the "conventional wisdom" you have expressed to LTP/UDP
> > > > but have seen no appreciable performance increases using those methods. I tried using
> > > > sendmmsg()/recvmmsg() and they did nothing to improve performance. I then implemented
> > > > GSO/GRO and again the performance increase if any was minimal. I even implemented a
> > > > first pass at IP parcels and sent 64KB parcels with ~1500B segments over an OMNI interface
> > > > and that did give some minor performance increase due to the reduction in header
> > > > overhead but nothing within the realm of simply sending larger packets where the
> > > > performance increases were multiplicative.
> > > >
> > > > I object to categorizing this as a transport issue - this is an Internetworking issue where
> > > > large packet sizes currently are not well supported especially when they exceed the path
> > > > MTU. I believe many transports will benefit from using larger packets, and that a robust
> > > > fragmentation and reassembly service is essential for performance maximization in the
> > > > Internet, and my drafts clearly explain why that is so.
> > >
> > > Fred,
> > >
> > > For transport protocols dealing with segments the interaction with
> > > fragmentation can't be ignored. Consider if there is a 1% packet loss
> > > in a path for a flow. If one segment equals one path MTU (no
> > > fragmentation), then 1% of segments are dropped, If one segment equals
> > > two MTUs with fragmentation then 2% of the segments are dropped, if
> > > one segment equals four MTUs then 4% are dropped, If one segment
> > > equals 32 MTUs then 32% of segments are dropped. Dropped segments need
> > > to be retransmitted and those retransmitted segments are subject to
> > > packet loss also so the goodput for the connection can quickly drop
> > > off a cliff when using fragmentation. As I mentioned this is
> > > exacerbated by the fact that the fragments themselves can be the
> > > source of congestion causing packet loss in the network.
> > >
> > > I think your argument that fragmentation is essential to the Internet
> > > would be stronger if you can show why packet loss isn't a big problem
> > > for transport protocols that use segments as the unit of congestion
> > > control and retransmission. Also, your focus for analysis seems to be
> > > on LTP, but if you want to make a general argument that fragmentation
> > > is essential for the whole Internet I suggest showing how TCP and QUIC
> > > behave when their segments are fragmented with varying amounts of
> > > packet loss in the path.
> > >
> > > Tom
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > Fred
> > > >
> > > > > -----Original Message-----
> > > > > From: Christian Huitema <huitema@huitema.net>
> > > > > Sent: Thursday, December 07, 2023 3:59 PM
> > > > > To: Tom Herbert <tom@herbertland.com>; Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > > Cc: IPv6 List <ipv6@ietf.org>
> > > > > Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
> > > > >
> > > > > On 12/7/2023 11:51 AM, Tom Herbert wrote:
> > > > > > On Thu, Dec 7, 2023 at 7:58 AM Templin (US), Fred L
> > > > > > <Fred.L.Templin=40boeing.com@dmarc.ietf.org>  wrote:
> > > > > >> Tom, to the point on performance:
> > > > > >>
> > > > > >>> Please provide references to these studies. Also, note IP
> > > > > >>> fragmentation is only one possibility, PMTUD and transport layer
> > > > > >>> segmentation is another and that latter seems more prevalent.
> > > > > >> If by transport layer segmentation you mean GSO/GRO, it is not the same thing
> > > > > >> as IP fragmentation at all. GSO/GRO provide a means for the application of the
> > > > > >> source to transfer a block of data containing multiple MTU- or smaller-sized
> > > > > >> segments to the kernel in a single system call, then the kernel breaks the
> > > > > >> segments out into individual packets that are all no larger than the path MTU
> > > > > >> and sends them to the destination. The destination kernel then gathers them
> > > > > >> up and posts them to the local application in a reassembled buffer possibly
> > > > > >> as large as that used by the original source. But, if some packets are lost,
> > > > > >> the destination kernel instead sends up what it has gathered so far which
> > > > > >> may be less than the block used by the original source.
> > > > > >>
> > > > > >> IP fragmentation is very different and operates on a single large transport
> > > > > >> layer segment instead of multiple smaller ones. And, the studies I am referring
> > > > > >> to show that performance was most positively affected by increasing the
> > > > > >> segment size even to larger than the path MTU. I implemented GSO/GRO
> > > > > >> in the ion-dtn LTP/UDP implementation and noted that the performance
> > > > > >> increase I saw was very minor and related to more efficient packaging
> > > > > >> and not a system call bottleneck. Conversely, when I increased the segment
> > > > > >> sizes to larger than the path MTU and intentionally invoked IP fragmentation
> > > > > >> the performance increase was dramatic. You can see this in the charts I
> > > > > >> showed at IETF118 intarea here:
> > > > > >>
> > > > > >> https://datatracker.ietf.org/meeting/118/materials/slides-118-intarea-identification-extension-for-the-internet-protocol-00
> > > > >
> > > > > I don't doubt your experience, but this is not what we saw with QUIC. In
> > > > > the early stages of QUIC development, the performance were gated by the
> > > > > cost of the UDP socket API. I have benchmarks showing that sendmsg was
> > > > > accounting for 70 to 80% of CPU on sender side. Using GSO was key to
> > > > > lowering that, with one single call to sendmsg for 64K worth of data.
> > > > >
> > > > >
> > > > > >> Again, GSO/GRO address performance limitations of the application/kernel
> > > > > >> system call interface which seems to have a positive performance effect for
> > > > > >> some applications. But, IP fragmentation addresses a performance limitation
> > > > > >> of transport layer protocols in allowing the transport protocol to use larger
> > > > > >> segment sizes and therefore have fewer segments to deal with.
> > > > >
> > > > > At the cost of very inefficient error correction, repeating 64K bytes if
> > > > > 1500 bytes are lost. The processing cost of retransmissions with
> > > > > selective acknowledgement is not large, it hardly shows in the flame
> > > > > graphs. Also, the next more important cost after sendmsg/recvmsg is the
> > > > > cost of encryption. If the application had to resend 64KB, it also has
> > > > > to encrypt 64KB again, and that costs more than re-encrypting 1500B.
> > > > > Given that, I am not sure that for QUIC we would see a lower CPU by
> > > > > delegating fragmentation to the IP stack.
> > > > >
> > > > > That does not mean that larger packets would not result in lower CPU
> > > > > load. It would, but only if the larger packet size did not involve
> > > > > fragmentation, reassembly, and the overhead caused by the occasional
> > > > > loss of a fragment.
> > > > >
> > > > > > Hi Fred,
> > > > > >
> > > > > > Fewer segments, but NOT fewer packets. The net amount of work in the
> > > > > > system is unchanged when sending larger segments instead of smaller so
> > > > > > there won't be any material performance differences other than maybe
> > > > > > implementation effects at the host and no effect at routers. Segments
> > > > > > are the unit of congestion management and retransmission in a
> > > > > > transport protocol, but fragments are transparent to the transport
> > > > > > protocol-- this distinction can cause material issues in performance.
> > > > > >
> > > > > > It's pretty easy to see why this is. Consider that the minimum number
> > > > > > of segments for a connection would be to use 64K segments and fragment
> > > > > > them. For a 1500 MTU one segment then would be sent in 43 fragments.
> > > > > > The problem is that if just one fragment is dropped in a segment then
> > > > > > the whole segment is retransmitted. Furthermore, the fragments
> > > > > > themselves are likely to be the cause of the congestion at routers. So
> > > > > > there is a high likelihood of creating congestion in the network and
> > > > > > needing a lot of retransmissions. Even if CWND goes to one, each
> > > > > > connection can still send 43 packets and SACKs don't help because
> > > > > > there's no granularity at 64K segments so congestion control really
> > > > > > wouldn't be effective. The net effect is likely to be very poor TCP
> > > > > > performance.
> > > > >
> > > > > Yes. That's actually a known issue with GSO, and why GSO is typically
> > > > > limited to no more than 64K. If the sender does not implement some form
> > > > > of pacing, the segments will be sent back to back, causing short peaks
> > > > > of traffic that can cause queues to fill up and overflow. But it is
> > > > > difficult to delegate this pacing to the kernel, because the API only
> > > > > expresses the pacing in "milliseconds between packets". Segmentation in
> > > > > the kernel or the drivers would have the same issues.
> > > > >
> > > > > > While I think there might be some incidental positive performance
> > > > > > effects in host implementation by using fragmentation, I really don't
> > > > > > see how it addresses any fundamental performance limitation in a
> > > > > > transport layer protocol like TCP. In fact, I don't see how IP
> > > > > > fragmentation could possibly be better than doing PMTUD with SACKs
> > > > > > especially on the Internet.
> > > > >
> > > > > Yet another issue is that Fred is not the only one with that particular
> > > > > bad idea. The UDP options defined in TSVWG include a
> > > > > sgementation/fragmentation option that looks very similar. The two bad
> > > > > ideas would probably have to be reconciled in a single bad idea.
> > > > >
> > > > > In any case, Fred is making arguments related to transport, which means
> > > > > this draft ought to be discussed in TSVWG.
> > > > >
> > > > > -- Christian Huitema
> > > > >
> > > > >
> > > > >
> > > >
> >
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@ietf.org
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------