Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt

Tom Herbert <> Fri, 08 December 2023 17:12 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id BAAB4C14F60C for <>; Fri, 8 Dec 2023 09:12:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.106
X-Spam-Status: No, score=-2.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Io91xMDj3etA for <>; Fri, 8 Dec 2023 09:12:11 -0800 (PST)
Received: from ( [IPv6:2a00:1450:4864:20::531]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPS id 1B229C14F605 for <>; Fri, 8 Dec 2023 09:12:10 -0800 (PST)
Received: by with SMTP id 4fb4d7f45d1cf-54bfa9b3ffaso3320004a12.1 for <>; Fri, 08 Dec 2023 09:12:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=google; t=1702055529; x=1702660329;; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=E1Q9E6pVpZF6XFjrjtxoI/NzcsKRs3cyKGta7gwd6Lo=; b=Zkp2hp2KI7wqoLr8AdKtJNXzbTqCQBUXVR/bWD/EnhY/o2Q3h5ihgU4bJzq8R1nRkr iJlLFMZM0nTXlAw7VAMigUlzq/InHkYv1XN6KpGMytqQbYzJTsYw7rL/Z41+KSYdIUJj 9DECBc5hDzUYoIJ8B78NiLyV4M3QDWXTpnj6W3Yfrz2qYaVwr/9le2Hr8x4HQWAzc2cY K1kZqO1sAgOebSuMbkYiq6OyZ4l/mmJrxwI1gGs9dPGNHdhpDzNmd/kxGG3U6wbwdqf4 SClff7TJ9PHAgh92piHayLAbQkcI6OVj/2nBq3mIWL3W4vkT4IxleeBEPVFMrIsX0Tpn hC0g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20230601; t=1702055529; x=1702660329; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E1Q9E6pVpZF6XFjrjtxoI/NzcsKRs3cyKGta7gwd6Lo=; b=lz31J0swyLOkOQtYU1oJ1nSrkV4FauCanDWYUCImlLo3r9ClI9p0/hGDb2w25EI19s ZJexlS5XM98Ic8wJmS6wjekSosQlCjNnmprRyWjkN58pZpiPjr91IEdRkukf6tTUNXqP LQ+izqdp3Nku+09XZm/1GpxPUKKoHQXh3vl9vk1gYPKeIGr/MBtDI45aitl1TI6bHcOM utwTNZoCAKHXcEoktIZkS1EuocfOniYrSS3j9hazcuD2SXp9dy3jDQdnVU9/x8p0qJiu HC2Mo1R8Ij+2mUnACPRxiG+UXd8JNDebdOSIeI77V8td0qAwXsPPpSV/AO6be4468y0e T5Uw==
X-Gm-Message-State: AOJu0Yx5TlMIWgBqb2GclpwxxJURd2CidxGinIKEV8iKgkNc6LplcpDp AemV/qACVPXhz1DqIOyBHhTydojPLFTBYZSmtEwNZ+tWdq3s5Fgj
X-Google-Smtp-Source: AGHT+IFp7roA0Xxdnt9EVXUFeBu+D7vH+QUAP91v0L5jJ2WksNI5W133K1J32sHtOlZrVICVG3LFSjM1PrjoUD/wu2E=
X-Received: by 2002:a50:85cc:0:b0:54f:4812:8cc9 with SMTP id q12-20020a5085cc000000b0054f48128cc9mr253429edh.55.1702055528534; Fri, 08 Dec 2023 09:12:08 -0800 (PST)
MIME-Version: 1.0
References: <> <> <BN0P110MB14205F118B67DD0225A18634A38BA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM> <> <> <BN0P110MB1420A66D481B00EF33487E36A38AA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM>
In-Reply-To: <BN0P110MB1420A66D481B00EF33487E36A38AA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM>
From: Tom Herbert <>
Date: Fri, 08 Dec 2023 09:11:57 -0800
Message-ID: <>
To: "Templin (US), Fred L" <>
Cc: Christian Huitema <>, IPv6 List <>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 08 Dec 2023 17:12:14 -0000

On Fri, Dec 8, 2023 at 7:37 AM Templin (US), Fred L
<> wrote:
> Christian, I am working with the DTN LTP over UDP transport, and what I have found is
> the performance is increased only by increasing the segment size even if that size exceeds
> the path MTU. I have shown performance increases with segment sizes all the way up to
> 64KB even over 1500B path MTUs, and I believe that still larger segment sizes (over paths
> with sufficient MTUs) would do even better. This was also a well-known characteristic of
> NFS over UDP back in the early days, and I believe we will find other transports today that
> would benefit from larger packets.
> I have tried many ways to apply the "conventional wisdom" you have expressed to LTP/UDP
> but have seen no appreciable performance increases using those methods. I tried using
> sendmmsg()/recvmmsg() and they did nothing to improve performance. I then implemented
> GSO/GRO and again the performance increase if any was minimal. I even implemented a
> first pass at IP parcels and sent 64KB parcels with ~1500B segments over an OMNI interface
> and that did give some minor performance increase due to the reduction in header
> overhead but nothing within the realm of simply sending larger packets where the
> performance increases were multiplicative.
> I object to categorizing this as a transport issue - this is an Internetworking issue where
> large packet sizes currently are not well supported especially when they exceed the path
> MTU. I believe many transports will benefit from using larger packets, and that a robust
> fragmentation and reassembly service is essential for performance maximization in the
> Internet, and my drafts clearly explain why that is so.


For transport protocols dealing with segments the interaction with
fragmentation can't be ignored. Consider if there is a 1% packet loss
in a path for a flow. If one segment equals one path MTU (no
fragmentation), then 1% of segments are dropped, If one segment equals
two MTUs with fragmentation then 2% of the segments are dropped, if
one segment equals four MTUs then 4% are dropped, If one segment
equals 32 MTUs then 32% of segments are dropped. Dropped segments need
to be retransmitted and those retransmitted segments are subject to
packet loss also so the goodput for the connection can quickly drop
off a cliff when using fragmentation. As I mentioned this is
exacerbated by the fact that the fragments themselves can be the
source of congestion causing packet loss in the network.

I think your argument that fragmentation is essential to the Internet
would be stronger if you can show why packet loss isn't a big problem
for transport protocols that use segments as the unit of congestion
control and retransmission. Also, your focus for analysis seems to be
on LTP, but if you want to make a general argument that fragmentation
is essential for the whole Internet I suggest showing how TCP and QUIC
behave when their segments are fragmented with varying amounts of
packet loss in the path.


> Fred
> > -----Original Message-----
> > From: Christian Huitema <>
> > Sent: Thursday, December 07, 2023 3:59 PM
> > To: Tom Herbert <>; Templin (US), Fred L <>
> > Cc: IPv6 List <>
> > Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
> >
> > On 12/7/2023 11:51 AM, Tom Herbert wrote:
> > > On Thu, Dec 7, 2023 at 7:58 AM Templin (US), Fred L
> > > <>  wrote:
> > >> Tom, to the point on performance:
> > >>
> > >>> Please provide references to these studies. Also, note IP
> > >>> fragmentation is only one possibility, PMTUD and transport layer
> > >>> segmentation is another and that latter seems more prevalent.
> > >> If by transport layer segmentation you mean GSO/GRO, it is not the same thing
> > >> as IP fragmentation at all. GSO/GRO provide a means for the application of the
> > >> source to transfer a block of data containing multiple MTU- or smaller-sized
> > >> segments to the kernel in a single system call, then the kernel breaks the
> > >> segments out into individual packets that are all no larger than the path MTU
> > >> and sends them to the destination. The destination kernel then gathers them
> > >> up and posts them to the local application in a reassembled buffer possibly
> > >> as large as that used by the original source. But, if some packets are lost,
> > >> the destination kernel instead sends up what it has gathered so far which
> > >> may be less than the block used by the original source.
> > >>
> > >> IP fragmentation is very different and operates on a single large transport
> > >> layer segment instead of multiple smaller ones. And, the studies I am referring
> > >> to show that performance was most positively affected by increasing the
> > >> segment size even to larger than the path MTU. I implemented GSO/GRO
> > >> in the ion-dtn LTP/UDP implementation and noted that the performance
> > >> increase I saw was very minor and related to more efficient packaging
> > >> and not a system call bottleneck. Conversely, when I increased the segment
> > >> sizes to larger than the path MTU and intentionally invoked IP fragmentation
> > >> the performance increase was dramatic. You can see this in the charts I
> > >> showed at IETF118 intarea here:
> > >>
> > >>
> >
> > I don't doubt your experience, but this is not what we saw with QUIC. In
> > the early stages of QUIC development, the performance were gated by the
> > cost of the UDP socket API. I have benchmarks showing that sendmsg was
> > accounting for 70 to 80% of CPU on sender side. Using GSO was key to
> > lowering that, with one single call to sendmsg for 64K worth of data.
> >
> >
> > >> Again, GSO/GRO address performance limitations of the application/kernel
> > >> system call interface which seems to have a positive performance effect for
> > >> some applications. But, IP fragmentation addresses a performance limitation
> > >> of transport layer protocols in allowing the transport protocol to use larger
> > >> segment sizes and therefore have fewer segments to deal with.
> >
> > At the cost of very inefficient error correction, repeating 64K bytes if
> > 1500 bytes are lost. The processing cost of retransmissions with
> > selective acknowledgement is not large, it hardly shows in the flame
> > graphs. Also, the next more important cost after sendmsg/recvmsg is the
> > cost of encryption. If the application had to resend 64KB, it also has
> > to encrypt 64KB again, and that costs more than re-encrypting 1500B.
> > Given that, I am not sure that for QUIC we would see a lower CPU by
> > delegating fragmentation to the IP stack.
> >
> > That does not mean that larger packets would not result in lower CPU
> > load. It would, but only if the larger packet size did not involve
> > fragmentation, reassembly, and the overhead caused by the occasional
> > loss of a fragment.
> >
> > > Hi Fred,
> > >
> > > Fewer segments, but NOT fewer packets. The net amount of work in the
> > > system is unchanged when sending larger segments instead of smaller so
> > > there won't be any material performance differences other than maybe
> > > implementation effects at the host and no effect at routers. Segments
> > > are the unit of congestion management and retransmission in a
> > > transport protocol, but fragments are transparent to the transport
> > > protocol-- this distinction can cause material issues in performance.
> > >
> > > It's pretty easy to see why this is. Consider that the minimum number
> > > of segments for a connection would be to use 64K segments and fragment
> > > them. For a 1500 MTU one segment then would be sent in 43 fragments.
> > > The problem is that if just one fragment is dropped in a segment then
> > > the whole segment is retransmitted. Furthermore, the fragments
> > > themselves are likely to be the cause of the congestion at routers. So
> > > there is a high likelihood of creating congestion in the network and
> > > needing a lot of retransmissions. Even if CWND goes to one, each
> > > connection can still send 43 packets and SACKs don't help because
> > > there's no granularity at 64K segments so congestion control really
> > > wouldn't be effective. The net effect is likely to be very poor TCP
> > > performance.
> >
> > Yes. That's actually a known issue with GSO, and why GSO is typically
> > limited to no more than 64K. If the sender does not implement some form
> > of pacing, the segments will be sent back to back, causing short peaks
> > of traffic that can cause queues to fill up and overflow. But it is
> > difficult to delegate this pacing to the kernel, because the API only
> > expresses the pacing in "milliseconds between packets". Segmentation in
> > the kernel or the drivers would have the same issues.
> >
> > > While I think there might be some incidental positive performance
> > > effects in host implementation by using fragmentation, I really don't
> > > see how it addresses any fundamental performance limitation in a
> > > transport layer protocol like TCP. In fact, I don't see how IP
> > > fragmentation could possibly be better than doing PMTUD with SACKs
> > > especially on the Internet.
> >
> > Yet another issue is that Fred is not the only one with that particular
> > bad idea. The UDP options defined in TSVWG include a
> > sgementation/fragmentation option that looks very similar. The two bad
> > ideas would probably have to be reconciled in a single bad idea.
> >
> > In any case, Fred is making arguments related to transport, which means
> > this draft ought to be discussed in TSVWG.
> >
> > -- Christian Huitema
> >
> >
> >