Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt

Tom Herbert <tom@herbertland.com> Tue, 12 December 2023 23:48 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80FDDC14F5FA for <ipv6@ietfa.amsl.com>; Tue, 12 Dec 2023 15:48:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9WWX8Ax3xwV9 for <ipv6@ietfa.amsl.com>; Tue, 12 Dec 2023 15:48:32 -0800 (PST)
Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5C9F5C14F5F8 for <ipv6@ietf.org>; Tue, 12 Dec 2023 15:48:32 -0800 (PST)
Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2c9e9c2989dso85003031fa.0 for <ipv6@ietf.org>; Tue, 12 Dec 2023 15:48:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland.com; s=google; t=1702424910; x=1703029710; darn=ietf.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PMbtSSUNL3fqMUYxpcckZnWfUw0mg0WoOVS73FRHBtI=; b=Y7X4RhwsSghKkJyxRay9JJGlZusR5o8JIEnItPwkamz4KCFBXJKLNSpIt9c0W1FXxd AC0fUOVQxAmEjV8Gs+f8rVU8S+jl2AG8lCLgY/zIAIQ/jfyBCjDA81fnacyOuDyjQ4BB fo8YfyiT8CU+jFBKoTXtRW890fXHDtkgfTocCEaYf+E9GmOITF3BUM2ZGAkhCJlVnYEo Y4dynwWFycd6mQEWlqpJQICnkSIYfi6WVndUlSa0k5O6QMjV1NnXoDJQb+Iv0hRXYHYU tPC5OWKukFCYHm6ILN6ovq4dewcLjpSnbqsH/rQag3vjzbemRzrLiz7P3mE/Hw2TmzjQ /3NQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702424910; x=1703029710; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PMbtSSUNL3fqMUYxpcckZnWfUw0mg0WoOVS73FRHBtI=; b=oY5fq0RjYbrJtwKFok6sTao6O8+SlWPr54BCg19rtqlRGu7lsJckUQva0kLbUUhAqN eB0/tRBeqPrgagCu3AeA+Bir0O/w8WZAAPuLvaVkIzBeLZriCuQUL5Q4x4TlsUZq4Tnw ba7Hux5pDwzW1gvi7kts1939yBg1Xoqxc09GX7z6zkkAKSrHKhNd7r4Xl26ev0VzpmSh ndnOY7Ae4PMoap1LCzJffQYLPw+g2MiyjJzAOcg523Pb4yGe9Dax0tKYKmVIMlkPTewS YhAgeW/UWk0l5l8U16dei4P2scMjTgjnT1MvsMMRbVi158HZ+kXc0jYwfApKzW7a1vnD Q5mQ==
X-Gm-Message-State: AOJu0YzaJFga7kLM7niTCDczPH0yxCOmIl7Q6aGunHeWKpuvW6VcIkN2 7rMpboGWfmEj6GvwO8aSXWbgFX5ha1nGDodm0aFWJA==
X-Google-Smtp-Source: AGHT+IH5Z/OPJLoKdONMiBOLzrp7sQV+yDANc7u/nd4o7MQ6HrQEwrXkg9iBRGJqFT1iNRF1DBnzoksSktexdy/mxxs=
X-Received: by 2002:ac2:5b91:0:b0:50c:7:ee96 with SMTP id o17-20020ac25b91000000b0050c0007ee96mr3060328lfn.139.1702424909906; Tue, 12 Dec 2023 15:48:29 -0800 (PST)
MIME-Version: 1.0
References: <13091d25c5874d5ba27b2de77d337646@boeing.com> <CALx6S371iasRTW+gzjgCPT1BY-KxZZau2Fu3qGYnoHpiu3o9tQ@mail.gmail.com> <BN0P110MB14205F118B67DD0225A18634A38BA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM> <CALx6S36TZqh9h4aZ-o5gkY5Hp1Md2w5gPwpyO4weWeVwqXC5yQ@mail.gmail.com> <c0d3f33b-1193-470a-9f72-2c39dcbacb4f@huitema.net> <BN0P110MB1420A66D481B00EF33487E36A38AA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM> <CALx6S36KyazyS3d6GkkvTJr=s9SnA7WT_RJpEztLjmnrQNMrRQ@mail.gmail.com> <BN0P110MB1420BABB15276F252600F998A38AA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM> <BN0P110MB142015F7908AF483AB8E5EF2A38FA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM>
In-Reply-To: <BN0P110MB142015F7908AF483AB8E5EF2A38FA@BN0P110MB1420.NAMP110.PROD.OUTLOOK.COM>
From: Tom Herbert <tom@herbertland.com>
Date: Tue, 12 Dec 2023 15:48:18 -0800
Message-ID: <CALx6S358SfLxD-ERaXtUhHGo=R+ZvnB5-Tv0-Thhcvk96bvGpA@mail.gmail.com>
To: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>
Cc: Christian Huitema <huitema@huitema.net>, IPv6 List <ipv6@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/eByGtU9_3WZ6DBXTi8c_HDtXf2A>
Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Dec 2023 23:48:36 -0000

On Mon, Dec 11, 2023 at 1:13 PM Templin (US), Fred L
<Fred.L.Templin@boeing.com> wrote:
>
> Tom et al, there have been some significant changes to the draft that bring it more
> in line with both the comments on the list and some of my other writings. I think it
> may be worth another look now if you have time and energy.

Fred,

>From the draft: "However, the 4-octet (32-bit) Identification field of
the Fragment Header may be too small to ensure reassembly integrity at
sufficiently high data rates, especially when the source resets the
starting sequence number often to maintain an unpredictable profile
[RFC7739]."

I might be missing something, but I'm not sure I see that this is a
practical problem. AFAICT generating collision in the identification
values is not easy with a reasonable generator (simple per destination
counter or invertible hash function).

For instance, suppose we connect to hosts with 1Tbit connections over
a network and one side is doing nothing but sourcing fragmented
packets to the other (we're no where close to this configuration
today, it's probably ten years off) . Since 1280 is the minimum MTU,
at most 98 million unique fragment identifiers are needed per second.
The default reassembly timeout in Linux is 30 seconds, so by this
logic less three billion values can be outstanding and wraparound
wouldn't happen. This is the most extreme case I could think of for
burning through the number space. Note, this logic wouldn't hold up
for IPv4 since the minimum MTU is smaller.

Tom

>
> Thanks - Fred
>
> > -----Original Message-----
> > From: ipv6 <ipv6-bounces@ietf.org> On Behalf Of Templin (US), Fred L
> > Sent: Friday, December 08, 2023 11:01 AM
> > To: Tom Herbert <tom@herbertland.com>
> > Cc: Christian Huitema <huitema@huitema.net>; IPv6 List <ipv6@ietf.org>
> > Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
> >
> > Tom, the service backs off during periods of congestive loss and can resume a more
> > aggressive profile when congestion subsides - the service is therefore adaptive. And,
> > the service is verified to improve performance for TCP and generic UDP as shown in
> > the iperf3 graphs in my Intarea charts. In fact, TCP does best of all.
> >
> > Thank you - Fred
> >
> > > -----Original Message-----
> > > From: Tom Herbert <tom@herbertland.com>
> > > Sent: Friday, December 08, 2023 9:12 AM
> > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > Cc: Christian Huitema <huitema@huitema.net>; IPv6 List <ipv6@ietf.org>
> > > Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
> > >
> > > On Fri, Dec 8, 2023 at 7:37 AM Templin (US), Fred L
> > > <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > > Christian, I am working with the DTN LTP over UDP transport, and what I have found is
> > > > the performance is increased only by increasing the segment size even if that size exceeds
> > > > the path MTU. I have shown performance increases with segment sizes all the way up to
> > > > 64KB even over 1500B path MTUs, and I believe that still larger segment sizes (over paths
> > > > with sufficient MTUs) would do even better. This was also a well-known characteristic of
> > > > NFS over UDP back in the early days, and I believe we will find other transports today that
> > > > would benefit from larger packets.
> > > >
> > > > I have tried many ways to apply the "conventional wisdom" you have expressed to LTP/UDP
> > > > but have seen no appreciable performance increases using those methods. I tried using
> > > > sendmmsg()/recvmmsg() and they did nothing to improve performance. I then implemented
> > > > GSO/GRO and again the performance increase if any was minimal. I even implemented a
> > > > first pass at IP parcels and sent 64KB parcels with ~1500B segments over an OMNI interface
> > > > and that did give some minor performance increase due to the reduction in header
> > > > overhead but nothing within the realm of simply sending larger packets where the
> > > > performance increases were multiplicative.
> > > >
> > > > I object to categorizing this as a transport issue - this is an Internetworking issue where
> > > > large packet sizes currently are not well supported especially when they exceed the path
> > > > MTU. I believe many transports will benefit from using larger packets, and that a robust
> > > > fragmentation and reassembly service is essential for performance maximization in the
> > > > Internet, and my drafts clearly explain why that is so.
> > >
> > > Fred,
> > >
> > > For transport protocols dealing with segments the interaction with
> > > fragmentation can't be ignored. Consider if there is a 1% packet loss
> > > in a path for a flow. If one segment equals one path MTU (no
> > > fragmentation), then 1% of segments are dropped, If one segment equals
> > > two MTUs with fragmentation then 2% of the segments are dropped, if
> > > one segment equals four MTUs then 4% are dropped, If one segment
> > > equals 32 MTUs then 32% of segments are dropped. Dropped segments need
> > > to be retransmitted and those retransmitted segments are subject to
> > > packet loss also so the goodput for the connection can quickly drop
> > > off a cliff when using fragmentation. As I mentioned this is
> > > exacerbated by the fact that the fragments themselves can be the
> > > source of congestion causing packet loss in the network.
> > >
> > > I think your argument that fragmentation is essential to the Internet
> > > would be stronger if you can show why packet loss isn't a big problem
> > > for transport protocols that use segments as the unit of congestion
> > > control and retransmission. Also, your focus for analysis seems to be
> > > on LTP, but if you want to make a general argument that fragmentation
> > > is essential for the whole Internet I suggest showing how TCP and QUIC
> > > behave when their segments are fragmented with varying amounts of
> > > packet loss in the path.
> > >
> > > Tom
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > Fred
> > > >
> > > > > -----Original Message-----
> > > > > From: Christian Huitema <huitema@huitema.net>
> > > > > Sent: Thursday, December 07, 2023 3:59 PM
> > > > > To: Tom Herbert <tom@herbertland.com>; Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > > Cc: IPv6 List <ipv6@ietf.org>
> > > > > Subject: Re: [IPv6] FW: I-D Action: draft-templin-6man-ipid-ext-00.txt
> > > > >
> > > > > On 12/7/2023 11:51 AM, Tom Herbert wrote:
> > > > > > On Thu, Dec 7, 2023 at 7:58 AM Templin (US), Fred L
> > > > > > <Fred.L.Templin=40boeing.com@dmarc.ietf.org>  wrote:
> > > > > >> Tom, to the point on performance:
> > > > > >>
> > > > > >>> Please provide references to these studies. Also, note IP
> > > > > >>> fragmentation is only one possibility, PMTUD and transport layer
> > > > > >>> segmentation is another and that latter seems more prevalent.
> > > > > >> If by transport layer segmentation you mean GSO/GRO, it is not the same thing
> > > > > >> as IP fragmentation at all. GSO/GRO provide a means for the application of the
> > > > > >> source to transfer a block of data containing multiple MTU- or smaller-sized
> > > > > >> segments to the kernel in a single system call, then the kernel breaks the
> > > > > >> segments out into individual packets that are all no larger than the path MTU
> > > > > >> and sends them to the destination. The destination kernel then gathers them
> > > > > >> up and posts them to the local application in a reassembled buffer possibly
> > > > > >> as large as that used by the original source. But, if some packets are lost,
> > > > > >> the destination kernel instead sends up what it has gathered so far which
> > > > > >> may be less than the block used by the original source.
> > > > > >>
> > > > > >> IP fragmentation is very different and operates on a single large transport
> > > > > >> layer segment instead of multiple smaller ones. And, the studies I am referring
> > > > > >> to show that performance was most positively affected by increasing the
> > > > > >> segment size even to larger than the path MTU. I implemented GSO/GRO
> > > > > >> in the ion-dtn LTP/UDP implementation and noted that the performance
> > > > > >> increase I saw was very minor and related to more efficient packaging
> > > > > >> and not a system call bottleneck. Conversely, when I increased the segment
> > > > > >> sizes to larger than the path MTU and intentionally invoked IP fragmentation
> > > > > >> the performance increase was dramatic. You can see this in the charts I
> > > > > >> showed at IETF118 intarea here:
> > > > > >>
> > > > > >> https://datatracker.ietf.org/meeting/118/materials/slides-118-intarea-identification-extension-for-the-internet-protocol-00
> > > > >
> > > > > I don't doubt your experience, but this is not what we saw with QUIC. In
> > > > > the early stages of QUIC development, the performance were gated by the
> > > > > cost of the UDP socket API. I have benchmarks showing that sendmsg was
> > > > > accounting for 70 to 80% of CPU on sender side. Using GSO was key to
> > > > > lowering that, with one single call to sendmsg for 64K worth of data.
> > > > >
> > > > >
> > > > > >> Again, GSO/GRO address performance limitations of the application/kernel
> > > > > >> system call interface which seems to have a positive performance effect for
> > > > > >> some applications. But, IP fragmentation addresses a performance limitation
> > > > > >> of transport layer protocols in allowing the transport protocol to use larger
> > > > > >> segment sizes and therefore have fewer segments to deal with.
> > > > >
> > > > > At the cost of very inefficient error correction, repeating 64K bytes if
> > > > > 1500 bytes are lost. The processing cost of retransmissions with
> > > > > selective acknowledgement is not large, it hardly shows in the flame
> > > > > graphs. Also, the next more important cost after sendmsg/recvmsg is the
> > > > > cost of encryption. If the application had to resend 64KB, it also has
> > > > > to encrypt 64KB again, and that costs more than re-encrypting 1500B.
> > > > > Given that, I am not sure that for QUIC we would see a lower CPU by
> > > > > delegating fragmentation to the IP stack.
> > > > >
> > > > > That does not mean that larger packets would not result in lower CPU
> > > > > load. It would, but only if the larger packet size did not involve
> > > > > fragmentation, reassembly, and the overhead caused by the occasional
> > > > > loss of a fragment.
> > > > >
> > > > > > Hi Fred,
> > > > > >
> > > > > > Fewer segments, but NOT fewer packets. The net amount of work in the
> > > > > > system is unchanged when sending larger segments instead of smaller so
> > > > > > there won't be any material performance differences other than maybe
> > > > > > implementation effects at the host and no effect at routers. Segments
> > > > > > are the unit of congestion management and retransmission in a
> > > > > > transport protocol, but fragments are transparent to the transport
> > > > > > protocol-- this distinction can cause material issues in performance.
> > > > > >
> > > > > > It's pretty easy to see why this is. Consider that the minimum number
> > > > > > of segments for a connection would be to use 64K segments and fragment
> > > > > > them. For a 1500 MTU one segment then would be sent in 43 fragments.
> > > > > > The problem is that if just one fragment is dropped in a segment then
> > > > > > the whole segment is retransmitted. Furthermore, the fragments
> > > > > > themselves are likely to be the cause of the congestion at routers. So
> > > > > > there is a high likelihood of creating congestion in the network and
> > > > > > needing a lot of retransmissions. Even if CWND goes to one, each
> > > > > > connection can still send 43 packets and SACKs don't help because
> > > > > > there's no granularity at 64K segments so congestion control really
> > > > > > wouldn't be effective. The net effect is likely to be very poor TCP
> > > > > > performance.
> > > > >
> > > > > Yes. That's actually a known issue with GSO, and why GSO is typically
> > > > > limited to no more than 64K. If the sender does not implement some form
> > > > > of pacing, the segments will be sent back to back, causing short peaks
> > > > > of traffic that can cause queues to fill up and overflow. But it is
> > > > > difficult to delegate this pacing to the kernel, because the API only
> > > > > expresses the pacing in "milliseconds between packets". Segmentation in
> > > > > the kernel or the drivers would have the same issues.
> > > > >
> > > > > > While I think there might be some incidental positive performance
> > > > > > effects in host implementation by using fragmentation, I really don't
> > > > > > see how it addresses any fundamental performance limitation in a
> > > > > > transport layer protocol like TCP. In fact, I don't see how IP
> > > > > > fragmentation could possibly be better than doing PMTUD with SACKs
> > > > > > especially on the Internet.
> > > > >
> > > > > Yet another issue is that Fred is not the only one with that particular
> > > > > bad idea. The UDP options defined in TSVWG include a
> > > > > sgementation/fragmentation option that looks very similar. The two bad
> > > > > ideas would probably have to be reconciled in a single bad idea.
> > > > >
> > > > > In any case, Fred is making arguments related to transport, which means
> > > > > this draft ought to be discussed in TSVWG.
> > > > >
> > > > > -- Christian Huitema
> > > > >
> > > > >
> > > > >
> > > >
> >
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@ietf.org
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------