Re: [Int-area] IP Parcels improves performance for end systems

I do remember token ring.  (I was working from 1983 for folks who 
delivered 50 megabits starting in 1976, and built some of the best FDDI 
around at the time.)

I am not claiming that increasing the MTU from 1500 to 9K did nothing.
I am claiming that diminishing returns has distinctly set in.
If the Data Center folks (who tend these days to have the highest 
demand) really want a 64K link, they would have one.  They don't.  They 
prefer to use Ethernet.
The improvement via increasing the MTU further runs into many obstacles, 
including such issues as error detection code coverage), application 
desired communication size, retransmission costs, and on and on.
Yes, they can all be overcome.   But the returns get smaller and smaller.

So absent real evidence that there is a problem needing the network 
stack and protocol to change, I just don't see this (IP Parcels) as 
providing enough benefit to justify the work.

Yours,
Joel

On 3/24/2022 3:05 PM, Templin (US), Fred L wrote:
> Hi Joel,
> 
>> -----Original Message-----
>> From: Joel M. Halpern [mailto:jmh@joelhalpern.com]
>> Sent: Thursday, March 24, 2022 11:41 AM
>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>> Cc: int-area <int-area@ietf.org>
>> Subject: Re: [Int-area] IP Parcels improves performance for end systems
>>
>> This exchange seems to assume facts not in evidence.
> 
> It is a fact that back in the 1980's the architects took simple token ring,
> changed the over-the-wire coding to 4B/5B, replaced the copper with
> fiber and then boosted the MTU by a factor of 3 and called it FDDI. They
> were able to claim what at the time was an astounding 100Mbps (i.e., in
> comparison to the 10Mbps Ethernet of the day), but the performance
> gain was largely due to the increase in the MTU. They told me: "Fred,
> go figure out the path MTU problem", and they said: "go talk to Jeff
> Mogul out in Palo Alto who knows something about it". But, then, the
> Path MTU discovery group took a left turn at Albuquerque and left the
> Internet as a tiny MTU wasteland. We have the opportunity to fix all
> of that now - so, let's get it right for once.
> 
> Fred
> 
> 
>>
>> And the whole premise is spending resources in other parts of the
>> network for a marginal diminishing return in the hosts.
>>
>> It simply does not add up.
>>
>> Yours,
>> Joel
>>
>> On 3/24/2022 2:19 PM, Templin (US), Fred L wrote:
>>>> The category 1) links are not yet in existence, but once parcels start to
>>>> enter the mainstream innovation will drive the creation of new kinds of
>>>> data links (1TB Ethernet?) that will be rolled out as new hardware.
>>>
>>> I want to put a gold star next to the above. AFAICT, pushing the MTU and
>>> implementing IP parcels can get us to 1TB Ethernet practically overnight.
>>> Back in the 1980's, FDDI proved that pushing to larger MTUs could boost
>>> throughput without changing the speed of light, so why wouldn't the same
>>> concept work for Ethernet in the modern era?
>>>
>>> Fred
>>>
>>>> -----Original Message-----
>>>> From: Int-area [mailto:int-area-bounces@ietf.org] On Behalf Of Templin (US), Fred L
>>>> Sent: Thursday, March 24, 2022 9:45 AM
>>>> To: Tom Herbert <tom@herbertland.com>
>>>> Cc: int-area <int-area@ietf.org>; Eggert, Lars <lars@netapp.com>; lars@eggert.org
>>>> Subject: Re: [Int-area] IP Parcels improves performance for end systems
>>>>
>>>> Hi Tom - responses below:
>>>>
>>>>> -----Original Message-----
>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>> Sent: Thursday, March 24, 2022 9:09 AM
>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area <int-area@ietf.org>; lars@eggert.org
>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end systems
>>>>>
>>>>> On Thu, Mar 24, 2022 at 7:27 AM Templin (US), Fred L
>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>
>>>>>> Tom - see below:
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>> Sent: Thursday, March 24, 2022 6:22 AM
>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area <int-area@ietf.org>; lars@eggert.org
>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end systems
>>>>>>>
>>>>>>> On Wed, Mar 23, 2022 at 10:47 AM Templin (US), Fred L
>>>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>
>>>>>>>> Tom, looks like you have switched over to HTML which can be a real conversation-killer.
>>>>>>>>
>>>>>>>> But, to some points you raised that require a response:
>>>>>>>>
>>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow case of encapsulation).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> That sounds like a good reason to continue to use IPv4 – at least as far as end system
>>>>>>>>
>>>>>>>> addressing is concerned – right?
>>>>>>>
>>>>>>>
>>>>>>> Not at all. All NICs today provide checksum offload and so it's
>>>>>>> basically zero cost to perform the UDP checksum. The fact that we
>>>>>>> don't have to do extra checks on the UDPv6 checksum field to see if
>>>>>>> it's zero actually is a performance improvement over UDPv4. (btw, I
>>>>>>> will present implementation of the Internet checksum at TSVGWG Friday,
>>>>>>> this will include discussion of checksum offloads).
>>>>>>
>>>>>> Actually, my assertion wasn't good to begin with because for IPv6 even if UDP
>>>>>> checksums are turned off the OMNI encapsulation layer includes a checksum
>>>>>> that ensures the integrity of the IPv6 header. UDP checksums off for IPv6 when
>>>>>> OMNI encapsulation is used is perfectly fine.
>>>>>>
>>>>> I assume you are referring to RFC6935 and RFC6936 that allow the UDPv6
>>>>> to be zero for tunneling with a very constrained set of conditions.
>>>>>
>>>>>>>>> If it's a standard per packet Internet checksum then a lot of HW could do it. If it's something like CRC32 then probably not.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The integrity check is covered in RFC5327, and I honestly haven’t had a chance to
>>>>>>>>
>>>>>>>> look at that myself yet.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> LTP is a nice experiment, but I'm more interested as to the interaction between IP parcels and TCP or QUIC.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Please be aware that while LTP may seem obscure at the moment that may be changing now
>>>>>>>>
>>>>>>>> that the core DTN standards have been published. As DTN use becomes more widespread I
>>>>>>>>
>>>>>>>> think we can see LTP also come into wider adoption.
>>>>>>>
>>>>>>>
>>>>>>>    My assumption is that IP parcels is intended to be a general solution
>>>>>>> of all protocols. Maybe in the next draft you could discuss the
>>>>>>> details of TCP in IP parcels including how to offload the TCP
>>>>>>> checksum.
>>>>>>
>>>>>> I could certainly add that. For TCP, each of the concatenated segments would
>>>>>> include its own TCP header with checksum field included. Any hardware that
>>>>>> knows the structure of an IP Parcel can then simply do the TCP checksum
>>>>>> offload function for each segment.
>>>>>
>>>>> To be honest, the odds of ever getting support in NIC hardware for IP
>>>>> parcels are extremely slim. Hardware vendors are driven by economics,
>>>>> so the only way they would do that would be to demonstrate widespread
>>>>> deployment of the protocol. But even then, with all the legacy
>>>>> hardware in deployment it will take many years before there's any
>>>>> appreciable traction. IMO, the better approach is to figure out how to
>>>>> leverage the existing hardware features for use with IP parcels.
>>>>
>>>> There will be two kinds of links that will need to be "Parcel-capable":
>>>> 1) Edge network (physical) links that natively forward large parcels, and
>>>> 2) OMNI (virtual) links that forward parcels using encapsulation and
>>>> fragmentation.
>>>>
>>>> The category 1) links are not yet in existence, but once parcels start to
>>>> enter the mainstream innovation will drive the creation of new kinds of
>>>> data links (1TB Ethernet?) that will be rolled out as new hardware. And
>>>> that new hardware can be made to understand the structure of parcels
>>>> from the beginning. The category 2) links might take a large parcel from
>>>> the upper layers on the local node (or one that has been forwarded by
>>>> a parcel-capable link) and break it down into smaller sub-parcels then
>>>> apply IP fragmentation to each sub-parcel and send the fragments to an
>>>> OMNI link egress node. You know better than me how checksum offload
>>>> could be applied in an environment like that.
>>>>
>>>>>>>>> There was quite a bit of work and discussion on this in Linux. I believe the deviation from the standard was motivated by some
>>>>>>>>
>>>>>>>>> deployed devices required the IPID be set on receive, and setting IPID with DF equals to 1 is thought to be innocuous. You may
>>>>>>>>
>>>>>>>>> want to look at Alex Duyck's papers on UDP GSO, he wrote a lot of code in this area.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> RFC6864 has quite a bit to say about coding IP ID with DF=1 – mostly in the negative.
>>>>>>>>
>>>>>>>> But, what I have seen in the linux code seems to indicate that there is not even any
>>>>>>>>
>>>>>>>> coordination between the GSO source and the GRO destination – instead, GRO simply
>>>>>>>>
>>>>>>>> starts gluing together packets that appear to have consecutive IP IDs without ever first
>>>>>>>>
>>>>>>>> checking that they were sent by a peer that was earnestly doing GSO. These aspects
>>>>>>>>
>>>>>>>> would make it very difficult to work GSO/GRO into an IETF standard, plus it doesn’t
>>>>>>>>
>>>>>>>> work for IPv6 at all where there is no IP ID included by default. IP Parcels addresses
>>>>>>>>
>>>>>>>> all of these points, and can be made into a standard.
>>>>>>>
>>>>>>>
>>>>>>> Huh? GRO/GSO works perfectly fine with IPV6.
>>>>>>
>>>>>> Where is the spec for that? My understanding is that GSO/GRO leverages the
>>>>>> IP ID for IPv4. But, for IPv6, there is no IP ID unless you include a Fragment Header.
>>>>>> Does IPv6 somehow do GSO/GRO differently?
>>>>>>
>>>>>
>>>>> GRO and GSO don't use the IPID to match a flow. The primary match is
>>>>> the TCP 4-tuple.
>>>>
>>>> Correct, the 5-tuple (src-ip, src-port, dst-ip, dst-pot, proto) is what is used
>>>> to match the flow. But, you need more than that in order to correctly paste
>>>> back together with GRO the segments of an original ULP buffer that was
>>>> broken down by GSO - you need Identifications and/or other markings in
>>>> the IP headers to give a reassembly context. Otherwise, GRO might end
>>>> up gluing together old and new pieces of ULP data and/or impart a lot of
>>>> reordering. IP Parcels have well behaved Identifications and Parcel IDs so
>>>> that the original ULP buffer context is honored during reassembly.
>>>>
>>>>> There's also another possibility with IPv6-- use jumbograms. For
>>>>> instance, instead of GRO reassembling segments up to a 64K packet, it
>>>>> could be modified to reassemble up to a 4G packet using IPv6
>>>>> jumbograms where one really big packet is given to the stack.
>>>>>
>>>>> But we probably don't even need jumbograms for that. In Linux, GRO
>>>>> might be taught to reassemble up to 4G super packet and set a flag bit
>>>>> in the skbuf to ignore the IP payload field and get the length from
>>>>> the skbuf len field (as though a jumbogram was received). This trick
>>>>> would work for IPV4 and IPv6 and GSO as well. It should also work TSO
>>>>> if the device takes the IP payload length to be that for each segment.
>>>>
>>>> Yes, I was planning to give that a try to see what kind of performance
>>>> can be gotten with GSO/GRO when you exceed 64KB. But, my concern
>>>> with GSO/GRO is that the reassembly is (relatively) unguided and
>>>> haphazard and can result in mis-ordered concatenations. And, there is
>>>> no protocol by which the GRO receiver can imply that the things it is
>>>> gluing together actually originated from a sender that is earnestly doing
>>>> GSO. So, I do not see how GSO/GRO as I see it in the implementation
>>>> could be made into a standard, whereas there is a clear path for
>>>> standardizing IP parcels.
>>>>
>>>> Another thing I forgot to mention is that in my experiments with GSO/GRO
>>>> I found that it won't let me set a GSO segment size that would cause the
>>>> resulting IP packets to exceed the path MTU (i.e., it won't allow fragmentation).
>>>> I fixed that by configuring IPv4-in-IPv6 encapsulation per RFC2473 and then
>>>> allowed the IPv6 layer to apply fragmentation to the encapsulated packet.
>>>> That way, I can use IPv4 GSO segment sizes up to ~64KB.
>>>>
>>>> Fred
>>>>
>>>>>
>>>>> Tom
>>>>>
>>>>>> Thanks - Fred
>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Fred
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>>> Sent: Wednesday, March 23, 2022 9:37 AM
>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area <int-area@ietf.org>; lars@eggert.org
>>>>>>>> Subject: Re: [EXTERNAL] Re: [Int-area] IP Parcels improves performance for end systems
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> EXT email: be mindful of links/attachments.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 23, 2022, 9:54 AM Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>
>>>>>>>> Hi Tom,
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>>>> Sent: Wednesday, March 23, 2022 6:19 AM
>>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area@ietf.org; lars@eggert.org
>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end systems
>>>>>>>>>
>>>>>>>>> On Tue, Mar 22, 2022 at 10:38 AM Templin (US), Fred L
>>>>>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Tom, see below:
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>>>>>> Sent: Tuesday, March 22, 2022 10:00 AM
>>>>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area@ietf.org
>>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end systems
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 22, 2022 at 7:42 AM Templin (US), Fred L
>>>>>>>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Lars, I did a poor job of answering your question. One of the most important aspects of
>>>>>>>>>>>>
>>>>>>>>>>>> IP Parcels in relation to TSO and GSO/GRO is that transports get to use a full 4MB buffer
>>>>>>>>>>>>
>>>>>>>>>>>> instead of the 64KB limit in current practices. This is possible due to the IP Parcel jumbo
>>>>>>>>>>>>
>>>>>>>>>>>> payload option encapsulation which provides a 32-bit length field instead of just a 16-bit.
>>>>>>>>>>>>
>>>>>>>>>>>> By allowing the transport to present the IP layer with a buffer of up to 4MB, it reduces
>>>>>>>>>>>>
>>>>>>>>>>>> the overhead, minimizes system calls and interrupts, etc.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So, yes, IP Parcels is very much about improving the performance for end systems in
>>>>>>>>>>>>
>>>>>>>>>>>> comparison with current practice (GSO/GRO and TSO).
>>>>>>>>>>>
>>>>>>>>>>> Hi Fred,
>>>>>>>>>>>
>>>>>>>>>>> The nice thing about TSO/GSO/GRO is that they don't require any
>>>>>>>>>>> changes to the protocol as just implementation techniques, also
>>>>>>>>>>> they're one sided opitmizations meaning for instance that TSO can be
>>>>>>>>>>> used at the sender without requiring GRO to be used at the receiver.
>>>>>>>>>>> My understanding is that IP parcels requires new protocol that would
>>>>>>>>>>> need to be implemented on both endpoints and possibly in some routers.
>>>>>>>>>>
>>>>>>>>>> It is not entirely true that the protocol needs to be implemented on both
>>>>>>>>>> endpoints . Sources that send IP Parcels send them into a Parcel-capable path
>>>>>>>>>> which ends at either the final destination or a router for which the next hop is
>>>>>>>>>> not Parcel-capable. If the Parcel-capable path extends all the way to the final
>>>>>>>>>> destination, then the Parcel is delivered to the destination which knows how
>>>>>>>>>> to deal with it. If the Parcel-capable path ends at a router somewhere in the
>>>>>>>>>> middle, the router opens the Parcel and sends each enclosed segment as an
>>>>>>>>>> independent IP packet. The final destination is then free to apply GRO to the
>>>>>>>>>> incoming IP packets even if it does not understand Parcels.
>>>>>>>>>>
>>>>>>>>>> IP Parcels is about efficient shipping and handling just like the major online
>>>>>>>>>> retailer service model I described during the talk. The goal is to deliver the
>>>>>>>>>> fewest and largest possible parcels to the final destination rather than
>>>>>>>>>> delivering lots of small IP packets. It is good for the network and good for
>>>>>>>>>> the end systems both. If this were not true, then Amazon would send the
>>>>>>>>>> consumer 50 small boxes with 1 item each instead of 1 larger box with all
>>>>>>>>>> 50 items inside. And, we all know what they would choose to do.
>>>>>>>>>>
>>>>>>>>>>> Do you have data that shows the benefits of IP Parcels in light of
>>>>>>>>>>> these requirements?
>>>>>>>>>>
>>>>>>>>>> I have data that shows that GSO/GRO is good for packaging sizes up to 64KB
>>>>>>>>>> even if the enclosed segments will require IP fragmentation upon transmission.
>>>>>>>>>> The data implies that even larger packaging sizes (up to a maximum of 4MB)
>>>>>>>>>> would be better still.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Fred,
>>>>>>>>>
>>>>>>>>> You seem to be only looking at the problem from a per packet cost
>>>>>>>>> point of view. There is also per byte cost, particularly in the
>>>>>>>>> computation of the TCP/UDP checksum. The cost is hidden in modern
>>>>>>>>> implementations by checksum offload, and for segmentation offload we
>>>>>>>>> have methods to preserve the utility of checksum offload. IP parcels
>>>>>>>>> will have to also leverage checksum offload, because if the checksum
>>>>>>>>> is not offloaded then the cost of computing the payload checksum in
>>>>>>>>> CPU would dwarf any benefits we'd get by using segments larger than
>>>>>>>>> 64K.
>>>>>>>>
>>>>>>>> There is plenty of opportunity to apply hardware checksum offload since
>>>>>>>> the structure of a Parcel will be very standard. My experiments have been
>>>>>>>> with a protocol called LTP which is layered over UDP/IP as some other
>>>>>>>> upper layer protocols are. LTP includes a segment-by-segment checksum
>>>>>>>> that is used at its level in the absence of lower layer integrity checks, so
>>>>>>>> for larger Parcels LTP would use that and turn off UDP checksums
>>>>>>>> altogether.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow case of encapsulation).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> As far as I am aware, there are currently no hardware
>>>>>>>> checksum offload implementations available for calculating the
>>>>>>>> LTP checksums.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> If it's a standard per packet Internet checksum then a lot of HW could do it. If it's something like CRC32 then probably not.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LTP is a nice experiment, but I'm more interested as to the interaction between IP parcels and TCP or QUIC.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Speaking of standard, AFAICT GSO/GRO are doing something very
>>>>>>>> non-standard. GSO seems to be coding the IP ID field in the IPv4
>>>>>>>> headers of packets with DF=1 which goes against RFC 6864. When
>>>>>>>> DF=1, GSO cannot simply claim the IP ID and code it as if there were
>>>>>>>> some sort of protocol. Or, if it does, there would be no way to
>>>>>>>> standardize it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> There was quite a bit of work and discussion on this in Linux. I believe the deviation from the standard was motivated by some
>>>>> deployed
>>>>>>> devices required the IPID be set on receive, and setting IPID with DF equals to 1 is thought to be innocuous. You may want to look at
>>>>> Alex
>>>>>>> Duyck's papers on UDP GSO, he wrote a lot of code in this area.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Tom
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Fred
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Tom
>>>>>>>>>
>>>>>>>>>> Fred
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Tom
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks - Fred
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Int-area mailing list
>>>>>>>>>>>> Int-area@ietf.org
>>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/int-area
>>>>>>
>>>>
>>>> _______________________________________________
>>>> Int-area mailing list
>>>> Int-area@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/int-area
>>> _______________________________________________
>>> Int-area mailing list
>>> Int-area@ietf.org
>>> https://www.ietf.org/mailman/listinfo/int-area
>