Re: [Int-area] IP Parcels improves performance for end systems

Dino Farinacci <farinacci@gmail.com> Thu, 24 March 2022 22:08 UTC

Return-Path: <farinacci@gmail.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 37F0E3A0A42 for <int-area@ietfa.amsl.com>; Thu, 24 Mar 2022 15:08:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 52Qx9h7EjJ-Y for <int-area@ietfa.amsl.com>; Thu, 24 Mar 2022 15:08:23 -0700 (PDT)
Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E9D173A0A53 for <int-area@ietf.org>; Thu, 24 Mar 2022 15:08:22 -0700 (PDT)
Received: by mail-pg1-x530.google.com with SMTP id s72so4919976pgc.5 for <int-area@ietf.org>; Thu, 24 Mar 2022 15:08:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=YDSx32aoCP8T0KprX0YqicHsJMhoBl6Gf/wPUbeu1C8=; b=EObOccWjT1Si9Pl8NKy4o0WjyB/7njN9wuT1FIDu+P3jTeIqiwCF0LpZ3wMtHo6PpD tshcoxHmxcCBaALwiyEW5fRkuIU+BM24vPjFeKA0tmHKK33r8IYkAATNfRZWjWTC1ZUa nZZ88s6WulMa4c2YpzapWrjqq6ZkPVOr2lXFeSNzC5JXebgDfO7wl9HpQjQI0pCtsZMa 1TwyiL8bh6wRRnEXsL7kIneIMFSHnNIk2NQuDN/Km63kmACafVdGCQz0UMJdDSzIYNzJ TuWkGFHWbyJxDdawL+vRqcjHhY1Q8OgntWywLQoAadTWvnzJjJslS7sE6kZJ3hrGJMva VHqg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=YDSx32aoCP8T0KprX0YqicHsJMhoBl6Gf/wPUbeu1C8=; b=v9YKjkUY6jQ4w6iX24zUsSeN9TSTO+aqqk9yftpYjBUvfNKpIn9ggDKimzK1pBjwsE 21LaS1toZxubEs2X684gjcZIZfRW20uRCRDsobl9hdyROU2Fr+tN5Bi2o9Ijm3SNLCEO HdqL6DBTsgbPXngN7vdgN93RiBXJDjJGYHF962AgpHEJGXnrj/TvdzRDJDkjD8QENCsx Zu2fv61nc49DojyjiiSuLbolyr6V+qqmMP8LiXrv2CS8ZLjMDLdHP/+6i3C01HSljEe7 6PUTX3LuaNkPpHNXzomJOPpS+BScbUuevmWas1PDJGd6o7bQtsKHPN8S6x0pjK4mlYQT 3FZQ==
X-Gm-Message-State: AOAM533I29Zk0Ws9DlH2NT1BMNYm0VMKDvo9xH7dYcMuSrxefCBIJouD OlEybksjrKNz1KJiO41hCPM=
X-Google-Smtp-Source: ABdhPJxdJpSQMkIEaEcAFQIhUzb3OJi+qFnWoqvG0ScvVhmI4ep2Z/vFBlfz48QFWxZFMsOAw11qKw==
X-Received: by 2002:a63:f923:0:b0:381:31b7:d914 with SMTP id h35-20020a63f923000000b0038131b7d914mr5570097pgi.121.1648159701594; Thu, 24 Mar 2022 15:08:21 -0700 (PDT)
Received: from smtpclient.apple (c-98-234-33-188.hsd1.ca.comcast.net. [98.234.33.188]) by smtp.gmail.com with ESMTPSA id lp4-20020a17090b4a8400b001bedba2df04sm3728228pjb.30.2022.03.24.15.08.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 Mar 2022 15:08:21 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\))
From: Dino Farinacci <farinacci@gmail.com>
In-Reply-To: <e26833ac-ec3f-7088-bd47-42ae16bf478e@joelhalpern.com>
Date: Thu, 24 Mar 2022 15:08:19 -0700
Cc: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>, int-area <int-area@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <2540B9D4-D2CB-4972-8726-7C719C22E3D9@gmail.com>
References: <e43f96d026424f18ab303d2ff82fd0e2@boeing.com> <e3c22441d7374ddf9f10ca50f62b8802@boeing.com> <f46f072f-3464-5f45-09ce-c6e8a1ac769d@joelhalpern.com> <BY3PR13MB4787C6873AD47B791FED18189A199@BY3PR13MB4787.namprd13.prod.outlook.com> <e0408cc1e5554c24820fed7bcc93374d@boeing.com> <BY3PR13MB47876C5647884AEEE7347FC89A199@BY3PR13MB4787.namprd13.prod.outlook.com> <9d4168a611034c169b68e047d32818cc@boeing.com> <e26833ac-ec3f-7088-bd47-42ae16bf478e@joelhalpern.com>
To: "Joel M. Halpern" <jmh@joelhalpern.com>
X-Mailer: Apple Mail (2.3693.60.0.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/FUfhZRLE8_cJDYSbTsaXZKkQBo0>
Subject: Re: [Int-area] IP Parcels improves performance for end systems
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area WG Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Mar 2022 22:08:29 -0000

Right, moving the problem does not fix the problem and changes the cost/benefit ratio as well.

Dino

> On Mar 24, 2022, at 2:30 PM, Joel M. Halpern <jmh@joelhalpern.com> wrote:
> 
> Fundamentally Fred, by not having the host send things in timely pieces you have created work.  Having some other platform do that work does not mean it does not need to get done.  It still does.  And since getting such big pieces costs latency, I can not see how the savings in I/O operations at the host (paid for with I/Os and processing to break things up) make sense as an abstract network model.  As a model for an interface between a host and a smart NIC card?  Maybe.  I will leave that to the NIC card vendors.  Who have been playing all sorts of clever tricks for years without IP needing to get involved.  And with minimal and simple modifications to the host applications.
> 
> I fear we are getting into repeating ourselves.
> 
> Yours,
> Joel
> 
> On 3/24/2022 4:54 PM, Templin (US), Fred L wrote:
>> Again, expect the breaking/reassembling to happen mostly near the edges of the network.
>> And, not necessarily on dedicated router platforms (in fact, probably not on dedicated
>> router platforms). Implications of loss at the IP fragment level are discussed in my
>> recent APNIC article:
>> https://blog.apnic.net/2022/02/18/omni-an-adaptation-layer-for-the-internet/
>> But, in terms of Parcel reassembly, reordering is not a problem and strict reassembly
>> is not required. It is OK if a Parcel that is broken up in transit gets delivered as
>> multiple smaller parcels, and even if some of the segments within a parcel are
>> delivered in a slightly reordered position from the way they were originally
>> transmitted. ULPs have sequence numbers and the like to put the segments
>> back together in the proper order. It all works, and again, it does not impact
>> the vast majority of the deployed base.
>> Fred
>>> -----Original Message-----
>>> From: Haoyu Song [mailto:haoyu.song@futurewei.com]
>>> Sent: Thursday, March 24, 2022 1:42 PM
>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>; Joel M. Halpern <jmh@joelhalpern.com>
>>> Cc: int-area <int-area@ietf.org>
>>> Subject: RE: [Int-area] IP Parcels improves performance for end systems
>>> 
>>> Understood. But some router or whatever will need to do the parcel break and assembly anyway. In high speed network, this is much more
>>> challenging than at the host, due to the buffer, scheduling, packet loss, out-of-order issues mentioned earlier.
>>> 
>>> Haoyu
>>> 
>>> -----Original Message-----
>>> From: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>> Sent: Thursday, March 24, 2022 1:12 PM
>>> To: Haoyu Song <haoyu.song@futurewei.com>; Joel M. Halpern <jmh@joelhalpern.com>
>>> Cc: int-area <int-area@ietf.org>
>>> Subject: RE: [Int-area] IP Parcels improves performance for end systems
>>> 
>>> Hi, no it is not the case that routers deep within the network will be asked to forward jumbos - that is not what we are after. Routers in the
>>> core will continue to forward common-sized IP packets the way they have always done - nothing within that realm needs to change.
>>> 
>>> Where parcels will have a visible footprint is at or near the edges near where the end systems live. Everything else in between will continue
>>> to see plain old IP packets the way they have always done.
>>> 
>>> Thanks - Fred
>>> 
>>>> -----Original Message-----
>>>> From: Haoyu Song [mailto:haoyu.song@futurewei.com]
>>>> Sent: Thursday, March 24, 2022 12:27 PM
>>>> To: Joel M. Halpern <jmh@joelhalpern.com>; Templin (US), Fred L
>>>> <Fred.L.Templin@boeing.com>
>>>> Cc: int-area <int-area@ietf.org>
>>>> Subject: [EXTERNAL] RE: [Int-area] IP Parcels improves performance for
>>>> end systems
>>>> 
>>>> EXT email: be mindful of links/attachments.
>>>> 
>>>> 
>>>> 
>>>> I have the similar concern. The IP parcels make me worried about the
>>>> buffer and scheduling for those huge parcels in network routers (the
>>>> buffer size over bandwidth ratio is becoming smaller and smaller, the
>>>> packet loss/reorder could happen after parcel break in the network, the parcel assembly in network could be even harder than in host ).  Is
>>> there any analysis or evaluation for its impact to the network? How will the routers be upgraded to support IP parcels?
>>>> On the other hand, the NIC becomes more and more smart and powerful, and can efficiently offload a lot of data manipulation functions.
>>>> Do we really want to optimize the host and complicate the network?
>>>> 
>>>> Best regards,
>>>> Haoyu
>>>> 
>>>> -----Original Message-----
>>>> From: Int-area <int-area-bounces@ietf.org> On Behalf Of Joel M.
>>>> Halpern
>>>> Sent: Thursday, March 24, 2022 11:41 AM
>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>> Cc: int-area <int-area@ietf.org>
>>>> Subject: Re: [Int-area] IP Parcels improves performance for end
>>>> systems
>>>> 
>>>> This exchange seems to assume facts not in evidence.
>>>> 
>>>> And the whole premise is spending resources in other parts of the network for a marginal diminishing return in the hosts.
>>>> 
>>>> It simply does not add up.
>>>> 
>>>> Yours,
>>>> Joel
>>>> 
>>>> On 3/24/2022 2:19 PM, Templin (US), Fred L wrote:
>>>>>> The category 1) links are not yet in existence, but once parcels
>>>>>> start to enter the mainstream innovation will drive the creation of
>>>>>> new kinds of data links (1TB Ethernet?) that will be rolled out as new hardware.
>>>>> 
>>>>> I want to put a gold star next to the above. AFAICT, pushing the MTU
>>>>> and implementing IP parcels can get us to 1TB Ethernet practically overnight.
>>>>> Back in the 1980's, FDDI proved that pushing to larger MTUs could
>>>>> boost throughput without changing the speed of light, so why
>>>>> wouldn't the same concept work for Ethernet in the modern era?
>>>>> 
>>>>> Fred
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Int-area [mailto:int-area-bounces@ietf.org] On Behalf Of
>>>>>> Templin (US), Fred L
>>>>>> Sent: Thursday, March 24, 2022 9:45 AM
>>>>>> To: Tom Herbert <tom@herbertland.com>
>>>>>> Cc: int-area <int-area@ietf.org>; Eggert, Lars <lars@netapp.com>;
>>>>>> lars@eggert.org
>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end
>>>>>> systems
>>>>>> 
>>>>>> Hi Tom - responses below:
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>> Sent: Thursday, March 24, 2022 9:09 AM
>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area <int-area@ietf.org>;
>>>>>>> lars@eggert.org
>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end
>>>>>>> systems
>>>>>>> 
>>>>>>> On Thu, Mar 24, 2022 at 7:27 AM Templin (US), Fred L
>>>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>>> 
>>>>>>>> Tom - see below:
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>>>> Sent: Thursday, March 24, 2022 6:22 AM
>>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area
>>>>>>>>> <int-area@ietf.org>; lars@eggert.org
>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end
>>>>>>>>> systems
>>>>>>>>> 
>>>>>>>>> On Wed, Mar 23, 2022 at 10:47 AM Templin (US), Fred L
>>>>>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Tom, looks like you have switched over to HTML which can be a real conversation-killer.
>>>>>>>>>> 
>>>>>>>>>> But, to some points you raised that require a response:
>>>>>>>>>> 
>>>>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow case of encapsulation).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> That sounds like a good reason to continue to use IPv4 - at
>>>>>>>>>> least as far as end system
>>>>>>>>>> 
>>>>>>>>>> addressing is concerned - right?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Not at all. All NICs today provide checksum offload and so it's
>>>>>>>>> basically zero cost to perform the UDP checksum. The fact that
>>>>>>>>> we don't have to do extra checks on the UDPv6 checksum field to
>>>>>>>>> see if it's zero actually is a performance improvement over UDPv4.
>>>>>>>>> (btw, I will present implementation of the Internet checksum at
>>>>>>>>> TSVGWG Friday, this will include discussion of checksum offloads).
>>>>>>>> 
>>>>>>>> Actually, my assertion wasn't good to begin with because for IPv6
>>>>>>>> even if UDP checksums are turned off the OMNI encapsulation layer
>>>>>>>> includes a checksum that ensures the integrity of the IPv6 header.
>>>>>>>> UDP checksums off for IPv6 when OMNI encapsulation is used is perfectly fine.
>>>>>>>> 
>>>>>>> I assume you are referring to RFC6935 and RFC6936 that allow the
>>>>>>> UDPv6 to be zero for tunneling with a very constrained set of conditions.
>>>>>>> 
>>>>>>>>>>> If it's a standard per packet Internet checksum then a lot of HW could do it. If it's something like CRC32 then probably not.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The integrity check is covered in RFC5327, and I honestly
>>>>>>>>>> haven't had a chance to
>>>>>>>>>> 
>>>>>>>>>> look at that myself yet.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> LTP is a nice experiment, but I'm more interested as to the interaction between IP parcels and TCP or QUIC.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Please be aware that while LTP may seem obscure at the moment
>>>>>>>>>> that may be changing now
>>>>>>>>>> 
>>>>>>>>>> that the core DTN standards have been published. As DTN use
>>>>>>>>>> becomes more widespread I
>>>>>>>>>> 
>>>>>>>>>> think we can see LTP also come into wider adoption.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>   My assumption is that IP parcels is intended to be a general
>>>>>>>>> solution of all protocols. Maybe in the next draft you could
>>>>>>>>> discuss the details of TCP in IP parcels including how to
>>>>>>>>> offload the TCP checksum.
>>>>>>>> 
>>>>>>>> I could certainly add that. For TCP, each of the concatenated
>>>>>>>> segments would include its own TCP header with checksum field
>>>>>>>> included. Any hardware that knows the structure of an IP Parcel
>>>>>>>> can then simply do the TCP checksum offload function for each segment.
>>>>>>> 
>>>>>>> To be honest, the odds of ever getting support in NIC hardware for
>>>>>>> IP parcels are extremely slim. Hardware vendors are driven by
>>>>>>> economics, so the only way they would do that would be to
>>>>>>> demonstrate widespread deployment of the protocol. But even then,
>>>>>>> with all the legacy hardware in deployment it will take many years
>>>>>>> before there's any appreciable traction. IMO, the better approach
>>>>>>> is to figure out how to leverage the existing hardware features for use with IP parcels.
>>>>>> 
>>>>>> There will be two kinds of links that will need to be "Parcel-capable":
>>>>>> 1) Edge network (physical) links that natively forward large
>>>>>> parcels, and
>>>>>> 2) OMNI (virtual) links that forward parcels using encapsulation
>>>>>> and fragmentation.
>>>>>> 
>>>>>> The category 1) links are not yet in existence, but once parcels
>>>>>> start to enter the mainstream innovation will drive the creation of
>>>>>> new kinds of data links (1TB Ethernet?) that will be rolled out as
>>>>>> new hardware. And that new hardware can be made to understand the
>>>>>> structure of parcels from the beginning. The category 2) links
>>>>>> might take a large parcel from the upper layers on the local node
>>>>>> (or one that has been forwarded by a parcel-capable link) and break
>>>>>> it down into smaller sub-parcels then apply IP fragmentation to
>>>>>> each sub-parcel and send the fragments to an OMNI link egress node.
>>>>>> You know better than me how checksum offload could be applied in an environment like that.
>>>>>> 
>>>>>>>>>>> There was quite a bit of work and discussion on this in Linux.
>>>>>>>>>>> I believe the deviation from the standard was motivated by
>>>>>>>>>>> some
>>>>>>>>>> 
>>>>>>>>>>> deployed devices required the IPID be set on receive, and
>>>>>>>>>>> setting IPID with DF equals to 1 is thought to be innocuous.
>>>>>>>>>>> You may
>>>>>>>>>> 
>>>>>>>>>>> want to look at Alex Duyck's papers on UDP GSO, he wrote a lot of code in this area.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> RFC6864 has quite a bit to say about coding IP ID with DF=1 - mostly in the negative.
>>>>>>>>>> 
>>>>>>>>>> But, what I have seen in the linux code seems to indicate that
>>>>>>>>>> there is not even any
>>>>>>>>>> 
>>>>>>>>>> coordination between the GSO source and the GRO destination -
>>>>>>>>>> instead, GRO simply
>>>>>>>>>> 
>>>>>>>>>> starts gluing together packets that appear to have consecutive
>>>>>>>>>> IP IDs without ever first
>>>>>>>>>> 
>>>>>>>>>> checking that they were sent by a peer that was earnestly doing
>>>>>>>>>> GSO. These aspects
>>>>>>>>>> 
>>>>>>>>>> would make it very difficult to work GSO/GRO into an IETF
>>>>>>>>>> standard, plus it doesn't
>>>>>>>>>> 
>>>>>>>>>> work for IPv6 at all where there is no IP ID included by default.
>>>>>>>>>> IP Parcels addresses
>>>>>>>>>> 
>>>>>>>>>> all of these points, and can be made into a standard.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Huh? GRO/GSO works perfectly fine with IPV6.
>>>>>>>> 
>>>>>>>> Where is the spec for that? My understanding is that GSO/GRO
>>>>>>>> leverages the IP ID for IPv4. But, for IPv6, there is no IP ID unless you include a Fragment Header.
>>>>>>>> Does IPv6 somehow do GSO/GRO differently?
>>>>>>>> 
>>>>>>> 
>>>>>>> GRO and GSO don't use the IPID to match a flow. The primary match
>>>>>>> is the TCP 4-tuple.
>>>>>> 
>>>>>> Correct, the 5-tuple (src-ip, src-port, dst-ip, dst-pot, proto) is
>>>>>> what is used to match the flow. But, you need more than that in
>>>>>> order to correctly paste back together with GRO the segments of an
>>>>>> original ULP buffer that was broken down by GSO - you need
>>>>>> Identifications and/or other markings in the IP headers to give a reassembly context.
>>>>>> Otherwise, GRO might end up gluing together old and new pieces of
>>>>>> ULP data and/or impart a lot of reordering. IP Parcels have well
>>>>>> behaved Identifications and Parcel IDs so that the original ULP buffer context is honored during reassembly.
>>>>>> 
>>>>>>> There's also another possibility with IPv6-- use jumbograms. For
>>>>>>> instance, instead of GRO reassembling segments up to a 64K packet,
>>>>>>> it could be modified to reassemble up to a 4G packet using IPv6
>>>>>>> jumbograms where one really big packet is given to the stack.
>>>>>>> 
>>>>>>> But we probably don't even need jumbograms for that. In Linux, GRO
>>>>>>> might be taught to reassemble up to 4G super packet and set a flag
>>>>>>> bit in the skbuf to ignore the IP payload field and get the length
>>>>>>> from the skbuf len field (as though a jumbogram was received).
>>>>>>> This trick would work for IPV4 and IPv6 and GSO as well. It should
>>>>>>> also work TSO if the device takes the IP payload length to be that for each segment.
>>>>>> 
>>>>>> Yes, I was planning to give that a try to see what kind of
>>>>>> performance can be gotten with GSO/GRO when you exceed 64KB. But,
>>>>>> my concern with GSO/GRO is that the reassembly is (relatively)
>>>>>> unguided and haphazard and can result in mis-ordered
>>>>>> concatenations. And, there is no protocol by which the GRO receiver
>>>>>> can imply that the things it is gluing together actually originated
>>>>>> from a sender that is earnestly doing GSO. So, I do not see how
>>>>>> GSO/GRO as I see it in the implementation could be made into a
>>>>>> standard, whereas there is a clear path for standardizing IP parcels.
>>>>>> 
>>>>>> Another thing I forgot to mention is that in my experiments with
>>>>>> GSO/GRO I found that it won't let me set a GSO segment size that
>>>>>> would cause the resulting IP packets to exceed the path MTU (i.e., it won't allow fragmentation).
>>>>>> I fixed that by configuring IPv4-in-IPv6 encapsulation per RFC2473
>>>>>> and then allowed the IPv6 layer to apply fragmentation to the encapsulated packet.
>>>>>> That way, I can use IPv4 GSO segment sizes up to ~64KB.
>>>>>> 
>>>>>> Fred
>>>>>> 
>>>>>>> 
>>>>>>> Tom
>>>>>>> 
>>>>>>>> Thanks - Fred
>>>>>>>> 
>>>>>>>>> Tom
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Fred
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>>>>> Sent: Wednesday, March 23, 2022 9:37 AM
>>>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area
>>>>>>>>>> <int-area@ietf.org>; lars@eggert.org
>>>>>>>>>> Subject: Re: [EXTERNAL] Re: [Int-area] IP Parcels improves
>>>>>>>>>> performance for end systems
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> EXT email: be mindful of links/attachments.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 23, 2022, 9:54 AM Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Tom,
>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>>>>>> Sent: Wednesday, March 23, 2022 6:19 AM
>>>>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area@ietf.org;
>>>>>>>>>>> lars@eggert.org
>>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for
>>>>>>>>>>> end systems
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Mar 22, 2022 at 10:38 AM Templin (US), Fred L
>>>>>>>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Tom, see below:
>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
>>>>>>>>>>>>> Sent: Tuesday, March 22, 2022 10:00 AM
>>>>>>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
>>>>>>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area@ietf.org
>>>>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for
>>>>>>>>>>>>> end systems
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Mar 22, 2022 at 7:42 AM Templin (US), Fred L
>>>>>>>>>>>>> <Fred.L.Templin@boeing.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Lars, I did a poor job of answering your question. One of
>>>>>>>>>>>>>> the most important aspects of
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> IP Parcels in relation to TSO and GSO/GRO is that
>>>>>>>>>>>>>> transports get to use a full 4MB buffer
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> instead of the 64KB limit in current practices. This is
>>>>>>>>>>>>>> possible due to the IP Parcel jumbo
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> payload option encapsulation which provides a 32-bit length field instead of just a 16-bit.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> By allowing the transport to present the IP layer with a
>>>>>>>>>>>>>> buffer of up to 4MB, it reduces
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> the overhead, minimizes system calls and interrupts, etc.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So, yes, IP Parcels is very much about improving the
>>>>>>>>>>>>>> performance for end systems in
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> comparison with current practice (GSO/GRO and TSO).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Fred,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The nice thing about TSO/GSO/GRO is that they don't require
>>>>>>>>>>>>> any changes to the protocol as just implementation
>>>>>>>>>>>>> techniques, also they're one sided opitmizations meaning for
>>>>>>>>>>>>> instance that TSO can be used at the sender without requiring GRO to be used at the receiver.
>>>>>>>>>>>>> My understanding is that IP parcels requires new protocol
>>>>>>>>>>>>> that would need to be implemented on both endpoints and possibly in some routers.
>>>>>>>>>>>> 
>>>>>>>>>>>> It is not entirely true that the protocol needs to be
>>>>>>>>>>>> implemented on both endpoints . Sources that send IP Parcels
>>>>>>>>>>>> send them into a Parcel-capable path which ends at either the
>>>>>>>>>>>> final destination or a router for which the next hop is not
>>>>>>>>>>>> Parcel-capable. If the Parcel-capable path extends all the
>>>>>>>>>>>> way to the final destination, then the Parcel is delivered to
>>>>>>>>>>>> the destination which knows how to deal with it. If the
>>>>>>>>>>>> Parcel-capable path ends at a router somewhere in the middle,
>>>>>>>>>>>> the router opens the Parcel and sends each enclosed segment
>>>>>>>>>>>> as an independent IP packet. The final destination is then
>>>>>>>>>>>> free to
>>>> apply GRO to the incoming IP packets even if it does not understand Parcels.
>>>>>>>>>>>> 
>>>>>>>>>>>> IP Parcels is about efficient shipping and handling just like
>>>>>>>>>>>> the major online retailer service model I described during
>>>>>>>>>>>> the talk. The goal is to deliver the fewest and largest
>>>>>>>>>>>> possible parcels to the final destination rather than
>>>>>>>>>>>> delivering lots of small IP packets. It is good for the
>>>>>>>>>>>> network and good for the end systems both. If this were not
>>>>>>>>>>>> true, then Amazon would send the consumer 50 small boxes with
>>>>>>>>>>>> 1 item each instead of 1 larger box with all
>>>>>>>>>>>> 50 items inside. And, we all know what they would choose to do.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Do you have data that shows the benefits of IP Parcels in
>>>>>>>>>>>>> light of these requirements?
>>>>>>>>>>>> 
>>>>>>>>>>>> I have data that shows that GSO/GRO is good for packaging
>>>>>>>>>>>> sizes up to 64KB even if the enclosed segments will require IP fragmentation upon transmission.
>>>>>>>>>>>> The data implies that even larger packaging sizes (up to a
>>>>>>>>>>>> maximum of 4MB) would be better still.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Fred,
>>>>>>>>>>> 
>>>>>>>>>>> You seem to be only looking at the problem from a per packet
>>>>>>>>>>> cost point of view. There is also per byte cost, particularly
>>>>>>>>>>> in the computation of the TCP/UDP checksum. The cost is hidden
>>>>>>>>>>> in modern implementations by checksum offload, and for
>>>>>>>>>>> segmentation offload we have methods to preserve the utility
>>>>>>>>>>> of checksum offload. IP parcels will have to also leverage
>>>>>>>>>>> checksum offload, because if the checksum is not offloaded
>>>>>>>>>>> then the cost of computing the payload checksum in CPU would
>>>>>>>>>>> dwarf any benefits we'd get by using segments larger than 64K.
>>>>>>>>>> 
>>>>>>>>>> There is plenty of opportunity to apply hardware checksum
>>>>>>>>>> offload since the structure of a Parcel will be very standard.
>>>>>>>>>> My experiments have been with a protocol called LTP which is
>>>>>>>>>> layered over UDP/IP as some other upper layer protocols are.
>>>>>>>>>> LTP includes a segment-by-segment checksum that is used at its
>>>>>>>>>> level in the absence of lower layer integrity checks, so for
>>>>>>>>>> larger Parcels LTP would use that and turn off UDP checksums altogether.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow case of encapsulation).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> As far as I am aware, there are currently no hardware checksum
>>>>>>>>>> offload implementations available for calculating the LTP
>>>>>>>>>> checksums.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If it's a standard per packet Internet checksum then a lot of HW could do it. If it's something like CRC32 then probably not.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> LTP is a nice experiment, but I'm more interested as to the interaction between IP parcels and TCP or QUIC.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Speaking of standard, AFAICT GSO/GRO are doing something very
>>>>>>>>>> non-standard. GSO seems to be coding the IP ID field in the
>>>>>>>>>> IPv4 headers of packets with DF=1 which goes against RFC 6864.
>>>>>>>>>> When DF=1, GSO cannot simply claim the IP ID and code it as if
>>>>>>>>>> there were some sort of protocol. Or, if it does, there would
>>>>>>>>>> be no way to standardize it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> There was quite a bit of work and discussion on this in Linux.
>>>>>>>>>> I believe the deviation from the standard was motivated by some
>>>>>>> deployed
>>>>>>>>> devices required the IPID be set on receive, and setting IPID
>>>>>>>>> with DF equals to 1 is thought to be innocuous. You may want to
>>>>>>>>> look at
>>>>>>> Alex
>>>>>>>>> Duyck's papers on UDP GSO, he wrote a lot of code in this area.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Tom
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Fred
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Tom
>>>>>>>>>>> 
>>>>>>>>>>>> Fred
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Tom
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks - Fred
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Int-area mailing list
>>>>>>>>>>>>>> Int-area@ietf.org
>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3
>>>>>>>>>>>>>> A%
>>>>>>>>>>>>>> 2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fint-area&amp;data=
>>>>>>>>>>>>>> 04
>>>>>>>>>>>>>> %7C01%7Chaoyu.song%40futurewei.com%7Cd4e1296f169f4c6e89c208
>>>>>>>>>>>>>> da
>>>>>>>>>>>>>> 0dc5db3a%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C63783
>>>>>>>>>>>>>> 74
>>>>>>>>>>>>>> 40712893274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
>>>>>>>>>>>>>> QI
>>>>>>>>>>>>>> joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=
>>>>>>>>>>>>>> u3
>>>>>>>>>>>>>> I44d091J7Au0YexeDc3ckAHRT37spe7sXGyjQK6gc%3D&amp;reserved=0
>>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Int-area mailing list
>>>>>> Int-area@ietf.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
>>>>>> ww
>>>>>> .ietf.org%2Fmailman%2Flistinfo%2Fint-area&amp;data=04%7C01%7Chaoyu.
>>>>>> so
>>>>>> ng%40futurewei.com%7Cd4e1296f169f4c6e89c208da0dc5db3a%7C0fee8ff2a3b
>>>>>> 24
>>>>>> 0189c753a1d5591fedc%7C1%7C1%7C637837440712893274%7CUnknown%7CTWFpbG
>>>>>> Zs
>>>>>> b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
>>>>>> %3
>>>>>> D%7C3000&amp;sdata=u3I44d091J7Au0YexeDc3ckAHRT37spe7sXGyjQK6gc%3D&a
>>>>>> mp
>>>>>> ;reserved=0
>>>>> _______________________________________________
>>>>> Int-area mailing list
>>>>> Int-area@ietf.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>>>>> ietf.org%2Fmailman%2Flistinfo%2Fint-area&amp;data=04%7C01%7Chaoyu.so
>>>>> ng
>>>>> %40futurewei.com%7Cd4e1296f169f4c6e89c208da0dc5db3a%7C0fee8ff2a3b240
>>>>> 18
>>>>> 9c753a1d5591fedc%7C1%7C1%7C637837440712893274%7CUnknown%7CTWFpbGZsb3
>>>>> d8
>>>>> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>>>>> C3
>>>>> 000&amp;sdata=u3I44d091J7Au0YexeDc3ckAHRT37spe7sXGyjQK6gc%3D&amp;res
>>>>> er
>>>>> ved=0
>>>> 
>>>> _______________________________________________
>>>> Int-area mailing list
>>>> Int-area@ietf.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>>>> ietf.org%2Fmailman%2Flistinfo%2Fint-&amp;data=04%7C01%7Chaoyu.song%40f
>>>> uturewei.com%7C33ae226047e14e8d3bdf08da0dd2a16c%7C0fee8ff2a3b240189c75
>>>> 3a1d5591fedc%7C1%7C0%7C637837495548621685%7CUnknown%7CTWFpbGZsb3d8eyJW
>>>> IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&
>>>> amp;sdata=izbuwb09yAhVCe8JycuTVIaT48Lin8x73Da7V0vrBmk%3D&amp;reserved=
>>>> 0
>>>> area&amp;data=04%7C01%7Chaoyu.song%40futurewei.com%7Cd4e1296f169f4c6e8
>>>> 9c208da0dc5db3a%7C0fee8ff2a3b240189c753a1d55
>>>> 91fedc%7C1%7C1%7C637837440712893274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
>>>> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
>>>> LCJXVCI6Mn0%3D%7C3000&amp;sdata=u3I44d091J7Au0YexeDc3ckAHRT37spe7sXGyj
>>>> QK6gc%3D&amp;reserved=0
> 
> _______________________________________________
> Int-area mailing list
> Int-area@ietf.org
> https://www.ietf.org/mailman/listinfo/int-area