Re: [Int-area] IP Parcels improves performance for end systems

Tom Herbert <tom@herbertland.com> Thu, 24 March 2022 19:51 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF6843A12DB for <int-area@ietfa.amsl.com>; Thu, 24 Mar 2022 12:51:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.905
X-Spam-Level:
X-Spam-Status: No, score=-1.905 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20210112.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HCNxoGjFENlw for <int-area@ietfa.amsl.com>; Thu, 24 Mar 2022 12:51:42 -0700 (PDT)
Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 07BBB3A12BE for <int-area@ietf.org>; Thu, 24 Mar 2022 12:51:41 -0700 (PDT)
Received: by mail-lj1-x22b.google.com with SMTP id 17so7630407lji.1 for <int-area@ietf.org>; Thu, 24 Mar 2022 12:51:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H6q6qoM6D606fzLuzHYSa8M+tiXc1/udVBEp3u8IRr0=; b=kOSMHs8SVTz8Ui+uxpsU6Q2UL/lxmZxwxaTWNlvUSHboPiDh+T2zwpjvV6bUs3HIoN rmzTRrRsw83lQbjYLCFiR+VT3T3cK4tZsYGGPuhw960ICAhzn+vys8Y38qgHq9FiL+a7 1yt5Zkhw6gCsEjrm+ESkLWwOlPiKKm5M8KUBFnYP7gpf6dTGKdF4C82d3z6VzHo7ONlG qo6mjXjJVXuwcpP7el9nbwBwZVZKWYSV8mAU0W/B/GtfC68ldaYvcWbYWgGGelWWl+RF bIdUZWLg5kMXNvJC+xBsSS9EkFg3y1/7ZgtdNL6gvO6PUKe01heGzNphnBhdS1Z9VWTe jSzQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H6q6qoM6D606fzLuzHYSa8M+tiXc1/udVBEp3u8IRr0=; b=kK+tF3qWJ4UKFVFZc4IojdQlf2uhO58AM+OuBUaoZNcZUUUJkZnknjfFIRalDFHFXs GW3tPggxTGmbC6mUEY33dClS1z2Efxu8ZiRBzLK0GosZ+vjCrWGQ48Dv9ahjNuwWeRn8 v3Ksb2cq+mHtdBuUdqD4o3hkQkLb4tLweuE1QFu+5epgrTtFC70mwAOOjdFC5KEyDqvG Tvg+5Ff48r2MRofqIh02XE6M9V2DbYSSx5bC5MFrdYSr+kpGMx0DitJgcy/I/VC10d9r hYT14p1XGQwP3TXjM5EUIgfVR5J/O0mtdwgmwlUjX7hP9RJTc/Mu63wQ5/5payplaHvW SI0A==
X-Gm-Message-State: AOAM533daenjsAlKx9uvqEqO1WIqfe5d+PWZM8URF6RUQJsZ9oNistgg /MT+mQM/xtzcr0PqgBx444lUh0JzONUUbG6TFX6d2w==
X-Google-Smtp-Source: ABdhPJzj4nz0VV8sqIISXZOKDc9ZCrcm82qG+iLhvjqC0mTuMbHfwykoWrdXQwEySon997YeOSN7fMock8gdv+bPcN0=
X-Received: by 2002:a2e:2d11:0:b0:246:3c3e:d544 with SMTP id t17-20020a2e2d11000000b002463c3ed544mr5134243ljt.518.1648151499300; Thu, 24 Mar 2022 12:51:39 -0700 (PDT)
MIME-Version: 1.0
References: <90a1ce8325a448ab81f63c844f98d6a6@boeing.com> <bd1e4a5a-2d09-3875-2135-6f6b6743a9cf@joelhalpern.com>
In-Reply-To: <bd1e4a5a-2d09-3875-2135-6f6b6743a9cf@joelhalpern.com>
From: Tom Herbert <tom@herbertland.com>
Date: Thu, 24 Mar 2022 12:51:27 -0700
Message-ID: <CALx6S37yoA-Bmz0QPZX2_07SeLgZQAnbMUeDbkNNKXCvANeibw@mail.gmail.com>
To: "Joel M. Halpern" <jmh@joelhalpern.com>
Cc: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>, int-area <int-area@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000a6f6f905dafc2cc9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/_XWFbez6c5wcmj91lOgT5widKSM>
Subject: Re: [Int-area] IP Parcels improves performance for end systems
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area WG Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Mar 2022 19:51:48 -0000

On Thu, Mar 24, 2022, 3:11 PM Joel M. Halpern <jmh@joelhalpern.com> wrote:

> I do remember token ring.  (I was working from 1983 for folks who
> delivered 50 megabits starting in 1976, and built some of the best FDDI
> around at the time.)
>
> I am not claiming that increasing the MTU from 1500 to 9K did nothing.
> I am claiming that diminishing returns has distinctly set in.
> If the Data Center folks (who tend these days to have the highest
> demand) really want a 64K link, they would have one.


Joel,

Indeed. Google, at least, is looking into it at least insofar as getting
bigger packets for GRO/GSO.
See https://netdevconf.info/0x15/session.html?BIG-TCP

Tom

They don't.  They
> prefer to use Ethernet.
> The improvement via increasing the MTU further runs into many obstacles,
> including such issues as error detection code coverage), application
> desired communication size, retransmission costs, and on and on.
> Yes, they can all be overcome.   But the returns get smaller and smaller.
>
> So absent real evidence that there is a problem needing the network
> stack and protocol to change, I just don't see this (IP Parcels) as
> providing enough benefit to justify the work.
>
>
> Yours,
> Joel
>
> On 3/24/2022 3:05 PM, Templin (US), Fred L wrote:
> > Hi Joel,
> >
> >> -----Original Message-----
> >> From: Joel M. Halpern [mailto:jmh@joelhalpern.com]
> >> Sent: Thursday, March 24, 2022 11:41 AM
> >> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> >> Cc: int-area <int-area@ietf.org>
> >> Subject: Re: [Int-area] IP Parcels improves performance for end systems
> >>
> >> This exchange seems to assume facts not in evidence.
> >
> > It is a fact that back in the 1980's the architects took simple token
> ring,
> > changed the over-the-wire coding to 4B/5B, replaced the copper with
> > fiber and then boosted the MTU by a factor of 3 and called it FDDI. They
> > were able to claim what at the time was an astounding 100Mbps (i.e., in
> > comparison to the 10Mbps Ethernet of the day), but the performance
> > gain was largely due to the increase in the MTU. They told me: "Fred,
> > go figure out the path MTU problem", and they said: "go talk to Jeff
> > Mogul out in Palo Alto who knows something about it". But, then, the
> > Path MTU discovery group took a left turn at Albuquerque and left the
> > Internet as a tiny MTU wasteland. We have the opportunity to fix all
> > of that now - so, let's get it right for once.
> >
> > Fred
> >
> >
> >>
> >> And the whole premise is spending resources in other parts of the
> >> network for a marginal diminishing return in the hosts.
> >>
> >> It simply does not add up.
> >>
> >> Yours,
> >> Joel
> >>
> >> On 3/24/2022 2:19 PM, Templin (US), Fred L wrote:
> >>>> The category 1) links are not yet in existence, but once parcels
> start to
> >>>> enter the mainstream innovation will drive the creation of new kinds
> of
> >>>> data links (1TB Ethernet?) that will be rolled out as new hardware.
> >>>
> >>> I want to put a gold star next to the above. AFAICT, pushing the MTU
> and
> >>> implementing IP parcels can get us to 1TB Ethernet practically
> overnight.
> >>> Back in the 1980's, FDDI proved that pushing to larger MTUs could boost
> >>> throughput without changing the speed of light, so why wouldn't the
> same
> >>> concept work for Ethernet in the modern era?
> >>>
> >>> Fred
> >>>
> >>>> -----Original Message-----
> >>>> From: Int-area [mailto:int-area-bounces@ietf.org] On Behalf Of
> Templin (US), Fred L
> >>>> Sent: Thursday, March 24, 2022 9:45 AM
> >>>> To: Tom Herbert <tom@herbertland.com>
> >>>> Cc: int-area <int-area@ietf.org>; Eggert, Lars <lars@netapp.com>;
> lars@eggert.org
> >>>> Subject: Re: [Int-area] IP Parcels improves performance for end
> systems
> >>>>
> >>>> Hi Tom - responses below:
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Tom Herbert [mailto:tom@herbertland.com]
> >>>>> Sent: Thursday, March 24, 2022 9:09 AM
> >>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> >>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area <int-area@ietf.org>;
> lars@eggert.org
> >>>>> Subject: Re: [Int-area] IP Parcels improves performance for end
> systems
> >>>>>
> >>>>> On Thu, Mar 24, 2022 at 7:27 AM Templin (US), Fred L
> >>>>> <Fred.L.Templin@boeing.com> wrote:
> >>>>>>
> >>>>>> Tom - see below:
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
> >>>>>>> Sent: Thursday, March 24, 2022 6:22 AM
> >>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> >>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area <int-area@ietf.org>;
> lars@eggert.org
> >>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end
> systems
> >>>>>>>
> >>>>>>> On Wed, Mar 23, 2022 at 10:47 AM Templin (US), Fred L
> >>>>>>> <Fred.L.Templin@boeing.com> wrote:
> >>>>>>>>
> >>>>>>>> Tom, looks like you have switched over to HTML which can be a
> real conversation-killer.
> >>>>>>>>
> >>>>>>>> But, to some points you raised that require a response:
> >>>>>>>>
> >>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow
> case of encapsulation).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> That sounds like a good reason to continue to use IPv4 – at least
> as far as end system
> >>>>>>>>
> >>>>>>>> addressing is concerned – right?
> >>>>>>>
> >>>>>>>
> >>>>>>> Not at all. All NICs today provide checksum offload and so it's
> >>>>>>> basically zero cost to perform the UDP checksum. The fact that we
> >>>>>>> don't have to do extra checks on the UDPv6 checksum field to see if
> >>>>>>> it's zero actually is a performance improvement over UDPv4. (btw, I
> >>>>>>> will present implementation of the Internet checksum at TSVGWG
> Friday,
> >>>>>>> this will include discussion of checksum offloads).
> >>>>>>
> >>>>>> Actually, my assertion wasn't good to begin with because for IPv6
> even if UDP
> >>>>>> checksums are turned off the OMNI encapsulation layer includes a
> checksum
> >>>>>> that ensures the integrity of the IPv6 header. UDP checksums off
> for IPv6 when
> >>>>>> OMNI encapsulation is used is perfectly fine.
> >>>>>>
> >>>>> I assume you are referring to RFC6935 and RFC6936 that allow the
> UDPv6
> >>>>> to be zero for tunneling with a very constrained set of conditions.
> >>>>>
> >>>>>>>>> If it's a standard per packet Internet checksum then a lot of HW
> could do it. If it's something like CRC32 then probably not.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The integrity check is covered in RFC5327, and I honestly haven’t
> had a chance to
> >>>>>>>>
> >>>>>>>> look at that myself yet.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> LTP is a nice experiment, but I'm more interested as to the
> interaction between IP parcels and TCP or QUIC.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Please be aware that while LTP may seem obscure at the moment
> that may be changing now
> >>>>>>>>
> >>>>>>>> that the core DTN standards have been published. As DTN use
> becomes more widespread I
> >>>>>>>>
> >>>>>>>> think we can see LTP also come into wider adoption.
> >>>>>>>
> >>>>>>>
> >>>>>>>    My assumption is that IP parcels is intended to be a general
> solution
> >>>>>>> of all protocols. Maybe in the next draft you could discuss the
> >>>>>>> details of TCP in IP parcels including how to offload the TCP
> >>>>>>> checksum.
> >>>>>>
> >>>>>> I could certainly add that. For TCP, each of the concatenated
> segments would
> >>>>>> include its own TCP header with checksum field included. Any
> hardware that
> >>>>>> knows the structure of an IP Parcel can then simply do the TCP
> checksum
> >>>>>> offload function for each segment.
> >>>>>
> >>>>> To be honest, the odds of ever getting support in NIC hardware for IP
> >>>>> parcels are extremely slim. Hardware vendors are driven by economics,
> >>>>> so the only way they would do that would be to demonstrate widespread
> >>>>> deployment of the protocol. But even then, with all the legacy
> >>>>> hardware in deployment it will take many years before there's any
> >>>>> appreciable traction. IMO, the better approach is to figure out how
> to
> >>>>> leverage the existing hardware features for use with IP parcels.
> >>>>
> >>>> There will be two kinds of links that will need to be
> "Parcel-capable":
> >>>> 1) Edge network (physical) links that natively forward large parcels,
> and
> >>>> 2) OMNI (virtual) links that forward parcels using encapsulation and
> >>>> fragmentation.
> >>>>
> >>>> The category 1) links are not yet in existence, but once parcels
> start to
> >>>> enter the mainstream innovation will drive the creation of new kinds
> of
> >>>> data links (1TB Ethernet?) that will be rolled out as new hardware.
> And
> >>>> that new hardware can be made to understand the structure of parcels
> >>>> from the beginning. The category 2) links might take a large parcel
> from
> >>>> the upper layers on the local node (or one that has been forwarded by
> >>>> a parcel-capable link) and break it down into smaller sub-parcels then
> >>>> apply IP fragmentation to each sub-parcel and send the fragments to an
> >>>> OMNI link egress node. You know better than me how checksum offload
> >>>> could be applied in an environment like that.
> >>>>
> >>>>>>>>> There was quite a bit of work and discussion on this in Linux. I
> believe the deviation from the standard was motivated by some
> >>>>>>>>
> >>>>>>>>> deployed devices required the IPID be set on receive, and
> setting IPID with DF equals to 1 is thought to be innocuous. You may
> >>>>>>>>
> >>>>>>>>> want to look at Alex Duyck's papers on UDP GSO, he wrote a lot
> of code in this area.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> RFC6864 has quite a bit to say about coding IP ID with DF=1 –
> mostly in the negative.
> >>>>>>>>
> >>>>>>>> But, what I have seen in the linux code seems to indicate that
> there is not even any
> >>>>>>>>
> >>>>>>>> coordination between the GSO source and the GRO destination –
> instead, GRO simply
> >>>>>>>>
> >>>>>>>> starts gluing together packets that appear to have consecutive IP
> IDs without ever first
> >>>>>>>>
> >>>>>>>> checking that they were sent by a peer that was earnestly doing
> GSO. These aspects
> >>>>>>>>
> >>>>>>>> would make it very difficult to work GSO/GRO into an IETF
> standard, plus it doesn’t
> >>>>>>>>
> >>>>>>>> work for IPv6 at all where there is no IP ID included by default.
> IP Parcels addresses
> >>>>>>>>
> >>>>>>>> all of these points, and can be made into a standard.
> >>>>>>>
> >>>>>>>
> >>>>>>> Huh? GRO/GSO works perfectly fine with IPV6.
> >>>>>>
> >>>>>> Where is the spec for that? My understanding is that GSO/GRO
> leverages the
> >>>>>> IP ID for IPv4. But, for IPv6, there is no IP ID unless you include
> a Fragment Header.
> >>>>>> Does IPv6 somehow do GSO/GRO differently?
> >>>>>>
> >>>>>
> >>>>> GRO and GSO don't use the IPID to match a flow. The primary match is
> >>>>> the TCP 4-tuple.
> >>>>
> >>>> Correct, the 5-tuple (src-ip, src-port, dst-ip, dst-pot, proto) is
> what is used
> >>>> to match the flow. But, you need more than that in order to correctly
> paste
> >>>> back together with GRO the segments of an original ULP buffer that was
> >>>> broken down by GSO - you need Identifications and/or other markings in
> >>>> the IP headers to give a reassembly context. Otherwise, GRO might end
> >>>> up gluing together old and new pieces of ULP data and/or impart a lot
> of
> >>>> reordering. IP Parcels have well behaved Identifications and Parcel
> IDs so
> >>>> that the original ULP buffer context is honored during reassembly.
> >>>>
> >>>>> There's also another possibility with IPv6-- use jumbograms. For
> >>>>> instance, instead of GRO reassembling segments up to a 64K packet, it
> >>>>> could be modified to reassemble up to a 4G packet using IPv6
> >>>>> jumbograms where one really big packet is given to the stack.
> >>>>>
> >>>>> But we probably don't even need jumbograms for that. In Linux, GRO
> >>>>> might be taught to reassemble up to 4G super packet and set a flag
> bit
> >>>>> in the skbuf to ignore the IP payload field and get the length from
> >>>>> the skbuf len field (as though a jumbogram was received). This trick
> >>>>> would work for IPV4 and IPv6 and GSO as well. It should also work TSO
> >>>>> if the device takes the IP payload length to be that for each
> segment.
> >>>>
> >>>> Yes, I was planning to give that a try to see what kind of performance
> >>>> can be gotten with GSO/GRO when you exceed 64KB. But, my concern
> >>>> with GSO/GRO is that the reassembly is (relatively) unguided and
> >>>> haphazard and can result in mis-ordered concatenations. And, there is
> >>>> no protocol by which the GRO receiver can imply that the things it is
> >>>> gluing together actually originated from a sender that is earnestly
> doing
> >>>> GSO. So, I do not see how GSO/GRO as I see it in the implementation
> >>>> could be made into a standard, whereas there is a clear path for
> >>>> standardizing IP parcels.
> >>>>
> >>>> Another thing I forgot to mention is that in my experiments with
> GSO/GRO
> >>>> I found that it won't let me set a GSO segment size that would cause
> the
> >>>> resulting IP packets to exceed the path MTU (i.e., it won't allow
> fragmentation).
> >>>> I fixed that by configuring IPv4-in-IPv6 encapsulation per RFC2473
> and then
> >>>> allowed the IPv6 layer to apply fragmentation to the encapsulated
> packet.
> >>>> That way, I can use IPv4 GSO segment sizes up to ~64KB.
> >>>>
> >>>> Fred
> >>>>
> >>>>>
> >>>>> Tom
> >>>>>
> >>>>>> Thanks - Fred
> >>>>>>
> >>>>>>> Tom
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Fred
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
> >>>>>>>> Sent: Wednesday, March 23, 2022 9:37 AM
> >>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> >>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area <int-area@ietf.org>;
> lars@eggert.org
> >>>>>>>> Subject: Re: [EXTERNAL] Re: [Int-area] IP Parcels improves
> performance for end systems
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> EXT email: be mindful of links/attachments.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Mar 23, 2022, 9:54 AM Templin (US), Fred L <
> Fred.L.Templin@boeing.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi Tom,
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
> >>>>>>>>> Sent: Wednesday, March 23, 2022 6:19 AM
> >>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> >>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area@ietf.org;
> lars@eggert.org
> >>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end
> systems
> >>>>>>>>>
> >>>>>>>>> On Tue, Mar 22, 2022 at 10:38 AM Templin (US), Fred L
> >>>>>>>>> <Fred.L.Templin@boeing.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Tom, see below:
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Tom Herbert [mailto:tom@herbertland.com]
> >>>>>>>>>>> Sent: Tuesday, March 22, 2022 10:00 AM
> >>>>>>>>>>> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> >>>>>>>>>>> Cc: Eggert, Lars <lars@netapp.com>; int-area@ietf.org
> >>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for
> end systems
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Mar 22, 2022 at 7:42 AM Templin (US), Fred L
> >>>>>>>>>>> <Fred.L.Templin@boeing.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Lars, I did a poor job of answering your question. One of the
> most important aspects of
> >>>>>>>>>>>>
> >>>>>>>>>>>> IP Parcels in relation to TSO and GSO/GRO is that transports
> get to use a full 4MB buffer
> >>>>>>>>>>>>
> >>>>>>>>>>>> instead of the 64KB limit in current practices. This is
> possible due to the IP Parcel jumbo
> >>>>>>>>>>>>
> >>>>>>>>>>>> payload option encapsulation which provides a 32-bit length
> field instead of just a 16-bit.
> >>>>>>>>>>>>
> >>>>>>>>>>>> By allowing the transport to present the IP layer with a
> buffer of up to 4MB, it reduces
> >>>>>>>>>>>>
> >>>>>>>>>>>> the overhead, minimizes system calls and interrupts, etc.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> So, yes, IP Parcels is very much about improving the
> performance for end systems in
> >>>>>>>>>>>>
> >>>>>>>>>>>> comparison with current practice (GSO/GRO and TSO).
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Fred,
> >>>>>>>>>>>
> >>>>>>>>>>> The nice thing about TSO/GSO/GRO is that they don't require any
> >>>>>>>>>>> changes to the protocol as just implementation techniques, also
> >>>>>>>>>>> they're one sided opitmizations meaning for instance that TSO
> can be
> >>>>>>>>>>> used at the sender without requiring GRO to be used at the
> receiver.
> >>>>>>>>>>> My understanding is that IP parcels requires new protocol that
> would
> >>>>>>>>>>> need to be implemented on both endpoints and possibly in some
> routers.
> >>>>>>>>>>
> >>>>>>>>>> It is not entirely true that the protocol needs to be
> implemented on both
> >>>>>>>>>> endpoints . Sources that send IP Parcels send them into a
> Parcel-capable path
> >>>>>>>>>> which ends at either the final destination or a router for
> which the next hop is
> >>>>>>>>>> not Parcel-capable. If the Parcel-capable path extends all the
> way to the final
> >>>>>>>>>> destination, then the Parcel is delivered to the destination
> which knows how
> >>>>>>>>>> to deal with it. If the Parcel-capable path ends at a router
> somewhere in the
> >>>>>>>>>> middle, the router opens the Parcel and sends each enclosed
> segment as an
> >>>>>>>>>> independent IP packet. The final destination is then free to
> apply GRO to the
> >>>>>>>>>> incoming IP packets even if it does not understand Parcels.
> >>>>>>>>>>
> >>>>>>>>>> IP Parcels is about efficient shipping and handling just like
> the major online
> >>>>>>>>>> retailer service model I described during the talk. The goal is
> to deliver the
> >>>>>>>>>> fewest and largest possible parcels to the final destination
> rather than
> >>>>>>>>>> delivering lots of small IP packets. It is good for the network
> and good for
> >>>>>>>>>> the end systems both. If this were not true, then Amazon would
> send the
> >>>>>>>>>> consumer 50 small boxes with 1 item each instead of 1 larger
> box with all
> >>>>>>>>>> 50 items inside. And, we all know what they would choose to do.
> >>>>>>>>>>
> >>>>>>>>>>> Do you have data that shows the benefits of IP Parcels in
> light of
> >>>>>>>>>>> these requirements?
> >>>>>>>>>>
> >>>>>>>>>> I have data that shows that GSO/GRO is good for packaging sizes
> up to 64KB
> >>>>>>>>>> even if the enclosed segments will require IP fragmentation
> upon transmission.
> >>>>>>>>>> The data implies that even larger packaging sizes (up to a
> maximum of 4MB)
> >>>>>>>>>> would be better still.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Fred,
> >>>>>>>>>
> >>>>>>>>> You seem to be only looking at the problem from a per packet cost
> >>>>>>>>> point of view. There is also per byte cost, particularly in the
> >>>>>>>>> computation of the TCP/UDP checksum. The cost is hidden in modern
> >>>>>>>>> implementations by checksum offload, and for segmentation
> offload we
> >>>>>>>>> have methods to preserve the utility of checksum offload. IP
> parcels
> >>>>>>>>> will have to also leverage checksum offload, because if the
> checksum
> >>>>>>>>> is not offloaded then the cost of computing the payload checksum
> in
> >>>>>>>>> CPU would dwarf any benefits we'd get by using segments larger
> than
> >>>>>>>>> 64K.
> >>>>>>>>
> >>>>>>>> There is plenty of opportunity to apply hardware checksum offload
> since
> >>>>>>>> the structure of a Parcel will be very standard. My experiments
> have been
> >>>>>>>> with a protocol called LTP which is layered over UDP/IP as some
> other
> >>>>>>>> upper layer protocols are. LTP includes a segment-by-segment
> checksum
> >>>>>>>> that is used at its level in the absence of lower layer integrity
> checks, so
> >>>>>>>> for larger Parcels LTP would use that and turn off UDP checksums
> >>>>>>>> altogether.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow
> case of encapsulation).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> As far as I am aware, there are currently no hardware
> >>>>>>>> checksum offload implementations available for calculating the
> >>>>>>>> LTP checksums.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> If it's a standard per packet Internet checksum then a lot of HW
> could do it. If it's something like CRC32 then probably not.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> LTP is a nice experiment, but I'm more interested as to the
> interaction between IP parcels and TCP or QUIC.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Speaking of standard, AFAICT GSO/GRO are doing something very
> >>>>>>>> non-standard. GSO seems to be coding the IP ID field in the IPv4
> >>>>>>>> headers of packets with DF=1 which goes against RFC 6864. When
> >>>>>>>> DF=1, GSO cannot simply claim the IP ID and code it as if there
> were
> >>>>>>>> some sort of protocol. Or, if it does, there would be no way to
> >>>>>>>> standardize it.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> There was quite a bit of work and discussion on this in Linux. I
> believe the deviation from the standard was motivated by some
> >>>>> deployed
> >>>>>>> devices required the IPID be set on receive, and setting IPID with
> DF equals to 1 is thought to be innocuous. You may want to look at
> >>>>> Alex
> >>>>>>> Duyck's papers on UDP GSO, he wrote a lot of code in this area.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Tom
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Fred
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Tom
> >>>>>>>>>
> >>>>>>>>>> Fred
> >>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Tom
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks - Fred
> >>>>>>>>>>>>
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> Int-area mailing list
> >>>>>>>>>>>> Int-area@ietf.org
> >>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/int-area
> >>>>>>
> >>>>
> >>>> _______________________________________________
> >>>> Int-area mailing list
> >>>> Int-area@ietf.org
> >>>> https://www.ietf.org/mailman/listinfo/int-area
> >>> _______________________________________________
> >>> Int-area mailing list
> >>> Int-area@ietf.org
> >>> https://www.ietf.org/mailman/listinfo/int-area
> >
>
> _______________________________________________
> Int-area mailing list
> Int-area@ietf.org
> https://www.ietf.org/mailman/listinfo/int-area
>