Re: [Int-area] IP parcels

"Templin (US), Fred L" <Fred.L.Templin@boeing.com> Wed, 22 December 2021 00:25 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C6D333A0420 for <int-area@ietfa.amsl.com>; Tue, 21 Dec 2021 16:25:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=boeing.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QivzbIpLh0gM for <int-area@ietfa.amsl.com>; Tue, 21 Dec 2021 16:25:25 -0800 (PST)
Received: from clt-mbsout-02.mbs.boeing.net (clt-mbsout-02.mbs.boeing.net [130.76.144.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2D0423A0433 for <int-area@ietf.org>; Tue, 21 Dec 2021 16:25:24 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by clt-mbsout-02.mbs.boeing.net (8.15.2/8.15.2/DOWNSTREAM_MBSOUT) with SMTP id 1BM0PKgA015416; Tue, 21 Dec 2021 19:25:22 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=boeing.com; s=boeing-s1912; t=1640132722; bh=cJJc46daOWElH8QOL8YSI1pe3OBmbWvL2AMVQnr6WYM=; h=From:To:CC:Subject:Date:From; b=pwRRiIAzn3yFT4KodWWZ8HUDnnzV6l8XsfZy5a+xxTiV2SeENsaEjyIrXFTkMmpeU EQ7581wQgZ4j8e/hbpO33vMCdT/adQCJkfehtT84Q9Jr3amrCtO3wAIKXVn7dcI6P0 VfQZTi8gmGL+7X9vieN2BRyCIKjKRi6TOU52TTIBYrY58I4GuWx9LHrEEJqgWlLO6z wKcCDckQY/fiCC64BeaIjPh398zPqh4gTuR08bX6hHZUtL5nzAK5gNPkhzAgPpYrVv B/XKveWzSB+mN3sxfTvBEASnPk1aNuc/mCxCM8YCvBIrzpi7WuYYrqyIJnR6neezwZ tniDlS69If6tA==
Received: from XCH16-07-09.nos.boeing.com (xch16-07-09.nos.boeing.com [144.115.66.111]) by clt-mbsout-02.mbs.boeing.net (8.15.2/8.15.2/8.15.2/UPSTREAM_MBSOUT) with ESMTPS id 1BM0PEuh015381 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 21 Dec 2021 19:25:14 -0500
Received: from XCH16-07-10.nos.boeing.com (144.115.66.112) by XCH16-07-09.nos.boeing.com (144.115.66.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.14; Tue, 21 Dec 2021 16:25:11 -0800
Received: from XCH16-07-10.nos.boeing.com ([fe80::1522:f068:5766:53b5]) by XCH16-07-10.nos.boeing.com ([fe80::1522:f068:5766:53b5%2]) with mapi id 15.01.2308.014; Tue, 21 Dec 2021 16:25:11 -0800
From: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>
To: Tom Herbert <tom@herbertland.com>
CC: "touch@strayalpha.com" <touch@strayalpha.com>, "int-area@ietf.org" <int-area@ietf.org>
Thread-Topic: IP parcels
Thread-Index: Adf2yjHOq6Gr30EpT3++kPZ3mr+1Kw==
Date: Wed, 22 Dec 2021 00:25:11 +0000
Message-ID: <23d97642b9954a9a9cce0342c5434ff8@boeing.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [137.137.12.6]
x-tm-snts-smtp: 1D57285CFC582059BE2126C328EB84BC9D113E990D3C688364C8301E70C2E1BB2000:8
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-TM-AS-GCONF: 00
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/2ueNRomtnBjJ_ah0g0Azz3B_dhQ>
Subject: Re: [Int-area] IP parcels
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area WG Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Dec 2021 00:25:30 -0000

Tom, sorry getting ready to shut down for the evening but a new draft version is out.
Please see if this answers questions, or if not we can continue to discuss.

Thanks - Fred

> -----Original Message-----
> From: Tom Herbert [mailto:tom@herbertland.com]
> Sent: Tuesday, December 21, 2021 1:52 PM
> To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> Cc: touch@strayalpha.com; int-area@ietf.org
> Subject: [EXTERNAL] Re: IP parcels
> 
> EXT email: be mindful of links/attachments.
> 
> 
> 
> On Tue, Dec 21, 2021 at 1:17 PM Templin (US), Fred L
> <Fred.L.Templin@boeing.com> wrote:
> >
> > Tom, thanks for these questions. A simple picture might help understanding:
> >
> > A <- net #1 -> X <- Internet -> Y <- net #2 - > B
> >
> Fred,
> 
> Thanks. The picture is helpful.
> 
> Presumably, in this picture the typical scenario would be that the MTU
> of net #1 and net #2 are greater than the MTU on the Internet, and
> could in fact be much greater. So a large MTU packet to B and over its
> local network it gets the benefits of a larger MTU. And then such
> packets could be fragmented at X to send over the Internet. Y
> reassembles the packet to enjoy the benefits of a larger MTU on
> network #2.
> 
> If my understanding is correct, X is basically fragmenting packets
> whose length exceeds the MTU of the Internet. That seems to have the
> same functionality as IP fragmentation, but I suppose the point of IP
> parcels is that the packet may be larger than 64K or it allows
> intermediate nodes to do fragmentation with IPv6.
> 
> But what is happening at Y is interesting to me. Y is trying to
> opportunistically reassemble packets in the network. As I mentioned
> the problem with doing that is that Y may or may not see all the
> packets of a parcel, or an attacker could purposely send one-off
> fragments to try to exhaust memory, or there might just be too many
> user flows going through the device that are candidates for
> reassembly-- for any of these cases it seems pretty easy to move to a
> state that substantially reduces the benefits of in-network
> reassembly. And if a device does get into this mode where it's only
> able to reassemble some small fraction of the traffic, then it seems
> like the effect would be to introduce delay on parcel fragments that
> never actually get reassembled (which would seem to violate the
> principle that if an opportunistic optimization becomes in effective,
> the effects should be no worse than had the optimization no existed at
> all). Do you have any data on the cost/benefits of reassembling at Y?
> 
> Tom
> 
> > Here, when A sends a parcel to B, it travels first over net #1 to a "first hop
> > middlebox" X. If the path MTU from A to X is smaller than the size of the
> > parcel, then A has to split the parcel into subparcels no larger than 64KB
> > and transmit each of the sub-parcels to X using IP fragmentation if necessary.
> > If the path MTU from A to X is at least as large as the parcel, then A can send
> > the whole parcel to X, i.e., even if the path MTU and parcel size exceed 64KB.
> >
> > Once X gets the subparcels, X should perform "sticky reassembly" by concatenating
> > subparcels of the same parcel opportunistically; but it does not need to strictly
> > reassemble the full parcel because the sub-parcels can go forward independently
> > if necessary. X then needs to somehow convey the parcel (or sub-parcels) over the
> > Internet to Y and the same considerations as above apply. Then, the same thing
> > happens for Y conveying finally on to B.
> >
> > So, in terms of fragmentation and reassembly at the IP layer, all that is required
> > is that all of A, X, Y and B are capable of reassembling up to 64KB since that is the
> > largest size IP packet that can be reassembled. If any or all of the hops support a
> > larger packet size (jumbo) then great and the parcel can be forwarded in one piece.
> >
> > But, at the parcel layer it is somewhat different. Breaking a parcel into sub-parcels
> > means that a single parcel can be broken down to a maximum of 64 sub-parcels.
> > Each of these sub-parcels should be marked with a common short Parcel ID plus
> > a monotonically incrementing 32-bit packet Identifier so the next hop can know
> > that they were all originally part of the same parcel. Then, the next hop applies
> > "opportunistic reassembly" as you called it and tries to paste the parcel back together.
> > If the whole parcel gets reassembled to its original form, then that is great; otherwise,
> > it can continue forward as sub parcels to the next hop.
> >
> > And it continues in this fashion from source to destination. If the "sticky" parcel
> > reassembly works well then the whole original parcel might be delivered to the
> > final destination in one piece. But, in the worst case, the final destination might
> > get up to 64 singleton parcels - and then, it can do "sticky reassembly" on these
> > since it is the final destination!
> >
> > In fact, the whole reason for doing "sticky reassembly" at middleboxes it to make
> > the largest possible parcels to send to next hops where the path MTU is greater
> > than 64K. We don't have many paths like that yet, but parcels might provide
> > motivation for trending in that direction.
> >
> > I was originally thinking I would capture all this in the OMNI spec instead of the
> > IP parcels spec, but I see that a lot of this should probably also go in IP parcels.
> > What do you think?
> >
> > Thanks - Fred
> >
> > > -----Original Message-----
> > > From: Tom Herbert [mailto:tom@herbertland.com]
> > > Sent: Tuesday, December 21, 2021 10:01 AM
> > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > Cc: touch@strayalpha.com; int-area@ietf.org
> > > Subject: Re: IP parcels
> > >
> > > On Tue, Dec 21, 2021 at 6:24 AM Templin (US), Fred L
> > > <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > > Tom, reading your message makes me think you have not read my drafts. The
> > > >
> > > > answers to the perceived issues you are raising are all there. I do not see anything
> > > >
> > > > new in what you are saying to make me believe otherwise.
> > > >
> > > >
> > >
> > > Fred,
> > >
> > > I did read your draft. I might be misunderstanding it. Here are points
> > > I don't think I understand, f these are in the draft plead reference
> > > the precise section:
> > >
> > > - A clear explanation why GSO/GRO are not sufficient to solve the
> > > problem. The draft highlights these to show the advantages of
> > > sending/receiving large data units, but it's not clear to me why a
> > > change to the protocol is required to get the same or somehow better
> > > effect. As a side note, it should be pointed out that GSO/GRO and
> > > similar mechanisms are opportunistic optimizations. For instance, the
> > > TCP congestion window and receive window have to be large enough for
> > > GSO to be effective for sending on a connection. As an anecdote, there
> > > was an incident early on at YouTube where they extensively used GSO
> > > for serving video which under normal circumstances is a great savings
> > > in CPU utilization. But one day there was a hiccup on the Internet
> > > that caused all the connections to go to slow start. So now instead of
> > > sending 64K at a time the servers were sending two segments at a time
> > > for all the connections (this was before the work to raise initcwnd);
> > > so now instead of servers running at 50% CPU, they needed 150% CPU and
> > > so were dropping a lot of packets, and recovering took quite a bit of
> > > time making unhappy customers (moral of this story: always provision
> > > your servers to handle the worst case scenario where opportunistic
> > > optimizations become ineffective!)
> > >
> > > - What is the exact algorithm for reassembly of parcels? Searching the
> > > document for reassemble only comes up with "when the OAL source or
> > > final destination receives the fragments or whole parcels, it
> > > reassembles if necessary"
> > >
> > > - What are the ramifications of middleboxes performing reassembly on
> > > behalf of a host. The document says "then rejoined into one or more
> > > parcels at a last-hop middlebox to be forwarded to the final
> > > destination". I'm not sure what a "last-hop middlebox" means in a
> > > normative context, but this does appear to be an intermediate network
> > > node which would seem to be susceptible to the issues of stateful
> > > intermediate network nodes that I previously raised.
> > >
> > > - Is in order delivery of segments within a parcel maintained. The
> > > draft states "While not desirable, reordering of segments within
> > > parcels and individual segment loss are possible.  But, what matters
> > > is that the number of parcels delivered to the final destination
> > > should be kept to a minimum, and that loss or receipt of individual
> > > segments (and not parcel size) determines the retransmission unit".
> > > Why is it so critical to keep the number of parcels delivered to the
> > > final destination to be kept at minimum? As I mentioned, hosts are
> > > already used to dealing with reassembly, it seems like the best method
> > > is still to send packets at path MTU which is what TCP is doing with
> > > PMTUD.
> > >
> > > - How are IP parcels substantially different from fragmentation?  Is
> > > the idea that individual segments in an IP parcel can be lost without
> > > losing the whole parcel? Is the idea that parcels can make up a >64K
> > > super packet? What if a segment in a parcel is greater than an MTU in
> > > the path, is an intermediate node breaking up a parcel expected to
> > > fragment the segment, or send a PTB?
> > >
> > > Tom
> > >
> > >
> > >
> > > >
> > > > Thanks - Fred
> > > >
> > > >
> > > >
> > > > From: Tom Herbert [mailto:tom@herbertland.com]
> > > > Sent: Monday, December 20, 2021 4:14 PM
> > > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > Cc: touch@strayalpha.com; int-area@ietf.org
> > > > Subject: Re: IP parcels
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Dec 20, 2021 at 3:11 PM Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > > Tom, in modern reassembly it is not going to wait for the MSL for all fragments
> > > >
> > > > to arrive anymore; either they all get there after a very small inter-fragment
> > > >
> > > > delay, or you send an immediate FRAGREP and possibly also a PTB soft error
> > > >
> > > > then quickly declare the reassembly dead if that doesn’t help. And, you make
> > > >
> > > > sure to inspect IDs of received fragments before admitting them into the
> > > >
> > > > reassembly cache so you don’t end up caching garbage that will just have to
> > > >
> > > > be discarded later.
> > > >
> > > >
> > > >
> > > > Fred,
> > > >
> > > >
> > > >
> > > > It doesn't matter in the sense that reassembly is a non-working conserving mechanism. In order to perform reassembly packet
> fragments
> > > need to be held which means memory will be consumed and since memory is a finite resource it needs to be managed.  Managing
> memory
> > > means that some policy is needed when to time out a reassembly or which fragment train to discard under memory pressure. A network
> > > that implements some arbitrary policy can cause problems on unsuspecting hosts. For instance, there's mechanisms for hosts to try to
> guess
> > > what the timeout is in a NAT box and send a keepalive packet before an idle NAT state is evicted. So this is just a guess that may or may
> not
> > > be right, and in fact there might not even be a NAT in the path in which case the host is just wasting energy sending keepalives. Also, the
> > > second we introduce a new exhaustible resource in the path that becomes yet another denial of service vector (consider the case that
> an
> > > attacker spoofs a whole bunch of IP parcels).
> > > >
> > > >
> > > >
> > > > Unless the network can coordinate very specifically with the host about what it's doing on behalf of the host stack, I think it's much
> better
> > > for the network to just focus or forward packets without delay and let the host handle the details of receive processing, reassembly,
> > > security, etc.
> > > >
> > > >
> > > >
> > > > Tom
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Fred
> > > >
> > > >
> > > >
> > > > From: Tom Herbert [mailto:tom@herbertland.com]
> > > > Sent: Monday, December 20, 2021 1:06 PM
> > > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > Cc: touch@strayalpha.com; int-area@ietf.org
> > > > Subject: Re: IP parcels
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Dec 20, 2021 at 12:03 PM Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > > Tom, sorry I will try to use my words more carefully; I am using GSO/GRO also for
> > > >
> > > > a UDP-based transport protocol – not QUIC but something similar. I like GSO/GRO
> > > >
> > > > very much; I am glad the service is available and I want to see it continue. But, my
> > > >
> > > > understanding of the services is that they leverage the IP ID field in whole IPv4
> > > >
> > > > packets that are not eligible for fragmentation and those are limitations I am
> > > >
> > > > seeking to improve on.
> > > >
> > > >
> > > >
> > > > I want to enable a facility similar to GSO/GRO that works for both IPv4 and IPv6
> > > >
> > > > packets and allows for lower layers to fragment if necessary. And, I want to use
> > > >
> > > > a well-behaved 32-bit IPv6 ID instead of the 16-bit IPv4 one where the use is not
> > > >
> > > > well defined when DF=1.
> > > >
> > > >
> > > >
> > > > There has been a lot of work in this area. For instance, you might want to take a look at
> > > https://www.youtube.com/watch?v=ccUeG1dAhbw
> > > >
> > > >
> > > >
> > > > About reassembly, that would only happen on the end systems themselves or on
> > > >
> > > > a very capable device that is very close to the end systems; I would not want for
> > > >
> > > > a high-speed core router to have to reassemble.
> > > >
> > > >
> > > >
> > > > Even so, an intermediate device close to the end system still has to provide service to more than one host. Reassembly requires
> memory
> > > to store fragments. A host would need enough memory to service all of its own flows, but an intermediate node would need number of
> > > hosts it serves times that amount of memory to perform reassembly.  This is a fundamental scaling problem of stateful services in the
> > > network, inevitably the network nodes cannot scale to the number of users or flows that require service. In the best case scenario, when
> > > resources are not available the network won't attempt the stateful operation and will just forward the packet unimpeded (which is fine
> > > because host will never rely on this class of optimization). In the worse case scenario, the network will take a detrimental action such as
> > > forcibly breaking a connection (e.g. this is what can happen when a NAT evicts a TCP connection because it has run out of memory).
> IMO,
> > > maintaining state in the network is a bad, albeit unfortunately prevalent, idea.
> > > >
> > > >
> > > >
> > > > Tom
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Again, GSO/GRO is nice work and much respect is due to those who made it possible.
> > > >
> > > >
> > > >
> > > > Fred
> > > >
> > > >
> > > >
> > > > From: Tom Herbert [mailto:tom@herbertland.com]
> > > > Sent: Monday, December 20, 2021 9:20 AM
> > > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > Cc: touch@strayalpha.com; int-area@ietf.org
> > > > Subject: Re: [Int-area] [EXTERNAL] Re: IP parcels
> > > >
> > > >
> > > >
> > > > The world is not just TCP anymore. QUIC and other UDP-based transports have already
> > > >
> > > > shown performance increases using facilities like GSO/GRO which are essentially a short
> > > >
> > > > term and non-standard implementation of what parcels promise to do in the long term.
> > > >
> > > >
> > > >
> > > > Fred,
> > > >
> > > >
> > > >
> > > > Can you explain why GSO/GRO aren't sufficient and are only short term solutions? We've been using these for almost twenty years
> now
> > > with good effect. These are widely deployed with TCP, TSO works well to offload transmit, LRO is defined and is in much better shape to
> > > offload RX now that programmable devices are emerging. For TCP it's hard to see how IP parcels would help significantly, but even for
> UDP
> > > we now have UDP GSO, sendmmsg, and recvmmsg that mitigate the cost of system calls and interrupts to which the draft refers. The
> > > reason these aren't standards in IETF is because they're implementation techniques and not protocol (although I will point out that
> > > GSO/GRO/sendmmsg/recvmmsg are in all Linux devices so that effectively makes it a de facto implementation standard).
> > > >
> > > >
> > > >
> > > > I am also concerned about the idea that intermediate devices would perform reassembly. This has a whole bunch of implications like
> > > middleboxes are no longer work conserving and seems to have the implicit requirement that it has to be in the path of every packet in a
> > > parcel (i.e. even in the case of the last hop performing reassembly. Also, as simply a matter of resources and capabilities, hosts are in a
> > > much better position to perform tasks like reassembly. I don't readily see that having intermediate devices perform reassembly would be
> a
> > > win for hosts, and even if it were, host implementations still would need the capability to perform reassembly themselves since they will
> > > never rely on the network to always do it for them.
> > > >
> > > >
> > > >
> > > > Tom
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Thanks - Fred
> > > >
> > > >
> > > >
> > > > From: touch@strayalpha.com [mailto:touch@strayalpha.com]
> > > > Sent: Sunday, December 19, 2021 11:53 AM
> > > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > Cc: int-area@ietf.org; Wes Eddy <wes@mti-systems.com>
> > > > Subject: Re: [Int-area] IP parcels
> > > >
> > > >
> > > >
> > > > Hi, Fred (et al.),
> > > >
> > > >
> > > >
> > > > On Dec 19, 2021, at 10:21 AM, Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > >
> > > >
> > > > Joe, your insistence on using html makes it impossible to respond to all of your points inline
> > > >
> > > > which is the reason for my top-posts.
> > > >
> > > >
> > > >
> > > > I use MacOS mail, IOS mail, and Thunderbird on Windows, all using default configurations, FWIW. I appear to be able to post inside
> > > everyone else’s responses. I don’t know if the IETF’s mailers are munging formats, though.
> > > >
> > > >
> > > >
> > > > I’ve made my position clear. However:
> > > >
> > > >
> > > >
> > > > - You still haven’t shown any evidence that end systems need to do all this extra work so they can somehow run faster, nor that this
> will
> > > be noticeably faster than large (i.e., 20-60KB) IPv4 packets.
> > > >
> > > >
> > > >
> > > > - You still haven’t shown any reason why this is feasible; in fact, below you add the idea of on-path fragmentation, which is largely
> > > deprecated because fragments won’t traverse tunnels (in your case, notably for single chunks larger than 64KB). Nevermind that the
> > > fragmentation is both expensive and slow-path at routers.
> > > >
> > > >
> > > >
> > > > - You have claimed that both routers and transports will somehow adopt this when we can’t even get reasonably large MTUs that
> already
> > > fit within IPv4 across heterogeneous enterprises.
> > > >
> > > >
> > > >
> > > > IPv4 is over; even if you don’t think so, any way forward with larger packets starts with:
> > > >
> > > >                a) getting ~64KB IP packets across the net
> > > >
> > > >                b) after (a), prove that >64KB are needed based on the IPv6 jumbo approach
> > > >
> > > >
> > > >
> > > > Any way forward with a lot of small packets inside one large one (where both chunks and total length are less than 64K) starts by
> proving
> > > there’s a need and it fixing how TCP interacts with its inherent burstiness and loss correlation.
> > > >
> > > >
> > > >
> > > > Only THEN will this issue be worth more discussion.
> > > >
> > > >
> > > >
> > > > Joe
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Parcels that contain a single segment whether 64K or considerably less are still sent as
> > > >
> > > > (singleton) parcels and not ordinary packets. That way, nodes in the network can know
> > > >
> > > > that it is OK to encapsulate and fragment since by asserting its interest in receiving parcels
> > > >
> > > > the destination has also subscribed to being able to reassemble up to a full 64K.
> > > >
> > > >
> > > >
> > > > Parcels do not set (Payload Length / Total Length) to 0; they set it to the length of the
> > > >
> > > > first element of the parcel (which is also the same length of each non-final element of
> > > >
> > > > the parcel). What happens then is that network equipment will see a unit with an L3
> > > >
> > > > length that may be considerably shorter than the L2 length. You are right that legacy
> > > >
> > > > routers might not like this (or, they might truncate the packet according to L3 length),
> > > >
> > > > and so for paths that might traverse legacy routers the first-hop node that recognizes
> > > >
> > > > parcels instead encapsulates the parcel in an IPv4 or IPv6 header then performs (source)
> > > >
> > > > fragmentation if necessary. These IP fragments will then travel through legacy routers
> > > >
> > > > just fine.
> > > >
> > > >
> > > >
> > > > About RFC793bis, you and Wes Eddy know far more about its status than I do; I only
> > > >
> > > > noted that this is something with TCP implications and so made mention of it in case
> > > >
> > > > there is still room for a few more engine tweaks while the hood is still open.
> > > >
> > > >
> > > >
> > > > About IPv4, I am currently running IPv4 edge networks with IPv4-in-IPv6 tunnel endpoints
> > > >
> > > > connected to an IPv6 transit network and it works really good. End systems get to use
> > > >
> > > > smaller addresses and smaller headers, and they can talk to remote correspondents using
> > > >
> > > > IPv4 as if they were all on the same IPv4 network. So yes, I think we might still want to
> > > >
> > > > consider IPv4 for edge networks like that.
> > > >
> > > >
> > > >
> > > > About getting 64K packets across, only the edge networks or end systems see them as
> > > >
> > > > large packets; in the core thy are typically broken up into something much smaller by
> > > >
> > > > ingress nodes that apply segmentation/fragmentation. We don’t need the core to move
> > > >
> > > > to jumbo links; we only need that at the edges. ATM taught us that.
> > > >
> > > >
> > > >
> > > > About our “nail”, end systems get to see larger packets/parcels and get to take advantage
> > > >
> > > > of the reduced interrupts and system call overhead they provide. That is what makes it
> > > >
> > > > worthwhile.
> > > >
> > > >
> > > >
> > > > Fred
> > > >
> > > >
> > > >
> > > > From: touch@strayalpha.com [mailto:touch@strayalpha.com]
> > > > Sent: Saturday, December 18, 2021 8:13 PM
> > > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > Cc: int-area@ietf.org; Wes Eddy <wes@mti-systems.com>
> > > > Subject: Re: [Int-area] IP parcels
> > > >
> > > >
> > > >
> > > > HI, Fred,
> > > >
> > > >
> > > >
> > > > If you have one segment that’s less than 64K, you don’t need the parcel option at all.
> > > >
> > > >
> > > >
> > > > If you have something longer than 64K, either as a single segment or multiple smaller segments, by setting total length to 0, you end
> up
> > > being dropped by legacy routers, which either ignore options they don’t understand or drop packets with options they don’t support.
> > > >
> > > >
> > > >
> > > > RFC793bis does talk about IPv6 jumbos, but this new work is out of scope for RFC793bis - furthermore, it’s too late. It has passed
> WGLC,
> > > IETF LC, and is currently in IESG review for publication.
> > > >
> > > >
> > > >
> > > > You also haven’t addressed why the IETF should be taking up this *new* work for IPv4, which I thought also had been considered
> > > ineligible.
> > > >
> > > >
> > > >
> > > > But overall, again, what’s the point? We can’t even get 64K IP packets through the Internet; making them larger doesn’t make that
> easier
> > > or more likely. Such large sizes are of diminishing benefit; routers already forward at 40Gbps per link for minimal packets and end
> systems
> > > have other problems that this exacerbates.
> > > >
> > > >
> > > >
> > > > This seems a lot like a huge hammer in search of a nail. Where’s the nail?
> > > >
> > > >
> > > >
> > > > Joe
> > > >
> > > >
> > > >
> > > > —
> > > >
> > > > Joe Touch, temporal epistemologist
> > > >
> > > > www.strayalpha.com
> > > >
> > > >
> > > >
> > > > On Dec 18, 2021, at 7:18 PM, Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > >
> > > >
> > > > Joe, I never said that I wanted to restrict this to small transport segments; in fact, I want
> > > >
> > > > just the opposite – I want large segments. A perfectly legal parcel is one which includes 1
> > > >
> > > > ~64KB segment - another legal parcel is one which includes 64 of them! If you want bigger
> > > >
> > > > segments than that, then true jumbos are necessary and this spec does not preclude that.
> > > >
> > > >
> > > >
> > > > About RFC793(bis), I see there is a section on Jumbos and IP parcels is (sort of) an application
> > > >
> > > > of Jumbos.
> > > >
> > > >
> > > >
> > > > Fred
> > > >
> > > >
> > > >
> > > > From: touch@strayalpha.com [mailto:touch@strayalpha.com]
> > > > Sent: Saturday, December 18, 2021 4:57 PM
> > > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > Cc: int-area@ietf.org; Wes Eddy <wes@mti-systems.com>
> > > > Subject: [EXTERNAL] Re: [Int-area] IP parcels
> > > >
> > > >
> > > >
> > > > EXT email: be mindful of links/attachments.
> > > >
> > > >
> > > >
> > > >
> > > > Hi, Fred,
> > > >
> > > >
> > > >
> > > > Regarding 793bis, new ideas are out of scope. It’s supposed to be a roll-in of existing items only.
> > > >
> > > >
> > > >
> > > > Nevermind the problems below, which “TCP will find a way” doesn’t magically fix.
> > > >
> > > >
> > > >
> > > > The problem is this:
> > > >
> > > > - end systems need to send larger transport segments (not just IP segments)
> > > >
> > > > - if they can do that, they should, filling up to the largest IP payload
> > > >
> > > >
> > > >
> > > > Having an IP packet have the opportunity to include lots of small transport packets assumes transport packets either work better or
> faster
> > > when they’re small. It’s the opposite.
> > > >
> > > >
> > > >
> > > > Joe
> > > >
> > > >
> > > >
> > > > —
> > > >
> > > > Joe Touch, temporal epistemologist
> > > >
> > > > www.strayalpha.com
> > > >
> > > >
> > > >
> > > > On Dec 18, 2021, at 4:42 PM, Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > >
> > > >
> > > > Joe, TCP will find a way to adapt – it always has. I also see that TCP is currently undergoing
> > > >
> > > > a second edition revision so the timing seems right to consider IP parcels in the analysis.
> > > >
> > > > I am Cc’ing the second edition editor for his information – Wesley, please consider this
> > > >
> > > > new concept called “IP Parcels” as it relates to your document.
> > > >
> > > >
> > > >
> > > > Here is the latest draft version – it expands on the “Motivation” section and adds a number
> > > >
> > > > of important feature such as a new “Parcels Permitted” TCP option:
> > > >
> > > >
> > > >
> > > > https://datatracker.ietf.org/doc/draft-templin-intarea-parcels/
> > > >
> > > >
> > > >
> > > > Fred
> > > >
> > > >
> > > >
> > > > From: touch@strayalpha.com [mailto:touch@strayalpha.com]
> > > > Sent: Friday, December 17, 2021 6:01 PM
> > > > To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
> > > > Cc: int-area@ietf.org
> > > > Subject: Re: [Int-area] IP parcels
> > > >
> > > >
> > > >
> > > > Hi, Fred,
> > > >
> > > >
> > > >
> > > > I’m first concerned at the use of an IP option at all, due to the problems with *any* options forcing processing to slow-path.
> > > >
> > > >
> > > >
> > > > From TCP’s viewpoint, it seems like you’ve just created a nightmare for SACK and ECN, basically because you will encourage drops of
> large
> > > bursts of packets.
> > > >
> > > >
> > > >
> > > > This will also increase the bustiness of TCP, i.e., rather than letting the ACKs support pacing.
> > > >
> > > >
> > > >
> > > > Any part of the system that currently coalesces TCP packets is likely to generate errors here, because they might see only the first TCP
> > > segment.
> > > >
> > > >
> > > >
> > > > However, AFAICT the most significant consideration is that  the issue with per-packet performance is at the TCP and UDP layers, not as
> > > much at the IP layer.
> > > >
> > > >
> > > >
> > > > So what problem is this trying to solve?
> > > >
> > > >
> > > >
> > > > Joe
> > > >
> > > > —
> > > >
> > > > Joe Touch, temporal epistemologist
> > > >
> > > > www.strayalpha.com
> > > >
> > > >
> > > >
> > > > On Dec 17, 2021, at 5:06 PM, Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
> > > >
> > > >
> > > >
> > > > Here's one that should help with shipping, just in time for Christmas. Thanks
> > > > to everyone for the past and future list exchanges.
> > > >
> > > > Fred
> > > >
> > > > -----Original Message-----
> > > > From: I-D-Announce [mailto:i-d-announce-bounces@ietf.org] On Behalf Of internet-drafts@ietf.org
> > > > Sent: Friday, December 17, 2021 5:00 PM
> > > > To: i-d-announce@ietf.org
> > > > Subject: I-D Action: draft-templin-intarea-parcels-00.txt
> > > >
> > > >
> > > > A New Internet-Draft is available from the on-line Internet-Drafts directories.
> > > >
> > > >
> > > >        Title           : IP Parcels
> > > >        Author          : Fred L. Templin
> > > >                Filename        : draft-templin-intarea-parcels-00.txt
> > > >                Pages           : 8
> > > >                Date            : 2021-12-17
> > > >
> > > > Abstract:
> > > >   IP packets (both IPv4 and IPv6) are understood to contain a unit of
> > > >   data which becomes the retransmission unit in case of loss.  Upper
> > > >   layer protocols such as the Transmission Control Protocol (TCP)
> > > >   prepare data units known as "segments", with traditional arrangements
> > > >   including a single segment per packet.  This document presents a new
> > > >   construct known as the "IP Parcel" which permits a single packet to
> > > >   carry multiple segments.  The parcel can be opened at middleboxes on
> > > >   the path with the included segments broken out into individual
> > > >   packets, then rejoined into one or more repackaged parcels to be
> > > >   forwarded further toward the final destination.  Reordering of
> > > >   segments within parcels is unimportant; what matters is that the
> > > >   number of parcels delivered to the final destination should be kept
> > > >   to a minimum, and that loss or receipt of individual segments (and
> > > >   not parcel size) determines the retransmission unit.
> > > >
> > > >
> > > > The IETF datatracker status page for this draft is:
> > > > https://datatracker.ietf.org/doc/draft-templin-intarea-parcels/
> > > >
> > > > There is also an htmlized version available at:
> > > > https://datatracker.ietf.org/doc/html/draft-templin-intarea-parcels-00
> > > >
> > > >
> > > > Internet-Drafts are also available by rsync at rsync.ietf.org::internet-drafts
> > > >
> > > >
> > > > _______________________________________________
> > > > I-D-Announce mailing list
> > > > I-D-Announce@ietf.org
> > > > https://www.ietf.org/mailman/listinfo/i-d-announce
> > > > Internet-Draft directories: http://www.ietf.org/shadow.html
> > > > or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
> > > >
> > > > _______________________________________________
> > > > Int-area mailing list
> > > > Int-area@ietf.org
> > > > https://www.ietf.org/mailman/listinfo/int-area
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Int-area mailing list
> > > > Int-area@ietf.org
> > > > https://www.ietf.org/mailman/listinfo/int-area
> >