Re: [Int-area] Call for WG adoption of draft-templin-intarea-parcels-10

Tom Herbert <tom@herbertland.com> Tue, 12 July 2022 19:54 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D91F2C157B4D for <int-area@ietfa.amsl.com>; Tue, 12 Jul 2022 12:54:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.794
X-Spam-Level:
X-Spam-Status: No, score=-1.794 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20210112.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jYQBQHcIe-uo for <int-area@ietfa.amsl.com>; Tue, 12 Jul 2022 12:54:09 -0700 (PDT)
Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 17DB7C157B42 for <int-area@ietf.org>; Tue, 12 Jul 2022 12:54:09 -0700 (PDT)
Received: by mail-lf1-x12d.google.com with SMTP id u13so15716003lfn.5 for <int-area@ietf.org>; Tue, 12 Jul 2022 12:54:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zY49WZ86uG/qEq2ZU+leBm+VrNOo/gU+qJJ7Js7VUCA=; b=dqJOZ8hB7dgfOBKWcwMvlNvUK7JEQUMiHvdLR0Sb2EXzX++Bsx2QmWEvvOOnceK3bb fvo/lZYR1gyuHZ4pY6ITP0+mGtWPwJl5HDo7CCCYS2Qlcwo19XuGzeIAoRiKJpliY1c5 64Nnz9cikNyR4xGEO4OMiwc6s/hN4unhEdMBmKHXYw3ls91MPFxlm6oU/i8ecmJnPt1K I0gxN90rSZL3ESrk5yDs249WyedFy+MaDqlpa8+LwxAbKRtKGvBzNBvTaZgpcXGS3koC Vdb4uW2eCTnBVQfF7/LagFgFOXL4l2+b1jJqklrpufZl4Tj15220RhvODxbAN72ycpKF Rc7A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zY49WZ86uG/qEq2ZU+leBm+VrNOo/gU+qJJ7Js7VUCA=; b=nn8N+mOi+PDwd/v6BkNLdSw495KdbBsfy1omMcYVBA2gGlMDmjwV5NM9VflVIm/f6w fVCgMQ64e1bETh6Q1KcN6ImT0Jsa1gye6abb4xs7oZJ+tEdQIeQU1+niKrPEt6iuxPGe sSCimMFO46Ib1QCqBol8E/Fmx3Hq27/gLLSg9zyZFAa5FOuPpjnkR7T3SPwcX/e6BLrU TihuBZ6QgIRZY36sZzm96ai0Rp5C+FP/cORYeVwvG3/g3YV6vyzgPAApgSEzSokO2bvs PiubnCUHqSN+YToamac8l0EitfNSpvSo8Sv2bFi+G9qiIEA+sexq9xtACHmk4ASJ6mVM wv9w==
X-Gm-Message-State: AJIora9/vUYXn3U74DEo0WZdf4ydvsIuKa03nwY5XLlN91qw62/xm8/E 0WE7/mO8JW1N8HlaPiwI+iGTE8tYh7oxHquY3nNvmA==
X-Google-Smtp-Source: AGRyM1vnRW4dp+TAc0eergxeHeyTTans1IvBcSK9vhqSw3/TQ3wYusnrGJh1259LTLjDO8vqcb1S2S4NL2ef2xeAuZg=
X-Received: by 2002:a05:6512:b8c:b0:489:79ef:aad0 with SMTP id b12-20020a0565120b8c00b0048979efaad0mr16818647lfv.276.1657655647061; Tue, 12 Jul 2022 12:54:07 -0700 (PDT)
MIME-Version: 1.0
References: <2c7401b5b34f406a980a6379b6f7704c@boeing.com>
In-Reply-To: <2c7401b5b34f406a980a6379b6f7704c@boeing.com>
From: Tom Herbert <tom@herbertland.com>
Date: Tue, 12 Jul 2022 12:53:55 -0700
Message-ID: <CALx6S34x0ut88GF+-XgA9gAf1aRqOcj5eFo+xPji_LaX218qjw@mail.gmail.com>
To: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>
Cc: Joel Halpern <jmh.direct@joelhalpern.com>, "int-area@ietf.org" <int-area@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000000e1c305e3a10827"
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/Gr3jEG2jg1XQ_wGSSBcjnL3pb9E>
Subject: Re: [Int-area] Call for WG adoption of draft-templin-intarea-parcels-10
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Internet Area WG Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Jul 2022 19:54:14 -0000

On Tue, Jul 12, 2022 at 9:23 AM Templin (US), Fred L <
Fred.L.Templin@boeing.com> wrote:

> Tom, I think the fact that ULP segment size need not closely conform to
> the path MTU
>
> in order to get optimum performance (i.e., even if fragmentation is
> needed) challenges
>
> conventional wisdom and needs to be repeated early and often. I think the
> fact that
>
> upper layers including multiple segments per system call can increase
> performance also
>
> bears repeating. To be truthful, I have not yet sent an actual IP parcel
> over the wire but
>
> that would be a logical next step. Some linux kernel hacking will be
> necessary, but very
>
> much in the realm of possibility.
>

Fred,

Without running code, it's really hard to make the argument that IP parcels
improve host performance.


>
>
> You suggested earlier that a parcel could include a full checksum for the
> entire parcel
>
> along with individual checksums for the separate segments and yes that
> works. Is
>
> your thinking that the “master” checksum would then be verified in
> hardware while
>
> the “per-segment” checksums are verified by upper layers?
>

No, just the master checksum. Having the upper layers verify the checksums
means that it would be done in the host CPU which makes it a non-starter
because of performance.


> The problem with having
>
> a master checksum for the full parcel is that the entire parcel would be
> discarded if
>
> there is even just a single bit error in just one of the segments. In
> other words, many
>
> good segments could be thrown away instead of just the one bad segment.
> That
>
> would make the retransmission unit significantly larger than the loss unit
> which is
>
> something we are trying to avoid.
>

Right, but AFAIK a failure to verify an integrity check in pretty much all
protocol means the packet is corrupt so drop it. In this new model, parts
of a packet would be accepted and parts might be dropped-- I think that's
going to be a hard sell in itself. However, this problem only occurs if an
intermediate performs reassembly, so if only the end host can reassemble
then there's no issue. Segments with a good packet checksum are accepted,
those with bad ones are dropped and checksums are properly offloaded.

Tom



>
> Fred
>
>
>
> *From:* Tom Herbert [mailto:tom@herbertland.com]
> *Sent:* Tuesday, July 12, 2022 8:53 AM
> *To:* Templin (US), Fred L <Fred.L.Templin@boeing.com>
> *Cc:* Joel Halpern <jmh.direct@joelhalpern.com>; int-area@ietf.org
> *Subject:* Re: [Int-area] Call for WG adoption of
> draft-templin-intarea-parcels-10
>
>
>
>
>
> On Tue, Jul 12, 2022 at 8:23 AM Templin (US), Fred L <
> Fred.L.Templin@boeing.com> wrote:
>
> Joel, I can show you an orders-of-magnitude performance speed-up when I
> send
>
> large blocks of data using larger segment sizes that invoke fragmentation
> and
>
> reassembly. I can also show a significant speed-up when system calls pass
>
> multiple larger segments in a single system call instead of one at a time.
> This
>
> is on real systems with real data, and not in simulations.
>
>
>
> Fred,
>
>
>
> You are making an argument that larger segment sizes is more efficient--
> yes, we know that. The argument that you really should be making is that
> "IP parcels" is necessary and sufficient to get those benefits and why the
> on-wire-protocol change is justified for the benefits. For instance, when
> you say "real systems with real data" are these system running IP parcels
> then, or is this an extrapolation using existing techniques? If it's the
> latter case then this really isn't very helpful in justifying IP parcels.
>
>
>
> Tom
>
>
>
>
>
> About links with larger MTUs, I am specifically NOT saying that we need to
> wait
>
> until we have links with MTU>64K. What I am saying is that parcels would
> pave
>
> the way toward evolution of links with larger MTUs than what we have in the
>
> current practice allowing a path forward for future innovation. But,
> parcels are
>
> still good even for the smallish MTUs in widescale deployment today.
>
>
>
> Fred
>
>
>
> *From:* Joel Halpern [mailto:jmh.direct@joelhalpern.com]
> *Sent:* Tuesday, July 12, 2022 7:44 AM
> *To:* Templin (US), Fred L <Fred.L.Templin@boeing.com>
> *Cc:* int-area@ietf.org
> *Subject:* [EXTERNAL] Re: [Int-area] Re: Call for WG adoption of
> draft-templin-intarea-parcels-10
>
>
>
> EXT email: be mindful of links/attachments.
>
>
>
>
> Fred, I understood full well that you only envision a small number of
> reassembly devices.  After all, on any given path only one device will
> likely reassemble.  Still, that device will be spending a lot of resources
> in a very expensive part of the path (fast path forwarding) to provide a
> small benefit to some hosts.
>
> Fundamentally you are asking the archtiecture to spend those resources for
> use case that you have not explained.  "I have proof" i snot relevant.
> Without knowing the scenarios and the assumptions, it does not help us to
> judge.  It is worse than the case in the early days of the MANET working
> group where the competing proposal repeatedly said "my simulation shows ..."
>
> Fundamentally, it is not the network's job to reassemble packets for a
> host.  If you want NICs to do that, as Tom has said, that's fine.  It is a
> private matter between the host and the NIC.  But you are asking for
> functionality in the network.
>
> I note also that you are assuming that hosts have links that support
> actual MTUs larger than 64K.  I know of no link that has those properties
> in current use.  (I am vaguely familiar with HIPPI and FiberChannel.
> Neither appears to be relevant.)
>
> Yours,
>
> Joel
>
> On 7/12/2022 10:02 AM, Templin (US), Fred L wrote:
>
> Joel, you are misunderstanding what nodes would be involved in reassembly;
> this would
>
> not be at every single IP layer router in the path. It would only be at
> possibly 0, 1 or 2
>
> adaptation layer middleboxes in the path from source to destination. And,
> then most
>
> likely only at a near-end middlebox very near the destination that happens
> to know the
>
> destination would prefer to receive larger parcels.
>
>
>
> About segment size, I have proof that using segment sizes significantly
> larger than the
>
> path MTU can often produce dramatic performance increases even when
> fragmentation
>
> is intentionally invoked. I also have proof that packaging multiple
> segments in the same
>
> system call can drive performance even higher an without reducing the
> segment size.
>
> IP parcels takes it the logical next step of allowing multiple segments to
> travel together
>
> in the same packet, which may or may not be subject to fragmentation and
> reassembly.
>
> But, let’s not get so hung up on the middlebox question that we forget the
> benefits
>
> for end-to-end.
>
>
>
> Fred
>
>
>
> *From:* Joel Halpern [mailto:jmh@joelhalpern.com <jmh@joelhalpern.com>]
> *Sent:* Monday, July 11, 2022 4:02 PM
> *To:* Templin (US), Fred L <Fred.L.Templin@boeing.com>
> <Fred.L.Templin@boeing.com>
> *Cc:* int-area@ietf.org
> *Subject:* Re: [Int-area] Re: Call for WG adoption of
> draft-templin-intarea-parcels-10
>
>
>
> No, intermediate reassembly is not an optimization.
>
> First, it is a bad idea.  It is very painful for routers to perform
> reassembly.  They have to burn expensive resources managing such
> attttempted reassesmbly.  It has major cost even if the router decides to
> give up and forward the pieces.
>
> And second, unless one makes some unstated assumptions in the absence of
> such reassembly the sending host will be throttled to the receiving host
> rate.  So the benefit of the entire system is markedly reduced.
>
> Net: we should not adopt this draft.
>
> Yours,
>
> Joel
>
> On 7/11/2022 6:41 PM, Templin (US), Fred L wrote:
>
> Tom,
>
>
>
> > Why would someone put six segments in a parcel if they already have a
> 9K link MTU?
>
> > Why not just send one segment in 9K?
>
>
>
> This is the mindset that we need to overcome. We have had it drilled into
> our heads
>
> that MSS must be the same as the path MTU, but it does not need to be that
> way.
>
> If the MSS is smaller than the path MTU, but we can send multiple segments
> in a
>
> single parcel that more closely approaches the size of the path MTU then
>
> amortization savings are possible.
>
>
>
> >The algorithm isn't the problem, it's supporting new protocols and
> multiple
>
> >checksums in a packet in hardware.
>
>
>
> But Tom, how hard can this be? Instead of running the Internet checksum 1
> time
>
> over N octets of data simply run it M times over N/M octet chunks of the
> data in
>
> succession but still in a single pass. You spoke before of NICs adapting
> to support
>
> TCP jumbograms – if they can do that, why not a very straightforward
> application
>
> of Internet checksum? I haven’t looked at this in a long while, but isn’t
> this also
>
> similar to what UDP-lite did?
>
>
>
> > Either you're trivializing reassembly or maybe you're thinking of some
> new method that
>
> > somehow avoids all the pitfalls and problems we've had with reassembly
> over the years!
>
>
>
> Intermediate node parcel reassembly is really just an optimization to try
> to pass the
>
> largest possible parcels on to the next hop instead of passing many
> smaller ones. It is
>
> really just a concatenation of segments of sub-parcels belonging to the
> same original
>
> parcel. Reordering is unimportant – it is OK to concatenate sub-parcels
> 3,8,5,2 in that
>
> order and without even waiting for any other sub-parcels to show up. The
> application
>
> will simply perceive it as a case of network reordering and the upper
> layer protocol
>
> will do the correct thing with the sequence numbers. AFAICT, the only hard
> requirement
>
> is that the final sub-parcel must not be concatenated as an intermediate
> sub-parcel.
>
>
>
> This stuff will all work, and it will work for the betterment of the
> Internet.
>
>
>
> Fred
>
>
>
> *From:* Tom Herbert [mailto:tom@herbertland.com <tom@herbertland.com>]
> *Sent:* Monday, July 11, 2022 2:57 PM
> *To:* Templin (US), Fred L <Fred.L.Templin@boeing.com>
> <Fred.L.Templin@boeing.com>
> *Cc:* Richard Li <richard.li@futurewei.com> <richard.li@futurewei.com>;
> Juan Carlos Zuniga (juzuniga) <juzuniga=40cisco.com@dmarc.ietf.org>
> <juzuniga=40cisco.com@dmarc.ietf.org>; int-area@ietf.org
> *Subject:* Re: [EXTERNAL] Re: [Int-area] Call for WG adoption of
> draft-templin-intarea-parcels-10
>
>
>
> EXT email: be mindful of links/attachments.
>
>
>
>
>
>
>
>
>
> On Mon, Jul 11, 2022 at 2:20 PM Templin (US), Fred L <
> Fred.L.Templin@boeing.com> wrote:
>
> Tom, some rejoinders:
>
>
>
> >Yes, I agree if the packet is fragmented by the network then this is a
> nice feature.
>
> >However, today we already have this from a host perspective property by
> just
>
> >sending "small" packets.
>
>
>
> It can be readily shown that some applications get much greater
> performance by
>
> sending larger packets that trigger fragmentation/reassembly than by
> sending
>
> smaller packets that do not. Multiple order of magnitude performance
> increases
>
> are indeed possible.
>
>
>
> >I'm not sure the savings qualify as significant. 9K MTUs are becoming
> common in data centers
>
> >and the standard TCP/IPv6 header is 80 bytes so that's already less than
> 1% overhead.
>
>
>
> I think 9K is only a starting point, and IP parcels pave the way to much
> larger link MTUs,
>
> possibly even in excess of 64KB. And, doing the math, even for just a 9K
> link sending a
>
> single parcel that contains 6x 1440 octet segments would save 5 * 60 ==
> 300 octets in
>
>
>
> Why would someone put six segments in a parcel if they already have a 9K
> link MTU? Why not just send one segment in 9K?
>
>
>
> comparison with sending 6x  1500 octet packets with 60 octets of IP/TCP
> headers per
>
> packet. For links with larger MTUs, the savings for sending parcels with
> lots of segments
>
> (up to 64) becomes even greater.
>
>
>
> >As I already mentioned, this is addressed by the BiGTCP work (
> https://lwn.net/Articles/884104).
>
> >Sending or receiving multi-megabytes TCP segments in one system call is
> now feasible. Also, it's
>
> >inevitable that NIC vendors will apply this also to be able to offload
> TCP jumbo grams. Given this
>
> >is just software that doesn't require hardware change or on-the-wire
> protocols to change, it's
>
> >immediately deployable with just a softwar change which is a huge benefit
> to datacenter operators.
>
>
>
> As I have said, IP parcels has the same advantage within the host
> system-call (user-space
>
> to kernel-space) context. But, IP parcels goes a step further to provide
> efficient packaging
>
> over-the-wire, whereas the approach you are referring to opens the box
> inside the
>
> kernel and sends individual packets instead of aggregates.
>
>
>
> >All modern NIC HW can deal with offloading a single checksum per packet,
> it's going to be
>
> >a major effort for them to offload multiple checksum like IP parcels
> needs. Without checksum
>
> >offload, this would be a non-starter for a lot of deployments.
>
>
>
> Check the latest spec (now at -12 and likely to stay that way until
> IETF114. Any H/W checksum
>
> that can run over the first segment of a packet should be possible to make
> run over the N-1
>
> additional segments of the same packet (parcel) by applying the very
> familiar Internet
>
> checksum algorithm.
>
>
>
> The algorithm isn't the problem, it's supporting new protocols and
> multiple checksums in a packet in hardware.
>
>
>
>
>
> >I'm not convinced of that. For instance, I'm skeptical that intermediate
> devices trying to reassemble
>
> >packets that aren't addressed to themselves could ever be robust or
> efficient (i.e. complexity, non-work
>
> >conserving resource requirements, security issues with reassembly,
> multi-path that causes latency
>
> >increase, potential DoS vector, etc.). Can you comment on this?
>
>
>
> Perhaps what is confusing this matter is that the intermediate devices
> referred to
>
> here most certainly do not refer to all routers in the path. Instead, what
> is intended
>
> here is an OMNI intermediate device, of which there may be something on
> the order
>
> of 0, 1, or 2 of them on the path between the OMNI source and destination
> even
>
> though there may be many 10’s or even 100’s of ordinary IP routers on the
> path.
>
> And, again, this is not a strict reassembly case – instead, it is an
> opportunistic
>
> “combine if convenient; else forward” swift decision.
>
>
>
> Either you're trivializing reassembly or maybe you're thinking of some new
> method that somehow avoids all the pitfalls and problems we've had with
> reassembly over the years! Consider that many NIC vendors have tried, and
> largely failed, to get any sort of device reassembly widely deployed (e.g.
> IP reassembly, TCP segmentation reassembly, etc.). The reason they failed
> is because they can't give the host stack transparency and control over the
> reassembly process.
>
>
>
> In its nature reassembly can only be done with at least packets. That
> means a device performing reassembly has to receive one packet, hold it,
> and wait for the following packet to perform reassembly. That makes
> reassembly, unlike fragmentation, a non-work conserving process. Many
> issues and policies arise from this. For instance, what happens if a packet
> is held and the following packet is never seen? (usually implies a
> reassembly timer). What happens if a packet is received OOO and is already
> forwarded, but the preceding packet is then received, do we try to
> reassemble that one? (the solution here seems to be to maintain some sort
> of flow state)? What about overlapping fragments and the security issues
> around that?
>
>
>
> IMO, if the WG does pursue this, I believe a lot of the effort will be in
> specifying how reassembly in intermediate nodes works.
>
>
>
> Tom
>
>
>
>
>
> Thanks - Fred
>
>
>
> *From:* Tom Herbert [mailto:tom@herbertland.com]
> *Sent:* Monday, July 11, 2022 1:34 PM
> *To:* Templin (US), Fred L <Fred.L.Templin@boeing.com>
> *Cc:* Richard Li <richard.li@futurewei.com>; Juan Carlos Zuniga
> (juzuniga) <juzuniga=40cisco.com@dmarc.ietf.org>; int-area@ietf.org
> *Subject:* [EXTERNAL] Re: [Int-area] Call for WG adoption of
> draft-templin-intarea-parcels-10
>
>
>
> EXT email: be mindful of links/attachments.
>
>
>
>
>
>
>
>
>
> On Mon, Jul 11, 2022 at 12:22 PM Templin (US), Fred L <
> Fred.L.Templin@boeing.com> wrote:
>
> Richard and others, thank you for these comments and for the ensuing
> discussion that
>
> took place over the time I was away on vacation. Strange how the timing
> hit when I
>
> was away from the office and off the grid - I was on a camping trip in
> Canada not far
>
> from where Steve Deering lives although I did not visit him.
>
>
>
> In any event, I was able to push out a new draft version ahead of the
> deadline that
>
> may address some (but likely not all) of your concerns:
>
>
>
> https://datatracker.ietf.org/doc/draft-templin-intarea-parcels/
>
>
>
> The major change is that the draft now talks about interactions with upper
> layer
>
> protocols including TCP and UDP, whereas the previous draft versions were
> silent
>
> regarding upper layer protocol framing.
>
>
>
> To others who have commented, I beg to differ and maintain that IP parcels
> do
>
> represent a significant improvement over the current state of affairs and
> over
>
> just regular IP jumbograms. In particular:
>
>
>
> Hi Fred, some comments in line.
>
>
>
>
>
> 1) IP parcels make it so that the loss unit is a single segment instead of
> the entire
>
> packet/parcel, and loss of a segment often results in retransmission of
> just that
>
> segment instead of the entire packet/parcel.
>
>
>
> Yes, I agree if the packet is fragmented by the network then this is a
> nice feature. However, today we already have this from a host perspective
> property by just sending "small" packets.
>
>
>
>
>
> 2) IP parcels are more efficient than sending a single segment per IP
> packet, since
>
> the parcel includes a single IP header plus single full {TCP,UDP} header
> for possibly
>
> many segments. This can result in significant savings in terms of bits
> over the wire
>
> for omitting unnecessary header bytes.
>
>
>
> I'm not sure the savings qualify as significant. 9K MTUs are becoming
> common in data centers and the standard TCP/IPv6 header is 80 bytes so
> that's already less than 1% overhead.
>
>
>
> Consider the postal service analogy; when
>
> many items can be sent together in a single package/parcel there is a
> large savings
>
> in shippeing and handling costs than when each individual item is shipped
> separately.
>
>
>
> As I already mentioned, this is addressed by the BiGTCP work (
> https://lwn.net/Articles/884104). Sending or receiving multi-megabytes
> TCP segments in one system call is now feasible. Also, it's inevitable that
> NIC vendors will apply this also to be able to offload TCP jumbo grams.
> Given this is just software that doesn't require hardware change or
> on-the-wire protocols to change, it's immediately deployable with just a
> softwar change which is a huge benefit to datacenter operators.
>
>
>
> 3) IP parcels improve large packet integrity by including a separate
> checksum for
>
> each segment instead of a single checksum for the entire packet.
>
>
>
> All modern NIC HW can deal with offloading a single checksum per packet,
> it's going to be a major effort for them to offload multiple checksum like
> IP parcels needs. Without checksum offload, this would be a non-starter for
> a lot of deployments.
>
>
>
> This means that
>
> large parcels (up to a few MB) can be sent in one piece over links with
> sufficiently
>
> large MTU without requiring the link itself to provide strong integrity
> checks over
>
> the entire length of the parcel. This means that link MTUs significantly
> larger than
>
> 9KB are now safely possible.
>
>
>
> 4) IP parcels offer all of the efficiency advantages to upper layers as
> are offered
>
> by GSO/GRO, etc. but also provide benefits 1) through 3) above that are not
>
> offered by GSO/GRO.
>
>
>
> Most of this is doable in GSO/GRO.
>
>
>
>
>
> 5) Plus, the idea is just plain neat. Better packaging is good. More
> efficient
>
> handling is good. Reduced header overhead is good. SAFE larger MTUs are
>
> good. The idea itself is good.
>
>
>
> I'm not convinced of that. For instance, I'm skeptical that intermediate
> devices trying to reassemble packets that aren't addressed to themselves
> could ever be robust or efficient (i.e. complexity, non-work conserving
> resource requirements, security issues with reassembly, multi-path that
> causes latency increase, potential DoS vector, etc.). Can you comment on
> this?
>
>
>
> Tom
>
>
>
>
>
> Fred
>
>
>
> *From:* Int-area [mailto:int-area-bounces@ietf.org] *On Behalf Of *Richard
> Li
> *Sent:* Friday, July 01, 2022 3:11 PM
> *To:* Juan Carlos Zuniga (juzuniga) <juzuniga=40cisco.com@dmarc.ietf.org>
> *Cc:* int-area@ietf.org
> *Subject:* Re: [Int-area] Call for WG adoption of
> draft-templin-intarea-parcels-10
>
>
>
> Chairs and Authors,
>
>
>
> I always like every new idea and effort to improve the Internet
> performance, and thus I have read this draft with a great interest. The
> following are my observations/comments/questions. If they don’t make any
> sense to you, please accept my apology, and disregard them.
>
>
>
> 1.      The text “multiple upper layer protocol segments” is ambiguous.
> It seems that you really mean “multiple segments from ‘the same’ upper
> layer protocol”, doesn’t it? It seems that multiple segments from different
> upper layer protocols are not allowed in your parcel.
>
>
>
> 2.      Is the following a fair statement? All segments in the same
> packet come from the same application identified by the 5-tupe (source
> address, destination address, source port, destination port, protocol
> number).
>
>
>
> 3.      Segment size
>
> You require that their sizes be the same except for the last one. Is this
> required for easy implementation or what? Do you require it for any other
> reasons?
>
>
>
> 4.      TTL issue
>
> You described how parcels are forwarded over the Internetwork, and in
> particular you described what the ingress/egress middlebox does about
> parcels. I understand that the ingress middlebox may break the parcel into
> smaller ones, which may rejoin at the egress middlebox. My question is
> about TTL. As different smaller parcels may traverse along different paths,
> as a result their TTLs may be different when they reach the egress
> middlebox . How does the egress middlebox set up the TTL value? Please
> provide more descriptions.
>
>
>
> 5.      Reordering at the egress middlebox
>
> The parcels would arrive one after another, and therefore the egress
> middlebox would “wait” for a little bit to identify and pick up enough
> parcels/packets for their rejoining and repackaging. A description of the
> egress middlebox behavior would be useful and helpful, in particular I
> would like to know more about the waiting time if any, and how you deal
> with the reordering and loss.
>
>
>
> 6.      IPv4 option
>
> Does IETF still allow to change/add IPv4 option fields? I might be wrong,
> but aren’t they frozen? Also, do commercial routers still care about IPv4
> options?
>
>
>
> 7.      IPv6 option
>
> This draft has defined a hop-by-hop option, it will require every
> intermediate IPv6 router to inspect this option. There have been some
> discussions on the pros/cons about Hop-by-Hop IPv6 Option. Is there any
> feedback from WG 6man?
>
>
>
> 8.      Parcel Path Qualification
>
> This draft has described a method for parcel path qualification probe from
> end to end. It is nice to have it, but it is unreliable simply for the
> following reason: a probe parcel goes along one specific path, and your
> real application parcels may take different paths.
>
>
>
> 9.      Integrity
>
> First paragraph of Section 7. More explanation/elaboration should be
> useful. I might have missed it in previous paragraphs, but if I do, please
> provide a reference to it such as “as described in …”.
>
>
>
> 10.   Implementation Status
>
> In section 10. TSO’s performance gain and Parcel’s gain should be regarded
> as two different things. Since this draft is adding a hop-by-hop option,
> every intermediate router is required to process the hop-by-hop option,
> which will, theoretically speaking, lead to performance downgrade. Of
> course, the whole performance would depend on many other factors, such as
> the total numbers of routing table lookups and number of segments.
>
>
>
> 11.   General observation
>
> This proposal essentially tries to solve a problem caused by MTU. If MTU
> be very big, one would simply put the whole data in a single packet. Since
> MTU is limited, a packet has to be cut into many smaller pieces (segments).
> In the existing specification, when an intermediate router sees a packet
> with its size larger than MTU, the router would be expected to fragment it
> so that the fragments could be forwarded. Here let me call it
> “fragmentation as needed”. In reality, however, some (if not all)
> commercial routers don’t do “fragmentation as needed”, instead of
> fragmenting the packet they simply discard it in order to achieve the
> wire-speed. This draft defines a new way to address the MTU issue: when a
> router sees a packet with its size larger than MTU, the router is asked to
> fragment it in a prescribed way (fragment it into pre-packaged segments).
> If I may, let me call it “fragmentation as prescribed”. Both “fragmentation
> as needed” and “fragmentation as prescribed” would require the support from
> intermediate routers. As the same as fragmentation as needed, fragmentation
> as prescribed may downgrade the performance of intermediate routers. What
> is more, intermediate routers/boxes may perform “rejoining and
> repackaging”, which will adversely impact the performance of the
> intermediate routers/boxes.
>
>
>
>
>
> Best regards,
>
>
>
> Richard
>
>
>
>
>
>
>
> *From:* Int-area <int-area-bounces@ietf.org> *On Behalf Of *Juan Carlos
> Zuniga (juzuniga)
> *Sent:* Wednesday, June 22, 2022 12:25 PM
> *To:* int-area@ietf.org
> *Subject:* [Int-area] Call for WG adoption of
> draft-templin-intarea-parcels-10
>
>
>
> Dear IntArea WG,
>
>
>
> We are starting a 2-week call for adoption of the IP-Parcels draft:
>
> https://www.ietf.org/archive/id/draft-templin-intarea-parcels-10.html
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-templin-intarea-parcels-10.html&data=05%7C01%7Crichard.li%40futurewei.com%7C715b5db213134932c70208da5484f702%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637915227299598680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=w4G5ypaSRv%2FR31%2F%2B857XT2xUqHdEXv90ubD5GGjqBEQ%3D&reserved=0>
>
>
>
> The document has been discussed for some time and it has received multiple
> comments.
>
>
>
> If you have an opinion on whether this document should be adopted by the
> IntArea WG please indicate it on the list by the end of Wednesday July 6th
> .
>
>
>
> Thanks,
>
>
>
> Juan-Carlos & Wassim
>
> (IntArea WG chairs)
>
>
>
> _______________________________________________
> Int-area mailing list
> Int-area@ietf.org
> https://www.ietf.org/mailman/listinfo/int-area
>
>
>
> _______________________________________________
>
> Int-area mailing list
>
> Int-area@ietf.org
>
> https://www.ietf.org/mailman/listinfo/int-area
>
> _______________________________________________
> Int-area mailing list
> Int-area@ietf.org
> https://www.ietf.org/mailman/listinfo/int-area
>
>