Re: [Int-area] [EXTERNAL] Re: Call for WG adoption of draft-templin-intarea-parcels-10

"Templin (US), Fred L" <Fred.L.Templin@boeing.com> Mon, 11 July 2022 21:20 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1A558C182D6C for <int-area@ietfa.amsl.com>; Mon, 11 Jul 2022 14:20:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.004
X-Spam-Level:
X-Spam-Status: No, score=-2.004 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=boeing.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZmVPHSBm4qcK for <int-area@ietfa.amsl.com>; Mon, 11 Jul 2022 14:20:14 -0700 (PDT)
Received: from clt-mbsout-02.mbs.boeing.net (clt-mbsout-02.mbs.boeing.net [130.76.144.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BE610C184B33 for <int-area@ietf.org>; Mon, 11 Jul 2022 14:20:13 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by clt-mbsout-02.mbs.boeing.net (8.15.2/8.15.2/DOWNSTREAM_MBSOUT) with SMTP id 26BLKAnd008210; Mon, 11 Jul 2022 17:20:11 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=boeing.com; s=boeing-s1912; t=1657574411; bh=te3KdMzx1AcsHSpbu3o7E/rKEzf+jw/XPJEWhbMELIQ=; h=From:To:CC:Subject:Date:References:In-Reply-To:From; b=LJNXSERFxN1g/D4qaDV+Bi9kB92z8B2Kb46ZqBmL/UBwbprclEoM9gCw7ty3hRDLU YAVGnBGQWvKpgqJAtB6nGc5JvvH4/UebXEEaE9Ox+IJcGlMENrD9oSfkKVkbHr0qhb Mu7NEwSpKew/qT3Ynfz0lSecBT4MXblJT/r6rdKV9hXaerXfTXwS2ZzW0nDPP0hLqw exd1KxITC1DRuQbAe1/RAqpURTDWWfxh9PK32ndIO58SZrSP+3qLcuSqJey6NrFIa7 +BxUAZGctpK9cBcYjbEhw/YZyiWS9Oe3oCoTVSFx5IqRP2T/IIHx+WctS5KZVeUGMz Fcm5BaWBaiqXg==
Received: from XCH16-07-12.nos.boeing.com (xch16-07-12.nos.boeing.com [144.115.66.114]) by clt-mbsout-02.mbs.boeing.net (8.15.2/8.15.2/8.15.2/UPSTREAM_MBSOUT) with ESMTPS id 26BLK1rG007768 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 11 Jul 2022 17:20:01 -0400
Received: from XCH16-07-10.nos.boeing.com (144.115.66.112) by XCH16-07-12.nos.boeing.com (144.115.66.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.24; Mon, 11 Jul 2022 14:19:59 -0700
Received: from XCH16-07-10.nos.boeing.com ([fe80::1522:f068:5766:53b5]) by XCH16-07-10.nos.boeing.com ([fe80::1522:f068:5766:53b5%2]) with mapi id 15.01.2375.024; Mon, 11 Jul 2022 14:19:59 -0700
From: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>
To: Tom Herbert <tom@herbertland.com>
CC: Richard Li <richard.li@futurewei.com>, "Juan Carlos Zuniga (juzuniga)" <juzuniga=40cisco.com@dmarc.ietf.org>, "int-area@ietf.org" <int-area@ietf.org>
Thread-Topic: [EXTERNAL] Re: [Int-area] Call for WG adoption of draft-templin-intarea-parcels-10
Thread-Index: AQHYhmyuUVV50YYT0UulhDx8TNe16K1qHUHAgA+FE6CAAJEwAP//kS2A
Date: Mon, 11 Jul 2022 21:19:59 +0000
Message-ID: <5dcb3714b5da4d9bbfc4fe0e7545bd87@boeing.com>
References: <SJ0PR11MB57692D589B130307F4C9C823D1B29@SJ0PR11MB5769.namprd11.prod.outlook.com> <BYAPR13MB2279C0291E3AC5998958824D87BD9@BYAPR13MB2279.namprd13.prod.outlook.com> <f90f462f9df94e2a82a1b8afe598a204@boeing.com> <CALx6S35Ute7f3D_zdbxauBiR+KG7J5gqSwXhH+MQcfQGuHGHzQ@mail.gmail.com>
In-Reply-To: <CALx6S35Ute7f3D_zdbxauBiR+KG7J5gqSwXhH+MQcfQGuHGHzQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [137.137.12.6]
x-tm-snts-smtp: BD5AD1613646B44026067E8CA2074E863D85AA8886DBEB2657FF0625BF45F9BA2000:8
Content-Type: multipart/alternative; boundary="_000_5dcb3714b5da4d9bbfc4fe0e7545bd87boeingcom_"
MIME-Version: 1.0
X-TM-AS-GCONF: 00
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/mbRu1PkHwZx2o1uSXB_u-m4uG38>
Subject: Re: [Int-area] [EXTERNAL] Re: Call for WG adoption of draft-templin-intarea-parcels-10
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Internet Area WG Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Jul 2022 21:20:18 -0000

Tom, some rejoinders:

>Yes, I agree if the packet is fragmented by the network then this is a nice feature.
>However, today we already have this from a host perspective property by just
>sending "small" packets.

It can be readily shown that some applications get much greater performance by
sending larger packets that trigger fragmentation/reassembly than by sending
smaller packets that do not. Multiple order of magnitude performance increases
are indeed possible.

>I'm not sure the savings qualify as significant. 9K MTUs are becoming common in data centers
>and the standard TCP/IPv6 header is 80 bytes so that's already less than 1% overhead.

I think 9K is only a starting point, and IP parcels pave the way to much larger link MTUs,
possibly even in excess of 64KB. And, doing the math, even for just a 9K link sending a
single parcel that contains 6x 1440 octet segments would save 5 * 60 == 300 octets in
comparison with sending 6x  1500 octet packets with 60 octets of IP/TCP headers per
packet. For links with larger MTUs, the savings for sending parcels with lots of segments
(up to 64) becomes even greater.

>As I already mentioned, this is addressed by the BiGTCP work (https://lwn.net/Articles/884104).
>Sending or receiving multi-megabytes TCP segments in one system call is now feasible. Also, it's
>inevitable that NIC vendors will apply this also to be able to offload TCP jumbo grams. Given this
>is just software that doesn't require hardware change or on-the-wire protocols to change, it's
>immediately deployable with just a softwar change which is a huge benefit to datacenter operators.

As I have said, IP parcels has the same advantage within the host system-call (user-space
to kernel-space) context. But, IP parcels goes a step further to provide efficient packaging
over-the-wire, whereas the approach you are referring to opens the box inside the
kernel and sends individual packets instead of aggregates.

>All modern NIC HW can deal with offloading a single checksum per packet, it's going to be
>a major effort for them to offload multiple checksum like IP parcels needs. Without checksum
>offload, this would be a non-starter for a lot of deployments.

Check the latest spec (now at -12 and likely to stay that way until IETF114. Any H/W checksum
that can run over the first segment of a packet should be possible to make run over the N-1
additional segments of the same packet (parcel) by applying the very familiar Internet
checksum algorithm.

>I'm not convinced of that. For instance, I'm skeptical that intermediate devices trying to reassemble
>packets that aren't addressed to themselves could ever be robust or efficient (i.e. complexity, non-work
>conserving resource requirements, security issues with reassembly, multi-path that causes latency
>increase, potential DoS vector, etc.). Can you comment on this?

Perhaps what is confusing this matter is that the intermediate devices referred to
here most certainly do not refer to all routers in the path. Instead, what is intended
here is an OMNI intermediate device, of which there may be something on the order
of 0, 1, or 2 of them on the path between the OMNI source and destination even
though there may be many 10’s or even 100’s of ordinary IP routers on the path.
And, again, this is not a strict reassembly case – instead, it is an opportunistic
“combine if convenient; else forward” swift decision.

Thanks - Fred

From: Tom Herbert [mailto:tom@herbertland.com]
Sent: Monday, July 11, 2022 1:34 PM
To: Templin (US), Fred L <Fred.L.Templin@boeing.com>
Cc: Richard Li <richard.li@futurewei.com>; Juan Carlos Zuniga (juzuniga) <juzuniga=40cisco.com@dmarc.ietf.org>; int-area@ietf.org
Subject: [EXTERNAL] Re: [Int-area] Call for WG adoption of draft-templin-intarea-parcels-10


EXT email: be mindful of links/attachments.






On Mon, Jul 11, 2022 at 12:22 PM Templin (US), Fred L <Fred.L.Templin@boeing.com<mailto:Fred.L.Templin@boeing.com>> wrote:
Richard and others, thank you for these comments and for the ensuing discussion that
took place over the time I was away on vacation. Strange how the timing hit when I
was away from the office and off the grid - I was on a camping trip in Canada not far
from where Steve Deering lives although I did not visit him.

In any event, I was able to push out a new draft version ahead of the deadline that
may address some (but likely not all) of your concerns:

https://datatracker.ietf.org/doc/draft-templin-intarea-parcels/

The major change is that the draft now talks about interactions with upper layer
protocols including TCP and UDP, whereas the previous draft versions were silent
regarding upper layer protocol framing.

To others who have commented, I beg to differ and maintain that IP parcels do
represent a significant improvement over the current state of affairs and over
just regular IP jumbograms. In particular:

Hi Fred, some comments in line.


1) IP parcels make it so that the loss unit is a single segment instead of the entire
packet/parcel, and loss of a segment often results in retransmission of just that
segment instead of the entire packet/parcel.

Yes, I agree if the packet is fragmented by the network then this is a nice feature. However, today we already have this from a host perspective property by just sending "small" packets.


2) IP parcels are more efficient than sending a single segment per IP packet, since
the parcel includes a single IP header plus single full {TCP,UDP} header for possibly
many segments. This can result in significant savings in terms of bits over the wire
for omitting unnecessary header bytes.

I'm not sure the savings qualify as significant. 9K MTUs are becoming common in data centers and the standard TCP/IPv6 header is 80 bytes so that's already less than 1% overhead.

Consider the postal service analogy; when
many items can be sent together in a single package/parcel there is a large savings
in shippeing and handling costs than when each individual item is shipped separately.

As I already mentioned, this is addressed by the BiGTCP work (https://lwn.net/Articles/884104). Sending or receiving multi-megabytes TCP segments in one system call is now feasible. Also, it's inevitable that NIC vendors will apply this also to be able to offload TCP jumbo grams. Given this is just software that doesn't require hardware change or on-the-wire protocols to change, it's immediately deployable with just a softwar change which is a huge benefit to datacenter operators.

3) IP parcels improve large packet integrity by including a separate checksum for
each segment instead of a single checksum for the entire packet.

All modern NIC HW can deal with offloading a single checksum per packet, it's going to be a major effort for them to offload multiple checksum like IP parcels needs. Without checksum offload, this would be a non-starter for a lot of deployments.

This means that
large parcels (up to a few MB) can be sent in one piece over links with sufficiently
large MTU without requiring the link itself to provide strong integrity checks over
the entire length of the parcel. This means that link MTUs significantly larger than
9KB are now safely possible.

4) IP parcels offer all of the efficiency advantages to upper layers as are offered
by GSO/GRO, etc. but also provide benefits 1) through 3) above that are not
offered by GSO/GRO.

Most of this is doable in GSO/GRO.


5) Plus, the idea is just plain neat. Better packaging is good. More efficient
handling is good. Reduced header overhead is good. SAFE larger MTUs are
good. The idea itself is good.

I'm not convinced of that. For instance, I'm skeptical that intermediate devices trying to reassemble packets that aren't addressed to themselves could ever be robust or efficient (i.e. complexity, non-work conserving resource requirements, security issues with reassembly, multi-path that causes latency increase, potential DoS vector, etc.). Can you comment on this?

Tom


Fred

From: Int-area [mailto:int-area-bounces@ietf.org<mailto:int-area-bounces@ietf.org>] On Behalf Of Richard Li
Sent: Friday, July 01, 2022 3:11 PM
To: Juan Carlos Zuniga (juzuniga) <juzuniga=40cisco.com@dmarc.ietf.org<mailto:40cisco.com@dmarc.ietf.org>>
Cc: int-area@ietf.org<mailto:int-area@ietf.org>
Subject: Re: [Int-area] Call for WG adoption of draft-templin-intarea-parcels-10

Chairs and Authors,

I always like every new idea and effort to improve the Internet performance, and thus I have read this draft with a great interest. The following are my observations/comments/questions. If they don’t make any sense to you, please accept my apology, and disregard them.

1.      The text “multiple upper layer protocol segments” is ambiguous. It seems that you really mean “multiple segments from ‘the same’ upper layer protocol”, doesn’t it? It seems that multiple segments from different upper layer protocols are not allowed in your parcel.

2.      Is the following a fair statement? All segments in the same packet come from the same application identified by the 5-tupe (source address, destination address, source port, destination port, protocol number).

3.      Segment size
You require that their sizes be the same except for the last one. Is this required for easy implementation or what? Do you require it for any other reasons?

4.      TTL issue
You described how parcels are forwarded over the Internetwork, and in particular you described what the ingress/egress middlebox does about parcels. I understand that the ingress middlebox may break the parcel into smaller ones, which may rejoin at the egress middlebox. My question is about TTL. As different smaller parcels may traverse along different paths, as a result their TTLs may be different when they reach the egress middlebox . How does the egress middlebox set up the TTL value? Please provide more descriptions.

5.      Reordering at the egress middlebox
The parcels would arrive one after another, and therefore the egress middlebox would “wait” for a little bit to identify and pick up enough parcels/packets for their rejoining and repackaging. A description of the egress middlebox behavior would be useful and helpful, in particular I would like to know more about the waiting time if any, and how you deal with the reordering and loss.

6.      IPv4 option
Does IETF still allow to change/add IPv4 option fields? I might be wrong, but aren’t they frozen? Also, do commercial routers still care about IPv4 options?

7.      IPv6 option
This draft has defined a hop-by-hop option, it will require every intermediate IPv6 router to inspect this option. There have been some discussions on the pros/cons about Hop-by-Hop IPv6 Option. Is there any feedback from WG 6man?

8.      Parcel Path Qualification
This draft has described a method for parcel path qualification probe from end to end. It is nice to have it, but it is unreliable simply for the following reason: a probe parcel goes along one specific path, and your real application parcels may take different paths.

9.      Integrity
First paragraph of Section 7. More explanation/elaboration should be useful. I might have missed it in previous paragraphs, but if I do, please provide a reference to it such as “as described in …”.

10.   Implementation Status
In section 10. TSO’s performance gain and Parcel’s gain should be regarded as two different things. Since this draft is adding a hop-by-hop option, every intermediate router is required to process the hop-by-hop option, which will, theoretically speaking, lead to performance downgrade. Of course, the whole performance would depend on many other factors, such as the total numbers of routing table lookups and number of segments.

11.   General observation
This proposal essentially tries to solve a problem caused by MTU. If MTU be very big, one would simply put the whole data in a single packet. Since MTU is limited, a packet has to be cut into many smaller pieces (segments). In the existing specification, when an intermediate router sees a packet with its size larger than MTU, the router would be expected to fragment it so that the fragments could be forwarded. Here let me call it “fragmentation as needed”. In reality, however, some (if not all) commercial routers don’t do “fragmentation as needed”, instead of fragmenting the packet they simply discard it in order to achieve the wire-speed. This draft defines a new way to address the MTU issue: when a router sees a packet with its size larger than MTU, the router is asked to fragment it in a prescribed way (fragment it into pre-packaged segments). If I may, let me call it “fragmentation as prescribed”. Both “fragmentation as needed” and “fragmentation as prescribed” would require the support from intermediate routers. As the same as fragmentation as needed, fragmentation as prescribed may downgrade the performance of intermediate routers. What is more, intermediate routers/boxes may perform “rejoining and repackaging”, which will adversely impact the performance of the intermediate routers/boxes.


Best regards,

Richard



From: Int-area <int-area-bounces@ietf.org<mailto:int-area-bounces@ietf.org>> On Behalf Of Juan Carlos Zuniga (juzuniga)
Sent: Wednesday, June 22, 2022 12:25 PM
To: int-area@ietf.org<mailto:int-area@ietf.org>
Subject: [Int-area] Call for WG adoption of draft-templin-intarea-parcels-10

Dear IntArea WG,

We are starting a 2-week call for adoption of the IP-Parcels draft:
https://www.ietf.org/archive/id/draft-templin-intarea-parcels-10.html<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-templin-intarea-parcels-10.html&data=05%7C01%7Crichard.li%40futurewei.com%7C715b5db213134932c70208da5484f702%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637915227299598680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=w4G5ypaSRv%2FR31%2F%2B857XT2xUqHdEXv90ubD5GGjqBEQ%3D&reserved=0>

The document has been discussed for some time and it has received multiple comments.

If you have an opinion on whether this document should be adopted by the IntArea WG please indicate it on the list by the end of Wednesday July 6th.

Thanks,

Juan-Carlos & Wassim
(IntArea WG chairs)

_______________________________________________
Int-area mailing list
Int-area@ietf.org<mailto:Int-area@ietf.org>
https://www.ietf.org/mailman/listinfo/int-area