Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12

Joseph Touch <touch@strayalpha.com> Tue, 15 June 2021 15:12 UTC

Return-Path: <touch@strayalpha.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B0AA63A3359 for <tsvwg@ietfa.amsl.com>; Tue, 15 Jun 2021 08:12:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.317
X-Spam-Level:
X-Spam-Status: No, score=-1.317 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=strayalpha.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0vgqKrLLB9tA for <tsvwg@ietfa.amsl.com>; Tue, 15 Jun 2021 08:12:05 -0700 (PDT)
Received: from server217-4.web-hosting.com (server217-4.web-hosting.com [198.54.116.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 340A53A336D for <tsvwg@ietf.org>; Tue, 15 Jun 2021 08:12:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=strayalpha.com; s=default; h=To:References:Message-Id:Cc:Date:In-Reply-To: From:Subject:Mime-Version:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=JA+BwBLp4V1LNfDITgQ4WEy6hx/ecnv/ufNV2wgk5O4=; b=kSc1RHEKwjelI5hTliD2vq3tFs z2nDY+tWVTbohC533IA0ujqSNNrzWr/3Ju3a7T9rtk9hV1nSYwrQAv1APRO8u3Zj7d70AOYdJec4F pViVknWl/1dPh0W6127LiuyTpYIypwOd4slarnWZH+01ZDECak4y9P5AMnneeHA22nck3AwuvHSQQ XfEDLWkESDlggeEkQFfSERQb/vNsackH8y2U2P0W7zk9V1Bbq6AcLwtPPLScvlaYxN0h0jWYALmBZ TVP2ZJzoxsOcF3TRMNvg1le1WRLAvV41WP2RSX6mAHbeR72wdKB70ZNBDy8pIVomTQiDia6LSDyer OjJZvstQ==;
Received: from cpe-172-250-225-198.socal.res.rr.com ([172.250.225.198]:59784 helo=smtpclient.apple) by server217.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <touch@strayalpha.com>) id 1ltAjN-000qyk-Ss; Tue, 15 Jun 2021 11:12:02 -0400
Content-Type: multipart/alternative; boundary="Apple-Mail=_12ECB95E-5FA9-4AB7-B103-9F66EAA900EF"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\))
From: Joseph Touch <touch@strayalpha.com>
In-Reply-To: <CACL_3VGOVTjzOBBCS4b+4X_cTFX6T=gYO4_htvr2idzQGUP+oQ@mail.gmail.com>
Date: Tue, 15 Jun 2021 08:11:56 -0700
Cc: TSVWG <tsvwg@ietf.org>
Message-Id: <C67EE01E-A41F-4BF5-BE1E-33E9F01D0B72@strayalpha.com>
References: <CACL_3VGb_9P5SfPGRJtf1ZBvEhgywc2ZEGr-qbgNOMXV20rFeA@mail.gmail.com> <CACL_3VHyoRr5ju8203DiLTUo-658DCj7ud+1dQE2o0hUPVhF0A@mail.gmail.com> <7D766992-AEEB-434F-BB1D-3817EE07DE61@strayalpha.com> <1BBDBD80-3A53-4700-A79F-9A3AE4876F2B@strayalpha.com> <CACL_3VEXCT-sSNhtncVK26DPQefDLJhqEijgDke4Q7DmhRrpTQ@mail.gmail.com> <67E79ED1-14DE-4127-83AF-D17E8C72F362@strayalpha.com> <CACL_3VGOVTjzOBBCS4b+4X_cTFX6T=gYO4_htvr2idzQGUP+oQ@mail.gmail.com>
To: "C. M. Heard" <heard@pobox.com>
X-Mailer: Apple Mail (2.3654.100.0.2.22)
X-OutGoing-Spam-Status: No, score=-0.5
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server217.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - strayalpha.com
X-Get-Message-Sender-Via: server217.web-hosting.com: authenticated_id: touch@strayalpha.com
X-Authenticated-Sender: server217.web-hosting.com: touch@strayalpha.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/ZYTc0X5GqcQiq3B3CEUM1B2lbrk>
Subject: Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jun 2021 15:12:11 -0000

Mike,

Zero-copy networking has been around since roughly 1991 (James Sterbenz PhD dissertation on “Axon” was arguably one of the first, the IBM 360 had zero copy for memory storage transfers back in the 1960s), and it has been used in networking stacks since that time. Yes, it’s more prevalent in datacenter and HPC (high performance computing) environments, where RDMA is one variant, but it can be supported in at least some network cards that offload packet processing.

It requires few per-packet decisions to confirm support, which is why some simple decisions we can make - the don’t constrain us - help.Those below include making OCS come first and enabling FRAG to come immediately after, so the entire TLV chain between the two doesn’t need to be walked. I agree we should not design in ways that complicate processing for non-zerocopy endpoints, but if the work is either small (as it was previously) or effectively none (as it would be with the structure below), there’s no utility in ignoring it.

It might be useful to note that zero copy support exists in most modern OSes, including Linux, macOS, and Windows. 

Joe

> On Jun 14, 2021, at 10:10 PM, C. M. Heard <heard@pobox.com> wrote:
> 
> On Sun, Jun 13, 2021 at 9:31 PM Joseph Touch wrote:
>> On Jun 13, 2021, at 7:20 PM, C. M. Heard wrote:
>> I for one would appreciate further discussion of these last points. I admit that I have failed to grasp Joe's message on the RDMA thread, and I would appreciate some time to think about it .
> 
> Sure - here’s how it all works. Note that this is relevant mostly for long transfers with persistent UDP fragmentation; if that is assumed to be ‘adjusted’ at the app layer (as QUIC does), then we don’t need zero-copy support...
> 
> - right now, UDP data can be zero-copied when received into user space, starting with the user data
> - if we add options, UDP data can still be zero-copied because it hasn’t moved (it still begins the payload
> - however, fragments are different because (esp given the merging of frag and lite) they don’t start at the beginning of data
> 	- they always start after OCS (which I think we should make fit the uniform KIND/LEN/OCS format of 4 bytes)
> 	- if the FRAG comes next, then we can move the frag content around a little and still support zero-copy
> 
> 		notably, we move the first 10 bytes of the fragment to the end
> 			4 for OCS
> 			6 for FRAG (assuming FRAG includes KIND/OPTLEN/FRAGOFFSET/ID/FRAGLEN)
> 		that way we can zero-copy the frag packet into place, then just copy those last 8 bytes over OCS and the FRAG header
> 
> This method assumes that we try to keep FRAG early in the packet - preferably right after OCS. The later it comes, the more additional bytes we need to move to “fix” the copy (beyond the 8 bytes noted above).
> 
> —
> 
> This method is the only reason we would want to allow options after non-terminal fragments - basically to keep the fragment toward the front of the packet, using the rule that post-noninitial frag options still operate on the fragment, rather than waiting for reassembly. The exception is the terminal fragment, where post-terminal fragment options operate on the reassembled packet.
> 
> I'm not understanding this AT ALL, and I apologize if there is well-known stuff of which I am embarrassingly ignorant. That being said:
> 
> EVERY description of a zero-copy receive describes something involving MTUs and highly constrained header length that allow the user data in a TCP segment or UDP packet to be mapped to one or more kernel pages. Here is one example:
> 
> PATH to TCP 4K MTU and RX zerocopy <https://netdevconf.info/0x14/pub/slides/62/Implementing%20TCP%20RX%20zero%20copy.pdf>
> 
> In every case that I have found, the solutions apply only to a highly constrained environment, such as a data center, and not over the Internet writ large. Some even involve requiring the application to process the transport headers, which is surely not an outcome that we wish in general.
> 
> If I am wrong -- and it would most assuredly not be the first time -- I am eager to be disabused, preferably with a complete and open description of a zero-copy technology without such shortcomings.
> 
> But if my conclusions are substantially correct, I don't think that TSVWG should expend effort on zero copy for UDP fragment reassembly. Transport options for UDP need to apply across the general Internet.
> 
> NOTE: the unfavorable conclusions that I make about zero-copy do NOT apply to checksum offload; the advantages and applicability of that technology (especially with OCS now defined to be an equivalent to the CCO proposal) are readily apparent, even though they are not realizable in every implementation.
> 
> Thanks
> 
> Mike Heard
> 
> 
> 
> 
>  
>