Re: [tsvwg] RDMA Support by UDP FRAG Option
"C. M. Heard" <heard@pobox.com> Fri, 18 June 2021 20:18 UTC
Return-Path: <heard@pobox.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1F703A0A88 for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 13:18:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.798
X-Spam-Level:
X-Spam-Status: No, score=-2.798 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=pobox.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id roEyyz3HGagH for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 13:18:18 -0700 (PDT)
Received: from pb-smtp20.pobox.com (pb-smtp20.pobox.com [173.228.157.52]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00F553A0A87 for <tsvwg@ietf.org>; Fri, 18 Jun 2021 13:18:16 -0700 (PDT)
Received: from pb-smtp20.pobox.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 06C9E145D9E for <tsvwg@ietf.org>; Fri, 18 Jun 2021 16:18:16 -0400 (EDT) (envelope-from heard@pobox.com)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h= mime-version:references:in-reply-to:from:date:message-id:subject :to:cc:content-type; s=sasl; bh=wtxbB5IWXl8sgGntLWPuCbmtt3RenAGL 1BSgJwzrwiQ=; b=SqXYAFEJU5a3sNsIaysgXpq2nWm72xQWTIEVi/JZHxfCTNcN 5legmzJbrTGm7KaSfvooqlX/iPQE7SIIyFJYga94nqjHBUf0JdzX77D47ORrzvUs M73ZkfIespt5awiKJkXPPeJW8iJ9BAV6ZvD/JjHSRL3XjIly39OHUGzpcbc=
Received: from pb-smtp20.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id F31AA145D9D for <tsvwg@ietf.org>; Fri, 18 Jun 2021 16:18:15 -0400 (EDT) (envelope-from heard@pobox.com)
Received: from mail-io1-f47.google.com (unknown [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp20.pobox.com (Postfix) with ESMTPSA id 89C12145D9C for <tsvwg@ietf.org>; Fri, 18 Jun 2021 16:18:13 -0400 (EDT) (envelope-from heard@pobox.com)
Received: by mail-io1-f47.google.com with SMTP id q3so8348273iop.11 for <tsvwg@ietf.org>; Fri, 18 Jun 2021 13:18:13 -0700 (PDT)
X-Gm-Message-State: AOAM533/c7v8UoEnRPXwfaZpvqoyhHi/9/HS1kvEqMeaXJJhr7nxTKU8 w4tp5h24MobiPphFphHsqKe5u7vhdwDVtyzA2HM=
X-Google-Smtp-Source: ABdhPJxLvTV03SdzW1FnncifDUE857s+cdCJKifvXOy8Yfl0cIcAL3agVp8fUf9GN2B8QlvDIZQPKD7jLtFRzuSwmkA=
X-Received: by 2002:a02:90cb:: with SMTP id c11mr4377716jag.53.1624047492248; Fri, 18 Jun 2021 13:18:12 -0700 (PDT)
MIME-Version: 1.0
References: <CACL_3VEyLdQZ-3hvzXxyA8ehtWs2hXESZ2OqyAx+BeSg85+-cA@mail.gmail.com> <CACL_3VFE4TjKvmkfZjvNpWo6vVfKjz5w85=Q+yqnYZKcwbYLmQ@mail.gmail.com> <63FFC34B-2179-47F1-B325-21CAC3D1543A@strayalpha.com> <CACL_3VHTfxWaBj7TFEmBXBqovrrAj7XuFEZFUag_iBHr3Hx09g@mail.gmail.com> <0EBFC9B0-591A-4860-B327-6E617B83F4D1@strayalpha.com> <CALx6S34pT81TbfQDk2vKF8wBrXL312As79K=rEzUQ3Lmg7UvpA@mail.gmail.com> <7C51D926-9DBB-41F5-93B2-10F716F672B1@strayalpha.com> <CALx6S37uN8TsXQZ3cv5jmxwxSyBRjK=-GQ_MsWxPWSs21XoGHw@mail.gmail.com> <CACL_3VEx7+VnLz7OLdXyhZU41e+-oBz3dc8JdMV_7pLMfic6=w@mail.gmail.com> <fcc8762f-c042-7999-d2e4-f28384950a19@erg.abdn.ac.uk> <CALx6S36sWGcZmFpAhF4DfOMyf6Z0w5F9bemNfeM1yWV-r0M+BA@mail.gmail.com> <8af3abf9-943f-13c1-e239-5efca27cf68c@erg.abdn.ac.uk>
In-Reply-To: <8af3abf9-943f-13c1-e239-5efca27cf68c@erg.abdn.ac.uk>
From: "C. M. Heard" <heard@pobox.com>
Date: Fri, 18 Jun 2021 13:18:00 -0700
X-Gmail-Original-Message-ID: <CACL_3VHdyLAmzMbWsTVfJD+4tTzsMvcTzKS1B1CAdZ3k5U957g@mail.gmail.com>
Message-ID: <CACL_3VHdyLAmzMbWsTVfJD+4tTzsMvcTzKS1B1CAdZ3k5U957g@mail.gmail.com>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Cc: Tom Herbert <tom@herbertland.com>, Joseph Touch <touch@strayalpha.com>, TSVWG <tsvwg@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000dfba2705c51005da"
X-Pobox-Relay-ID: 4F09695E-D072-11EB-B1BE-D5C30F5B5667-06080547!pb-smtp20.pobox.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/RZULHOKRgrSYIvsI-5Hg7sNtSS8>
Subject: Re: [tsvwg] RDMA Support by UDP FRAG Option
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jun 2021 20:18:23 -0000
On Fri, Jun 18, 2021 at 2:36 AM Gorry Fairhurst wrote: > On 17/06/2021 18:30, Tom Herbert wrote: > > On Thu, Jun 17, 2021 at 7:41 AM Gorry Fairhurst wrote: > >> When [the] dust has settled, I expect we should see an updated > >> fragmentation text. I suspect we should consider the final format for > >> suitability for offload (perhaps Tom Herbert would look?) and for ease > >> of processing in message nbuffere (Tom Jones offered to look). > > > > Please look at draft-herbert-udp-space-hdr-01 for my proposal as to > > how the UDP surplus space should be formatted. If there is interest, I > > could update draft to include considerations for UDP options > > fragmentation and reassembly offload as well as header/data split > > which is needed for zero-copy receive (i.e. packet headers are DMA'ed > > into one set of buffers to be processed by the stack, payload is > > DMA'ed to another set of buffers to be processed by the application). > > > Thanks, I saw that a while ago, thanks for reminding people. > Indeed. I wanted to undertake a detailed analysis of the effects of the primary competing formats on checksum offload, and I was looking around for the treatment in Appendix A of draft-herbert-udp-space-hdr-01, especially the discussion of how common offload hardware works, when the reminder came in. Since draft-herbert-udp-space-hdr-01 does not take into account the OCS pseudo-header (which is one 16-bit word consisting of the surplus area length), I'm going to do the analysis from scratch in order to account for this point. My main conclusions are that for the purposes of checksum offload, IT DOESN'T MATTER whether OCS is at a fixed position in an option header or in a TLV. The differences are very minor. What follows is a fairly detailed comparison of how the offload logic differs between the format proposed in draft-ietf-tsvwg-udp-options-12 and that in draft-herbert-udp-space-hdr-01. Those who do not care to pick through the entire analysis should skip down to the CONCLUSIONS section below. Tom's draft describes typical hardware support for transmit checksums as follows: In generic checksum offload, for each packet the host indicates to the device the starting offset where the checksum calculation begins and the offset of the field to write the resultant checksum. The extent of the checksum coverage is assumed to be the end of the packet. In particular, this means that even if the UDP checksum is being offloaded, the UDP surplus space is included in the device's computation. As the draft-herbert-udp-space-hdr-01 notes, there are (in general) two transmit checksums to be computed: the UDP checksum, and the OCS. The idea is to use the hardware support to compute whichever checksum covers more data and relegate the other to the host CPU. When the packet is a standard UDP datagram with an options trailer in the surplus area, OCS is typically the one that covers fewer bytes, so the host CPU prepares and formats the option trailer either as described in draft-ietf-tsvwg-udp-options-12 or as in draft-herbert-udp-space-hdr-01 and computes the OCS as specified in draft-ietf-tsvwg-udp-options-12 (i.e., including the 1-word pseudo-header). Then the host CPU calculates a modified UDP pseudo-header where the length is the IP payload length (not the UDP length) and puts this in the UDP checksum field. Finally, the host instructs the offload engine to calculate the Internet checksum over the entire IP payload and store the complement in the UDP checksum field. Clearly, nothing in the offload logic depends on the details of how the trailer is formatted, only that the 1s complement sum of all 16-bit words therein adds up to the 1s complement of the trailer length. When the packet is a UDP fragment (including a degenerate one that is both an initial and terminal fragment), the host CPU calculates the UDP header checksum in the standard way (including using the UDP length of 8 in the pseudo-header). It then formats the trailer area either as described in draft-ietf-tsvwg-udp-options-12 or as in draft-herbert-udp-space-hdr-01 and populates the OCS field with the surplus area length (i.e., IP payload length - 8). Note that in the case when OCS is a TLV (draft-ietf-tsvwg-udp-options-12) the issue of alignment may arise: the OCS pseudo-header needs to be byte-swapped if the OCS checksum field is on an odd byte boundary. This can be avoided by judicious use of NOPs. The host CPU then instructs the offload engine to calculate the Internet checksum over the surplus area and store the complement in the OCS checksum field. Some offload hardware may not be able to store at an arbitrary byte boundary, but any hardware alignment requirements can again be accommodated by judicious use of NOPs. In this case some details of the offload logic are dependent on the surplus area format, but it is clear, I think, that either of the competing formats draft-ietf-tsvwg-udp-options-12 or draft-herbert-udp-space-hdr-01) can be accommodated. It's easy to accommodate optional UDP checksum (UDP CS=0) and (conditionally) optional OCS: just omit the unneeded checksum computation(s). Conditionally optional OCS could be accommodated in the format proposed in draft-herbert-udp-space-hdr-01 by allowing OCS (therein called USH checksum) equal to zero to indicate absence. The draft describes typical hardware support for receive checksums as follows: In the most generic form of receive checksum offload, a device performs a running checksum calculation across a packet as it is received. That is, it performs a running ones complement addition over two byte words as they are received. The device then provides the computed value, referred to as the "checksum complete" value, to the host in the meta data (receive descriptor) for the packet. The host can use this value to verify one or more packet checksums contained in the packet. In order to use the raw checksum provided in this way. the host CPU needs to do the following: 1.) Compensate for any lower layer headers that may have been included in the raw checksum. This could the IPv4/IPv6 header and any IPv6 extension headers that the NIC does not strip off. I'm not going to dwell further on this point because the effects are the same under the different approaches to UDP options. 2.) Determine whether the received packet is a standard UDP packet with an options trailer and also whether the UCP checksum is present or absent, and if the UDP checksum is absent, whether OCS is also absent. Determining whether the a standard UDP packet with an options trailer or a UDP fragment (possibly a degenerate one that is both an initial and terminal fragment) is easy: just check to see if UDP Length = 8. Determining whether the UDP checksum is present is easy: just check to see if UDP checksum = 0. Determining whether OCS is present is a bit more difficult with the TLV format in draft-ietf-tsvwg-udp-options-12, but not much so, PROVIDED THAT OCS IS ALWAYS THE FIRST OPTION, apart from any NOPs that may be present for alignment. Let's consider first the case where both checksums are present. The first step for the host CPU is to compute the ones complement sum of a pseudo-header that is similar to the standard UDP pseudo-header but uses the IP payload length in place of the UDP length. When that is added to the raw checksum coming from the offload engine (following compensation for lower layer headers) the result will be zero if both the UDP checksum and the OCS are correct or both have errors that happen to cancel (see draft-fairhurst-udp-options-cco-00). So if this check passes, the host CPU needs to separately verify either the UDP checksum or the OCS in order to ensure that both are valid. When the packet is a standard UDP datagram with an options trailer in the surplus area, OCS is usually the cheaper one to verify, and the calculation required to do so is very little different whether the trailer uses the TLV format in draft-ietf-tsvwg-udp-options-12 or the format in draft-herbert-udp-space-hdr-01. Assertions to the contrary notwithstanding, IT IS NOT NECESSARY TO PARSE the OCS TLV in order to perform the OCS validation. One need only compute the Internet checksum over the trailer area using the trailer length as the pseudo-header, same as for the trailer format described in draft-herbert-udp-space-hdr-01. When the packet is a UDP fragment (including a degenerate one that is both an initial and terminal fragment), the host CPU validates the UDP header checksum in the standard way (including using the UDP length of 8 in the pseudo-header). There is no difference in the two formats for this. If UDP CS=0 and OCS is present, then the host CPU needs to validate OCS only. When the packet is a standard UDP datagram with an options trailer in the surplus area, the cheapest thing to do is likely just to calculate OCS directly. When the packet is a UDP fragment, then the host CPU can profitably use the output of the offload engine by adding the 1's complement of the 1's complement checksum of the UDP header and then adding the OCS pseudo-header. In both cases the amount of work is identical (or nearly so) for both formats. CONCLUSIONS: 1.) For checksum offload, the differences in the packet formats in draft-ietf-tsvwg-udp-options-12 and format in draft-herbert-udp-space-hdr-01 vary only at the edges. For the former it is necessary to impose a constraint on the placement of the OCS TLV in order for a receiver to safely determine whether it is present in the case where UDP CS=0. For the latter this placement constraint is automatic, but on the other hand there is now an alignment requirement when the options are in a trailer, which is something that the receiver needs to check. To me, that's six of one or half a dozen of the other, a wash. I do concede that there is a question of what happens if an OCS TLV shows up later in the packet when UDP CS = 0. My sense is that the receiver should just discard the option in that case and not check OCS. 2.) In both cases the receive logic would be cleaner if we reduced this to two cases: UDP CS<>0 and OCS is required (as in draft-ietf-tsvwg-udp-options-12) or UCP CS=0 and OCS is not required (and ignored if present). Then there would be no need to constrain the placement of the OCS TLV (since, as I point out above, it's always possible to check it without parsing the option TLVs). The alignment annoyance with the herbert-udp-space-hdr-01 trailer format remains. I personally have trouble seeing a lot of value in UDP CS=0 with OCS present, no matter how hard I squint; but if the WG really wants it, I'm OK to leave it in. 3.) If both checksums are always present whenever either is, a substantive simplification suggests itself, namely, not to worry about offsetting errors in UDP CS and OCS. That weakens the check, but not by a lot, and it streamlines both implementations that use offload and those that do not. Again, though, this is just a suggestion; it is not a point on which I feel strongly. Gorry also wrote: > I think it would be good for the WG to focus on how to finish > draft-ietf-tsvwg-udp-options, but I do seem some opportunities to use > some of these ideas for making the fragment header - because that also > places all data in the "option". > I agree that we need to focus on getting this deliverable finished. I'm rapidly coming to the conclusion that to some extent at least -- and certainly for checksum offload -- the disagreements that Joe, Tom, and I have about the packet formats amount largely to bike-shedding. I'm going to shut up for a while, review the 50+ messages in the "A review of draft-ietf-tsvwg-udp-options-12," and come back with some substantive suggestions for convergence. Mike Heard
- [tsvwg] A counterproposal to Section 5.5 of draft… C. M. Heard
- Re: [tsvwg] A counterproposal to Section 5.5 of d… Joseph Touch
- Re: [tsvwg] A counterproposal to Section 5.5 of d… C. M. Heard
- [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Gorry Fairhurst
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Gorry Fairhurst
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joe Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joe Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- [tsvwg] incorrectly coalesce packets [was: Re: RD… Rodney W. Grimes
- Re: [tsvwg] RDMA Support by UDP FRAG Option Rodney W. Grimes
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] incorrectly coalesce packets [was: Re… Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joe Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joe Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Tom Herbert
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joe Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch
- Re: [tsvwg] RDMA Support by UDP FRAG Option C. M. Heard
- Re: [tsvwg] RDMA Support by UDP FRAG Option Joseph Touch