Re: [tsvwg] RDMA Support by UDP FRAG Option

"C. M. Heard" <heard@pobox.com> Fri, 18 June 2021 20:18 UTC

Return-Path: <heard@pobox.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1F703A0A88 for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 13:18:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.798
X-Spam-Level:
X-Spam-Status: No, score=-2.798 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=pobox.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id roEyyz3HGagH for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 13:18:18 -0700 (PDT)
Received: from pb-smtp20.pobox.com (pb-smtp20.pobox.com [173.228.157.52]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00F553A0A87 for <tsvwg@ietf.org>; Fri, 18 Jun 2021 13:18:16 -0700 (PDT)
Received: from pb-smtp20.pobox.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 06C9E145D9E for <tsvwg@ietf.org>; Fri, 18 Jun 2021 16:18:16 -0400 (EDT) (envelope-from heard@pobox.com)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h= mime-version:references:in-reply-to:from:date:message-id:subject :to:cc:content-type; s=sasl; bh=wtxbB5IWXl8sgGntLWPuCbmtt3RenAGL 1BSgJwzrwiQ=; b=SqXYAFEJU5a3sNsIaysgXpq2nWm72xQWTIEVi/JZHxfCTNcN 5legmzJbrTGm7KaSfvooqlX/iPQE7SIIyFJYga94nqjHBUf0JdzX77D47ORrzvUs M73ZkfIespt5awiKJkXPPeJW8iJ9BAV6ZvD/JjHSRL3XjIly39OHUGzpcbc=
Received: from pb-smtp20.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id F31AA145D9D for <tsvwg@ietf.org>; Fri, 18 Jun 2021 16:18:15 -0400 (EDT) (envelope-from heard@pobox.com)
Received: from mail-io1-f47.google.com (unknown [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp20.pobox.com (Postfix) with ESMTPSA id 89C12145D9C for <tsvwg@ietf.org>; Fri, 18 Jun 2021 16:18:13 -0400 (EDT) (envelope-from heard@pobox.com)
Received: by mail-io1-f47.google.com with SMTP id q3so8348273iop.11 for <tsvwg@ietf.org>; Fri, 18 Jun 2021 13:18:13 -0700 (PDT)
X-Gm-Message-State: AOAM533/c7v8UoEnRPXwfaZpvqoyhHi/9/HS1kvEqMeaXJJhr7nxTKU8 w4tp5h24MobiPphFphHsqKe5u7vhdwDVtyzA2HM=
X-Google-Smtp-Source: ABdhPJxLvTV03SdzW1FnncifDUE857s+cdCJKifvXOy8Yfl0cIcAL3agVp8fUf9GN2B8QlvDIZQPKD7jLtFRzuSwmkA=
X-Received: by 2002:a02:90cb:: with SMTP id c11mr4377716jag.53.1624047492248; Fri, 18 Jun 2021 13:18:12 -0700 (PDT)
MIME-Version: 1.0
References: <CACL_3VEyLdQZ-3hvzXxyA8ehtWs2hXESZ2OqyAx+BeSg85+-cA@mail.gmail.com> <CACL_3VFE4TjKvmkfZjvNpWo6vVfKjz5w85=Q+yqnYZKcwbYLmQ@mail.gmail.com> <63FFC34B-2179-47F1-B325-21CAC3D1543A@strayalpha.com> <CACL_3VHTfxWaBj7TFEmBXBqovrrAj7XuFEZFUag_iBHr3Hx09g@mail.gmail.com> <0EBFC9B0-591A-4860-B327-6E617B83F4D1@strayalpha.com> <CALx6S34pT81TbfQDk2vKF8wBrXL312As79K=rEzUQ3Lmg7UvpA@mail.gmail.com> <7C51D926-9DBB-41F5-93B2-10F716F672B1@strayalpha.com> <CALx6S37uN8TsXQZ3cv5jmxwxSyBRjK=-GQ_MsWxPWSs21XoGHw@mail.gmail.com> <CACL_3VEx7+VnLz7OLdXyhZU41e+-oBz3dc8JdMV_7pLMfic6=w@mail.gmail.com> <fcc8762f-c042-7999-d2e4-f28384950a19@erg.abdn.ac.uk> <CALx6S36sWGcZmFpAhF4DfOMyf6Z0w5F9bemNfeM1yWV-r0M+BA@mail.gmail.com> <8af3abf9-943f-13c1-e239-5efca27cf68c@erg.abdn.ac.uk>
In-Reply-To: <8af3abf9-943f-13c1-e239-5efca27cf68c@erg.abdn.ac.uk>
From: "C. M. Heard" <heard@pobox.com>
Date: Fri, 18 Jun 2021 13:18:00 -0700
X-Gmail-Original-Message-ID: <CACL_3VHdyLAmzMbWsTVfJD+4tTzsMvcTzKS1B1CAdZ3k5U957g@mail.gmail.com>
Message-ID: <CACL_3VHdyLAmzMbWsTVfJD+4tTzsMvcTzKS1B1CAdZ3k5U957g@mail.gmail.com>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Cc: Tom Herbert <tom@herbertland.com>, Joseph Touch <touch@strayalpha.com>, TSVWG <tsvwg@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000dfba2705c51005da"
X-Pobox-Relay-ID: 4F09695E-D072-11EB-B1BE-D5C30F5B5667-06080547!pb-smtp20.pobox.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/RZULHOKRgrSYIvsI-5Hg7sNtSS8>
Subject: Re: [tsvwg] RDMA Support by UDP FRAG Option
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jun 2021 20:18:23 -0000

On Fri, Jun 18, 2021 at 2:36 AM Gorry Fairhurst wrote:

> On 17/06/2021 18:30, Tom Herbert wrote:
> > On Thu, Jun 17, 2021 at 7:41 AM Gorry Fairhurst wrote:
> >> When [the] dust has settled, I expect we should see an updated
> >> fragmentation text.  I suspect we should consider the final format for
> >> suitability for offload (perhaps Tom Herbert would look?) and for ease
> >> of processing in message nbuffere (Tom Jones offered to look).
> >
> > Please look at draft-herbert-udp-space-hdr-01 for my proposal as to
> > how the UDP surplus space should be formatted. If there is interest, I
> > could update draft to include considerations for UDP options
> > fragmentation and reassembly offload as well as header/data split
> > which is needed for zero-copy receive (i.e. packet headers are DMA'ed
> > into one set of buffers to be processed by the stack, payload is
> > DMA'ed to another set of buffers to be processed by the application).
>
>
> Thanks, I saw that a while ago, thanks for reminding people.
>

Indeed. I wanted to undertake a detailed analysis of the effects of the
primary competing formats on checksum offload, and I was looking around for
the treatment in Appendix A of draft-herbert-udp-space-hdr-01,  especially
the discussion of how common offload hardware works, when the reminder came
in.

Since draft-herbert-udp-space-hdr-01 does not take into account the OCS
pseudo-header (which is one 16-bit word consisting of the surplus area
length), I'm going to do the analysis from scratch in order to account for
this point. My main conclusions are that for the purposes of checksum
offload, IT DOESN'T MATTER whether OCS is at a fixed position in an option
header or in a TLV. The differences are very minor.

What follows is a fairly detailed comparison of how the offload logic
differs between the format proposed in draft-ietf-tsvwg-udp-options-12 and
that in draft-herbert-udp-space-hdr-01. Those who do not care to pick
through the entire analysis should skip down to the CONCLUSIONS section
below.

Tom's draft describes typical hardware support for transmit checksums as
follows:

   In generic checksum offload, for each packet the host indicates to
   the device the starting offset where the checksum calculation begins
   and the offset of the field to write the resultant checksum. The
   extent of the checksum coverage is assumed to be the end of the
   packet. In particular, this means that even if the UDP checksum is
   being offloaded, the UDP surplus space is included in the device's
   computation.


As the draft-herbert-udp-space-hdr-01 notes, there are (in general) two
transmit checksums to be computed: the UDP checksum, and the OCS. The idea
is to use the hardware support to compute whichever checksum covers more
data and relegate the other to the host CPU.

When the packet is a standard UDP datagram with an options trailer in the
surplus area, OCS is typically the one that covers fewer bytes, so the host
CPU prepares and formats the option trailer either as described in
draft-ietf-tsvwg-udp-options-12 or as in draft-herbert-udp-space-hdr-01 and
computes the OCS as specified in draft-ietf-tsvwg-udp-options-12 (i.e.,
including the 1-word pseudo-header). Then the host CPU calculates a
modified UDP pseudo-header where the length is the IP payload length (not
the UDP length) and puts this in the UDP checksum field. Finally, the host
instructs the offload engine to calculate the Internet checksum over the
entire IP payload and store the complement in the UDP checksum field.
Clearly, nothing in the offload logic depends on the details of how the
trailer is formatted, only that the 1s complement sum of all 16-bit words
therein adds up to the 1s complement of the trailer length.

When the packet is a UDP fragment (including a degenerate one that is both
an initial and terminal fragment), the host CPU calculates the UDP header
checksum in the standard way (including using the UDP length of 8 in the
pseudo-header). It then formats the trailer area either as described in
draft-ietf-tsvwg-udp-options-12 or as in draft-herbert-udp-space-hdr-01 and
populates the OCS field with the surplus area length (i.e., IP payload
length - 8). Note that in the case when OCS is a TLV
(draft-ietf-tsvwg-udp-options-12) the issue of alignment may arise: the OCS
pseudo-header needs to be byte-swapped if the OCS checksum field is on an
odd byte boundary. This can be avoided by judicious use of NOPs. The host
CPU then instructs the offload engine to calculate the Internet checksum
over the surplus area and store the complement in the OCS checksum field.
Some offload hardware may not be able to store at an arbitrary byte
boundary, but any hardware alignment requirements can again be accommodated
by judicious use of NOPs. In this case some details of the offload logic
are dependent on the surplus area format, but it is clear, I think, that
either of the competing formats draft-ietf-tsvwg-udp-options-12 or
draft-herbert-udp-space-hdr-01) can be accommodated.

It's easy to accommodate optional UDP checksum (UDP CS=0) and
(conditionally) optional OCS: just omit the unneeded checksum
computation(s). Conditionally optional OCS could be accommodated in the
format proposed in draft-herbert-udp-space-hdr-01 by allowing OCS (therein
called USH checksum) equal to zero to indicate absence.

 The draft describes typical hardware support for receive checksums as
follows:

   In the most generic form of receive checksum offload, a device
   performs a running checksum calculation across a packet as it is
   received. That is, it performs a running ones complement addition
   over two byte words as they are received. The device then provides
   the computed value, referred to as the "checksum complete" value, to
   the host in the meta data (receive descriptor) for the packet. The
   host can use this value to verify one or more packet checksums
   contained in the packet.


In order to use the raw checksum provided in this way. the host CPU needs
to do the following:

1.) Compensate for any lower layer headers that may have been included in
the raw checksum. This could
the IPv4/IPv6 header and any IPv6 extension headers that the NIC does not
strip off. I'm not going to dwell further on this point because the effects
are the same under the different approaches to UDP options.

2.) Determine whether the received packet is a standard UDP packet with an
options trailer and also whether the UCP checksum is present or absent, and
if the UDP checksum is absent, whether OCS is also absent. Determining
whether the a standard UDP packet with an options trailer or a UDP fragment
(possibly a degenerate one that is both an initial and terminal fragment)
is easy: just check to see if UDP Length = 8. Determining whether the UDP
checksum is present is easy: just check to see if UDP checksum = 0.
Determining whether OCS is present is a bit more difficult with the TLV
format in draft-ietf-tsvwg-udp-options-12, but not much so, PROVIDED THAT
OCS IS ALWAYS THE FIRST OPTION, apart from any NOPs that may be present for
alignment.

Let's consider first the case where both checksums are present. The first
step for the host CPU is to compute the ones complement sum of a
pseudo-header that is similar to the standard UDP pseudo-header but uses
the IP payload length in place of the UDP length. When that is added to the
raw checksum coming from the offload engine (following compensation for
lower layer headers) the result will be zero if both the UDP checksum and
the OCS are correct or both have errors that happen to cancel
(see draft-fairhurst-udp-options-cco-00). So if this check passes, the host
CPU needs to separately verify either the UDP checksum or the OCS in order
to ensure that both are valid.

When the packet is a standard UDP datagram with an options trailer in the
surplus area, OCS is usually the cheaper one to verify, and the calculation
required to do so is very little different whether the trailer uses the TLV
format in draft-ietf-tsvwg-udp-options-12 or the format in
draft-herbert-udp-space-hdr-01. Assertions to the contrary notwithstanding,
IT IS NOT NECESSARY TO PARSE the OCS TLV in order to perform the OCS
validation. One need only compute the Internet checksum over the trailer
area using the trailer length as the pseudo-header, same as for the trailer
format described in draft-herbert-udp-space-hdr-01.

When the packet is a UDP fragment (including a degenerate one that is both
an initial and terminal fragment), the host CPU validates the UDP header
checksum in the standard way (including using the UDP length of 8 in the
pseudo-header). There is no difference in the two formats for this.

If UDP CS=0 and OCS is present, then the host CPU needs to validate OCS
only. When the packet is a standard UDP datagram with an options trailer in
the surplus area, the cheapest thing to do is likely just to calculate OCS
directly. When the packet is a UDP fragment, then the host CPU can
profitably use the output of the offload engine by adding the 1's
complement of the 1's complement checksum of the UDP header and then adding
the OCS pseudo-header. In both cases the amount of work is identical (or
nearly so) for both formats.

CONCLUSIONS:

1.) For checksum offload, the differences in the packet formats in
draft-ietf-tsvwg-udp-options-12 and format in
draft-herbert-udp-space-hdr-01 vary only at the edges. For the former it is
necessary to impose a constraint on the placement of the OCS TLV in order
for a receiver to safely determine whether it is present in the case where
UDP CS=0. For the latter this placement constraint is automatic, but on the
other hand there is now an alignment requirement when the options are in a
trailer, which is something that the receiver needs to check. To me, that's
six of one or half a dozen of the other, a wash. I do concede that there is
a question of what happens if an OCS TLV shows up later in the packet when
UDP CS = 0. My sense is that the receiver should just discard the option in
that case and not check OCS.

2.) In both cases the receive logic would be cleaner if we reduced this to
two cases: UDP CS<>0 and OCS is required (as in
draft-ietf-tsvwg-udp-options-12) or UCP CS=0 and OCS is not required (and
ignored if present). Then there would be no need to constrain the placement
of the OCS TLV (since, as I point out above, it's always possible to check
it without parsing the option TLVs). The alignment annoyance with the
herbert-udp-space-hdr-01 trailer format remains. I personally have trouble
seeing a lot of value in UDP CS=0 with OCS present, no matter how hard I
squint; but if the WG really wants it, I'm OK to leave it in.

3.) If both checksums are always present whenever either is, a substantive
simplification suggests itself, namely, not to worry about offsetting
errors in UDP CS and OCS. That weakens the check, but not by a lot, and it
streamlines both implementations that use offload and those that do not.
Again, though, this is just a suggestion; it is not a point on which I feel
strongly.

Gorry also wrote:

> I think it would be good for the WG to focus on how to finish
> draft-ietf-tsvwg-udp-options, but I do seem some opportunities to use
> some of these ideas for making the fragment header - because that also
> places all data in the "option".
>

I agree that we need to focus on getting this deliverable finished. I'm
rapidly coming to the conclusion that to some extent at least -- and
certainly for checksum offload -- the disagreements that Joe, Tom, and I
have about the packet formats amount largely to bike-shedding. I'm going to
shut up for a while, review the 50+ messages in the "A review of
draft-ietf-tsvwg-udp-options-12," and come back with some substantive
suggestions for convergence.

Mike Heard