Re: [tsvwg] RDMA Support by UDP FRAG Option

Tom Herbert <tom@herbertland.com> Sat, 19 June 2021 22:07 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8EBA63A07D4 for <tsvwg@ietfa.amsl.com>; Sat, 19 Jun 2021 15:07:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.004
X-Spam-Level:
X-Spam-Status: No, score=0.004 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Q39ggM2roqZu for <tsvwg@ietfa.amsl.com>; Sat, 19 Jun 2021 15:07:45 -0700 (PDT)
Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3FBC83A07D1 for <tsvwg@ietf.org>; Sat, 19 Jun 2021 15:07:44 -0700 (PDT)
Received: by mail-ej1-x62f.google.com with SMTP id gb32so13720800ejc.2 for <tsvwg@ietf.org>; Sat, 19 Jun 2021 15:07:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KrZYEzDht41/SopOtDlSCatIhjMP31TUzMAnZq9Afo8=; b=MSSK2TRWMk05J1IjpMZjao8VV1GHVSFuWgKGpaNbawxt1QTIX5lmwQcA0EXSOJxhmZ 84qjkCGi5grdaE+pdzq/PfqHdwtlgr3U1SoigB1okVz5ZXDmAO4bPEHu+OEhLuEG/85k jiwl/y7bJGYGHiBjmy37cLYPEm8bcRFkGMIioyr1bDnEmAnzKfyqCy42XUv3QCFDKFTM J2WyP/qB5UyNa71ntoBVCcBo43OOPtypGs02hj6vEEmgRYxfJnQTlQNcicr7Y07ma4/B xPbYCFa1jR2OWQSG4wbfA9IShfewbxcnGi0oAbTS+ZMBOFkWl4WXu1XAXkZvnY7e1UV0 35CQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KrZYEzDht41/SopOtDlSCatIhjMP31TUzMAnZq9Afo8=; b=n2r2mUYokpmchO7w8kA60sSLre+UkGYRNQ2q+8I5CRBshS2r/2RLeR2HttOfqNt2vX /xuKEcNx4+SSFhuDA2dewDVsSE9NObM2xZO3LqvVe59zKdWCoqpnVNPhTI/5mSAMiLlM LxxwNmd8g44eHvebn2SgnBxobsJOVvIuQ8nCHhUlGDiID5qAk/89b+49w9j/2W/M34y+ 7gM2G9E46VXnqadnYL0dlz/CXxQKYjN27OxmQC7Ruc0R/WwCKMNy4bxJxIPrkDOTxtFj g++l270DuLeAzCo/DrPuCfRa9e0Ct2aHyAVmeGpGsTZogNba3kDXkrr/pFC4NKIwXJvw EIHw==
X-Gm-Message-State: AOAM532m/S2fnDGdXjNEaPGFHJabyVj7rB6KuFHhq/BS9fWOLEnZfYDe Ws2pZfJ6DlBI2R0NenYNCYzj7wRTtZjlzen4GfwMiA==
X-Google-Smtp-Source: ABdhPJyWJKImBy/M3HHX0NUNR4NgndeEJyktXXtEjO0ejfQTYj6hyWojj1SmGoLvspNk3eCZjlkT7AvTAAx9U3XDwtg=
X-Received: by 2002:a17:906:4c58:: with SMTP id d24mr14244980ejw.298.1624140461669; Sat, 19 Jun 2021 15:07:41 -0700 (PDT)
MIME-Version: 1.0
References: <CACL_3VEyLdQZ-3hvzXxyA8ehtWs2hXESZ2OqyAx+BeSg85+-cA@mail.gmail.com> <CACL_3VFE4TjKvmkfZjvNpWo6vVfKjz5w85=Q+yqnYZKcwbYLmQ@mail.gmail.com> <63FFC34B-2179-47F1-B325-21CAC3D1543A@strayalpha.com> <CACL_3VHTfxWaBj7TFEmBXBqovrrAj7XuFEZFUag_iBHr3Hx09g@mail.gmail.com> <0EBFC9B0-591A-4860-B327-6E617B83F4D1@strayalpha.com> <CALx6S34pT81TbfQDk2vKF8wBrXL312As79K=rEzUQ3Lmg7UvpA@mail.gmail.com> <7C51D926-9DBB-41F5-93B2-10F716F672B1@strayalpha.com> <CALx6S37uN8TsXQZ3cv5jmxwxSyBRjK=-GQ_MsWxPWSs21XoGHw@mail.gmail.com> <CACL_3VEx7+VnLz7OLdXyhZU41e+-oBz3dc8JdMV_7pLMfic6=w@mail.gmail.com> <fcc8762f-c042-7999-d2e4-f28384950a19@erg.abdn.ac.uk> <CALx6S36sWGcZmFpAhF4DfOMyf6Z0w5F9bemNfeM1yWV-r0M+BA@mail.gmail.com> <8af3abf9-943f-13c1-e239-5efca27cf68c@erg.abdn.ac.uk> <CACL_3VHdyLAmzMbWsTVfJD+4tTzsMvcTzKS1B1CAdZ3k5U957g@mail.gmail.com> <CALx6S34DUrUBYd94LPPg4Hgh0FnZYZjZ4eKEYuaxb-7zbzb=pQ@mail.gmail.com> <F2C7D790-4037-4D41-B30D-0F66AF084635@strayalpha.com> <CALx6S37VN_GyyQ7E_rnNCOG2tPS5wVR9jdGMjgy0aaAFYT7anQ@mail.gmail.com> <C9BB95CC-1A12-48B6-9E90-8ED56EF40F27@strayalpha.com>
In-Reply-To: <C9BB95CC-1A12-48B6-9E90-8ED56EF40F27@strayalpha.com>
From: Tom Herbert <tom@herbertland.com>
Date: Sat, 19 Jun 2021 15:07:30 -0700
Message-ID: <CALx6S36FK7NVzMTdh+aUSpBdXrfT5C=KsAwoVBR8gU06E0TW5g@mail.gmail.com>
To: Joseph Touch <touch@strayalpha.com>
Cc: "C. M. Heard" <heard@pobox.com>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, TSVWG <tsvwg@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000489ac405c525ab8a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/Tza4WX9Q3_Bq0RVCEoa6z7mn4bU>
Subject: Re: [tsvwg] RDMA Support by UDP FRAG Option
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jun 2021 22:07:51 -0000

On Sat, Jun 19, 2021, 2:58 PM Joseph Touch <touch@strayalpha.com> wrote:

>
>
> > On Jun 19, 2021, at 1:40 PM, Tom Herbert <tom@herbertland.com> wrote:
> >
> > On Sat, Jun 19, 2021 at 12:56 PM Joseph Touch <touch@strayalpha.com>
> wrote:
> >>
> >> Tom,
> >>
> >>> On Jun 19, 2021, at 10:11 AM, Tom Herbert <tom@herbertland.com> wrote:
> >>>
> >>> ...
> >>> There is another serious problem with transport checksums and use of
> >>> the UDP surplus area. Most of this discussion has presumed that the
> >>> UDP checksum is the one in the packet being offloaded, but that may
> >>> not be the case. Consider the case where a sender sends a TCP packet
> >>> that is encapsulated GRE/UDP or VXLAN (a very common use case in
> >>> virtual networks where VMs send packets on their virtual networks).
> >>> The stack will attempt to offload the innermost checksum which is the
> >>> TCP checksum.
> >>
> >> I’m assuming this means TCP in GRE/UDP (it seems ambiguous).
> >>
> >>> The TCP checksum is the one offloaded regardless of
> >>> whether or not the outer UDP checksum is zero (if it's non-zero then
> >>> the stack would set it using local checksum offload (LCO)).
> >>
> >> Can you explain this? Packets are both defined and processed from the
> outside in; to do anything else might not yield a meaningful result.
> >>
> > It is described in
> > https://www.kernel.org/doc/html/latest/networking/checksum-offloads.html
> .
> >
> >>> The
> >>> offloaded TCP checksum computation would start the computation at the
> >>> first byte of the TCP header through the end of the whole packet.
> >>
> >> If TCP interprets “end” to mean anything beyond what the UDP header
> indicates, it is that TCP stack that is broken.
> >>
> > The assumption is that there are no bits in the packet beyond the
> > transport layer on transmit. This is a valid assumption since there
> > are currently no use cases where this is true;
>
> That is the fundamental error; this is not a valid assumption.
>
> > UDP options would be
> > the first instance of supporting trailers.
>
> TCP has had trailers too, at one point.
>
> > On the receive side, stacks
> > will properly handle checksum offload with data in the surplu space.
> >
> >> (Presumably we’re talking about non-UDP fragmented packets; with UDP
> fragmentation, the TCP operation should never happen until the UDP
> fragments are reassembled).
> >>
> >>> So
> >>> unless the surplus area is properly checksummed then the computed TCP
> >>> checksum will be invalid and the packet will be dropped at the
> >>> receiver.
> >>
> >> See above.
> >>
> >>> This is not just a problem for offload for offload, I
> >>> believe that this wouldn't work properly in existing software stacks
> >>> without some major changes.
> >>
> >> Same problem, same answer.
> >>
> >>> So to make all uses of transport checksum computation and offload
> >>> reasonably robust, when the UDP surplus area is being used both the
> >>> UDP checksum and checksum over the surplus area MUST always be set.
> >>
> >> If protocol stacks that try to peek ahead in layers don’t follow the
> rules, there’s not much we can do, ever.
> >>
> >>> FYI, here is some nice background checksum offload is
> >>>
> https://www.kernel.org/doc/html/latest/networking/checksum-offloads.html
> >>
> >> There is an error in draft-herbert-remotecsumoffload in Section 2.1; it
> states that the UDP checksum is over the upper layer packet length (it is
> not). RFC768 defines the UDP checksum over the pseudoheader, UDP header,
> and data (not referring to the IP packet except for pseudo header info). In
> fact, it also refers to the pseudoheader as using the UDP Length, not the
> IP length (adjusted or not).
> >>
> >> The same error appears in the draft-herbert-vxlan-rco in the definition
> of packet_csum.
> >>
> >> If either doc reflects how offloading is implemented, then those are
> bugs that should be fixed.
> >>
> >> In TCPM we’ve identified a number of additional issues with TCP
> offload, notably regarding how they incorrectly coalesce packets with
> different TCP headers.
> >>
> >> If we’re not calling out these behaviors as the bugs they are, there’s
> little point in doing much of anything in the IETF.
> >>
> > You're welcome to call these behavior bugs if you want, but the fact
> > is they are correct behavior and robust behavior for all currently
> > defined IETF protocols as evidenced by the fact that the Internet runs
> > on billions of devices with these behaviors.
>
> “Currently works” is not the same as “correct”.
>
> Correct follows the spec; the above notes do not. They were never in WG
> docs or that would have been presumably corrected.
>
> > The problem you are
> > hitting is that we have over thirty years of deployment and
> > implementation experience with Internet protocols that follow some
> > basic principles and conventions like use protocol headers and not
> > trailers.
>
> It’s not about headers vs. trailers. It’s about whether you follow the
> specs or not.
>
> > So while a protocol that diverges from those principles and
> > conventions might be academically correct on paper, in deployment it
> > may be replete with a myriad of issues which is what we see when
> > delving into the details of how UDP options interact with real stacks
> > and devices. If the goal is to produce a deployable and performant
> > protocol, which I believe is the purpose of IETF, then we need to take
> > realities of deployment and implementation into account.
>
> We need to start bu not continuing to refer to false claims in unpublished
> drafts.
>
> When we find errors, we should fix them, not propagate them.
>

Joe,

A nice thing about an open source project like Linux is that anyone who
thinks there's a bug they can submit a patch and it will be accepted *if*
you can justify the patch to the maintainers. Good luck, if you want to
take that on!

Tom


> Joe