Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12

Tom Herbert <tom@herbertland.com> Tue, 15 June 2021 15:56 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3BEC23A34AC for <tsvwg@ietfa.amsl.com>; Tue, 15 Jun 2021 08:56:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XEVqdCxvDU6p for <tsvwg@ietfa.amsl.com>; Tue, 15 Jun 2021 08:56:39 -0700 (PDT)
Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BDF123A34AA for <tsvwg@ietf.org>; Tue, 15 Jun 2021 08:56:39 -0700 (PDT)
Received: by mail-ej1-x629.google.com with SMTP id g20so23507074ejt.0 for <tsvwg@ietf.org>; Tue, 15 Jun 2021 08:56:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=bOqc3/OlPFEItBwqeYN9viXhk6SrKiHUUpdZolj4lPA=; b=VkUPkTPe8o8mI2ziJQpSMSpjLXwMH5qHC/8mUmchJHw65VJ+ftB2rxz9QzMeJF4/ei DNoYYU1nym/bsaAIvfcOgPQvUS6ggZP3zPU//h018mxEwoJKQeTSIsIzTUn8xBkBAHkg Dp8sY4MJHdg04hp1WpF1K5fAJwQRnpcFKUX7sQSKHO6Mxfjzx/WEAVRiv/mpUxTZPWgn wJA1diFEKfTrgSk7lSYeDnVF3eHtXnmXajGvHvVqAhQOeNWH9jaKdZkJ6+pNBp0y8lmI 6S+XMmS4HcCYuqWsAWfapoQl90vat4LX0FwlqC5xivIkwkKeNSEOb0VOtZVwebQOLLJG OGyA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=bOqc3/OlPFEItBwqeYN9viXhk6SrKiHUUpdZolj4lPA=; b=lsQfSwje0u7xbyhegzElZrzPELVZdzrRWJXjR43HaJ62d/pophEn6+FufWOiXl23YB 25zUs5nPZzZ/XZpmZKbCoSR/78i2pE0KaLS7BQOwvTkAbddoiw2GW9FkvmZdRSLGqqjZ +pzwbzNHXPzky8qEMt1naTwG+DofPzU20u8Ukkl6eKXLPRTQlLyVUYjynkCFBEDQpIDQ HStwcHwlDNL77dm0kFEC00QbRoBDwWUgih8F+a+H4cchfbZnbMD9y2TYtX3/uhbAJLQI h3Ess53Qknl8aHLo2Tqdnd4eZ7zVvhM7D3/LVUHwE6ZgWkAoIejvdv0fvYConEyUtoO8 fSyw==
X-Gm-Message-State: AOAM532KIRoDdJStIpm2oIyF77b3ayzo3nC4MzpEG5qU7ywI4wP8EQWc VQHCix+GMLERj4UmWXlHNFsUOBFSGKpGzer51kxwaQ==
X-Google-Smtp-Source: ABdhPJwui5qCPewuEJUWMUFqeZd3gShnhBq6p3rKxzDUB8rwJJeSgz6X31oSZQmV97DfX9T35Nzou21w/LIBB8KN63Y=
X-Received: by 2002:a17:906:eb17:: with SMTP id mb23mr201061ejb.239.1623772597490; Tue, 15 Jun 2021 08:56:37 -0700 (PDT)
MIME-Version: 1.0
References: <CACL_3VGb_9P5SfPGRJtf1ZBvEhgywc2ZEGr-qbgNOMXV20rFeA@mail.gmail.com> <CACL_3VHyoRr5ju8203DiLTUo-658DCj7ud+1dQE2o0hUPVhF0A@mail.gmail.com> <7D766992-AEEB-434F-BB1D-3817EE07DE61@strayalpha.com> <1BBDBD80-3A53-4700-A79F-9A3AE4876F2B@strayalpha.com> <CACL_3VEXCT-sSNhtncVK26DPQefDLJhqEijgDke4Q7DmhRrpTQ@mail.gmail.com> <67E79ED1-14DE-4127-83AF-D17E8C72F362@strayalpha.com> <CACL_3VGOVTjzOBBCS4b+4X_cTFX6T=gYO4_htvr2idzQGUP+oQ@mail.gmail.com> <C67EE01E-A41F-4BF5-BE1E-33E9F01D0B72@strayalpha.com>
In-Reply-To: <C67EE01E-A41F-4BF5-BE1E-33E9F01D0B72@strayalpha.com>
From: Tom Herbert <tom@herbertland.com>
Date: Tue, 15 Jun 2021 08:56:24 -0700
Message-ID: <CALx6S360gGWGicLAj1QJheHRyqvbkWx3KE7VrTyUSJ83cqkffA@mail.gmail.com>
To: Joseph Touch <touch@strayalpha.com>
Cc: "C. M. Heard" <heard@pobox.com>, TSVWG <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/Pnbqx0PkHnM_UmGjjQcH70e8PJM>
Subject: Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jun 2021 15:56:45 -0000

On Tue, Jun 15, 2021 at 8:12 AM Joseph Touch <touch@strayalpha.com> wrote:
>
> Mike,
>
> Zero-copy networking has been around since roughly 1991 (James Sterbenz PhD dissertation on “Axon” was arguably one of the first, the IBM 360 had zero copy for memory storage transfers back in the 1960s), and it has been used in networking stacks since that time. Yes, it’s more prevalent in datacenter and HPC (high performance computing) environments, where RDMA is one variant, but it can be supported in at least some network cards that offload packet processing.
>
> It requires few per-packet decisions to confirm support, which is why some simple decisions we can make - the don’t constrain us - help.Those below include making OCS come first and enabling FRAG to come immediately after, so the entire TLV chain between the two doesn’t need to be walked. I agree we should not design in ways that complicate processing for non-zerocopy endpoints, but if the work is either small (as it was previously) or effectively none (as it would be with the structure below), there’s no utility in ignoring it.
>
> It might be useful to note that zero copy support exists in most modern OSes, including Linux, macOS, and Windows.
>
Joe,

Please see the paper that Mike referred to, that describes concisely
how zero copy receive is supported in Linux. Receive zero was not easy
to pull off and does require a constrained environment. Zero copy send
was easier as we have splice and MSG_ZEROCOPY has been added as the
deck describes.

IMO, if you want to make UDP options seamlessly integrate and get
ubiquitous support of UDP options in the OS and hardware (leverage
checksum offload, zero copy, hardware fragmentation/reassembly,
offload authentication/crypto, header/data split, etc.), the best
direction would be to make UDP with options look like TCP as much as
possible at least with regards to packet format. i.e. use the same
option format, have a header length field, use headers instead of
trailers, etc.

Tom

> Joe
>
> On Jun 14, 2021, at 10:10 PM, C. M. Heard <heard@pobox.com> wrote:
>
> On Sun, Jun 13, 2021 at 9:31 PM Joseph Touch wrote:
>>
>> On Jun 13, 2021, at 7:20 PM, C. M. Heard wrote:
>>>
>>> I for one would appreciate further discussion of these last points. I admit that I have failed to grasp Joe's message on the RDMA thread, and I would appreciate some time to think about it .
>>
>>
>> Sure - here’s how it all works. Note that this is relevant mostly for long transfers with persistent UDP fragmentation; if that is assumed to be ‘adjusted’ at the app layer (as QUIC does), then we don’t need zero-copy support...
>>
>> - right now, UDP data can be zero-copied when received into user space, starting with the user data
>> - if we add options, UDP data can still be zero-copied because it hasn’t moved (it still begins the payload
>> - however, fragments are different because (esp given the merging of frag and lite) they don’t start at the beginning of data
>> - they always start after OCS (which I think we should make fit the uniform KIND/LEN/OCS format of 4 bytes)
>> - if the FRAG comes next, then we can move the frag content around a little and still support zero-copy
>>
>> notably, we move the first 10 bytes of the fragment to the end
>> 4 for OCS
>> 6 for FRAG (assuming FRAG includes KIND/OPTLEN/FRAGOFFSET/ID/FRAGLEN)
>> that way we can zero-copy the frag packet into place, then just copy those last 8 bytes over OCS and the FRAG header
>>
>> This method assumes that we try to keep FRAG early in the packet - preferably right after OCS. The later it comes, the more additional bytes we need to move to “fix” the copy (beyond the 8 bytes noted above).
>>
>> —
>>
>> This method is the only reason we would want to allow options after non-terminal fragments - basically to keep the fragment toward the front of the packet, using the rule that post-noninitial frag options still operate on the fragment, rather than waiting for reassembly. The exception is the terminal fragment, where post-terminal fragment options operate on the reassembled packet.
>
>
> I'm not understanding this AT ALL, and I apologize if there is well-known stuff of which I am embarrassingly ignorant. That being said:
>
> EVERY description of a zero-copy receive describes something involving MTUs and highly constrained header length that allow the user data in a TCP segment or UDP packet to be mapped to one or more kernel pages. Here is one example:
>
> PATH to TCP 4K MTU and RX zerocopy
>
> In every case that I have found, the solutions apply only to a highly constrained environment, such as a data center, and not over the Internet writ large. Some even involve requiring the application to process the transport headers, which is surely not an outcome that we wish in general.
>
> If I am wrong -- and it would most assuredly not be the first time -- I am eager to be disabused, preferably with a complete and open description of a zero-copy technology without such shortcomings.
>
> But if my conclusions are substantially correct, I don't think that TSVWG should expend effort on zero copy for UDP fragment reassembly. Transport options for UDP need to apply across the general Internet.
>
> NOTE: the unfavorable conclusions that I make about zero-copy do NOT apply to checksum offload; the advantages and applicability of that technology (especially with OCS now defined to be an equivalent to the CCO proposal) are readily apparent, even though they are not realizable in every implementation.
>
> Thanks
>
> Mike Heard
>
>
>
>
>
>>
>>
>
>