Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12

Joseph Touch <touch@strayalpha.com> Mon, 14 June 2021 17:44 UTC

Return-Path: <touch@strayalpha.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8C98D3A2C1D for <tsvwg@ietfa.amsl.com>; Mon, 14 Jun 2021 10:44:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.318
X-Spam-Level:
X-Spam-Status: No, score=-1.318 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=strayalpha.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LMBPzT83tegh for <tsvwg@ietfa.amsl.com>; Mon, 14 Jun 2021 10:44:02 -0700 (PDT)
Received: from server217-4.web-hosting.com (server217-4.web-hosting.com [198.54.116.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BAFDA3A2C1A for <tsvwg@ietf.org>; Mon, 14 Jun 2021 10:44:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=strayalpha.com; s=default; h=To:References:Message-Id:Cc:Date:In-Reply-To: From:Subject:Mime-Version:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=GY/Y8oO5y07jl0U6DTzp3S7oLKeJMEDa89acaaPa37M=; b=QjIFwcCYWItnY+Dp53uf7McY2/ SyJQ1qKT+l1maIMMBbsrvZNdNS5+grHGfvn+uSwtSAp8tGixonG9d1tKGjmKus7g0bdyWdIreQs9H +waTXhqkNBBKSV9QaN8KewZoIeAql/Qv85K2b21JAltohhORx7NUXQQqmZyf8TIhXaPKe7siKseMj Q1kHKSfEZV4vAa3EkPQn9jOzFF09Mqxpt4qnlBY2i9CEu+31pnSq21Me6tgGTLohixQC5M9UgPlfF jyGGIURHr9/7N0C9xZZndoev05yon555rLhlEmeC0kPRVSr2dKyjCA9KL+jtaWBEovb6/Etju2DQx lSr+rlQg==;
Received: from cpe-172-250-225-198.socal.res.rr.com ([172.250.225.198]:64421 helo=smtpclient.apple) by server217.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <touch@strayalpha.com>) id 1lsqcu-001rQS-7E; Mon, 14 Jun 2021 13:44:01 -0400
Content-Type: multipart/alternative; boundary="Apple-Mail=_D8CC5F79-6D90-464E-B877-B1685D518EFE"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\))
From: Joseph Touch <touch@strayalpha.com>
In-Reply-To: <CALx6S37Hx1zafjjr_fnG1ZY7afGEF081QfV5yhdfPftM57Ro0g@mail.gmail.com>
Date: Mon, 14 Jun 2021 10:43:55 -0700
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, TSVWG <tsvwg@ietf.org>
Message-Id: <5A6C1B4E-491E-4F62-82EF-F49292F433AB@strayalpha.com>
References: <D9B2E315-5C7A-4BE9-97A9-AF627F6FD6FF@strayalpha.com> <DCF3D0D3-83E0-4F84-8C1F-57DF9EE63C59@strayalpha.com> <CALx6S37Hx1zafjjr_fnG1ZY7afGEF081QfV5yhdfPftM57Ro0g@mail.gmail.com>
To: Tom Herbert <tom@herbertland.com>
X-Mailer: Apple Mail (2.3654.100.0.2.22)
X-OutGoing-Spam-Status: No, score=-0.5
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server217.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - strayalpha.com
X-Get-Message-Sender-Via: server217.web-hosting.com: authenticated_id: touch@strayalpha.com
X-Authenticated-Sender: server217.web-hosting.com: touch@strayalpha.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/r8j1M9_OTs2F-v6XJFvwDVhU_R8>
Subject: Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jun 2021 17:44:08 -0000

Hi, Tom,

OCS has been required since -08 (Sept. 2019). Here’s the relevant text:
   >> The OCS MUST be included when the UDP checksum is nonzero and UDP
   options are present.
Joe

> On Jun 14, 2021, at 10:32 AM, Tom Herbert <tom@herbertland.com> wrote:
> 
> Joe,
> 
> I suggest that the UDP options should be preceded by a four byte
> header consisting of one byte type, one byte length, and two byte
> checksum. As I've mentioned previously, making the checksum optional
> is inherently problematic because it cannot protect against a
> corrupted type field for the optional checksum. e.g. a single bit flip
> in the type field for the checksum could turn the checksum option into
> some other type and there is no way to detect that.
> 
> Tom
> 
> Tom
> 
> 
> On Mon, Jun 14, 2021 at 10:20 AM Joe Touch <touch@strayalpha.com> wrote:
>> 
>> Ps - we need an option length field to make fragments look like tcp. I can put that in - do we want that in OCS? Or independent?
>> 
>>> On Jun 14, 2021, at 10:16 AM, Joe Touch <touch@strayalpha.com> wrote:
>>> 
>>> FYI that’s what fragments look like. We can’t do this for non fragments.
>>> 
>>>>> On Jun 14, 2021, at 10:03 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> 
>>>>> On Mon, Jun 14, 2021 at 9:31 AM Gorry Fairhurst <gorry@erg.abdn.ac.uk> wrote:
>>>>> 
>>>>>> On 14/06/2021 17:17, Tom Herbert wrote:
>>>>>> On Sun, Jun 13, 2021 at 9:31 PM Joseph Touch <touch@strayalpha.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 13, 2021, at 7:20 PM, C. M. Heard <heard@pobox.com> wrote:
>>>>>>> 
>>>>>>>> If we DO support zero-copy and thus want to allow non-terminal fragments to have post-fragoption options that operate on each fragment, then we would add THISFRAGLEN to the nonterminal format and issue different KIND numbers to nonterminal/terminal fragment.
>>>>>>> 
>>>>>>> 
>>>>>>> I for one would appreciate further discussion of these last points. I admit that I have failed to grasp Joe's message on the RDMA thread, and I would appreciate some time to think about it.
>>>>>>> 
>>>>>>> 
>>>>>>> Sure - here’s how it all works. Note that this is relevant mostly for long transfers with persistent UDP fragmentation; if that is assumed to be ‘adjusted’ at the app layer (as QUIC does), then we don’t need zero-copy support...
>>>>>>> 
>>>>>>> - right now, UDP data can be zero-copied when received into user space, starting with the user data
>>>>>> Only if the device supports header/data split where the headers are in
>>>>>> one buffer and UDP data is in aligned buffer.
>>>>>> 
>>>>>>> - if we add options, UDP data can still be zero-copied because it hasn’t moved (it still begins the payload
>>>>>>> - however, fragments are different because (esp given the merging of frag and lite) they don’t start at the beginning of data
>>>>>>> - they always start after OCS (which I think we should make fit the uniform KIND/LEN/OCS format of 4 bytes)
>>>>>>> - if the FRAG comes next, then we can move the frag content around a little and still support zero-copy
>>>>>>> 
>>>>>>> notably, we move the first 10 bytes of the fragment to the end
>>>>>>> 4 for OCS
>>>>>>> 6 for FRAG (assuming FRAG includes KIND/OPTLEN/FRAGOFFSET/ID/FRAGLEN)
>>>>>>> that way we can zero-copy the frag packet into place, then just copy those last 8 bytes over OCS and the FRAG header
>>>>>>> 
>>>>>> An obvious feature we'd want is NIC hardware to do UDP options
>>>>>> fragementation and reassembly, analogous to existing UDP Fragmentation
>>>>>> Offload (UFO) which performs IP fragmentation of UDP packets. The
>>>>>> impediment with supporting this is that hardware devices would need to
>>>>>> perform protocol processing on trailers as opposed to headers. Nearly
>>>>>> all hardware devices, including switches and NICs, are optimized to
>>>>>> process protocol headers and in modern devices they are quite
>>>>>> programmable in that regard. However, they typically rely on a parsing
>>>>>> buffer that holds the first N bytes of the packet and assume that all
>>>>>> the protocol headers lie within that. They wouldn't process data after
>>>>>> that header in the fast path at least, and almost certainly would have
>>>>>> capability to process protocol headers at that end of a large packet.
>>>>>> I am doubtful we'll ever see hardware support for trailer protocols,
>>>>>> and hence it's unlikely we'd see accelerations for UDP options like we
>>>>>> have for TCP.
>>>>>> 
>>>>>> Tom
>>>>> 
>>>>> OK.... Is there any way that we could design to enable this?
>>>>> 
>>>>> I'm "fishing" for ideas because I know you've talked about the various
>>>>> offload methods.
>>>>> 
>>>> 
>>>> Gorry,
>>>> 
>>>> My suggestion was to place UDP options after the UDP header. Instead
>>>> of just placing fragment header after the UDP header, place all the
>>>> UDP options there and then follow that by the Payload. So packet looks
>>>> like:
>>>> 
>>>> +-------------------+
>>>> |   UDP header  |
>>>> +-------------------+
>>>> |  UDP options  |
>>>> +-------------------+
>>>> |     Payload      |
>>>> +-------------------+
>>>> 
>>>> Now this looks a lot like a TCP packet and other variable length
>>>> headers which we know how to handle. For zero copy we can do
>>>> header/split by programming emerging smart devices to split through
>>>> UDP options in one buffer and payload in another thereby also
>>>> eliminating any need to move headers or data around.
>>>> 
>>>> Tom
>>>> 
>>>>> So for options in the trailer, this is clearly an impediment.
>>>>> 
>>>>> For UDP-Opt fragmentation, I understand there is no standard UDP payload,
>>>>> 
>>>>> .... only an option containing a fragment, so the Fragment information
>>>>> would actually be in the" first N bytes of the packet".
>>>>> 
>>>>> So, what do you think  could be most likely helpful to enable fastpath
>>>>> accelleration for the fragments?
>>>>> 
>>>>> Gorry
>>>>> 
>>>>>>> This method assumes that we try to keep FRAG early in the packet - preferably right after OCS. The later it comes, the more additional bytes we need to move to “fix” the copy (beyond the 8 bytes noted above).
>>>>>>> 
>>>>>>> —
>>>>>>> 
>>>>>>> This method is the only reason we would want to allow options after non-terminal fragments - basically to keep the fragment toward the front of the packet, using the rule that post-noninitial frag options still operate on the fragment, rather than waiting for reassembly. The exception is the terminal fragment, where post-terminal fragment options operate on the reassembled packet.
>>>>>>> 
>>>>>>> Joe
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>> 
>