Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12

Joseph Touch <touch@strayalpha.com> Mon, 14 June 2021 18:04 UTC

Return-Path: <touch@strayalpha.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C467E3A2D00 for <tsvwg@ietfa.amsl.com>; Mon, 14 Jun 2021 11:04:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.319
X-Spam-Level:
X-Spam-Status: No, score=-1.319 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=strayalpha.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JCHhw2KGxlJe for <tsvwg@ietfa.amsl.com>; Mon, 14 Jun 2021 11:04:08 -0700 (PDT)
Received: from server217-4.web-hosting.com (server217-4.web-hosting.com [198.54.116.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB9143A2CB6 for <tsvwg@ietf.org>; Mon, 14 Jun 2021 11:04:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=strayalpha.com; s=default; h=To:References:Message-Id: Content-Transfer-Encoding:Cc:Date:In-Reply-To:From:Subject:Mime-Version: Content-Type:Sender:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=1rPvpv0Qla/aWiOOlEF9nKGqLa4xTPSYemS8UG3j4rU=; b=xWti5Z1nrqB6iFwYy+FpVJy1cc 39cdHMG3pKebFvOfnN94ZqSbRdXxuGMoIuPali8fMaCjiN3y4Yvndx2MuUrKNWQdT6BNWeQ8b6L5Y +VFJXix2zsjjKM9kNb9tARqpFty4O62FwRIp5ZST55HK5Dy3704+70hufgidCObIeK4hLbKKzuvFe rLY27iSXwf6jOnkUWGFCYIzNONJeV3yqu579+MEJ9YcgNkkOpXsy4NeWLvmY6D6AzWnpiUcFsfJ5M ksPEFZiiE+su2Q4cuJjsq3pzWzWfR3UBhwojfUPF+eZqeoUepGdNLlyg8dD2aQfFIRP4Si/3qRG7g ew9KOD4Q==;
Received: from cpe-172-250-225-198.socal.res.rr.com ([172.250.225.198]:49649 helo=smtpclient.apple) by server217.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <touch@strayalpha.com>) id 1lsqwL-002JMQ-E7; Mon, 14 Jun 2021 14:04:06 -0400
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\))
From: Joseph Touch <touch@strayalpha.com>
In-Reply-To: <CALx6S34zXstyhwe8naRozNK3=dtHU-FV6F-L4uv1CK9Yim_-7w@mail.gmail.com>
Date: Mon, 14 Jun 2021 11:04:00 -0700
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, TSVWG <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <FC3C3A51-B1D1-4893-8184-3F9CB83F3E66@strayalpha.com>
References: <D9B2E315-5C7A-4BE9-97A9-AF627F6FD6FF@strayalpha.com> <DCF3D0D3-83E0-4F84-8C1F-57DF9EE63C59@strayalpha.com> <CALx6S37Hx1zafjjr_fnG1ZY7afGEF081QfV5yhdfPftM57Ro0g@mail.gmail.com> <5A6C1B4E-491E-4F62-82EF-F49292F433AB@strayalpha.com> <CALx6S34zXstyhwe8naRozNK3=dtHU-FV6F-L4uv1CK9Yim_-7w@mail.gmail.com>
To: Tom Herbert <tom@herbertland.com>
X-Mailer: Apple Mail (2.3654.100.0.2.22)
X-OutGoing-Spam-Status: No, score=-0.5
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server217.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - strayalpha.com
X-Get-Message-Sender-Via: server217.web-hosting.com: authenticated_id: touch@strayalpha.com
X-Authenticated-Sender: server217.web-hosting.com: touch@strayalpha.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/iAEpfh-uzmC604eu-xkmaGdMOrg>
Subject: Re: [tsvwg] A review of draft-ietf-tsvwg-udp-options-12
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jun 2021 18:04:13 -0000


> On Jun 14, 2021, at 10:57 AM, Tom Herbert <tom@herbertland.com> wrote:
> 
> On Mon, Jun 14, 2021 at 10:44 AM Joseph Touch <touch@strayalpha.com> wrote:
>> 
>> Hi, Tom,
>> 
>> OCS has been required since -08 (Sept. 2019). Here’s the relevant text:
>> 
>>>> The OCS MUST be included when the UDP checksum is nonzero and UDP
>>   options are present.
> 
> Joe,
> 
> Yes, but in the case when that's not true the problem of corrupted
> option type still exists.

Not under the conditions where UDP checksum can be zero, as per RFC 8200. In that case, the entire UDP packet becomes a payload in another frame, so it would already be covered. The reason to allow this is to avoid double-protection effort for tunnels. This is no different.

> Also, this rule requires that the
> implementation needs to find the checksum option in a list of TLVS,

If we don’t need to reorder bytes for DMA (which we don’t now), we can require it to be first if present, preceded only by NOPs for alignment (we can remove that too if needed).

Joe

> that makes implementation harder and leads to some strange scenarios--
> like a receiver may process twenty options only to get to the 21st
> which is checksum and if the checksum cannot be validated then the
> node just wasted cycles processing TLVs for no reason. In TCP, for
> instance, we validate the checksum before processing any options so
> this scenario cannot occur.
> 
> Tom
> 
>> 
>> Joe
>> 
>> On Jun 14, 2021, at 10:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>> 
>> Joe,
>> 
>> I suggest that the UDP options should be preceded by a four byte
>> header consisting of one byte type, one byte length, and two byte
>> checksum. As I've mentioned previously, making the checksum optional
>> is inherently problematic because it cannot protect against a
>> corrupted type field for the optional checksum. e.g. a single bit flip
>> in the type field for the checksum could turn the checksum option into
>> some other type and there is no way to detect that.
>> 
>> Tom
>> 
>> Tom
>> 
>> 
>> On Mon, Jun 14, 2021 at 10:20 AM Joe Touch <touch@strayalpha.com> wrote:
>> 
>> 
>> Ps - we need an option length field to make fragments look like tcp. I can put that in - do we want that in OCS? Or independent?
>> 
>> On Jun 14, 2021, at 10:16 AM, Joe Touch <touch@strayalpha.com> wrote:
>> 
>> FYI that’s what fragments look like. We can’t do this for non fragments.
>> 
>> On Jun 14, 2021, at 10:03 AM, Tom Herbert <tom@herbertland.com> wrote:
>> 
>> On Mon, Jun 14, 2021 at 9:31 AM Gorry Fairhurst <gorry@erg.abdn.ac.uk> wrote:
>> 
>> On 14/06/2021 17:17, Tom Herbert wrote:
>> On Sun, Jun 13, 2021 at 9:31 PM Joseph Touch <touch@strayalpha.com> wrote:
>> 
>> 
>> 
>> On Jun 13, 2021, at 7:20 PM, C. M. Heard <heard@pobox.com> wrote:
>> 
>> If we DO support zero-copy and thus want to allow non-terminal fragments to have post-fragoption options that operate on each fragment, then we would add THISFRAGLEN to the nonterminal format and issue different KIND numbers to nonterminal/terminal fragment.
>> 
>> 
>> 
>> I for one would appreciate further discussion of these last points. I admit that I have failed to grasp Joe's message on the RDMA thread, and I would appreciate some time to think about it.
>> 
>> 
>> Sure - here’s how it all works. Note that this is relevant mostly for long transfers with persistent UDP fragmentation; if that is assumed to be ‘adjusted’ at the app layer (as QUIC does), then we don’t need zero-copy support...
>> 
>> - right now, UDP data can be zero-copied when received into user space, starting with the user data
>> 
>> Only if the device supports header/data split where the headers are in
>> one buffer and UDP data is in aligned buffer.
>> 
>> - if we add options, UDP data can still be zero-copied because it hasn’t moved (it still begins the payload
>> - however, fragments are different because (esp given the merging of frag and lite) they don’t start at the beginning of data
>> - they always start after OCS (which I think we should make fit the uniform KIND/LEN/OCS format of 4 bytes)
>> - if the FRAG comes next, then we can move the frag content around a little and still support zero-copy
>> 
>> notably, we move the first 10 bytes of the fragment to the end
>> 4 for OCS
>> 6 for FRAG (assuming FRAG includes KIND/OPTLEN/FRAGOFFSET/ID/FRAGLEN)
>> that way we can zero-copy the frag packet into place, then just copy those last 8 bytes over OCS and the FRAG header
>> 
>> An obvious feature we'd want is NIC hardware to do UDP options
>> fragementation and reassembly, analogous to existing UDP Fragmentation
>> Offload (UFO) which performs IP fragmentation of UDP packets. The
>> impediment with supporting this is that hardware devices would need to
>> perform protocol processing on trailers as opposed to headers. Nearly
>> all hardware devices, including switches and NICs, are optimized to
>> process protocol headers and in modern devices they are quite
>> programmable in that regard. However, they typically rely on a parsing
>> buffer that holds the first N bytes of the packet and assume that all
>> the protocol headers lie within that. They wouldn't process data after
>> that header in the fast path at least, and almost certainly would have
>> capability to process protocol headers at that end of a large packet.
>> I am doubtful we'll ever see hardware support for trailer protocols,
>> and hence it's unlikely we'd see accelerations for UDP options like we
>> have for TCP.
>> 
>> Tom
>> 
>> 
>> OK.... Is there any way that we could design to enable this?
>> 
>> I'm "fishing" for ideas because I know you've talked about the various
>> offload methods.
>> 
>> 
>> Gorry,
>> 
>> My suggestion was to place UDP options after the UDP header. Instead
>> of just placing fragment header after the UDP header, place all the
>> UDP options there and then follow that by the Payload. So packet looks
>> like:
>> 
>> +-------------------+
>> |   UDP header  |
>> +-------------------+
>> |  UDP options  |
>> +-------------------+
>> |     Payload      |
>> +-------------------+
>> 
>> Now this looks a lot like a TCP packet and other variable length
>> headers which we know how to handle. For zero copy we can do
>> header/split by programming emerging smart devices to split through
>> UDP options in one buffer and payload in another thereby also
>> eliminating any need to move headers or data around.
>> 
>> Tom
>> 
>> So for options in the trailer, this is clearly an impediment.
>> 
>> For UDP-Opt fragmentation, I understand there is no standard UDP payload,
>> 
>> .... only an option containing a fragment, so the Fragment information
>> would actually be in the" first N bytes of the packet".
>> 
>> So, what do you think  could be most likely helpful to enable fastpath
>> accelleration for the fragments?
>> 
>> Gorry
>> 
>> This method assumes that we try to keep FRAG early in the packet - preferably right after OCS. The later it comes, the more additional bytes we need to move to “fix” the copy (beyond the 8 bytes noted above).
>> 
>> —
>> 
>> This method is the only reason we would want to allow options after non-terminal fragments - basically to keep the fragment toward the front of the packet, using the rule that post-noninitial frag options still operate on the fragment, rather than waiting for reassembly. The exception is the terminal fragment, where post-terminal fragment options operate on the reassembled packet.
>> 
>> Joe
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>