Re: [tsvwg] RDMA Support by UDP FRAG Option

Joe Touch <touch@strayalpha.com> Fri, 18 June 2021 17:04 UTC

Return-Path: <touch@strayalpha.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B2F903A19E1 for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 10:04:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.319
X-Spam-Level:
X-Spam-Status: No, score=-1.319 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=strayalpha.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oUW8rOL1gY6r for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 10:04:32 -0700 (PDT)
Received: from server217-4.web-hosting.com (server217-4.web-hosting.com [198.54.116.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 123CC3A19E0 for <tsvwg@ietf.org>; Fri, 18 Jun 2021 10:04:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=strayalpha.com; s=default; h=To:References:Message-Id:Cc:Date:In-Reply-To: From:Subject:Mime-Version:Content-Transfer-Encoding:Content-Type:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=9iOxoaBardsXSCv9czBtKHKS/C1ZIwAE+r+qDbnm20E=; b=aoCsI+yMSIRwKFYONcJp/bsIgD 5xVSQ/JtDc1zPJxnbVgar6fHSUHx9FkG3UBWb1fHni7uSM/z2s73cs7KPS+L01oYHsTPnYxwO+gJb t23YevEFWYZf+4EM09suMtzUgFMrvGlh8ca1eR9N4eyXn+LEWAhHySe9yrcO7dII+QcF2THWmBEo8 Biwh0aPIm2xdSeEUhFOj509nhlyixTPhvzmysgxsghmV7+JPZOHmSfnWEU6K5LZKTdtyKLhBH55sk SazEsfSIXytsPdVcIR/166iivcTU9RQFSeo6ydwlhJnbYRPfZyJL/5TRP29xi4YhJ25CQboJ4sv1i WMfuOjig==;
Received: from cpe-172-250-225-198.socal.res.rr.com ([172.250.225.198]:49488 helo=smtpclient.apple) by server217.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <touch@strayalpha.com>) id 1luHut-000muQ-0d; Fri, 18 Jun 2021 13:04:31 -0400
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (1.0)
From: Joe Touch <touch@strayalpha.com>
In-Reply-To: <CALx6S35Mpkr9QhxuPBjMnEDLMD5h4z4KN93AemciDPUfaWKAKw@mail.gmail.com>
Date: Fri, 18 Jun 2021 10:04:26 -0700
Cc: TSVWG <tsvwg@ietf.org>
Message-Id: <C21829C5-2461-4907-8479-3FEC798682E1@strayalpha.com>
References: <CALx6S35Mpkr9QhxuPBjMnEDLMD5h4z4KN93AemciDPUfaWKAKw@mail.gmail.com>
To: Tom Herbert <tom@herbertland.com>
X-Mailer: iPad Mail (18F72)
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server217.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - strayalpha.com
X-Get-Message-Sender-Via: server217.web-hosting.com: authenticated_id: touch@strayalpha.com
X-Authenticated-Sender: server217.web-hosting.com: touch@strayalpha.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/yjIuJugGmDFHbqST67C24oP1L5k>
Subject: Re: [tsvwg] RDMA Support by UDP FRAG Option
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jun 2021 17:04:37 -0000

Tom, 

This thread is OBE because we are no longer proposing individual byte moves at all. I posted the new frag option earlier this week.

Joe

> On Jun 18, 2021, at 9:44 AM, Tom Herbert <tom@herbertland.com> wrote:
> 
> On Wed, Jun 16, 2021 at 5:16 PM Joseph Touch <touch@strayalpha.com> wrote:
>> 
>> 
>> 
>>>> On Jun 16, 2021, at 10:16 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> 
>>>> ...
>>>> The idea is that the data can be placed directly in memory, with the last 3 bytes overwriting the first three. By moving only the space covered by the options, direct data placement can be used:
>>>> 
>>>> 123de - write up to the last 3 bytes
>>>> a23de - overwrite the last 3 bytes starting at the beginning (next 2 lines too)
>>> 
>>> That presumes that the memory holding the payload is writeable which
>>> is not always true. For instance, that would not be the case if the
>>> device DMAs the data into GPU memory.
>> 
>> This approach is already OBE and thus not relevant.
>> 
> If you don't like the receive problems, consider those on transmit.
> The most common use case of zero copy is sendfile where an application
> sends data directly from files, specifically this is done by DMA'ing
> buffers in the page cache directly to the NIC without data going
> through userspace application. This technique is extremely common in
> video servers so there is no question that it's relevant. The
> algorithm requires reading and writing three bytes of the application
> payload.
> 
> If the algorithm is performed in the userspace application then we
> need to read the first three bytes of the zero copy data into
> userspace, perform a sendmsg consisting of at least three parts: the
> three byte FRAG option, the zero copy data offset by three bytes,
> followed by the trailer containing the originally value. So any time
> we want to invoke sendfile we'd need to read part of the data that we
> wanted to do zero copy on (expensive syscall). But there is another
> subtle problem, between the time we read the first three bytes of data
> and invoke the sendmsg, the page cache data may be changed by a writer
> of the file. For instance, the first 128 bytes of the page cache may
> have been modified such that the data actually sent is now corrupted.
> 
> If the algorithm is performed in the kernel, one might think that we
> can just modify the page cache directly, however page cache cache is
> shared so modifying it, even temporarily, risks data corruption for
> another reader-- so this is a non-starter. We would need to perform
> the same steps that I described above in the userspace case where
> bytes are copied and we need to offset the zero copy data. The problem
> of the page cache being modified still exists, although since we're in
> the kernel we may be able to lock the pages during the operation.
> 
> In any case, to make zero copy work correctly for both TX and RX with
> an algorithm requires manipulating application payload is going to be
> much harder to implement and get right than zero copy in plain UDP or
> TCP.
> 
> Tom
> 
>> However, the approach above remains an expectation of any DMA transfer - there are always edge aspects that need to be adjusted, whether by a separate DMA or some other mechanism. It doesn’t have to be one byte at a time, as shown above.
>> 
>> Joe
>> 
>