Re: [tsvwg] RDMA Support by UDP FRAG Option

Tom Herbert <tom@herbertland.com> Fri, 18 June 2021 16:44 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D6F2E3A1928 for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 09:44:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fNdjfp37q0UB for <tsvwg@ietfa.amsl.com>; Fri, 18 Jun 2021 09:44:29 -0700 (PDT)
Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 89C223A1919 for <tsvwg@ietf.org>; Fri, 18 Jun 2021 09:44:27 -0700 (PDT)
Received: by mail-ej1-x635.google.com with SMTP id ho18so16828452ejc.8 for <tsvwg@ietf.org>; Fri, 18 Jun 2021 09:44:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=mQrTM/K1JQ5EIRPedjacpHJ0jlAJEITnPYSdZNS5Rso=; b=1lT8ReJcgBJtynAIsOuVgFZ7YLDa3IARaDWk/j12fSiHNrWK68+NwXOkX3gcW6WgAA UIfRewvGB7QSuSV+1WveNv1wcBBBQ6Aw0Xsjg4S2ew6gvSxHRz9moYYdjEA0wJRIXULr DvPsV4Etk9/v/Vqso7Vppyy44YILZrv1gCxFHy+8IMdnbH/ac5dfPFUDo9ydCwA7iOF5 +Ox+yYmo9VXTizM98g7JryfVIDsqugPRToRMmWtzna7viS5zWbAybNVZTSXZtFfqmnBm AlX0WD/6Z9Z/rHr3IgveDMIcrW6P+ObqbJT1pzEK9Zs8idssD5+PrLUGow8ftIueuQHm Jetw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=mQrTM/K1JQ5EIRPedjacpHJ0jlAJEITnPYSdZNS5Rso=; b=nGKlMRbEqw3hbiPvqd/qqEGQ38wUNnhcNVSxeXSPeQeUEQRdT0eY5LNEUcOiqfwfWD Ff4esi1PJJPZWtrNvut1uuA2Be6Is6mG7OvOi5dB1jX/NaadRsy1eak+CIgEFCFiAFPR B/hZ4xPhaMm4dNn7+7xoCuiZWSMeFNc3NEpI2Je8DudHAOm4IKMifZcHXBcB0DpN/ZDt lq7IxGcOvHzDV6Kvd4pSYUItBa0w6eGnSMm0tFnLjhSnFvaQowC9K2KQDP0PGQ0gOmP1 ODdwoiG6DijHnE99cn3bcL6ZAy/cY6Hn73GNBABthQew3B1NQuL+NRHI+lYak1ZYVbwz 7qGg==
X-Gm-Message-State: AOAM532omkZn92WHR5bp/Apst68rIwlqvr88mWTJC7zQbiDelTcrIMDF 8xK3/wVqcmaTtY+4RCQ8yPSwAOhM9xylPSV8hAsCqQ==
X-Google-Smtp-Source: ABdhPJxSyeNKQqQ4SUK49I9BjGMhi3yQGxColB+QOoWKvmjO1AI7BoOKBgrx56drQy7NAkp+8vXsiGwy1gApl1aKUBA=
X-Received: by 2002:a17:906:149a:: with SMTP id x26mr12250901ejc.41.1624034664874; Fri, 18 Jun 2021 09:44:24 -0700 (PDT)
MIME-Version: 1.0
References: <CACL_3VEyLdQZ-3hvzXxyA8ehtWs2hXESZ2OqyAx+BeSg85+-cA@mail.gmail.com> <CACL_3VFE4TjKvmkfZjvNpWo6vVfKjz5w85=Q+yqnYZKcwbYLmQ@mail.gmail.com> <63FFC34B-2179-47F1-B325-21CAC3D1543A@strayalpha.com> <CACL_3VHTfxWaBj7TFEmBXBqovrrAj7XuFEZFUag_iBHr3Hx09g@mail.gmail.com> <0EBFC9B0-591A-4860-B327-6E617B83F4D1@strayalpha.com> <CALx6S34pT81TbfQDk2vKF8wBrXL312As79K=rEzUQ3Lmg7UvpA@mail.gmail.com> <7C51D926-9DBB-41F5-93B2-10F716F672B1@strayalpha.com>
In-Reply-To: <7C51D926-9DBB-41F5-93B2-10F716F672B1@strayalpha.com>
From: Tom Herbert <tom@herbertland.com>
Date: Fri, 18 Jun 2021 09:44:13 -0700
Message-ID: <CALx6S35Mpkr9QhxuPBjMnEDLMD5h4z4KN93AemciDPUfaWKAKw@mail.gmail.com>
To: Joseph Touch <touch@strayalpha.com>
Cc: TSVWG <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/U6Uwa0D5y9xFa5l0YY8Kz2We0Do>
Subject: Re: [tsvwg] RDMA Support by UDP FRAG Option
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jun 2021 16:44:35 -0000

On Wed, Jun 16, 2021 at 5:16 PM Joseph Touch <touch@strayalpha.com> wrote:
>
>
>
> > On Jun 16, 2021, at 10:16 AM, Tom Herbert <tom@herbertland.com> wrote:
> >
> >> ...
> >> The idea is that the data can be placed directly in memory, with the last 3 bytes overwriting the first three. By moving only the space covered by the options, direct data placement can be used:
> >>
> >> 123de - write up to the last 3 bytes
> >> a23de - overwrite the last 3 bytes starting at the beginning (next 2 lines too)
> >
> > That presumes that the memory holding the payload is writeable which
> > is not always true. For instance, that would not be the case if the
> > device DMAs the data into GPU memory.
>
> This approach is already OBE and thus not relevant.
>
If you don't like the receive problems, consider those on transmit.
The most common use case of zero copy is sendfile where an application
sends data directly from files, specifically this is done by DMA'ing
buffers in the page cache directly to the NIC without data going
through userspace application. This technique is extremely common in
video servers so there is no question that it's relevant. The
algorithm requires reading and writing three bytes of the application
payload.

If the algorithm is performed in the userspace application then we
need to read the first three bytes of the zero copy data into
userspace, perform a sendmsg consisting of at least three parts: the
three byte FRAG option, the zero copy data offset by three bytes,
followed by the trailer containing the originally value. So any time
we want to invoke sendfile we'd need to read part of the data that we
wanted to do zero copy on (expensive syscall). But there is another
subtle problem, between the time we read the first three bytes of data
and invoke the sendmsg, the page cache data may be changed by a writer
of the file. For instance, the first 128 bytes of the page cache may
have been modified such that the data actually sent is now corrupted.

If the algorithm is performed in the kernel, one might think that we
can just modify the page cache directly, however page cache cache is
shared so modifying it, even temporarily, risks data corruption for
another reader-- so this is a non-starter. We would need to perform
the same steps that I described above in the userspace case where
bytes are copied and we need to offset the zero copy data. The problem
of the page cache being modified still exists, although since we're in
the kernel we may be able to lock the pages during the operation.

In any case, to make zero copy work correctly for both TX and RX with
an algorithm requires manipulating application payload is going to be
much harder to implement and get right than zero copy in plain UDP or
TCP.

Tom

> However, the approach above remains an expectation of any DMA transfer - there are always edge aspects that need to be adjusted, whether by a separate DMA or some other mechanism. It doesn’t have to be one byte at a time, as shown above.
>
> Joe
>