Re: [tsvwg] UDP options and header-data split (zero copy)

"C. M. Heard" <heard@pobox.com> Sun, 01 August 2021 20:32 UTC

Return-Path: <heard@pobox.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B71DD3A0DFF for <tsvwg@ietfa.amsl.com>; Sun, 1 Aug 2021 13:32:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=pobox.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A-G4Q4_PuX1e for <tsvwg@ietfa.amsl.com>; Sun, 1 Aug 2021 13:32:30 -0700 (PDT)
Received: from pb-smtp20.pobox.com (pb-smtp20.pobox.com [173.228.157.52]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F02F13A0DFC for <tsvwg@ietf.org>; Sun, 1 Aug 2021 13:32:29 -0700 (PDT)
Received: from pb-smtp20.pobox.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 8C3DF151496 for <tsvwg@ietf.org>; Sun, 1 Aug 2021 16:32:26 -0400 (EDT) (envelope-from heard@pobox.com)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h= mime-version:references:in-reply-to:from:date:message-id:subject :to:cc:content-type; s=sasl; bh=ytlrmkz0x0yZScsPC1QuI8Iaszh6OTOd Pv472MbZ7AE=; b=t8tFYR71j2n+VhRdRpDx5S1qOEO7zA2FEoSNt0YXD9ybDika V08Cxjs3ofzcx0bX7jKSMs1TC5gP4wqNL9zZ4C1Vbucs/Qad06yECSVI5E/UJJYt hfvT4c5ndKSTA0TvBWgCqDV4pm3G32dS4fYiarkcoNQt9QAxQ5/QjmaxL30=
Received: from pb-smtp20.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 853D4151495 for <tsvwg@ietf.org>; Sun, 1 Aug 2021 16:32:26 -0400 (EDT) (envelope-from heard@pobox.com)
Received: from mail-pj1-f53.google.com (unknown [209.85.216.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp20.pobox.com (Postfix) with ESMTPSA id 3925F151493 for <tsvwg@ietf.org>; Sun, 1 Aug 2021 16:32:24 -0400 (EDT) (envelope-from heard@pobox.com)
Received: by mail-pj1-f53.google.com with SMTP id dw2-20020a17090b0942b0290177cb475142so2683951pjb.2 for <tsvwg@ietf.org>; Sun, 01 Aug 2021 13:32:24 -0700 (PDT)
X-Gm-Message-State: AOAM533M/FhsKI1RyeGPiRq/A1CnGFDHGsWPe/MdV5wVTGaS4yGTcZIE XH6cn21nLvsOA6tSWqiUB6cC1YKMbWVgDNMdeJk=
X-Google-Smtp-Source: ABdhPJzc6YUFLLZfZM96T+AxDjjYaYQoa6kMednx8oL9zoOSJB6vFWGKka4fWeRbqCkewc5zPAYRysljnIO53RHKglE=
X-Received: by 2002:a17:90a:f68f:: with SMTP id cl15mr14113446pjb.234.1627849943134; Sun, 01 Aug 2021 13:32:23 -0700 (PDT)
MIME-Version: 1.0
References: <CALx6S37zVVXnCH+Dv7_QXgwOoqcL4h0SThh+LnmAWn-5enprZQ@mail.gmail.com> <FA155FD9-2319-405C-B082-C023DEC2BF28@strayalpha.com> <CALx6S3435ZjAz8ECgbFbH=Hxm-cXAGRQjTbxgtGb9U-CTXMw=A@mail.gmail.com> <C8CE3912-55B2-4DC0-AB39-2D6EA6953500@strayalpha.com> <1178DE92-175A-4293-8A97-9B6FEBAF7B02@strayalpha.com> <CALx6S35tB=j5y3-xr5S22y0p+WJxKX_hqk8rm30oCruFxZp5Dw@mail.gmail.com> <87662B22-F63B-4EA4-94B3-DF4B2439A4E1@strayalpha.com>
In-Reply-To: <87662B22-F63B-4EA4-94B3-DF4B2439A4E1@strayalpha.com>
From: "C. M. Heard" <heard@pobox.com>
Date: Sun, 01 Aug 2021 13:32:11 -0700
X-Gmail-Original-Message-ID: <CACL_3VFO2=J2jYdzcX9o5bzUYtsDunpKWD4f_g2ypGNWcbqGAA@mail.gmail.com>
Message-ID: <CACL_3VFO2=J2jYdzcX9o5bzUYtsDunpKWD4f_g2ypGNWcbqGAA@mail.gmail.com>
To: Joseph Touch <touch@strayalpha.com>, Tom Herbert <tom@herbertland.com>
Cc: tsvwg <tsvwg@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000009bb78805c8855964"
X-Pobox-Relay-ID: 944016D2-F307-11EB-AADC-D5C30F5B5667-06080547!pb-smtp20.pobox.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/oC6VxwwPdCL71lMjUo0zQ5BnVIM>
Subject: Re: [tsvwg] UDP options and header-data split (zero copy)
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Aug 2021 20:32:33 -0000

On Sun, Aug 1, 2021 at 10:48 AM Joseph Touch wrote:

> Also, the trailing variant allows per-reassembled options to be
> arbitrarily long (limited by the reassembled length), rather than requiring
> them to fit inside a single fragment.
>


If that is the intent, then draft-ietf-tsvwg-udp-options-13#section-5.5
needs significant clarification. What I read there tells me that the
per-reassembled-datagram options must all appear in the terminal fragment,
following the user data. If the terminal fragment user data length is zero,
that does allow for a long options field, but it is limited by the maximum
fragment length, not the reassembled datagram length.


We’ve been down the path of fixed headers before. It wastes space for some
> uses to conserve it for others. E.g., in tunnel cases where UDP CS==0, it
> would waste space for OCS when not needed.
>


Indeed we have, but AFAICT a definitive consensus has not been reached.

As you know, recently I have been advocating a fragment format along the
following lines:


                   +--------+--------+--------+--------+
                   |  Source Port    |   Dest. Port    |

                   +--------+--------+--------+--------+
                   |   UDP Len=8     |  UDP Checksum   |

                   +--------+--------+--------+--------+
                   | OCS=2  | LEN=4  | Option Checksum |
                   +--------+--------+--------+--------+
                   |       ... Other Options ...       |
                   +--------+--------+--------+--------+
                   | FRAG=X | LEN=8  |  Frag. Offset   |
                   +--------+--------+--------+--------+
                   |          Identification           |
                   +--------+--------+--------+--------+
                   |       ... Fragment Data ...       |
                   +--------+--------+--------+--------+


where X is one of two codepoints, depending on whether the fragment is a
terminal or non-terminal fragment.

This would achieve the goal of pushing all user data in self-contained
fragments to the end of the packet and would thereby allow for checksum
offload of encapsulated packets on commonly available hardware. However,
there's one thing that Tom's header would give us that a naked stack of
TLVs would not: it would enable a NIC to perform header-data split without
parsing a long stack of TLVs. We still don't get around the need to loop
through all those TLVs at some point, but that does not have to be handled
in hardware that is not well-suited for it.

In https://mailarchive.ietf.org/arch/msg/tsvwg/XZxL29UA-95ReA72mxv5-kEytK0/
I floated the following variant of Tom's proposal"

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\
   |        Source port            |      Destination port         | |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -Base Hdr
   |        UDP Length = 8         |   UDP Header Checksum         | |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/
   |        Payload Offset         |  Option+Payload Checksum      |\
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
   |                                                               | |
   ~                           UDP Options                         ~ -Ext Hdr
   |                                                               | |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/
   |                                                               |
   ~                         Payload Data                          ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In this case the fixed header just replaces the currently-defined OCS,
occupying the exact same amount of space while using the same algorithm.
And it is neutral as to whether FRAG is defined as in
draft-ietf-tsvwg-udp-options-13, or as in my counterproposal, or in some
other way, as long as FRAG data is contained in the option space.

For the record:

1. I do not think that it is useful to allow an arbitrary UDP option to be
encoded either as a per-fragment or per-datagram option (or perhaps both).
As I have stated previously, that usage should be spelled out in the option
definition. I believe that there also needs to be a better specification of
what the application protocol running on top of UDP will be able to request
of the UDP layer regarding options. This consideration informs how the FRAG
option needs to be designed.

2. I do not agree with the contention that there is any good reason for
reassembled UDP datagrams with options to wind up in the same format as the
wire format for legacy UDP datagrams with an options trailer. That is an
implementation detail, generally invisible to the outside world. In common
existing implementations options and user data generally follow different
paths to get between the user and the protocol stack, and it is not at all
obvious that the machinations to force these two things into the same mold
provide any benefit to the implementation. As a corollary, I see no
compelling reason why the legacy format cannot remain as is, with a fixed
header replacing OCS in the case where UDP Len = 0.

3, I do not agree that consenting endpoints should be disallowed from using
just the legacy format with a trailer, or just the format with UDP Len = 0.
We would not expect this of a fully conforming implementation, but if it is
useful in specific use cases, I can't see the downside in allowing it, no
matter how hard I squint.  Unlike some other limited domain proposals that
have been floated, this is not something that is likely to cause harm to
the general internet if it escapes.

I expect strong disagreement on all these points. It would be nice to hear
from folks who have been silent in this latest round of discussions.

Thanks and regards,

Mike Heard