Re: [tsvwg] UDP-Options: UDP has two ???maximums???

Paul Vixie <> Sun, 04 April 2021 01:29 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 51DA73A1F3F for <>; Sat, 3 Apr 2021 18:29:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id uHO9NMhkYlyh for <>; Sat, 3 Apr 2021 18:29:10 -0700 (PDT)
Received: from ( [IPv6:2001:559:8000:cd::5]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id A24C43A1F3C for <>; Sat, 3 Apr 2021 18:29:08 -0700 (PDT)
Received: by (Postfix, from userid 716) id CBFDC7599B; Sun, 4 Apr 2021 01:29:03 +0000 (UTC)
Date: Sun, 4 Apr 2021 01:29:03 +0000
From: Paul Vixie <>
To: Joseph Touch <>
Cc: Gorry Fairhurst <>, "" <>
Message-ID: <>
References: <> <> <> <> <> <> <> <> <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
Archived-At: <>
Subject: Re: [tsvwg] UDP-Options: UDP has two ???maximums???
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 04 Apr 2021 01:29:15 -0000

> On Apr 3, 2021, at 1:23 PM, Paul Vixie <> wrote:
> > PLPMTUD is exactly what i wanted for DNS over UDP,
> > and we refer to it here:
> > 
> > ... draft-ietf-dnsop-avoid-fragmentation-04.txt
> > 
> > note, a lot of others expect DNS to move to HTTP/3 or TCP, but even in
> > those cases i would like to use the largest discrete PDUs that will fit.

On Sat, Apr 03, 2021 at 04:07:22PM -0700, Joseph Touch wrote:
> You might take a closer look at UDP options:
> It supports UDP-layer fragmentation and reassembly with a 32-bit ID field.

you're implicitly positing a situation wherein a DNS speaker could know that
the far endpoint knew about UDP options and could reassemble, and that the
local endpoint (kernel) knows about UDP options and could fragment, and that
using UDP Fragmentation would be seen as a better choice than leaving out
optional data or signalling a need to retry with TCP.

in practice this means if a DNS UDP initiator asked its networking stack to
include a UDP option list (perhaps empty, but present), this signal could be
made available to the DNS UDP responder by its networking stack, so that the
capability of UDP fragmentation and reassembly could be considered when
deciding how to respond.

i worry about microbursts. the DNS UDP responder knows how much data it has
committed to the network recently, and how much was committed to a given
initiator, and can slow down a little to avoid back-to-back transmissions,
but certainly will slow down a little and reach such avoidance if the
initiator is not pipelining its requests.

what we've learned from NFS, and high volume authoriative UDP DNS, is that
the network doesn't love minimum-spaced back-to-back packets, and that if
an 8KiB NFS result gets chopped into ~1500B chunks, tail drop is likely.
this is the biggest source of operator pain from IP fragmentation, fwiw.

(bizarrely, this was not an issue in 10base5 or 10baseT due to CSMA/CD, even
where repeaters were present. but once we add bridging, such as "switches",
there's no reliable interface-driver signal for non-local congestion.)

candidly, i would not want to have to teach a kernel network stack how to
pace its transmitted fragments. this observation was one of my few
contributions to RFC6013, in which a TCB could revert to embryonic state
but retain its CWND until either reused or pruned. alas, TCPM felt they
had to make a choice, and chose TCPFO, which has since proved unworkable,
thus leading to QUIC.

further digression: the framer of messages (like TCP, or DNS, or NFS) ought
to know the PMTU, which is why PMTUD was originally a non-optional feature
of IPv6 until we learned that ICMPv6 was dangerous as hell and threw out
PMTUD, thus leaving us with the pessimal and never-expected-to-be-used 1280
and 1232 numbers. if we can get PLPMTUD then we can make IPv6 better than
IPv4 in terms of header amortization rather than (as it currently is) worse.

Paul Vixie