Re: [tsvwg] UDP-Options: UDP has two ???maximums???

Joseph Touch <> Sun, 04 April 2021 01:50 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8094E3A0853 for <>; Sat, 3 Apr 2021 18:50:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 1.07
X-Spam-Level: *
X-Spam-Status: No, score=1.07 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HAS_X_OUTGOING_SPAM_STAT=2.388, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Cy71vnMfL2ow for <>; Sat, 3 Apr 2021 18:49:56 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 15AC33A0845 for <>; Sat, 3 Apr 2021 18:49:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=To:References:Message-Id:Cc:Date:In-Reply-To: From:Subject:Mime-Version:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=bfYOtFNX2/tdDRZydpU0xtQMHUIoWIOPo4Nh+4r2020=; b=tH7uLsWaNXkTIu1XkoRaLtD/7 3BbPHCTtaCwopDKxbyIUqyjAjCw8AgEVIQEUOVQydBVdvRRP3so/97KzDrlKvpbaWL6xKk73vT3uf WHTxgYbW/oz2eB66jvk9xsRe1wePk3hLerDJvyTD8gWpZ3o6XVGAX0QODqe0EOyYTgXIrWc/4JXWz WlwM2MiAbxD0LzG1hOvoSmvkZMI+fbXYXVWEcLP6Gk3dLo6Oee5A8RyhPraFt7NrfxpBqBSxxbzho 0TahICM/XHG0zwRVnv+RbdQcblNHNCz0QlUklO3y+QE/fid8m5uoJdBT/+7EI2V0AyTdxIsyVBhWg t/o/LTtUw==;
Received: from ([]:64881 helo=[]) by with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from <>) id 1lSrtc-000WaI-9B; Sat, 03 Apr 2021 21:49:53 -0400
Content-Type: multipart/alternative; boundary="Apple-Mail=_D28358D1-15F1-4243-B219-633E27916198"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.\))
From: Joseph Touch <>
In-Reply-To: <>
Date: Sat, 3 Apr 2021 18:49:46 -0700
Cc: Gorry Fairhurst <>, "" <>
Message-Id: <>
References: <> <> <> <> <> <> <> <> <> <> <>
To: Paul Vixie <>
X-Mailer: Apple Mail (2.3654.
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
X-From-Rewrite: unmodified, already matched
Archived-At: <>
Subject: Re: [tsvwg] UDP-Options: UDP has two ???maximums???
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 04 Apr 2021 01:50:01 -0000

Hi, Paul,

> On Apr 3, 2021, at 6:29 PM, Paul Vixie <> wrote:
>> On Apr 3, 2021, at 1:23 PM, Paul Vixie <> wrote:
>>> PLPMTUD is exactly what i wanted for DNS over UDP,
>>> and we refer to it here:
>>> ... draft-ietf-dnsop-avoid-fragmentation-04.txt
>>> note, a lot of others expect DNS to move to HTTP/3 or TCP, but even in
>>> those cases i would like to use the largest discrete PDUs that will fit.
> On Sat, Apr 03, 2021 at 04:07:22PM -0700, Joseph Touch wrote:
>> You might take a closer look at UDP options:
>> It supports UDP-layer fragmentation and reassembly with a 32-bit ID field.
> you're implicitly positing a situation wherein a DNS speaker could know that
> the far endpoint knew about UDP options and could reassemble, and that the
> local endpoint (kernel) knows about UDP options and could fragment, and that
> using UDP Fragmentation would be seen as a better choice than leaving out
> optional data or signalling a need to retry with TCP.

Yes. Note though that the fragmentation in UDP can be used safely; legacy endpoints just see (at most) packets with zero data.

> in practice this means if a DNS UDP initiator asked its networking stack to
> include a UDP option list (perhaps empty, but present), this signal could be
> made available to the DNS UDP responder by its networking stack, so that the
> capability of UDP fragmentation and reassembly could be considered when
> deciding how to respond.


> i worry about microbursts. the DNS UDP responder knows how much data it has
> committed to the network recently, and how much was committed to a given
> initiator, and can slow down a little to avoid back-to-back transmissions,
> but certainly will slow down a little and reach such avoidance if the
> initiator is not pipelining its requests.
> what we've learned from NFS, and high volume authoriative UDP DNS, is that
> the network doesn't love minimum-spaced back-to-back packets, and that if
> an 8KiB NFS result gets chopped into ~1500B chunks, tail drop is likely.
> this is the biggest source of operator pain from IP fragmentation, fwiw.

Although I appreciate this concern, TCP does the same kind of bursts - which seems like DNS would exacerbate, either by sending up to 10-packet bursts at the start of every new connection or by sending potentially larger bursts if persistent connections are used.

I had thought we knew about this long enough that vendors didn’t use tail drop; they should have been doing AQM or at least something akin to RED.

> (bizarrely, this was not an issue in 10base5 or 10baseT due to CSMA/CD, even
> where repeaters were present. but once we add bridging, such as "switches",
> there's no reliable interface-driver signal for non-local congestion.)
> candidly, i would not want to have to teach a kernel network stack how to
> pace its transmitted fragments. this observation was one of my few
> contributions to RFC6013, in which a TCB could revert to embryonic state
> but retain its CWND until either reused or pruned. alas, TCPM felt they
> had to make a choice, and chose TCPFO, which has since proved unworkable,
> thus leading to QUIC.

A few of us at ISI explored ways to adjust TCP slow start restart to avoid this issue too: <>

> further digression: the framer of messages (like TCP, or DNS, or NFS) ought
> to know the PMTU, which is why PMTUD was originally a non-optional feature
> of IPv6 until we learned that ICMPv6 was dangerous as hell and threw out
> PMTUD, thus leaving us with the pessimal and never-expected-to-be-used 1280
> and 1232 numbers. if we can get PLPMTUD then we can make IPv6 better than
> IPv4 in terms of header amortization rather than (as it currently is) worse.

Agreed; that’s aided in UDP with options as per Gorry’s draft.