Re: [tsvwg] UDP-Options: UDP has two ???maximums???

Joseph Touch <> Sun, 04 April 2021 04:25 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 017B03A1446 for <>; Sat, 3 Apr 2021 21:25:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 1.069
X-Spam-Level: *
X-Spam-Status: No, score=1.069 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HAS_X_OUTGOING_SPAM_STAT=2.388, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 48kGq--XkMnF for <>; Sat, 3 Apr 2021 21:25:11 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 397C43A1445 for <>; Sat, 3 Apr 2021 21:25:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=To:References:Message-Id: Content-Transfer-Encoding:Cc:Date:In-Reply-To:From:Subject:Mime-Version: Content-Type:Sender:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=DsU9zAhYF2z+2xJppSLqUGVfmPHXR5Ig3knTOrn/NlA=; b=WFzS3jarp5JVHPGEFGgapz1SE NkSLuoAZKEpGiWStmKyN+YN6g9SD1+sFDioPshrIQk72oDF5xlV6pijQFo7gR9I+lHtJnp9G1R2+F ULcsu9faU8yQ47L0w6118l74sq72+YN5L81yjTH/a9A+azmh1IgRAPCg43MlY9IepCIN2FvepAMhm 94us3EmTCuO1VXBfpWGmq5cjF0KgWI8TWfmSMVm13DWKKoUNqfgHRieUbutZFrn/qyS8VWvv4o6Dp fbo8jn4vcuknLUTylJ6WcGPUvE7g/SJ6gfpXPfHOkkoFkh9RR2eO9p/IUwO0AxzigvbEh1mF97Ixm ydG1bSw1g==;
Received: from ([]:51332 helo=[]) by with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from <>) id 1lSuJs-004CNd-SG; Sun, 04 Apr 2021 00:25:09 -0400
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.\))
From: Joseph Touch <>
In-Reply-To: <>
Date: Sat, 3 Apr 2021 21:25:00 -0700
Cc: Gorry Fairhurst <>, "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <> <> <> <> <> <> <>
To: Paul Vixie <>
X-Mailer: Apple Mail (2.3654.
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
X-From-Rewrite: unmodified, already matched
Archived-At: <>
Subject: Re: [tsvwg] UDP-Options: UDP has two ???maximums???
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 04 Apr 2021 04:25:16 -0000

> On Apr 3, 2021, at 8:31 PM, Paul Vixie <> wrote:
> On Sat, Apr 03, 2021 at 06:49:46PM -0700, Joseph Touch wrote:
>>> On Apr 3, 2021, at 6:29 PM, Paul Vixie <> wrote:
>>> you're implicitly positing a situation wherein a DNS speaker could know
>>> that the far endpoint knew about UDP options and could reassemble, and
>>> that the local endpoint (kernel) knows about UDP options and could
>>> fragment, and that using UDP Fragmentation would be seen as a better
>>> choice than leaving out optional data or signalling a need to retry
>>> with TCP.
>> Yes. Note though that the fragmentation in UDP can be used safely;
>> legacy endpoints just see (at most) packets with zero data.
> i wasn't considering it unsafe. some kind of initiation signal would
> be required though, or else the initiator would see a small set of
> apparently empty UDP payloads coming back, and wonder why.

The same could happen for nearly any new variant of DNS, too.

>>> i worry about microbursts. ...
>>> what we've learned from NFS, and high volume authoriative UDP DNS,
>>> is that the network doesn't love minimum-spaced back-to-back packets,
>>> and that if an 8KiB NFS result gets chopped into ~1500B chunks, tail
>>> drop is likely. this is the biggest source of operator pain from IP
>>> fragmentation, fwiw.
>> Although I appreciate this concern, TCP does the same kind of bursts --
> TCP has a congestion window that commonly keeps burst size within "range”.

The window isn’t based on the ability to burst an entire window’s worth of messages back-to-back.

It represents an entire round trip's worth of transmissions; when the source goes idle, the next active period can burst up to that entire window.

>> I had thought we knew about this long enough that vendors didn???t use
>> tail drop; they should have been doing AQM or at least something akin
>> to RED.
> in routers and endpoints, yes. in switches, no. to a switch (multiport
> bridge), the problem is felt too late and too near to the copper. in a fan-in
> topology there can be too many gozinta for the gozouta, and this doesn't even
> depend on link-layer flow control or whether it works, just ten gallons of
> water trying to fit into a five gallon hat.

If you’re speaking of Ethernet, there is link layer flow control, e.g., congestion notification messages or explicit on/off flow control intended to push this feedback to routers and endpoints. 

> ….
>>> further digression: the framer of messages (like TCP, or DNS, or NFS)
>>> ought to know the PMTU, which is why PMTUD was originally a non-optional
>>> feature of IPv6 until we learned that ICMPv6 was dangerous as hell and
>>> threw out PMTUD, thus leaving us with the pessimal and never-expected-
>>> to-be-used 1280 and 1232 numbers. if we can get PLPMTUD then we can make
>>> IPv6 better than IPv4 in terms of header amortization rather than (as it
>>> currently is) worse.
>> Agreed; that???s aided in UDP with options as per Gorry???s draft.
> if we implement PLPMTUD/UDP for DNS, we're going to have a decades-long
> period during which the far end doesn't understand UDP options,

s/UDP options/{TCP, QUIC, etc.}/

Yes, all new mechanisms take time. 

> ...
> the original IPng PMTUD model whereby the endpoint's routing table would
> remember a discovered MTU for each endpoint was a shinier city than this
> on a better hill. i hope to see this outcome in the PLPMTUD world,

PLPMTUD intends to cache MTUs per-endpoint exactly the same way that PMTUD did.

> so that
> (for example) TCP, QUIC, NFS, and UDP can all set their MSS accordingly,
> without each service having to do its own discovery work per endpoint.

Sure, but someone has to figure it out for others to use it. That’s why we’re working on a mechanism native to UDP - so it doesn’t rely on TCP or QUIC connections that precede its use.