Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt

Marek Majkowski <> Wed, 08 July 2020 14:50 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8B1313A0C99 for <>; Wed, 8 Jul 2020 07:50:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.848
X-Spam-Status: No, score=-1.848 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 5Xe2_E280t8W for <>; Wed, 8 Jul 2020 07:50:44 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::12e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B44DC3A0C7F for <>; Wed, 8 Jul 2020 07:50:43 -0700 (PDT)
Received: by with SMTP id g2so27057737lfb.0 for <>; Wed, 08 Jul 2020 07:50:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kSfNrzMOGR2DOpJSZVOwnUcYneiuhs02tNA9mVvyaWE=; b=pnMBshT2m8p/2sv+KqB/QhIVUbkb+6AIvX12LIqvEisRpOFEMSBlOUZQ3m5KMji/FR nNtzUK9Qp0quv7ajJES8YnXFUK61j/3WtPM9v3SN6A8cr7mPNIJkctq9VzsbVdilVP0v Fn86H4XSEgYY3TRD/ZWQ7ZKW7IjqR+CoJoR6I3sdh3MRDPC9nvHIq1ja7oiKQDokdrPi 249Quk2ZzuRkl3vXv2DDNACwjAu8iF92KLaEXEk2mTjfs9yaqWbHjkj/ardaUJuAAihJ E4+v3rgD6l8QRcWvnjWEIA39UgG6AWjBoo4RLttms55T4VSWcYzYok06BArG1WAZ5bzl PmnQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kSfNrzMOGR2DOpJSZVOwnUcYneiuhs02tNA9mVvyaWE=; b=getDAKpOgm/zyaqgTsRFeF97TfSUPY7BlBtwYnIUj3frGslvG1Za5qmzb9EVjXSHtQ z6jKm4aH2qfxldneDMtrBO+CvltVMtvvSfxwyft9YfBpat6zObX2JlpaSvd6rpgm9RuT AmE8/kb5+Yx2nv54WLky4loIDvk1dzZp7oePfc71JbP6YfIFUMZXZzgJkiYaL/ArvmM+ Og9H6nYaHjKNCy2yBqBpqGxcRPXeQ2N8yMeL+lHSjylFrCz7BxdHVk1EGwYyYZNIw0kD 8AfcU+nyLmS9BYqEDUmsjMGpHeXLn2O07fTfbJDIT/5E3YuIs4HG4OW6K4MlmlpvihzZ 7lmQ==
X-Gm-Message-State: AOAM5313uVvFymlo9CkBPuq+l8iRBjq8TDRkzv9AH0490IVYhn7IMIpk 9MVg9Y4YbcZWRt4uX0KHTAyq4oKVpey9GNnJbf+oJQ==
X-Google-Smtp-Source: ABdhPJzEKVHNFfZWpYI0t3jvZeEB24qfrPmwrvHUE30xGyd9oN+RE5MVNHenwuzsfAALEbrpjnwvWkbCKS+ZazKKIy0=
X-Received: by 2002:a19:87c2:: with SMTP id j185mr36556513lfd.183.1594219841694; Wed, 08 Jul 2020 07:50:41 -0700 (PDT)
MIME-Version: 1.0
References: <> <>
In-Reply-To: <>
From: Marek Majkowski <>
Date: Wed, 8 Jul 2020 16:50:30 +0200
Message-ID: <>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 08 Jul 2020 14:50:46 -0000

On Wed, Jul 8, 2020 at 10:01 AM <> wrote:
> Paul Vixie and I submitted draft-ietf-dnsop-avoid-fragmentation-00.
> Please review it.


> UDP requestors and responders SHOULD send DNS responses with
> IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield
> either a silent timeout, or a network (ICMP) error, if the path
> MTU is exceeded.

When MTU is exceeded the sender might also receive plain old EMSGSIZE
error on sendto(). I would love to see an example on what
IP_MTU_DISCOVER settings authors expect. This option is notoriously
hard to get right.

> The maximum buffer size offered by an EDNS0 initiator SHOULD be
> no larger than the estimated maximum DNS/UDP payload size...

This seems to indicate that EDNS0 over TCP should have a small buffer
size as well. Consider wording like "...buffer size offered by an
EDNS0 initator over UDP...".

> Fragmented DNS/UDP messages may be dropped without IP reassembly

Not sure what it has to do with the draft. Are we worried about
request fragmentation and allowing the DNS server to drop fragmented
requests? Are we worried about response fragmentation?

I have two problems with this proposal. First, it doesn't mention IPv4
vs IPv6 differences at all. In IPv4 landscape fragmentation, while a
security issue, is generally fine. In the IPv6 world, fragmentation is
disastrous - packets with extension headers are known to be dropped.

Second, this proposal assumes that path MTU detection works correctly.
This is surprisingly optimistic. Let's consider IPv6 - in IPv6 the
smaller path MTU < 1500 is very common.

Let's say a DNS auth server sent an IPv6 DNS response packet exceeding
path MTU. An intermediate router will drop the offending packet and
one of three scenarios will happen:

- (A) No ICMP PTB message is sent back.

- (B) ICMP PTB message is sent back, but fails to be delivered.

- (C) ICMP PTB message is sent back and delivered correctly to the server.

All three scenarios are disastrous on the practical internet. The
proposal assumes (A) and (B) will rarely happen, and puts the
responsibility on the DNS client to retry over TCP. This will cause
unnecessary timeouts and degrade the overall quality of the service.

But perhaps most importantly even option (C) will *not* result in good
service. Consider a setup with multiple DNS servers behind an ECMP
router, or another L4 load balancer. Even if the return ICMP will hit
back the correct server - which is far from obvious - the ICMP will
update the Path MTU on *one server*. If a client attempts to retry the
query, as suggested by the proposal, it will most likely hit another
server, which is not aware of non standard Path MTU.

These days DNS Auth installations use ECMP routing for load balancing.
A single physical box serving important DNS is a rare occurrence.

In this proposal all three (A), (B), and (C) scenarios will result in
dropped responses. DNS client needs to wait for timeout, retry over
UDP, wait more and eventually retry over TCP. This is bad.

We could fix (C) by making the DNS server to capture the ICMP PTB in
DNS server code. The ICMP payload often has enough context for the DNS
server to prepare another reply. This reply of course should be sent
with lowered MTU.

In other words, I'm asking for capturing ICMP PTB in DNS servers,
unpacking the paylad and treating it as another request to be handled.
This would solve (C). Also, note this opens an interesting DDoS
vector, but this is another story.

On Linux it is possible to capture the ICMP PTB without privileges, by
setting IP_RECVERR and inspecting MSG_ERRQUEUE. In IPv4 the PTB
messages often have 520 bytes of payload and in IPv6 1184 bytes. This
is enough context to build another response, without having to wait
for any timeout.