Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt

Mark Andrews <> Thu, 09 July 2020 01:35 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id AB7CD3A0B6B for <>; Wed, 8 Jul 2020 18:35:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id a9jf210feVL1 for <>; Wed, 8 Jul 2020 18:35:12 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id EA0453A0B47 for <>; Wed, 8 Jul 2020 18:35:12 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id E9F563AB00C; Thu, 9 Jul 2020 01:35:10 +0000 (UTC)
Received: from (localhost []) by (Postfix) with ESMTPS id D6F8916007E; Thu, 9 Jul 2020 01:35:10 +0000 (UTC)
Received: from localhost (localhost []) by (Postfix) with ESMTP id B19D216007D; Thu, 9 Jul 2020 01:35:10 +0000 (UTC)
Received: from ([]) by localhost ( []) (amavisd-new, port 10026) with ESMTP id m2sb8YzT0Ol1; Thu, 9 Jul 2020 01:35:10 +0000 (UTC)
Received: from [] (unknown []) by (Postfix) with ESMTPSA id 8F6A9160054; Thu, 9 Jul 2020 01:35:09 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.5\))
From: Mark Andrews <>
In-Reply-To: <>
Date: Thu, 9 Jul 2020 11:35:05 +1000
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <>
To: Marek Majkowski <>
X-Mailer: Apple Mail (2.3445.9.5)
Archived-At: <>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 09 Jul 2020 01:35:15 -0000

> On 9 Jul 2020, at 00:50, Marek Majkowski <> wrote:
> On Wed, Jul 8, 2020 at 10:01 AM <> wrote:
>> Paul Vixie and I submitted draft-ietf-dnsop-avoid-fragmentation-00.
>> Please review it.
> Hi!
>> UDP requestors and responders SHOULD send DNS responses with
>> IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield
>> either a silent timeout, or a network (ICMP) error, if the path
>> MTU is exceeded.
> When MTU is exceeded the sender might also receive plain old EMSGSIZE
> error on sendto(). I would love to see an example on what
> IP_MTU_DISCOVER settings authors expect. This option is notoriously
> hard to get right.
>> The maximum buffer size offered by an EDNS0 initiator SHOULD be
>> no larger than the estimated maximum DNS/UDP payload size...
> This seems to indicate that EDNS0 over TCP should have a small buffer
> size as well. Consider wording like "...buffer size offered by an
> EDNS0 initator over UDP...".
>> Fragmented DNS/UDP messages may be dropped without IP reassembly
> Not sure what it has to do with the draft. Are we worried about
> request fragmentation and allowing the DNS server to drop fragmented
> requests? Are we worried about response fragmentation?
> I have two problems with this proposal. First, it doesn't mention IPv4
> vs IPv6 differences at all. In IPv4 landscape fragmentation, while a
> security issue, is generally fine. In the IPv6 world, fragmentation is
> disastrous - packets with extension headers are known to be dropped.

Not really. UNKNOWN extensions tend to get dropped but the fragmentation
header is a KNOWN extension header.

> Second, this proposal assumes that path MTU detection works correctly.
> This is surprisingly optimistic. Let's consider IPv6 - in IPv6 the
> smaller path MTU < 1500 is very common.

Which is why IPV6_USE_MIN_MTU exists (RFC 3542).  USE THE SOCKET OPTION.
It was put there specifically to support DNS over UDP and other applications
like that.  I know this as I proposed the predecessor option back in 1999
which became IPV6_USE_MIN_MTU.

If the OS hosting your DNS server doesn’t support this option 17 years
after is was defined throw it in the bin.

IPV6_USE_MIN_MTU also helps with TCP.  DNS does not need to suffer from
PMTUD issues.

> Let's say a DNS auth server sent an IPv6 DNS response packet exceeding
> path MTU. An intermediate router will drop the offending packet and
> one of three scenarios will happen:
> - (A) No ICMP PTB message is sent back.
> - (B) ICMP PTB message is sent back, but fails to be delivered.
> - (C) ICMP PTB message is sent back and delivered correctly to the server.
> All three scenarios are disastrous on the practical internet. The
> proposal assumes (A) and (B) will rarely happen, and puts the
> responsibility on the DNS client to retry over TCP. This will cause
> unnecessary timeouts and degrade the overall quality of the service.
> But perhaps most importantly even option (C) will *not* result in good
> service. Consider a setup with multiple DNS servers behind an ECMP
> router, or another L4 load balancer. Even if the return ICMP will hit
> back the correct server - which is far from obvious - the ICMP will
> update the Path MTU on *one server*. If a client attempts to retry the
> query, as suggested by the proposal, it will most likely hit another
> server, which is not aware of non standard Path MTU.
> These days DNS Auth installations use ECMP routing for load balancing.
> A single physical box serving important DNS is a rare occurrence.
> In this proposal all three (A), (B), and (C) scenarios will result in
> dropped responses. DNS client needs to wait for timeout, retry over
> UDP, wait more and eventually retry over TCP. This is bad.
> We could fix (C) by making the DNS server to capture the ICMP PTB in
> DNS server code. The ICMP payload often has enough context for the DNS
> server to prepare another reply. This reply of course should be sent
> with lowered MTU.
> In other words, I'm asking for capturing ICMP PTB in DNS servers,
> unpacking the paylad and treating it as another request to be handled.
> This would solve (C). Also, note this opens an interesting DDoS
> vector, but this is another story.
> On Linux it is possible to capture the ICMP PTB without privileges, by
> setting IP_RECVERR and inspecting MSG_ERRQUEUE. In IPv4 the PTB
> messages often have 520 bytes of payload and in IPv6 1184 bytes. This
> is enough context to build another response, without having to wait
> for any timeout.
> Cheers,
> Marek
> _______________________________________________
> DNSOP mailing list

Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: