Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt Thu, 09 July 2020 08:28 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 00D473A0921 for <>; Thu, 9 Jul 2020 01:28:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id jO87v08AfG6x for <>; Thu, 9 Jul 2020 01:28:45 -0700 (PDT)
Received: from ( [IPv6:2001:218:3001:17::10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 7DEFF3A09AB for <>; Thu, 9 Jul 2020 01:28:41 -0700 (PDT)
Received: from ( []) by (8.14.4/8.14.4) with ESMTP id 0698SaWY028822; Thu, 9 Jul 2020 17:28:36 +0900
Received: from (localhost []) by postfix.imss91 (Postfix) with ESMTP id E97446025C7A; Thu, 9 Jul 2020 17:28:35 +0900 (JST)
Received: from localhost ( []) by (Postfix) with ESMTP id DDC896025C72; Thu, 9 Jul 2020 17:28:35 +0900 (JST)
Date: Thu, 09 Jul 2020 17:28:35 +0900 (JST)
Message-Id: <>
In-Reply-To: <>
References: <> <> <>
X-Mailer: Mew version 6.8 on Emacs 24.5
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-TM-AS-Product-Ver: IMSS-
X-TM-AS-Result: No--14.902-5.0-31-10
X-imss-scan-details: No--14.902-5.0-31-10
X-TMASE-Version: IMSS-
X-TMASE-Result: 10--14.902500-10.000000
X-TMASE-MatchedRID: Pdx1d+QvS3VCXIGdsOwlUu5i6weAmSDKJqv0GX3SOh1XnNJXNFaUScLm p4jPUF8tMyL9jGotq95+VgCkHz/gWYV30ZD7XA/lw8M8WLzV3UAKF0jiwuWuOGmycYYiBYyZMMf GevmkG3CDlRdgCatGI7qojq4R9iDwwHiQvWCwsAHW4Mz461fsHI1j+mrGi/PFR2YNIFh+clEBVA nSKs1efdfBoKSEbWBCNvBx6O+DT1R20R+WDMa+DDKVTrGMDe/DywXStpqWmJYZSz1vvG+0mv7Bk QpfZe7sGigJ1mZabF7WbPmC+kFlkKXdBafgrWi3mvnKSb020hxigNR1SmQbk54jWPBZdp4jPO2T WMqktw2aodLlCZqD9SrC1QWpDkehhKm8SfhFvqu1GgeTcvlUnLDDNkPLxZj2QXAiEiGnHpMU7nq Jabd1t0bJBw+nyfEBoZPJhklVPxz1E10ExMtldCxYq3WqsPihyizC+YyrQ5qJXvKs5JH8Cd0VeK xVJwyxubLNJB1wRkVA1Q6+f7p8r0Nujf0BuJK6kWxQHYJOliD/j4ZByyZz4n5Isu006IGGxi+go SSJHSkwJnJIX0ps3EVfQ6xZcAAx8MT+u6T3i85qVGpA+EcxPg5k1ea+clp6Cn625hvg21BBlTIQ Bu46XP5nH2ehPfH/kZOl7WKIImq0P2qkGU0XykY41YX/o/8K66cYQGo+KX6kSYi/oi9rTgtuKBG ekqUpIG4YlbCDECvS3Rxy14J4N8dRWMlTlIB9WMHkfkuKF08zPwMFT0aEExhmtFCN1MUmSwwcGK LTYEc=
X-TMASE-SNAP-Result: 1.821001.0001-0-1-12:0,22:0,33:0,34:0-0
Archived-At: <>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 09 Jul 2020 08:28:47 -0000

> From: Marek Majkowski <>
>> UDP requestors and responders SHOULD send DNS responses with
>> IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield
>> either a silent timeout, or a network (ICMP) error, if the path
>> MTU is exceeded.
> When MTU is exceeded the sender might also receive plain old EMSGSIZE
> error on sendto(). I would love to see an example on what
> IP_MTU_DISCOVER settings authors expect. This option is notoriously
> hard to get right.

Is IP_MTU_DISCOVER a Linux-only option ?

Authors don't consider discouraging Path MTU discovery.
We refer RFC1191, RFC8201 and ietf-tsvwg-datagram-plpmtud.
If Path MTU discovery works well, we can use large UDP datagram size.

>> Fragmented DNS/UDP messages may be dropped without IP reassembly
> Not sure what it has to do with the draft. Are we worried about
> request fragmentation and allowing the DNS server to drop fragmented
> requests? Are we worried about response fragmentation?

I would like to change the text as:

  The DNS Requestor MAY drop Fragmented DNS/UDP responses without IP
  reassembly. (before IP reassembly?)

  (Texts related ICMP error may be dropped, I think.)

> I have two problems with this proposal. First, it doesn't mention IPv4
> vs IPv6 differences at all. In IPv4 landscape fragmentation, while a
> security issue, is generally fine. In the IPv6 world, fragmentation is
> disastrous - packets with extension headers are known to be dropped.

On IPv6, "every link in the Internet have an MTU of 1280 octets or
greater" (RFC 8200).
Then, if Path MTU discovery works, we can use real MTU value.
Otherwise, we can use 1280 as MTU.
We can easily avoid IPv6 fragmentation. (Fragmented IPv6 packets are bogus.)

On IPv4, it's terrible because IPv4 minimal MTU is 68, but most of
links support 1500 octet MTU.

> Second, this proposal assumes that path MTU detection works correctly.
> This is surprisingly optimistic. Let's consider IPv6 - in IPv6 the
> smaller path MTU < 1500 is very common.

Without IPv6 over IPv4 tunnels, most of IPv6 links support 1500 octet
MTU size.

We can detect path MTU discovery failure, then we can use 1280 MTU on IPv6.

> Let's say a DNS auth server sent an IPv6 DNS response packet exceeding
> path MTU. An intermediate router will drop the offending packet and
> one of three scenarios will happen:

  Leaf client ----Tunnel-----[Tunnel Router]-----Routers----Auth Server
                 MTU 1280                   1500       1500
                             drop ------------------------>
			     ICMP PTB

> - (A) No ICMP PTB message is sent back.
> - (B) ICMP PTB message is sent back, but fails to be delivered.

A and B is the same.
The first response is simply dropped.

Leaf client (full-service resolvers) may retry queries to other Auth
servers by UDP, or retry queries to the Auth server by TCP or UDP/EDNS
with small size.

> - (C) ICMP PTB message is sent back and delivered correctly to the server.

First, the first response is simply dropped.
Auth server knows that path MTU to the Leaf client is 1280.

Leaf client (full-service resolvers) may retry queries to other Auth
servers by UDP, or retry queries to the Auth server by TCP or UDP/EDNS
with small size. (The same as A and B.)

After some time, if the leaf node send next queries to the Auth server
with UDP and same parameter, the Auth server knows path MTU to the
leaf.  Then, the auth server need to compose response packets fit in
the path MTU size, or set TC=1.

> All three scenarios are disastrous on the practical internet. The
> proposal assumes (A) and (B) will rarely happen, and puts the
> responsibility on the DNS client to retry over TCP. This will cause
> unnecessary timeouts and degrade the overall quality of the service.

To avoid these cases, we can make new recommendations.  Authoritative
servers and full-service resolvers SHOULD support 1500-octet path MTU
to major parts of the Internet.
(we need to define major parts of the Internet)

> In this proposal all three (A), (B), and (C) scenarios will result in
> dropped responses. DNS client needs to wait for timeout, retry over
> UDP, wait more and eventually retry over TCP. This is bad.

Do you mean that DNS client is stub resolvers in clients ?
Stub resolvers may not set DO (DNSSEC OK) bit and responses can be small.

Or full-service resolvers ? Full-service resolvers should be located
at 1500-octet MTU world.

> We could fix (C) by making the DNS server to capture the ICMP PTB in
> DNS server code. The ICMP payload often has enough context for the DNS
> server to prepare another reply. This reply of course should be sent
> with lowered MTU.

I think that we can know ICMP PTB result from applications by IP_MTU
socket options on Linux.

> On Linux it is possible to capture the ICMP PTB without privileges, by
> setting IP_RECVERR and inspecting MSG_ERRQUEUE. In IPv4 the PTB
> messages often have 520 bytes of payload and in IPv6 1184 bytes. This
> is enough context to build another response, without having to wait
> for any timeout.

In my opinion, the captured PTB data is similar to IP_MTU/IPV6_MTU
socket options.

Kazunori Fujiwara, JPRS <>