Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt

fujiwara@jprs.co.jp Thu, 09 July 2020 08:28 UTC

Date: Thu, 09 Jul 2020 17:28:35 +0900
Message-Id: <20200709.172835.1485118047513250578.fujiwara@jprs.co.jp>
To: majek04@gmail.com
Cc: paul@redbarn.org, dnsop@ietf.org
From: fujiwara@jprs.co.jp
In-Reply-To: <CABzX+qw11H1JSWT6_EcVirT1LNd9Sxqm4zEyjSrDEqc3j2Cgbg@mail.gmail.com>
References: <159351340969.9763.13693079622434674195@ietfa.amsl.com> <20200708.170123.2054449579631699570.fujiwara@jprs.co.jp> <CABzX+qw11H1JSWT6_EcVirT1LNd9Sxqm4zEyjSrDEqc3j2Cgbg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: Text/Plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/QhkDVURzW4YYnnTXKSva3JGQHEE>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragmentation-00.txt
Precedence: list

> From: Marek Majkowski <majek04@gmail.com>
>> UDP requestors and responders SHOULD send DNS responses with
>> IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield
>> either a silent timeout, or a network (ICMP) error, if the path
>> MTU is exceeded.
> 
> When MTU is exceeded the sender might also receive plain old EMSGSIZE
> error on sendto(). I would love to see an example on what
> IP_MTU_DISCOVER settings authors expect. This option is notoriously
> hard to get right.

Is IP_MTU_DISCOVER a Linux-only option ?

Authors don't consider discouraging Path MTU discovery.
We refer RFC1191, RFC8201 and ietf-tsvwg-datagram-plpmtud.
If Path MTU discovery works well, we can use large UDP datagram size.

>> Fragmented DNS/UDP messages may be dropped without IP reassembly
> 
> Not sure what it has to do with the draft. Are we worried about
> request fragmentation and allowing the DNS server to drop fragmented
> requests? Are we worried about response fragmentation?

I would like to change the text as:

  The DNS Requestor MAY drop Fragmented DNS/UDP responses without IP
  reassembly. (before IP reassembly?)

  (Texts related ICMP error may be dropped, I think.)

> I have two problems with this proposal. First, it doesn't mention IPv4
> vs IPv6 differences at all. In IPv4 landscape fragmentation, while a
> security issue, is generally fine. In the IPv6 world, fragmentation is
> disastrous - packets with extension headers are known to be dropped.

On IPv6, "every link in the Internet have an MTU of 1280 octets or
greater" (RFC 8200).
Then, if Path MTU discovery works, we can use real MTU value.
Otherwise, we can use 1280 as MTU.
We can easily avoid IPv6 fragmentation. (Fragmented IPv6 packets are bogus.)

On IPv4, it's terrible because IPv4 minimal MTU is 68, but most of
links support 1500 octet MTU.

> Second, this proposal assumes that path MTU detection works correctly.
> This is surprisingly optimistic. Let's consider IPv6 - in IPv6 the
> smaller path MTU < 1500 is very common.

Without IPv6 over IPv4 tunnels, most of IPv6 links support 1500 octet
MTU size.

We can detect path MTU discovery failure, then we can use 1280 MTU on IPv6.

> Let's say a DNS auth server sent an IPv6 DNS response packet exceeding
> path MTU. An intermediate router will drop the offending packet and
> one of three scenarios will happen:

  Leaf client ----Tunnel-----[Tunnel Router]-----Routers----Auth Server
                 MTU 1280                   1500       1500
                             drop ------------------------>
			     ICMP PTB

> - (A) No ICMP PTB message is sent back.
> - (B) ICMP PTB message is sent back, but fails to be delivered.

A and B is the same.
The first response is simply dropped.

Leaf client (full-service resolvers) may retry queries to other Auth
servers by UDP, or retry queries to the Auth server by TCP or UDP/EDNS
with small size.

> - (C) ICMP PTB message is sent back and delivered correctly to the server.

First, the first response is simply dropped.
Auth server knows that path MTU to the Leaf client is 1280.

Leaf client (full-service resolvers) may retry queries to other Auth
servers by UDP, or retry queries to the Auth server by TCP or UDP/EDNS
with small size. (The same as A and B.)

After some time, if the leaf node send next queries to the Auth server
with UDP and same parameter, the Auth server knows path MTU to the
leaf.  Then, the auth server need to compose response packets fit in
the path MTU size, or set TC=1.

> All three scenarios are disastrous on the practical internet. The
> proposal assumes (A) and (B) will rarely happen, and puts the
> responsibility on the DNS client to retry over TCP. This will cause
> unnecessary timeouts and degrade the overall quality of the service.

To avoid these cases, we can make new recommendations.  Authoritative
servers and full-service resolvers SHOULD support 1500-octet path MTU
to major parts of the Internet.
(we need to define major parts of the Internet)

> In this proposal all three (A), (B), and (C) scenarios will result in
> dropped responses. DNS client needs to wait for timeout, retry over
> UDP, wait more and eventually retry over TCP. This is bad.

Do you mean that DNS client is stub resolvers in clients ?
Stub resolvers may not set DO (DNSSEC OK) bit and responses can be small.

Or full-service resolvers ? Full-service resolvers should be located
at 1500-octet MTU world.

> We could fix (C) by making the DNS server to capture the ICMP PTB in
> DNS server code. The ICMP payload often has enough context for the DNS
> server to prepare another reply. This reply of course should be sent
> with lowered MTU.

I think that we can know ICMP PTB result from applications by IP_MTU
socket options on Linux.

> On Linux it is possible to capture the ICMP PTB without privileges, by
> setting IP_RECVERR and inspecting MSG_ERRQUEUE. In IPv4 the PTB
> messages often have 520 bytes of payload and in IPv6 1184 bytes. This
> is enough context to build another response, without having to wait
> for any timeout.

In my opinion, the captured PTB data is similar to IP_MTU/IPV6_MTU
socket options.

--
Kazunori Fujiwara, JPRS <fujiwara@jprs.co.jp>

[DNSOP] I-D Action: draft-ietf-dnsop-avoid-fragme… internet-drafts
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… fujiwara
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Marek Majkowski
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Mukund Sivaraman
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Mark Andrews
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… fujiwara
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Marek Majkowski
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Marek Majkowski
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Mark Andrews
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Paul Vixie
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Tony Finch
Re: [DNSOP] I-D Action: draft-ietf-dnsop-avoid-fr… Andrew McConachie