Re: [DNSOP] Call for Adoption: draft-fujiwara-dnsop-avoid-fragmentation

Paul Vixie <paul@redbarn.org> Wed, 15 April 2020 01:23 UTC

Return-Path: <paul@redbarn.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1DE0C3A145F for <dnsop@ietfa.amsl.com>; Tue, 14 Apr 2020 18:23:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QNh9xtLn7fZK for <dnsop@ietfa.amsl.com>; Tue, 14 Apr 2020 18:23:16 -0700 (PDT)
Received: from family.redbarn.org (family.redbarn.org [24.104.150.213]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BA2EA3A145E for <dnsop@ietf.org>; Tue, 14 Apr 2020 18:23:16 -0700 (PDT)
Received: from linux-9daj.localnet (vixp1.redbarn.org [24.104.150.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by family.redbarn.org (Postfix) with ESMTPSA id 500ABB074A for <dnsop@ietf.org>; Wed, 15 Apr 2020 01:23:15 +0000 (UTC)
From: Paul Vixie <paul@redbarn.org>
To: dnsop <dnsop@ietf.org>
Date: Wed, 15 Apr 2020 01:23:14 +0000
Message-ID: <2282970.8yTDVETrLv@linux-9daj>
Organization: none
In-Reply-To: <20200414234146.GA471121@jurassic.vpn.mukund.org>
References: <CADyWQ+GECV6aaeKxp-ObgsK0Ax3KN_5hAaYgmXQhssJ1A00Ttw@mail.gmail.com> <20200414215855.GA464850@jurassic.vpn.mukund.org> <20200414234146.GA471121@jurassic.vpn.mukund.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/hmeHHOhY3aezqxRaVadGZN_0URw>
Subject: Re: [DNSOP] Call for Adoption: draft-fujiwara-dnsop-avoid-fragmentation
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Apr 2020 01:23:18 -0000

first reply:

On Tuesday, 14 April 2020 23:41:46 UTC Mukund Sivaraman wrote:
> One more question:
> > 3.  Proposal to avoid IP fragmentation in DNS
> > 
> >    o  UDP requestors and responders SHOULD send DNS responses with
> >       IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield
> >       either a silent timeout, or a network (ICMP) error, if the path
> >       MTU is exceeded.  Upon a timeout, UDP requestors may retry using
> >       TCP or UDP, per local policy.
> 
> If the IP_DONTFRAG/IP_DF/IP_PMTUDISC_DO option is available and can be
> used to set the DF flag on DNS over UDP over IPv4 PDUs, why are any of
> the following maximum-size mitigations (the next 3 items after the above
> quoted item) necessary?

to avoid fragmentation-related loss. all the setting of the DONTFRAG bit does 
is increase the likelihood that fragmentation-related loss will occur and 
cause that loss to occur as close to the responder as possible (thus not 
wasting network capacity on things that won't be delivered) and possibly 
inciting an ICMP that, if sent and if received, will help with manageability.

but while it's important to increase the likelihood of fragmentation-related 
loss and to try to make it happen nearby, it is separately virtuous to avoid 
triggering this loss. the title of the document is "fragmentation avoidance" 
after all. if we intend to avoid fragmentation, we should bet our life on it, 
or at least, bet our response packet's deliverability on it.

> A large response may be dropped on path, but such conditions are already
> expected by resolvers; they make use of the OPT UDP payload size field
> to retry to settle on a maximum size that will work with the peer.

the requestor has no more knowledge of PMTU than the responder does. both 
might be on 9K MTU LANs (ethernet jumbogram) but connected via a whole lot of 
4K MTU WANs (packet on sonet). this document seeks to create the conditions 
under which per-path MTU knowledge can be used, but, without requiring same.

> It would be helpful to the reader to understand better, if the
> requirements of "SHOULD" are annotated with the specific reasons why
> that particular item is suggested. The 3 items after the above quoted
> item don't specify reasons, and so it's not clear how with DF=1 set,
> these items matter.

please propose draft text to that effect?

> 
> Off-topic, [...] A resolver
> retries with changes in how it makes its request.

not any more. as you yourself said, the (so-called) DNS-OARC Flag Day of 2019 
removed any duty to retry with changes, and just lets failures be failures.

> IP fragmentation
> issues is a subset of this topic of how to communicate with a DNS peer
> over UDP. You may want to consider documenting the techniques a client
> could use when talking to a peer over UDP.

that's not off-topic. this document is what you're asking for here: set DF, 
pick a defensible response size that takes the offered buffer size as well as 
the local network MTU and any extra knowledge of PMTU into account, and send.

> More generally, a modern day resolver algorithm includes nameserver
> selection (SRTT-derived metric, bad peers, transport, etc.), limits on
> recursion and indirection, response message processing, etc. These have
> security implications and documentation is scant, of low-quality, and
> scattered. Documentation for understanding of resolver<->NS
> communication behavior is an often-requested item from users.

on this i agree twice: that it's off topic, and that it's vital information. 
every full resolver has a holddown timer for lame delegations, but only 
because it's a good idea, not because any RFC describes or recommends it.

---

second reply:

On Wednesday, 15 April 2020 00:41:24 UTC Mukund Sivaraman wrote:
> On Wed, Apr 15, 2020 at 05:11:46AM +0530, Mukund Sivaraman wrote:
> > > 3.  Proposal to avoid IP fragmentation in DNS
> > >    o  UDP requestors and responders SHOULD send DNS responses with
> > >       IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield
> > >       either a silent timeout, or a network (ICMP) error, if the path
> > >       MTU is exceeded.  Upon a timeout, UDP requestors may retry using
> > >       TCP or UDP, per local policy.
> > 
> > If the IP_DONTFRAG/IP_DF/IP_PMTUDISC_DO option is available and can be
> > used to set the DF flag on DNS over UDP over IPv4 PDUs, why are any of
> > the following maximum-size mitigations (the next 3 items after the above
> > quoted item) necessary?
> 
> Possibly this is client-driven mitigation, right? A client which does
> not know if the server will set DF=1 can still avoid fragmentation of
> the reply by using a smaller EDNS UDP payload size.

yes, that result could be obtained. but it's not directly intended. rather, 
this document hopes to establish a "reasonableness" threshold on UDP replies, 
and once that threshold becomes the new norm, adaptation such as you describe 
here will be within the realm of experiment. DNS might end up trying a lot of 
different ways to learn what EDNS buffer size will work for any given 
responder. my personal favourite is a binary search among sizes. and since 
modern (IPv6-era) IP stacks allow a host route to be installed to carry the 
path MTU, the PMTU discovered by DNS might then become available to, for 
example, TCP. but all that is stardust at the moment. first we have to nail 
the coffin shut on fragmentation, now that we know PMTUD6 is never coming 
because ICMPv6 was just a thought-experiment that couldn't survive.

-- 
Paul