Re: [DNSOP] [Ext] I-D Action: draft-ietf-dnsop-serve-stale-03.txt

Dave Lawrence <tale@dd.org> Wed, 06 March 2019 16:16 UTC

Return-Path: <tale@dd.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F14EE1277E5 for <dnsop@ietfa.amsl.com>; Wed, 6 Mar 2019 08:16:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wLBZ6lSFH6aj for <dnsop@ietfa.amsl.com>; Wed, 6 Mar 2019 08:16:33 -0800 (PST)
Received: from gro.dd.org (host2.dlawren-3-gw.cust.sover.net [207.136.201.30]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1FE181277D6 for <dnsop@ietf.org>; Wed, 6 Mar 2019 08:16:31 -0800 (PST)
Received: by gro.dd.org (Postfix, from userid 102) id 16A9D28F6B; Wed, 6 Mar 2019 11:16:30 -0500 (EST)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <23679.62046.75332.545862@gro.dd.org>
Date: Wed, 6 Mar 2019 11:16:30 -0500
From: Dave Lawrence <tale@dd.org>
To: dnsop <dnsop@ietf.org>
In-Reply-To: <alpine.DEB.2.20.1903061237440.17454@grey.csi.cam.ac.uk>
References: <155094804613.28045.8648150477440044197@ietfa.amsl.com> <CA+9_gVscCzr0S8A0Z23q0V1B+BZeLtDoZRSKyEJDPZ3P=KT-tw@mail.gmail.com> <CAL9jLaYo5JH6vf+djEn0O=YGhLV2AkytMg_eKQmWn=Pma5yBFQ@mail.gmail.com> <4253851.Zqd2zPpPcC@linux-9daj> <92355508-D5AC-46DC-8FF5-C1C4155601D8@isc.org> <alpine.LRH.2.21.1903042240330.32161@bofh.nohats.ca> <23678.40176.492174.37630@gro.dd.org> <3E7AF476-0989-4FA8-8186-F5AAFC87317A@icann.org> <alpine.LRH.2.21.1903051202360.1124@bofh.nohats.ca> <23679.9798.678631.923122@gro.dd.org> <alpine.DEB.2.20.1903061237440.17454@grey.csi.cam.ac.uk>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/Wwq28RjfDp0vWmD2oNv6jI9a8lY>
Subject: Re: [DNSOP] [Ext] I-D Action: draft-ietf-dnsop-serve-stale-03.txt
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Mar 2019 16:16:36 -0000

Tony Finch writes:
> This sounds like it will lead to stale answers being given instead of
> re-trying other potentially working servers.

The document is explicit that you need to keep trying to get an
answer, so if an implementation is not retrying other potentially
working servers that is its own defect.

> I think serve-stale should only cover cases where servers are
> unreachable or unresponsive.

You are of course free to write your own implementation that way.
Having worked for operations where the authorities were concerned
about the possibility of accidental ServFails, I know that their
preference is that resolvers would serve-stale then too and enhance
the overall resiliency of the system.

If you think it would help, I can add some text to Implementation
Considerations about this, something like:

   Consider whether serve-stale should kick in for only the case of
   all servers being unresponsive, or whether authoritative servers
   responding with DNS RCODEs other than NoError and NXDomain also
   trigger it.  Some authoritative servers operators would prefer
   stale answers to be used in the event of their server failures,
   while other implementers see any answer from the authoritative
   server as being sufficient indication that any previously available
   answer for the question is superseded.

The implications of that are a transition from good answers to
failure answers to unavailable means that the stale answers will never
be available when they otherwise could have been, but so be it.

> If all a zone's servers start to reply REFUSED, that's a deliberate
> decision to disable the zone, and resolvers should not try to keep it
> alive beyond its TTL.

You cannot know that it is a deliberate decision to disable the zone.
In fact, I have direct operational experience of why it's a terrible
way to disable a zone.

One of my own servers was slammed for queries in a zone I was not
authoritative for.  (It was a well-known zone, too, and one which is
not DNSSEC-signed.)  My server was dutifully returning Refused to the
queries, and yet they kept coming very frequently, maxing the
link. Arguably the clients should have applied the techniques of RFC
2308 for negative caching of those Refused answers, but it was not
until I added the zone in question to send back an authoritative
answer with a proper caching signal that the queries really went away.

While it is obviously the best approach for a takedown to update the
delegation, in the situation where you have a delegation pointing to
server(s) that cannot be updated but you have control over the
servers, it is far better to provide an affirmative answer to the
question than to send Refused.  Positive caching is much better
understood than negative caching by the wide variety of DNS
implementations out there.