Re: [DNSOP] [Ext] I-D Action: draft-ietf-dnsop-serve-stale-03.txt

Dave Lawrence <> Wed, 06 March 2019 01:45 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0255C128CB7 for <>; Tue, 5 Mar 2019 17:45:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id hdUbyDRkcFmT for <>; Tue, 5 Mar 2019 17:45:45 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B78DF124BF6 for <>; Tue, 5 Mar 2019 17:45:43 -0800 (PST)
Received: by (Postfix, from userid 102) id A9F5B2896A; Tue, 5 Mar 2019 20:45:42 -0500 (EST)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <>
Date: Tue, 5 Mar 2019 20:45:42 -0500
From: Dave Lawrence <>
To: dnsop <>
In-Reply-To: <>
References: <> <> <> <4253851.Zqd2zPpPcC@linux-9daj> <> <> <> <> <>
Archived-At: <>
Subject: Re: [DNSOP] [Ext] I-D Action: draft-ietf-dnsop-serve-stale-03.txt
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 06 Mar 2019 01:45:48 -0000

Paul Wouters writes:
> I am a bit confused here. The goal of the draft is to keep data past
> the TTL in case you cannot reach the authoritative servers during a
> DDOS attack.

There are many different failure modes in operating the DNS and
the goal of this draft has been to accommodate the ones that are clear
failures.  I, for one, have never put forth that it is only about
resiliency against DDoS and don't recall hearing Warren or Puneet say
that either.  It can include when there are other clear errors in
the system, even when self-inflicted by the authoritative operator.

> Misconfiguring your authoritative server by removing the zone is not
> meant to be covered by this draft if I understood it correctly. If it
> is, then introduction will need to add text to cover that use case.

I can sort of see how someone might infer from "It is predicated on
the observation that authoritative server unavailability can cause
outages ..." that it means this whole idea is constrained to DDoS, and
presumably you would include as well other network and server outages
not caused by DDoS.  It doesn't only mean that though.  The intention
is that this applies to any inability to get a proper authoritative
response, one which has AA set in a protocol-meaningful way.

This can be edited to be clearer, perhaps as simply as changing
"authoritative server unavailability" to "authoritative answer
unavailability".  We'd be happy to consider alternative text.

Realistically only rcodes NoError and NXDomain apply for being
authoritative answers, each being an explicit assertion regarding the
name/type in the query and legitimately supplanting whatever previous
data was known about that name and type.

ServFail is a clear signal that something is going wrong with the
authoritative server itself has something going wrong.  If you send a
ServFail then AA is completely irrelevant.

REFUSED is slightly murkier as to its exact meaning, thanks to
overloading, but in its most commonly seen usage for lameness
indicates a clear problem with the delegation.  Even in its other use
cases, notably an EDNS Client Subnet error or an actual "I am
authoritative for the name but administratively denying your
resolution of it", I submit that if the resolver has a stale answer
then serving it is reasonable.  In that administrative denial case
it'd be better to issue NXDomain anyway, which is exactly what split
horizon authorities do.

Other lesser seen rcodes are largely similar in not indicating
anything at all about the legitimacy of the name and whatever data you
might have previously associated with it.  Only the dynamic update
rcodes come close to being relevant, but they are not part of the
resolution process covered by serve-stale.

Despite the unfortunate RFC 1035 nomenclature of NXDomain as "Name
Error" it is called out explicitly because it isn't really an error,
not in the database lookup sense.  There's no way of knowing whether
the NXDomain is happening because of operator fault or the far more
likely case that it just doesn't exist.  That's why it is called out
separately in the doc, with an explicit note about why it has to be
treated as replacing any stale data associated with the name.