Re: [DNSOP] What should ANAME-aware servers do when target records are verifiably missing?

Richard Gibson <richard.j.gibson@oracle.com> Fri, 12 April 2019 00:02 UTC

To: Matthew Pounsett <matt@conundrum.com>
Cc: Bob Harold <rharolde@umich.edu>, dnsop <dnsop@ietf.org>
References: <d8ccad4a-cd0c-4c97-b4d7-2099657351dc@oracle.com> <CA+nkc8BM+mfTBm3XyOaZUF5hMg23t9aSY4nq4Y4=BQ-sjcjkVg@mail.gmail.com> <25b38d21-c572-d782-6b35-a187fa0caae8@oracle.com> <CAAiTEH9Eg0oYw9HR9Ab5pYikFUvcbWXneF39_8xasp6tE9PpCA@mail.gmail.com>
From: Richard Gibson <richard.j.gibson@oracle.com>
Message-ID: <516fda75-bb6e-67c6-cd52-0a5017bc889f@oracle.com>
Date: Thu, 11 Apr 2019 20:02:26 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <CAAiTEH9Eg0oYw9HR9Ab5pYikFUvcbWXneF39_8xasp6tE9PpCA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------6CAD79BC8B8BB40AD96D9247"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/V0RfLg9UIKpdTkB4eJv7UqwhhEo>
Subject: Re: [DNSOP] What should ANAME-aware servers do when target records are verifiably missing?
Precedence: list

Responses inline.

On 4/11/19 18:50, Matthew Pounsett wrote:
> On Wed, 10 Apr 2019 at 16:43, Richard Gibson
> <richard.j.gibson@oracle.com> wrote:
>> The first problem is for the owner of the ANAME-containing zone, for whom the upstream misconfiguration will result in downtime and be extended by caching to the MINIMUM value from their SOA, which in many cases will be one to three orders of magnitude greater than the TTL of the ANAME itself.
> I think I'm missing something here.  If, for example, the TTL of the
> ANAME is 1 hour, what mechanism results in caching holding onto a
> negative answer for a broken target name for a minimum of 10 hours,
> and to 40 days?
Demonstrative example zone:

example.com.  3600  IN    SOA  ns.example.net. hostmaster.example.net. 1 (
                                   7200   ; REFRESH
                                   3600   ; RETRY
                                  86400   ; EXPIRE
                                   3600  ); MINIMUM
example.com.    60  IN  ANAME  example.invalid.
example.com.    60  IN      A  192.0.2.1

When an ANAME-aware resolver queries an ANAME-aware authoritative server 
for example.com. A, it will receive the A record in the answer section 
and the ANAME in the additional section. If it then chases the ANAME 
target to an NXDOMAIN and accepts that as justification for replacing 
the sibling A RRSet with nothing as currently specified in the draft, 
then the appropriate response will be a Type 2 NODATA in which the 
answer section is empty and the additional section contains the SOA. But 
this suffers from both of the problems I have been complaining about—the 
resolver does not necessarily /have/ the zone SOA, possibility 
necessitating an inline lookup, and per RFC 2308 the negative response 
will be cached according to values from the SOA that are unrelated to 
and far exceed the TTL of the ANAME.

>> Both of these problems can be addressed by allowing/recommending/requiring ANAME-aware servers to preserve ANAME siblings when resolution of ANAME targets results in NXDOMAIN or NODATA responses, rather than replacing them with an empty RRSet... which, to be honest, seems to be always-undesirable behavior anyway—if anyone can think of a scenario where it would be beneficial to dynamically remove ANAME siblings, please share it.
> I feel like this is creating an even bigger potential problem.  What
> happens when the owner of the ANAME target legitimately wants that
> name to go away, but some other zone owner is leaving an ANAME in
> place pointing to that now-nonexistent name?  Continuing to serve the
> sibling records indefinitely seems like serve-stale gone horribly
> wrong.

In such a configuration, the owner of the ANAME will be able to see that 
clients are using its static sibling records rather than its target (and 
therefore that they are getting no benefit from the ANAME), and can 
react accordingly. If your concern is excess queries for the ANAME 
target, then this seems no different from e.g. CNAME—the owner of the 
target can issue long-lived negative responses while performing whatever 
other exploration and/or mitigation they deem fit.

But this seems like it will be much more rare and frankly much less of a 
problem than stretching out misconfiguration at an ANAME target into 
extended downtime for an ANAME owner. It must be possible for the latter 
to execute a recovery plan as quickly as possible, and if ANAME is 
specified well then that the first step of recovery can be literally 
instant and automatic.

[DNSOP] What should ANAME-aware servers do when t… Richard Gibson
Re: [DNSOP] What should ANAME-aware servers do wh… Bob Harold
Re: [DNSOP] What should ANAME-aware servers do wh… Richard Gibson
Re: [DNSOP] What should ANAME-aware servers do wh… Bob Harold
Re: [DNSOP] What should ANAME-aware servers do wh… Matthew Pounsett
Re: [DNSOP] What should ANAME-aware servers do wh… Richard Gibson
Re: [DNSOP] What should ANAME-aware servers do wh… Matthew Pounsett
Re: [DNSOP] What should ANAME-aware servers do wh… Tony Finch
Re: [DNSOP] What should ANAME-aware servers do wh… Matthijs Mekking
Re: [DNSOP] What should ANAME-aware servers do wh… Richard Gibson
Re: [DNSOP] What should ANAME-aware servers do wh… Joe Abley
Re: [DNSOP] What should ANAME-aware servers do wh… Richard Gibson
Re: [DNSOP] What should ANAME-aware servers do wh… Richard Gibson