Re: [DNSOP] Fwd: I-D Action: draft-pappas-dnsop-long-ttl-04.txt

John Kristoff <jtk@cymru.com> Mon, 14 May 2012 22:29 UTC

Date: Mon, 14 May 2012 17:29:46 -0500
From: John Kristoff <jtk@cymru.com>
To: dnsop@ietf.org
Message-ID: <20120514172946.48f4ed01@localhost>
In-Reply-To: <ED92824E-550C-4E76-B7B7-F010613326A2@verisign.com>
References: <20120223155730.20754.45643.idtracker@ietfa.amsl.com> <ED92824E-550C-4E76-B7B7-F010613326A2@verisign.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [DNSOP] Fwd: I-D Action: draft-pappas-dnsop-long-ttl-04.txt
Precedence: list

On Fri, 2 Mar 2012 09:56:40 -0800
Eric Osterweil <eosterweil@verisign.com> wrote:

> We have resurrected our draft Improving DNS Service Availability by
> Using Long TTL Values, and added some new polish to it.  We've taken
> some feedback from various people and would love to hear any thoughts
> other people have.

Hi Eric,

I remember this draft from years ago so I'll resurrect and update my
comments on the subject as well.

I'm not convinced these long TTLs are for everyone , though of course
I'm not opposed to it, from an operational perspective I would need to
be convinced this is worthwhile.  I think ultimately it depends on
the operator.  There doesn't seem to be much critical analysis in why
this might not be desirable. I'll try to offer some.

Practically speaking, while caching to avoid transient issues may be
highly desirable, you may find that many operators would prefer to
build their network to get as many queries as possible even if that
leaves that part of the hierarchy at risk when transient issues arise.
Queries, even and often especially if unnecessary, have intrinsic
intelligence value.

In the Introduction there is this sentence:

   Furthermore, the use of shared unicast introduces one entry in the
   global BGP routing for every shared unicast enabled server.

This is misleading.  There may be many unique announcements from each
each anycast instance, but each router's global table will use only one
as long as the prefix is consistent.

Does the deployment of DNSSEC change any of the underlying assumptions?

The references used in section two are now quite old.  Updated
measurement results to ensure they support the recommendations would be
nice.

After reading section 3.3, I wrote to myself "Such as?"  What sorts of
changes are you alluding to?

This is designed specifically for the root and TLDs, yes?  It might be
helpful to make this stipulation more clear in the Recommendations
section.

It seems to me that this may be a useful document to help folks
design their own strategy for "infrastructure RRs".  I would approach
it that way, as an informational guide demonstrating how to do it and
why you might want to.  I'm not convinced everyone should.  At least
you've not convinced me yet.  :-)

For background, this is related:

  https://lists.dns-oarc.net/pipermail/dns-operations/2006-April/000503.html

Note, the concern about uncooperative authoritative DNS servers.

Lixia sought comments from the DNS operations community on this topic in
2007 and I followed up offline.  I'll include that follow up here for
posterity and since it may never have made it to you since it includes
the concern mentioned above.

  Date: Thu, 22 Mar 2007 17:54:05 -0500
  From: John Kristoff <jtk@ultradns.net>
  To: Lixia Zhang <lixia@CS.UCLA.EDU>
  Cc: Daniel Massey <massey@cs.colostate.edu>, Vasileios Pappas <vpappas@us.ibm.com>, Steve Crocker <steve@shinkuro.com>
  Subject: Re: [dns-operations] Seeking input regarding TTL value for infrastructure RRs

  On Wed, 21 Mar 2007 15:13:32 -0700
  Lixia Zhang <lixia@CS.UCLA.EDU> wrote:

  > Note that in this talk we separate out infrastructure RRs from all  
  > other RRs.
  > Some people  might argue that they dont want long TTL values because  
  > their hosts move around a lot, but we have not directly heard from  
  > anyone saying they moving their DNS servers around on a daily or even  
  > weekly basis.

  And in fact that might be pretty hard to fully accomplish quickly anyway
  if there are glue records in some parent zone they don't control.

  > But as Vasilis reasoned above, if child zone's response overwrites  
  > whatever one learned from the parent zone, then what I said on  
  > slide-9 still holds true, right?  i.e. a zone can choose to set long  
  > TTL for its NS+A RRs to make itself more available.

  There is that damage problem, which I think is going to be a point of
  contention long term.  If a zone owner wants to move their zone and
  the zone server administrator decides to set a TTL really high, the
  server administrator can effectively hijack the domain for a long
  time.  This doesn't seem to happen in practice much today, but it
  could easily become a "lock-in" problem we'd want to avoid.

  > What John meant here is to be able to move one's DNS servers around  
  > quickly to get around the attack traffic.
  > 
  > But wouldn't this also require that one have to update the parent  
  > quickly?  Can people actually coordinate that well in short time scale?

  For some parents it should be do-able.

  > (Vasilis also did a massive measurement on lame delegations, people  
  > don't even seem to be able to fix long term inconsistencies between  
  > parent and child zones :-)

  Oh yeah, that's a big problem too.  Here are a few other areas Of
  interest I've been looking at.  If you want to collaborate on some
  of this stuff, let me know, I'd love to:

    open resolvers (fully open, caching only and auth only)
    EDNS0 max size discrepancies
    reserved, special and private name leaks
    cache poisoning vulnerabilities
    NS RRset size and diversity
    non-DNS accessible services on nameservers
    IPv6 records
    TCP support
    CNAME RRs used for NS RRs
    commonly filtered ports blocking queries/responses

  > and even in this case, long TTL seems still a good way out---one can  
  > extend the concept to host RRs, so the host addresses can be cached  
  > for extended period which mask off the DNS server's unavailability  
  > (as Paul M said 20 years ago)

  While I think in typical usage this all may be well and good, I think
  the inability to be flexible in times of stress is again the potential
  problem.

  > or if I say in slightly different word: we want to know why people  
  > want short TTL for infrastructure RRs.

  This I don't know.  I suspect laziness or something to do with defaults.

  > PS: John, I just want to thank you again for your comments on our  
  > PHAS paper last time.  We are preparing another talk for next NANOG  
  > on prefix origin checking, a complementary piece to PHAS.

  Great, will look forward to it.

  John

John

[DNSOP] Fwd: I-D Action: draft-pappas-dnsop-long-… Eric Osterweil
Re: [DNSOP] Fwd: I-D Action: draft-pappas-dnsop-l… Eric Osterweil
Re: [DNSOP] Fwd: I-D Action: draft-pappas-dnsop-l… Hector Santos
Re: [DNSOP] Fwd: I-D Action: draft-pappas-dnsop-l… Eric Osterweil
Re: [DNSOP] Fwd: I-D Action: draft-pappas-dnsop-l… Stephane Bortzmeyer
Re: [DNSOP] Fwd: I-D Action: draft-pappas-dnsop-l… John Kristoff