Re: [dnsop] WGLC on draft-ietf-dnsop-bad-dns-res-03.txt

Peter Koch <pk@TechFak.Uni-Bielefeld.DE> Wed, 01 December 2004 18:45 UTC

Received: from darkwing.uoregon.edu (root@darkwing.uoregon.edu [128.223.142.13]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA08670 for <dnsop-archive@lists.ietf.org>; Wed, 1 Dec 2004 13:45:15 -0500 (EST)
Received: from darkwing.uoregon.edu (majordom@localhost [127.0.0.1]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iB1GmTNq028076; Wed, 1 Dec 2004 08:48:29 -0800 (PST)
Received: (from majordom@localhost) by darkwing.uoregon.edu (8.12.11/8.12.11/Submit) id iB1GmT6v028069; Wed, 1 Dec 2004 08:48:29 -0800 (PST)
Received: from mailout.TechFak.Uni-Bielefeld.DE (mailout.TechFak.Uni-Bielefeld.DE [129.70.136.245]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iB1GmQqJ027960 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NOT) for <dnsop@lists.uoregon.edu>; Wed, 1 Dec 2004 08:48:28 -0800 (PST)
Received: from zeder.TechFak.Uni-Bielefeld.DE (zeder.TechFak.Uni-Bielefeld.DE [129.70.128.80]) by momotombo.TechFak.Uni-Bielefeld.DE (8.12.11/8.12.11/TechFak/2004/05/05/sjaenick) with ESMTP id iB1GmP8s003418 for <dnsop@lists.uoregon.edu>; Wed, 1 Dec 2004 17:48:25 +0100 (MET)
Received: from localhost (pk@localhost) by zeder.TechFak.Uni-Bielefeld.DE (8.11.7+Sun/8.9.1) with SMTP id iB1GmOd21772 for <dnsop@lists.uoregon.edu>; Wed, 1 Dec 2004 17:48:24 +0100 (MET)
Message-Id: <200412011648.iB1GmOd21772@zeder.TechFak.Uni-Bielefeld.DE>
X-Authentication-Warning: zeder.TechFak.Uni-Bielefeld.DE: pk owned process doing -bs
X-Authentication-Warning: zeder.TechFak.Uni-Bielefeld.DE: pk@localhost didn't use HELO protocol
To: dnsop@lists.uoregon.edu
Subject: Re: [dnsop] WGLC on draft-ietf-dnsop-bad-dns-res-03.txt
In-reply-to: Your message of "Fri, 19 Nov 2004 16:58:05 EST." <20041119215805.37460418A@thrintun.hactrn.net>
X-Organization: Uni Bielefeld, Technische Fakultaet
X-Phone: +49 521 106 2902
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <21765.1101919700.1@zeder.TechFak.Uni-Bielefeld.DE>
Date: Wed, 01 Dec 2004 17:48:24 +0100
From: Peter Koch <pk@TechFak.Uni-Bielefeld.DE>
Sender: owner-dnsop@lists.uoregon.edu
Precedence: bulk
Reply-To: Peter Koch <pk@TechFak.Uni-Bielefeld.DE>

> Please clearly state to the mailing list whether you support or oppose
> this draft going to the IESG.

While I consider this a very valuable document which could serve implementers
of resolvers and the stability of the DNS as a whole, here are some remarks
and questions, which make me think the document is not yet ready for
publication.

>    domain (TLD) name servers.  In some cases we recommend minor
>    additions to the DNS protocol specification and corresponding changes
>    in iterative resolver implementations to alleviate these unnecessary

This promise would probably not warrant the BCP but instead the standards
track. However, as far as I understood the recommendations they are all
guidelines for resolver implementors and (recursive) server operators, so BCP
is OK.
However, I'd suggest the wording be changed to reflect that the standard
itself, i.e. the on-the-wire protocol, remains unchanged. These are guidelines.

>    rate.  Some of the changes recommended affect the core DNS protocol
>    specification, described principally in RFC 1034 [2], RFC 1035 [3]
>    and RFC 2181 [4].

Since I may have missed those changes, could they please be clearly identified?

>    answer questions about certain zones authoritatively.  Often called a
>    "recursive name server" or a "caching name server", it is in fact an
>    iterative resolver combined with an authoritative name server.

Shouldn't that read 'a non-authoritative name server', since the cache fed by
the iterative resolver serves non-auth data?

> 2.1  Aggressive requerying for delegation information
> 
>    There can be times when every name server in a zone's NS RRset is
>    unreachable (e.g., during a network outage), unavailable (e.g., the
>    name server process is not running on the server host) or

... or the host does not exist (name doesn't resolve) ...

>    For this query of the parent zone to be useful, the target zone's
>    entire set of name servers would have to change AND the former set of
>    name servers would have to be deconfigured or decommissioned AND the
>    delegation information in the parent zone would have to be updated
>    with the new set of name servers, all within the TTL of the target
>    zone's NS RRset.  We believe this scenario is uncommon:
>    administrative best practices dictate that changes to a zone's set of
>    name servers happen gradually when at all possible, with servers
>    removed from the NS RRset left authoritative for the zone as long as
>    possible.  The scenarios that we can envision that would benefit from
>    the parent requery behavior do not outweigh its damaging effects.

Explicit "NS" queries should never happen (there were some implicit assumptions
in this direction during the discussion of wildcard NS RRs), but there are
situations where some requery can be justified:

1) When the NS-RRset in the parent zone is larger than that sent
   authoritatively from within the zone, additional information may be
   discovered. This is a configuration problem, of course, but is not too
   uncommon.

2) More importantly, consider a cache content like this:

	example.com.		NS	dns.example.com.
				NS	dns.example.net.
	dns.example.net.	A	192.0.2.42

   The dns.example.com A RR has expired. Now, if dns.example.net is not
   available, the zone is 'non-responsive'. To learn the address of
   dns.example.com. there's no other way than to make use of the necessary
   glue RR present in the COM zone, so there is justification for going
   one step up. Another recommendation following from this could be that the
   address records of nameservers in or below the zone served should not have
   TTLs lower than the NS RRs.

> 2.1.1  Recommendation
> 
>    An iterative resolver MUST NOT send a query for the NS RRset of a
>    non-responsive zone to any of the name servers for that zone's parent
>    zone.  For the purposes of this injunction, a non-responsive zone is
>    defined as a zone for which every name server listed in the zone's NS
>    RRset:
>    1.  is not authoritative for the zone (i.e., lame), or,
>    2.  returns a server failure response (RCODE=2), or,
>    3.  is dead or unreachable according to section 7.2 of RFC 2308 [5].

This recommendation now can be adjusted accordingly. Explicit NS type queries
can be recommended against reagrdless of the responsiveness of the zone. They
have no place in the resolution process, do they?

> 2.2.1  Recommendation
> 
>    Iterative resolvers SHOULD cache name servers that they discover are

s/cache name servers/cache the status of name servers/;

>    not authoritative for zones delegated to them (i.e.  lame servers).
>    Lame servers MUST be cached against the specific query tuple <zone
>    name, class, server IP address>.  Zone name can be derived from the
>    owner name of the NS record that was referenced to query the name
>    server that was discovered to be lame.  Implementations that perform
>    lame server caching MUST refrain from sending queries to known lame
>    servers based on a time interval from when the server is discovered
>    to be lame.  A minimum interval of thirty minutes is RECOMMENDED.

In general I support this recommendation. Here's a corner case:

	dns.example.com is delegated (and detected) lame example.com but
	is authoritative for sub.example.net 

During resolving www.sub.example.net dns.example.com must be skipped but upon
receipt of the referral containing the reference to dns.example.com it should
be eligible again. The problem is that one cannot tell in advance whether
www.sub.example.net is part of the example.net zone. The recommendation
could be given more precisely:

  old:	MUST refrain from sending queries to known lame servers

  new:	MUST refrain from sending queries whose QNAME is likely to be in the
	lame zone (i.e. equals the zone name or is below the zone name and not
	positively identified to belong to a distinct zone) to known lame
	servers

> 2.3  Inability to follow multiple levels of out-of-zone glue

In this paragraph the term glue is used where 'additional data' would be more
precise.

>    new gTLDs will use name servers in other gTLDs, increasing the amount
>    of inter-zone glue.

Again, that's not glue then. s/glue/additional data/

> 2.4  Aggressive retransmission when fetching glue

>    implementations take this address inclusion a step further with a
>    feature called "glue fetching".  A name server that implements glue

While at least one popular implementation called this 'fetch-glue', it's
actually just additional section processing. Glue is just there (or it isn't),
it can't be fetched. That should have been clarified by the AXFR draft, but
that's unfortunately in some indetermined state.
Sorry for the nitpicking here, but fuzzy or changing use of wording is
extremely counterproductive in educating DNS operators. And yes, that does
matter since they are part of our target audience here. So please let's
use 'additional data processing' in favor of 'glue'.

The overall recommendation is fine, though.

> 2.6.1  Recommendation
> 
>    An authoritative server can detect this situation.  A trailing dot
>    missing from an NS record's RDATA always results by definition in a
>    name server name that exists somewhere under the SOA of the zone the

I'd prefer the term 'zone apex' over SOA, since the latter is just an RR type.

>    NS record appears in.  Note that further levels of delegation are
>    possible, so a missing trailing dot could inadvertently create a name
>    server name that actually exists in a subzone.  But in any case, the
>    address record must still be present in this zone, either as
>    authoritative data or glue.

I disagree with this analysis. Given the zone

	example.net.	NS	dns.example.net
			->	dns.example.net.example.net.

Now, if a delegation at example.net.example.net. or net.example.net. exists,
there's no need that dns.example.net.example.net. be one of the authoritative
servers, so the absence of a glue A RR does not indicate that the name
dns.example.net.example.net. does not exist. So, not only doesn't it have to
exist in the zone, it also need not be part of the zone file (using the AXFR
draft logic).

>    An authoritative name server SHOULD report an error when one of a
>    zone's NS records references a name server below the zone's SOA when

s/zone's SOA/zone apex/

>    a corresponding address record does not exist in the zone.

First, I'd suggest that any slave server only issue a warning but still load
the zone. A master should report an error if the target of the NS RR
is below the zone apex and an address record does not exist (because the
name is within the zone and neither A nor AAAA RRsets exist or because the
name doesn't exist) and a warning if the domain name belongs to a child
zone (regardless of any glue RR).

> 2.7.1  Recommendation
> 
>    Because of the additional load placed on a zone's parent's
>    authoritative servers resulting from a zero TTL on a zone's NS RRset,
>    under such circumstances authoritative name servers SHOULD issue a
>    warning when loading a zone or refuse to load the zone altogether.

A warning is OK at both a master and a slave, otherwise a slave should not
refuse a zone on these grounds. Causing harm sometimes heals, but the
slave administrator can't do much about this problem. Also, the recommendation
should be clear in what to do (warning vs error), and a warning seems less
invasive here. Also, refusing to load a 0 TTL zone while silently accepting
TTLs of 1 second is probably hard to sell.
This is registry policy, though, and could already be implemented there ...

> 2.8.1  Recommendation
> 
>    Dynamic update agents SHOULD send SOA or NS queries to progressively
>    higher-level zones to find the closest enclosing zone for a given

s/zones/names/, since there's no need to send the query to higher zones.
The actual problem is that you don't know what zone the name belongs to.

>    name to update.  Only after the appropriate zone is found should the
>    client send an UPDATE message to one of the zone's authoritative

I'd restrict this recommendation to SOA queries only. NS queries are currently
not necessary and shouldn't be introduced. Also, the SOA MNAME might be
helpful anyway.

> 2.9  Queries for domain names resembling IP addresses
> 
>    The root name servers receive a significant number of A record
>    queries where the qname is an IP address.  The source of these

s/is an IP address/looks like an IP address/

> 2.9.1  Recommendation

>    to produce the Name Error response directly.  We suggest that
>    implementors consider the option of synthesizing Name Error responses
>    at the iterative resolver.  The server could claim authority for
>    synthesized TLD zones corresponding to the first octet of every
>    possible IP address, e.g.  1., 2., through 255.  This behavior could
>    be configurable in the (probably unlikely) event that numeric TLDs
>    are ever put into use.

That would be in conflict with DNSSEC, which isn't discussed in the security
considerations section. Instead of fiddling with these issues at the iterative
resolver, why not recommend stub resolvers either produce a name error or
silently translate the query to the appropriate answer as gethostbyname()
has been doing for quite some time?

>    Another option is to delegate these numeric TLDs from the root zone
>    to a separate set of servers to absorb the traffic.  The "black hole
>    servers" used by the the AS 112
> Project [8], which are currently
>    delegated the in-addr.arpa zones corresponding to RFC 1918 [7]
>    private use address space, would be a possible choice to receive
>    these delegations.

While that's an interesting project, now we have two alternatives. Which
one is endorsed by the WG?

> 2.10  Misdirected recursive queries

> 2.10.1  Recommendation
> 
>    When the IP address of a name server that supposedly offers recursion
>    is configured in a stub resolver using an interactive user interface,
>    the resolver could send a test query to verify that the server indeed

That invites Murphy. The particular test query should be specified here,
i.e. '<randomstring>.invalid'. To decrease root server load further, the
test query might be sent non-recursively although I'm not confident that
all implementations will offer RA if the query did not have RD set.

>    The stub resolver could also report an error, either through a user
>    interface or in a log file, if the queried server does not support
>    recursion.  Error reporting SHOULD be throttled to avoid a
>    notification or log message for every response from a non-recursive
>    server.

Now, if there's state to keep anyway (for throttling), the answers should
be delayed, too. This might work for stateless stub resolvers, too.

> 2.11  Suboptimal name server selection algorithm

> 2.11.1  Recommendation

>    This list is not conclusive, but reflects the changes that would
>    produce the most impact in terms of reducing disproportionate query
>    load among a zone's authoritative servers.  I.e., these changes would
>    help spread the query load evenly.
>    o  Do not make assumptions based on NS RRset order: all NS RRs SHOULD
>       be treated equally.  (In the case of the "com" zone, for example,
>       most of the root servers return the NS record for
>       "a.gtld-servers.net" first in the authority section of referrals.
>       Apparently as a result, this server receives disproportionately
>       more traffic than the other 12 authoritative servers for "com".)

Shouldn't the root servers also shuffle the NS RRset, as most of them (K and
the NSD instance of H don't) do for the '.'?

> 4.  Security considerations

A discussion of the 'synthesis hack' in 2.9 is missing here.

> 6  Normative References
 
>    [8]  <http://www.as112.net>

Whether or not a reference is normative in a BCP may be irrelevant, but since
this is a moving target, it might better be under informational references.

Finally, there's one assumption made in 1.1 which is current practice but
not explicitly stated anywhere as far as I remember: the fact that 'recursive
servers' or so called 'iterative resolvers' actually send only non-recursive
queries. RFC 1034 in its section '2.3. Assumptions about usage' only addresses
the server side, allowing servers to not offer recursion. The client side
is covered later:

     Clients may request recursive service
     from any name server, though they should depend upon receiving
     it only from servers which have previously sent an RA, or
     servers which have agreed to provide service through private
     agreement or some other means outside of the DNS protocol

So, since root, TLD and many large ISP or enterprise servers have disabled
recursion over time, we'd have arrived where we are. However, looking at
our servers' logs, there are still a number of recursive queries which do
not result from misdirected (stub) resolvers (covered in 2.10). So, an
additional recommendation would be helpful stating that 'recursive servers'
only issue queries with RD bit cleared.

I'd like to close with thanks to the authors for their work and with apologies
for waiting until almost the end of the LC.

-Peter
.
dnsop resources:_____________________________________________________
web user interface: http://darkwing.uoregon.edu/~llynch/dnsop.html
mhonarc archive: http://darkwing.uoregon.edu/~llynch/dnsop/index.html