Re: [DNSOP] draft-moura-dnsop-negative-cache-loop

Petr Špaček <> Wed, 10 November 2021 13:48 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 72ED63A101B for <>; Wed, 10 Nov 2021 05:48:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -5.429
X-Spam-Status: No, score=-5.429 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-3.33, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key) header.b=lm4d2fTA; dkim=pass (1024-bit key) header.b=LB1dV+ew
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Y2kQajAe3h8K for <>; Wed, 10 Nov 2021 05:47:56 -0800 (PST)
Received: from ( [IPv6:2001:4f8:0:2::2b]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 0C3D23A1003 for <>; Wed, 10 Nov 2021 05:47:55 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by (Postfix) with ESMTPS id BD721435614 for <>; Wed, 10 Nov 2021 13:47:51 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;; s=ostpay; t=1636552071; bh=n+qFcCZdks5KImLiEdYjDgHTsKdHJ0gbWJTKTV6/+Gk=; h=Date:To:References:From:Subject:In-Reply-To; b=lm4d2fTA2ftAQwS/zpTxWvgmEpP9tzm+0B4RVph3j1oxe0aAGes513amoAIdbaRkU U56yKU/SGZRU/FatfplXcncF5n5Lh5wGxK5sUnPOCBIgzqNVff1TPDoJpV0EVHciC1 ub889442sC54pkwQpcxUeFKg4xjGFvZz8zBeMz4Q=
Received: from (localhost.localdomain []) by (Postfix) with ESMTPS id B553EF08E8D for <>; Wed, 10 Nov 2021 13:47:51 +0000 (UTC)
Received: from localhost (localhost.localdomain []) by (Postfix) with ESMTP id 8E72EF08E95 for <>; Wed, 10 Nov 2021 13:47:51 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.10.3 8E72EF08E95
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=05DFB016-56A2-11EB-AEC0-15368D323330; t=1636552071; bh=rOjFeQuO18QRITRF6lMJBaeeo71VZvvzPuTT9FlmSSc=; h=Message-ID:Date:MIME-Version:To:From; b=LB1dV+ewBPftwNfwsLsyt9uZugJBRgHl6odN3StC9/46ihuHkaO+wQsX9Ga1iOWkl u+RqrIjiXdbC1PhCbj0l5Z4o0zgFHa7bIg5jbzTQBl9nskG7h6ZF+yoy4LXWIYkCUL MRikOlNVvdD/sNCLa3KAi3vyb+FLFJoyp2iiFPmU=
Received: from ([]) by localhost ( []) (amavisd-new, port 10026) with ESMTP id lD_D7J0dn4t3 for <>; Wed, 10 Nov 2021 13:47:51 +0000 (UTC)
Received: from [] ( []) by (Postfix) with ESMTPSA id 24C11F08E8D for <>; Wed, 10 Nov 2021 13:47:50 +0000 (UTC)
Message-ID: <>
Date: Wed, 10 Nov 2021 14:47:48 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0
Content-Language: en-US
References: <> <> <>
From: Petr Špaček <>
In-Reply-To: <>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [DNSOP] draft-moura-dnsop-negative-cache-loop
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 10 Nov 2021 13:48:02 -0000

On 10. 11. 21 10:31, Giovane C. M. Moura wrote:
>> Ad the draft content:
>>> 2.  Past solutions
>> This section somehow does not mention RFC 2308 section 7.1 which solves
>> most of the problem if implemented. In fact BIND has an implementation
>> of it and is not vulnerable to the TsuNAME attack (or at least I was not
>> able to reproduce it).
> Yep, but 7.1 was unfortunately (for this case) optional, and a MAY.
> But when we privately disclosed tsuname at OARC34, we tested only if
> BIND and others would loop in the presence of a single client query.
> They don't. That covers only one source of loop: resolvers looping.
> But what happens when a client sends non-stop queries to the same
> resolver?  Does bind answer from cache (7.1 RFC2308) OR will trigger new
> queries again? (we did not test for that, if you did, could you please
> share the findings)?

This is an interesting question. In case of BIND there are two (or 
three...) things which prevent it from generating queries to 
authoritatives when queried repeatedly:

1] First stage is RFC 2308 section 7.1-style "SERVFAIL cache". It is by 
default configured with a 1 second TTL ("servfail-ttl" option in 
Identical queries which resulted in SERVFAIL are responded from this 
cache without doing anything else.
Please note that this is an "output" cache, i.e. it stores SERVFAILs 
generated by the resolver itself - which happens when query fails for a 
number of reasons, including resource limits.

2] If the answer is not in SERVFAIL cache, the resolver starts 
recursing, but naturally consults its RR cache for each step. While 
processing the second query, the resolver will find delegations from the 
authoritative servers in RR cache and use these instead of re-querying 
servers again. I.e. no queries will be generated until TTL in RR cache 
expires (or cache eviction kicks out delegation RRs for other reasons).

3] The third reason is a bug in older versions of BIND :-D A subtle bug 
caused mishandling of queries with cyclic dependencies in delegations, 
causing BIND to _delay_ responding with SERVFAIL by roughly 10 seconds 
(an another internal timeout).

All two/three mechanism dampen amount of outgoing queries. Of course we 
need to look at it with attacker's mindset and probe for holes in it, 
but with this infrastructure in place I think it will not be much worse 
than regular TTL=0 query/answer flood, and that's only possible if 
attacker has control over delegation TTL (which is AFAIK not the case 
for most TLDs).

> Because if does not cache, clients recurrent queries would force the
> resolver to send many queries to the authoritative servers, and it would
> seem they'd be looping.  See fig3(b) in [0], where we show that only
> some of Google resolvers would be aggressive -- and those were the ones
> that had these impatient clients.
> That's the second root cause: clients/forwarders looping.

Sure, that boils down to generic problem "clients evading cache in 
resolvers", which is always PITA. We should declare TTL=0 illegal :-)

Petr Špaček