Re: [DNSOP] What should ANAME-aware servers do when target records are verifiably missing?

Richard Gibson <richard.j.gibson@oracle.com> Fri, 12 April 2019 00:02 UTC

Return-Path: <richard.j.gibson@oracle.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 556F912044A for <dnsop@ietfa.amsl.com>; Thu, 11 Apr 2019 17:02:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.301
X-Spam-Level:
X-Spam-Status: No, score=-4.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=oracle.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uo9fKsEhGGiu for <dnsop@ietfa.amsl.com>; Thu, 11 Apr 2019 17:02:38 -0700 (PDT)
Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 06F03120013 for <dnsop@ietf.org>; Thu, 11 Apr 2019 17:02:37 -0700 (PDT)
Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3BNxC9P139786; Fri, 12 Apr 2019 00:02:36 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type; s=corp-2018-07-02; bh=gJZnJ9zzTJOpohac2IeknpKkWn8vNE6X+kMP2erZ12U=; b=yY9vhktgx7u4bxsXHcRAiAKlX8Vm4GRIy4KQ5oOs8qlgtqFmGWE41DmvmYXj2+mTepSB +joKy9+gjFiabShGqWTuTr1W0NI6x3Ve47BzlKbSWoJqWb/w2RHPrGDaewfsOTXMiMSl 0jaNc0eYJWu7I2tvTqb13xesHHJ9oQu5Pk//9EmHVSM4QTngceNiUXu8Mzp01ya1Q7n0 ipFzlNfWljBKqVk0zvZ+Tpe/DTSQKfv4ByzwTmsP7JRYggOySMgaRsOMZSH7iYSGiRlZ NTIh44MtGng/579TJx2r6bMlhKcu2Tk9N8QddoAiLgVT8XeJ6Akw9EUB0rrhyd++8tom Vw==
Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 2rpkhtbtrd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 12 Apr 2019 00:02:35 +0000
Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3C02Z6S024490; Fri, 12 Apr 2019 00:02:35 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 2rph7u19mn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 12 Apr 2019 00:02:34 +0000
Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x3C02SSl018535; Fri, 12 Apr 2019 00:02:32 GMT
Received: from [192.168.1.213] (/67.189.230.160) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 11 Apr 2019 17:02:28 -0700
To: Matthew Pounsett <matt@conundrum.com>
Cc: Bob Harold <rharolde@umich.edu>, dnsop <dnsop@ietf.org>
References: <d8ccad4a-cd0c-4c97-b4d7-2099657351dc@oracle.com> <CA+nkc8BM+mfTBm3XyOaZUF5hMg23t9aSY4nq4Y4=BQ-sjcjkVg@mail.gmail.com> <25b38d21-c572-d782-6b35-a187fa0caae8@oracle.com> <CAAiTEH9Eg0oYw9HR9Ab5pYikFUvcbWXneF39_8xasp6tE9PpCA@mail.gmail.com>
From: Richard Gibson <richard.j.gibson@oracle.com>
Message-ID: <516fda75-bb6e-67c6-cd52-0a5017bc889f@oracle.com>
Date: Thu, 11 Apr 2019 20:02:26 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <CAAiTEH9Eg0oYw9HR9Ab5pYikFUvcbWXneF39_8xasp6tE9PpCA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------6CAD79BC8B8BB40AD96D9247"
Content-Language: en-US
X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9224 signatures=668685
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904110155
X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9224 signatures=668685
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904110155
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/V0RfLg9UIKpdTkB4eJv7UqwhhEo>
Subject: Re: [DNSOP] What should ANAME-aware servers do when target records are verifiably missing?
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Apr 2019 00:02:40 -0000

Responses inline.

On 4/11/19 18:50, Matthew Pounsett wrote:
> On Wed, 10 Apr 2019 at 16:43, Richard Gibson
> <richard.j.gibson@oracle.com> wrote:
>> The first problem is for the owner of the ANAME-containing zone, for whom the upstream misconfiguration will result in downtime and be extended by caching to the MINIMUM value from their SOA, which in many cases will be one to three orders of magnitude greater than the TTL of the ANAME itself.
> I think I'm missing something here.  If, for example, the TTL of the
> ANAME is 1 hour, what mechanism results in caching holding onto a
> negative answer for a broken target name for a minimum of 10 hours,
> and to 40 days?
Demonstrative example zone:

example.com.  3600  IN    SOA  ns.example.net. hostmaster.example.net. 1 (
                                   7200   ; REFRESH
                                   3600   ; RETRY
                                  86400   ; EXPIRE
                                   3600  ); MINIMUM
example.com.    60  IN  ANAME  example.invalid.
example.com.    60  IN      A  192.0.2.1

When an ANAME-aware resolver queries an ANAME-aware authoritative server 
for example.com. A, it will receive the A record in the answer section 
and the ANAME in the additional section. If it then chases the ANAME 
target to an NXDOMAIN and accepts that as justification for replacing 
the sibling A RRSet with nothing as currently specified in the draft, 
then the appropriate response will be a Type 2 NODATA in which the 
answer section is empty and the additional section contains the SOA. But 
this suffers from both of the problems I have been complaining about—the 
resolver does not necessarily /have/ the zone SOA, possibility 
necessitating an inline lookup, and per RFC 2308 the negative response 
will be cached according to values from the SOA that are unrelated to 
and far exceed the TTL of the ANAME.

>> Both of these problems can be addressed by allowing/recommending/requiring ANAME-aware servers to preserve ANAME siblings when resolution of ANAME targets results in NXDOMAIN or NODATA responses, rather than replacing them with an empty RRSet... which, to be honest, seems to be always-undesirable behavior anyway—if anyone can think of a scenario where it would be beneficial to dynamically remove ANAME siblings, please share it.
> I feel like this is creating an even bigger potential problem.  What
> happens when the owner of the ANAME target legitimately wants that
> name to go away, but some other zone owner is leaving an ANAME in
> place pointing to that now-nonexistent name?  Continuing to serve the
> sibling records indefinitely seems like serve-stale gone horribly
> wrong.

In such a configuration, the owner of the ANAME will be able to see that 
clients are using its static sibling records rather than its target (and 
therefore that they are getting no benefit from the ANAME), and can 
react accordingly. If your concern is excess queries for the ANAME 
target, then this seems no different from e.g. CNAME—the owner of the 
target can issue long-lived negative responses while performing whatever 
other exploration and/or mitigation they deem fit.

But this seems like it will be much more rare and frankly much less of a 
problem than stretching out misconfiguration at an ANAME target into 
extended downtime for an ANAME owner. It must be possible for the latter 
to execute a recovery plan as quickly as possible, and if ANAME is 
specified well then that the first step of recovery can be literally 
instant and automatic.