Re: [DNSOP] Dnsdir last call review of draft-ietf-dnsop-caching-resolution-failures-03

Peter van Dijk <peter.van.dijk@powerdns.com> Mon, 03 July 2023 09:17 UTC

Return-Path: <peter.van.dijk@powerdns.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 61A81C15152F; Mon, 3 Jul 2023 02:17:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.094
X-Spam-Level:
X-Spam-Status: No, score=-7.094 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=powerdns.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RehmnwLqwJWG; Mon, 3 Jul 2023 02:17:19 -0700 (PDT)
Received: from mx3.open-xchange.com (mx3.open-xchange.com [87.191.57.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 17FACC14F75F; Mon, 3 Jul 2023 02:17:18 -0700 (PDT)
Received: from imap.open-xchange.com (imap.open-xchange.com [86.85.149.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx3.open-xchange.com (Postfix) with ESMTPSA id A46426A0D0; Mon, 3 Jul 2023 11:09:29 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=powerdns.com; s=202306; t=1688375369; bh=g+/0RCBdg3Urqcip+Od7F6JAe7vFo/+4LYj8cNm2ACU=; h=Subject:From:To:Date:In-Reply-To:References:From; b=Zv6kra6rEqRN6VJYPO8iBFnYYA9WwXxNKjBU/d+TMGM1eFMJKMhETmRGT0MYZc+1+ lqzJk4J0gWSXseedCI9XDvs05NHrE25FP4oQf14gq55p5Q0PFOrI9PLVqvv09Fg1sZ eSZOcJHD0i0lvbDUacuUIgBgFvKnggB5Ze7ZqSpL0gZrycdbHIuqKB5exnFdxT9Q6q O8l6i/cD7qBJRhqcMtU9FjukMp2jjERM36JvNm74h5vS6Xo8UFr/G/r5ud6uoQelOT LKHijKCOXGgjTUtAKPXa6U/gGFMgcwgUYZ1hksrRsnbO3YigPGxpPOS0JIJkXHPESo iCo+qBTbD1Q+A==
Received: from [10.91.39.75] ([86.85.149.247]) by imap.open-xchange.com with ESMTPSA id 2d8UIUmQomTotycA3c6Kzw (envelope-from <peter.van.dijk@powerdns.com>); Mon, 03 Jul 2023 11:09:29 +0200
Message-ID: <10777fba692a1ccb8b86cd86c1bfdfe9f2fba1fa.camel@powerdns.com>
From: Peter van Dijk <peter.van.dijk@powerdns.com>
To: "dnsdir@ietf.org" <dnsdir@ietf.org>, "dnsop@ietf.org" <dnsop@ietf.org>, "draft-ietf-dnsop-caching-resolution-failures.all@ietf.org" <draft-ietf-dnsop-caching-resolution-failures.all@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>
Date: Mon, 03 Jul 2023 11:09:29 +0200
In-Reply-To: <354927FC-9FF7-4F21-A5D9-023D944522A5@verisign.com>
References: <168779086892.55920.13910161227412972733@ietfa.amsl.com> <354927FC-9FF7-4F21-A5D9-023D944522A5@verisign.com>
Organization: PowerDNS.COM B.V.
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.38.3-1+deb11u1
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/XGUInkBJ57zVXD4vfc5GN8Y2NNE>
Subject: Re: [DNSOP] Dnsdir last call review of draft-ietf-dnsop-caching-resolution-failures-03
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Jul 2023 09:17:24 -0000

Hello Duane & others,

thank you for your response. Comments inline below.


On Thu, 2023-06-29 at 23:58 +0000, Wessels, Duane wrote:
> 
> 
> > 
> > ## 2.2
> > 
> > The first paragraph correctly mentions "policy reasons". The second paragraph
> > correctly says "they are not authoritative". I am not sure not being
> > authoritative can be considered a policy reason, so perhaps these two
> > paragraphs can be connected with an "or"?
> 
> I see your point.  We propose this change to the introduction sentence:
> 
> A name server returns a message with the RCODE field set to REFUSED
> when it refuses to process the query, e.g., for policy or other reasons.

Works for me.

> 
> > 
> > ## 3.1
> > 
> > "A resolver MUST NOT retry a given query over a server's transport  more than
> > twice" - should this be clarified to say "in a short period of time" or
> > something like that? Clearly a retry is allowed *eventually*.
> 
> For reference, here’s the sentence in question at the start of 3.1:
> 
>    A resolver MUST NOT retry a given query over a server's transport more
>    than twice (i.e., three queries in total) before considering the
>    server's transport unresponsive for that query.
> 
> We feel that “a given query” and “for that query” in the sentence sufficiently limits the
> scope here, and there is no need to qualify it by some amount of time.
> 
> As an example, let’s say that a recursive has been asked to lookup www.example.com (our “given” query).  The example.com zone has two name servers, each of which has two IP addresses, and (presumably) two transports.  It can send 3 queries to 199.43.135.53 over UDP (then that transport is unresponsive), 3 queries to 199.43.133.53 over UDP, same over TCP, over IPv6, and so on.  In total the recursive can send 2x2x2x3 = 24 queries before it has to give up if all servers and all transports are unresponsive. At this point the resolver gives up on that query and returns SERVFAIL.
> 
> Then, section 3.2 is about caching and says that the resolution failure MUST be cached for at least 5 seconds, but otherwise gives implementations a lot of freedom in how to do that.  Could be by query tuple, by server/transport, or some other way.

Right! 3.2 solves this.

> > Also, "MUST NOT" is pretty strong language. Given the various process models of
> > resolver implementations, two subprocesses (threads) both retrying the same or
> > a similar thing a few times can not always be avoided. Would you settle for
> > SHOULD NOT? The "given" in "retry a given query" gives some leeway, but not
> > enough, I feel.
> 
> We feel that MUST NOT is appropriate but would like more input from working group
> members and implementors especially.

Ok

> > "may retry a given query over a different transport .. believe .. is available"
> > - this ignores that some transports have better security properties than
> > others. One currently active draft in this area is
> > draft-ietf-dprive-unilateral-probing. Perhaps add some wording, without being
> > too prescriptive, such as "available, and compatible with the resolver's
> > security policies, ..".
> 
> We think “compatible with the resolver’s security policies” goes without saying, but don’t mind making it explicit.

I am inclined to agree, and will leave this for others to judge.

> > 
> > ## 3.2
> > 
> > A previous review
> > (https://secure-web.cisco.com/1-uwEOxF71cZbW0W3ux-QNC1pO0bJjYJvc0KHnZ_wN4Xw3M1XWB_K8diPjdzzV1zzAfZ98vObLHcs-9USjQPtEzxOdqnjHtcYGPxv8yID-fDRYNW8i8BtGJL-qahSS-JHbS3LHL6Bfm0duG-nUUKdSZF_MOoDFhQymCFnu838N4-l8Ky7xjoVKijU3pbZHLVQFpxjYecSLm0hqLoc4GW9n2Ri-vYT-lKiSPl5qB72Q1kbSUp21qnHSMMrfCCEizICDfjVzCKrwtau5DkwfiR7PVxgh2wT1twgX8oVBhJIY-0QfTaJLnHg7itWRgwH3tcX/https%3A%2F%2Fmailarchive.ietf.org%2Farch%2Fmsg%2Fdnsop%2FsJlbyhro-4bDhfGBnXhhD5Htcew%2F)
> > suggested that the then-chosen tuple was not specific enough, and also said it
> > was too prescriptive. I agree with both. The current draft prescribes nothing,
> > which I'm generally a fan of!
> > 
> > However, speaking to a coworker (the one likely responsible for implementing
> > this draft, if it turns out our implementation deviates from its final form)
> > told me "some guidance would be nice". After some discussion on
> > prescriptiveness, here is our suggestion: do not prescribe, but mention
> > (without wanting to be complete) a few tuple formats that might make sense, and
> > suggest that implementations document what they choose here.
> 
> The relevant text here currently says:
> 
>    The implementation might cache different resolution failure conditions
>    differently.  For example, DNSSEC validation failures might be cached
>    according to the queried name, class, and type, whereas unresponsive
>    servers might be cached only according to the server's IP address.
> 
> So we provide two examples, although not really phrased as “tuples”.  I guess you’re suggesting to see more options here and talk about them more as tuples?

Yes, I think that would make sense.

> For the documentation suggestion, maybe something like this?: “Developers SHOULD document their implementation choices so that operators know what behaviors to expect when resolution failures are cached.”

Wonderful.

> 
> 
> First, we apologize for not realizing that this and two other “for discussion” questions were not yet resolved.  We plan to remove the first (from the Introduction).
> 
> For the one that was in section 2.6, we propose this updated text and new section 3.4:
> 
> 2.6.  DNSSEC Validation Failures
> 
>    For zones that are signed with DNSSEC, a resolution failure can occur
>    when a security-aware resolver believes it should be able to
>    establish a chain-of-trust for an RRset but is unable to do so,
>    possibly after trying multiple authoritative name servers.  DNSSEC
>    validation failures may be due to signature mismatch, missing DNSKEY
>    RRs, problems with denial-of-existence records, clock skew, or other
>    reasons.
> 
>    Section 4.7 of [RFC4035] already discusses the requirements and
>    reasons for caching validation failures.  Section 3.4 of this
>    document strengthens those requirements.

Good.

> 3.4.  DNSSEC Validation Failures
> 
>    Section 4.7 of [RFC4035] states:
> 
>    To prevent such unnecessary DNS traffic, security-aware resolvers MAY
>    cache data with invalid signatures, with some restrictions.
> 
>    This document updates [RFC4035] with the following, stronger
>    requirement:
> 
>    To prevent such unnecessary DNS traffic, security-aware resolvers
>    MUST cache DNSSEC validation failures, with some restrictions.

Good :)

> And for the one in section 3.3 we propose this:  
> 
> 3.3.  Requerying Delegation Information
> 
>    Section 2.1 of [RFC4697] identifies circumstances in which "every
>    name server in a zone's NS RRSet is unreachable (e.g., during a
>    network outage), unavailable (e.g., the name server process is not
>    running on the server host), or misconfigured (e.g., the name server
>    is not authoritative for the given zone, also known as 'lame')."  It
>    prohibits unnecessary "aggressive requerying" to the parent of a non-
>    responsive zone by sending NS queries.
> 
>    The problem of aggresive requerying to parent zones is not limited to
>    queries of type NS.  This document updates the requirement from
>    section 2.1.1 of [RFC4697] to apply more generally: Upon encountering
>    a zone whose name servers are all non-responsive, a resolver MUST
>    cache the resolution failure.  Furthermore, the resolver MUST limit
>    queries to the non-responsive zone's parent zone (and other ancestor
>    zones) just as it would limit subsequent queries to the non-
>    responsive zone.

Looks great.

Thanks!

Kind regards,
-- 
Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/