Re: [DNSOP] Genart last call review of draft-ietf-dnsop-caching-resolution-failures-06

"Wessels, Duane" <dwessels@verisign.com> Mon, 21 August 2023 21:07 UTC

Return-Path: <dwessels@verisign.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0CE5FC16950F; Mon, 21 Aug 2023 14:07:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.406
X-Spam-Level:
X-Spam-Status: No, score=-4.406 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=verisign.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OP3sxfodkwrU; Mon, 21 Aug 2023 14:07:21 -0700 (PDT)
Received: from mail1.verisign.com (mail1.verisign.com [72.13.63.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 62172C152565; Mon, 21 Aug 2023 14:07:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=verisign.com; l=10108; q=dns/txt; s=VRSN; t=1692652041; h=from:to:cc:date:message-id:references:in-reply-to: content-id:content-transfer-encoding:mime-version:subject; bh=DH3scgnHgoJ1hxsNSFdGDUij9yLZMu4LIb33AK9zri4=; b=U1tmVhTOmsdEzw9W67+ROumnPfR0MFIEYb1GlskwkFHd3NbHQzSt0lpf tIlSjdIaNX31qVnMdTVoWUuLLOCe0XDKPfmxp+SRdhgHEOimKZzztPbQg ZSpQs5brXg71qjIuNfFVfVFHxdrxEpSCSaqtroqwrKVt4+nNyaa+aYuE9 EB9K6Ov3DjQoNFTlOM+4oCNA7HZ+gK3TraReCQHZLlcqW3fkaadcV46KK nhTgsdp+HSJrYKgJDFiY6WrF3gp9LTZQSnpZKdGQaPFT0cqUZB3O3cR6M 2mAJhzcdZWYZKwoYb9W3RM0FfFdO0TaT87vuDD/VtmQ8Lgir5Oh/Up/JZ g==;
IronPort-Data: A9a23:aQFzXK9UhpmjAzbnTFQXDrUDVX+TJUtcMsCJ2f8bNWPcYEJGY0x3z GYbUTjVPP2ONGejedkgb9y3oE0Dv5KDmN4ySFQ/qy4xFiIbosf7XtnIdU2Y0wF+jCHgZBk+s 5hBMImowOQcFCK0SsKFa+C5xZVE/fjUAOC6UoYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE0 T/Ii5S31GSNhXgsawr414rZ8Ek05KSo4WtD1rADTasjUGH2xiF94K03ePnZw0vQGuF8AuO8T uDf+7C1lkuxE8AFU47Nfh7TKyXmc5aKVeS8oiM+t5uK23CukhcPPpMTb5LwX28M0mnUwIoho Dl6ncfYpQ8BZsUgkcxDC0UIS3kW0aduoNcrKlDn2SCfItGvn9IBDJyCAWlvVbD09NqbDkly+ MEHaw0SVivSjsub65C3T/RluPUKeZyD0IM34hmMzBnzN9B/frbuc/2To8FT2y0owMlCW+jEf MxfYj1qBPjCS0QXfA5IU9Rnwbzu2imXnz5w8Tp5oYI7/GXI1wF1y5DzPcDUYd2FQ4NemUPwS mfupTSjW0xHb4X3JTyt9WC1vumWlnLCYJsxDYCl071smnS8yTlGYPERfR7hyRWjsWa6RtlWM FQd4CYzoO5oqAr0Ztb4Vhy85nWDu3Y0VNdWVuQg9CmMx7bapQGDCQAsQjhab8QOtcIqS3otz FDht9/zDDJz9byYVXzY+rGPqiv3MiEeLW4EamoeQBAC58T/oYY1yxzGT9J+CqOuyNTxHRnxz iyE6i8kiN07iccQy+Cw9FTDqzOhupaPSRQ6jjg7RUqv9AUge4iod9TxrEPF97BFLZ3cRF7Ht mICwo6A9vsIS5qKkURhXdkwIV1g3N7dWBW0vLKlN8BJG+iFk5J7Qb1t3Q==
IronPort-HdrOrdr: A9a23:Fixw16PiXJR9ysBcThyjsMiBIKoaSvp037BN7TEVdfU1SL37qy nAppQmPHPP5gr5O0tOpTnoAsDpfZq2z+8X3WB+B9afdTijlmeuIJpr8IfuhxbxcheTysdtkY NtabJ3BtG1L1Rr5PyR3CCIV/It2sOO/qztv/rZ1HsFd2xXQrtt9Bh0ETyWFUBKRA1LbKBTKK ah
X-Talos-CUID: 9a23:y84dRW0z5ni77XaxlDJ4tLxfJs0YbT7g00/rckaRJE1LRLiIQGGh0fYx
X-Talos-MUID: 9a23:w7KR1Ahrfa9j0Amoek22dsMpM9Vx26L3AxE0qYwlovSZEG92IR6yg2Hi
X-IronPort-AV: E=Sophos;i="6.01,191,1684800000"; d="scan'208";a="28215531"
Received: from BRN1WNEX01.vcorp.ad.vrsn.com (10.173.153.48) by BRN1WNEX01.vcorp.ad.vrsn.com (10.173.153.48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.31; Mon, 21 Aug 2023 17:07:19 -0400
Received: from BRN1WNEX01.vcorp.ad.vrsn.com ([10.173.153.48]) by BRN1WNEX01.vcorp.ad.vrsn.com ([10.173.153.48]) with mapi id 15.01.2507.031; Mon, 21 Aug 2023 17:07:19 -0400
From: "Wessels, Duane" <dwessels@verisign.com>
To: Lucas Pardue <lucaspardue.24.7@gmail.com>
CC: "gen-art@ietf.org" <gen-art@ietf.org>, "dnsop@ietf.org" <dnsop@ietf.org>, "draft-ietf-dnsop-caching-resolution-failures.all@ietf.org" <draft-ietf-dnsop-caching-resolution-failures.all@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>
Thread-Topic: [EXTERNAL] Genart last call review of draft-ietf-dnsop-caching-resolution-failures-06
Thread-Index: AQHZzFBuO+aQMgALxUaqtReygLq5Sq/1kVKA
Date: Mon, 21 Aug 2023 21:07:19 +0000
Message-ID: <509DC3AD-AA85-443C-ACE8-8CCF1C903BB2@verisign.com>
References: <169175737767.37063.4458393955343190137@ietfa.amsl.com>
In-Reply-To: <169175737767.37063.4458393955343190137@ietfa.amsl.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-mailer: Apple Mail (2.3731.700.6)
x-originating-ip: [10.170.148.18]
Content-Type: text/plain; charset="utf-8"
Content-ID: <74BDF40280292C458FE014C96278C0D6@verisign.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/BLO-gQIUYkCxUVPDL3TM_kSXIag>
Subject: Re: [DNSOP] Genart last call review of draft-ietf-dnsop-caching-resolution-failures-06
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Aug 2023 21:07:26 -0000


> On Aug 11, 2023, at 5:36 AM, Lucas Pardue via Datatracker <noreply@ietf.org> wrote:
> 
> Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. 
> 
> Reviewer: Lucas Pardue
> Review result: Ready with Issues
> 
> I am the assigned Gen-ART reviewer for this draft. The General Area
> Review Team (Gen-ART) reviews all IETF documents being processed
> by the IESG for the IETF Chair.  Please treat these comments just
> like any other last call comments.
> 
> For more information, please see the FAQ at
> 
> <https://secure-web.cisco.com/1UZHZEsg_CD0wKCgJum89JtRWBIKuWfAMrOAeNCDx_noxdIVT0xTFtSDKvvkTvjoqt0318tJcX06nwaM58f9XNMDWWilDoqIENqL_gk262YdZle75QHHoW2s2KdRaGCdQkKG8uKUbDRRY655t-OOuxr0Yfd1eJmBdp5KBeJs1-XyEcQI-c_JeFcXJ8taygT-DnCUz-awp_q3J8yJneseERQtJ7GDzNxDcvYbgsJO-fPPCB7ErC401Qq9bP2qWs07AET3l4jK5lmNnyR4yBeDa5NBFgyzdWwC8DOQ9c2t6FPY/https%3A%2F%2Fwiki.ietf.org%2Fen%2Fgroup%2Fgen%2FGenArtFAQ>.
> 
> Document: draft-ietf-dnsop-caching-resolution-failures-??
> Reviewer: Lucas Pardue
> Review Date: 2023-08-11
> IETF LC End Date: 2023-08-17
> IESG Telechat date: Not scheduled for a telechat
> 
> Summary: The document was well-written with clear motivation statements and
> normative text for addressing the indicated problems

Hi Lucas, thanks for the detailed review.


> 
> Major issues: None
> 
> Minor issues:
> 
> * Section 3.1 describes retries and places the normative requirement "A
> resolver MUST NOT retry a given query to a server address over a given
> transport protocol more than ...". However, the definition of "transport
> protocol" is not 100% clear to me, and the terms "transport" and "transport
> layer protocol" seem to be used interchangeably through the document.  Perhaps
> this is clearer to those in the DNS area, but as a transport area person, DNS
> over TCP and DNS over TLS both use the same transport protocol. Section 2.3
> would seem to imply that DNS over TCP and DNS over TLS are treated as different.
> 
> I think it would help to better define exactly what "a given transport
> protocol" in section 3.1 means. Perhaps that definition already exists
> somewhere that can be cited and imported into the terminology section.

You’re right that we have not been especially precise when using the word “transport.”
The authors did intend for DNS over UDP, over TCP, and over TLS, etc to essentially
be treated as separate transports, or separate ways a client can talk to a server.

I’m not sure how best to fix this.  On one hand, as far as we know, there is
currently not a good term that collectively refers to DNS over UDP, TCP, TLS, HTTPS,
QUIC, and whatever else may come our way.  So maybe we need to define one.  I’m
hesitant, though, because I’m not sure this document is where such a term should
be introduced, and because definitions often turn out to be like cans of worms.

Nonetheless, we have taken a stab at it:

   *  DNS Transport: In this document, DNS transport means a protocol
      used to transport DNS messages between a client and a server.
      This includes "classic DNS" transports, i.e., DNS-over-UDP and
      DNS-over-TCP [RFC1034] [RFC7766], as well as newer encrypted DNS
      transports such as DNS-over-TLS [RFC7858], DNS-over-HTTPS
      [RFC8484], DNS-over-QUIC [RFC9250], and similar communication of
      DNS messages using other protocols.  NOTE: at the time of this
      writing not all DNS transports are standardized for all types of
      servers, but may become standardized in the future.

…

3.1.  Retries and Timeouts

   A resolver MUST NOT retry a given query to a server address over a
   given DNS transport more than twice (i.e., three queries in total)
   before considering the server address unresponsive over that DNS
   transport for that query.

   A resolver MAY retry a given query over a different DNS transport to
   the same server if it has reason to believe the DNS transport is
   available for that server and is compatible with the resolver's
   security policies.







> 
> Nits/editorial comments:
> 
> * In section 1, there exists "section 5" and "section 7" usages that do make it
> clear if these are internal or external references.

We propose to just remove those section references.

> 
> * I appreciated the text in sections 1.1 and 1.2, dealing with motivation and
> related use cases respectively. However, as a generalist reviewer, the most
> useful part of Section 1.1 was the first sentence. The remainder of the text in
> 1.1 feels like case studies, that while interesting manifestations, are not
> pure motivation. As a purely editorial suggestion you can take or leave,
> consider modifying the last paragraph of Section 1 to something like
> 
> "Operators of DNS services have known for some time that recursive resolvers
> become more aggressive when they experience resolution failures; see Appendix A
> for a collection of anecdotes, experiments, and incidents support this claim.
> This document updates [RFC2308] to require negative caching of DNS resolution
> failures, which can help to mitigate the operational problems failures might
> generate. Examples of resolution failures are provided in Section 2. Related
> work is described in Appendix B."
> 
> then move the text from sections 1.1 and 1.2 in appendix A and appendix B.

That is an interesting suggestion.  Among discussion with my coauthors we have
a slight preference to leave it as-is, but would also like to take advice on
this from the RFC editor.


> 
> * TOC - "Conditions That Lead To DNS Resolution Failures" vs "Requirements for
> Caching Resolution Failures". Presumably the same thing, so consistency might
> help

I’m not sure I understand this comment.  Can you explain further what you mean?


> 
> * Section 3.2 - regarding the 1 second minimum requirement, the text that
> follows says "Resolvers MAY cache different types of resolution failures for
> different (i.e, longer) amounts of time." and then later "Consistent with
> [RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes.".
> These statements are all logically consistent but could be made simpler with
> some editorial work. For example, something like
> 
> "Resolvers MUST cache resolution failures for at least 1 second. Resolvers MAY
> cache failures for a longer time, up to a maximum of 5 minutes (per the
> requirements of [RFC2308]). Resolvers MAY cache different types of failures
> using different time periods within this range."

I see what you’re saying.  We propose to move the maximim caching time up and split that paragraph into two, as follows:

   Resolvers MUST cache resolution failures for at least 1 second.
   Resolvers MAY cache different types of resolution failures for
   different (i.e., longer) amounts of time.  Consistent with [RFC2308],
   resolution failures MUST NOT be cached for longer than 5 minutes.

   The minimum cache duration SHOULD be configurable by the operator.  A
   longer cache duration for resolution failures will reduce the
   processing burden from repeated queries, but may also increase the
   time to recover from transitory issues.

DW