Re: [DNSOP] Working Group Last call for draft-ietf-dnsop-dns-error-reporting

Viktor Dukhovni <ietf-dane@dukhovni.org> Mon, 26 June 2023 21:13 UTC

Return-Path: <ietf-dane@dukhovni.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DF35AC14CF18 for <dnsop@ietfa.amsl.com>; Mon, 26 Jun 2023 14:13:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lW3XDMRlbDIE for <dnsop@ietfa.amsl.com>; Mon, 26 Jun 2023 14:13:24 -0700 (PDT)
Received: from straasha.imrryr.org (straasha.imrryr.org [100.2.39.101]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8B797C14CE3B for <dnsop@ietf.org>; Mon, 26 Jun 2023 14:13:23 -0700 (PDT)
Received: by straasha.imrryr.org (Postfix, from userid 1001) id 209F4127376; Mon, 26 Jun 2023 17:13:23 -0400 (EDT)
Date: Mon, 26 Jun 2023 17:13:23 -0400
From: Viktor Dukhovni <ietf-dane@dukhovni.org>
To: DNSOP Working Group <dnsop@ietf.org>
Message-ID: <ZJn_cwWWOKIn1wbq@straasha.imrryr.org>
Reply-To: dnsop@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <BN8PR15MB32811A729AA0F3D979C7B6EBB323A@BN8PR15MB3281.namprd15.prod.outlook.com> <ybl4jmzt7nt.fsf@wd.hardakers.net> <504D8E15-981A-4C18-AC47-E8E761DF7495@dnss.ec> <ybly1kswsrq.fsf@wx.hardakers.net> <D7DB2C7B-0CDD-443E-BE32-14F5D04389B9@nohats.ca> <14B39D5F-4E55-488C-AB79-0D4407FD3255@dnss.ec> <CAKW6Ri7HBG1KvDtf15vLsn8vqKbHZ6LtzUAU86RLgb4+m1YZpA@mail.gmail.com> <62DB4990-C49F-46C6-9A72-EBFACFB835B4@dnss.ec> <CAKW6Ri4+PgNKLgwPpJdGRgAoDm=qcqQAksdNjQJgvY552BzBEQ@mail.gmail.com> <CAKW6Ri6BnA1xpmQLpwepBDCZa=G0FnD-QtBqqskLc9NaOn3n8w@mail.gmail.com> <395A2004-803E-43CA-945E-F9C1EDE86F21@dnss.ec> <CAKW6Ri5da0Gnb=840U1h-E_1amt8HrJbGh9Tid-DQSsTpTqAvg@mail.gmail.com> <49112d32-e0c7-0ee0-9bdb-b1379fc8e7ce@nlnetlabs.nl> <B4F95679-2065-45E4-B214-970412665DDD@dnss.ec> <CAKW6Ri7yL7OSOuEnT4jG8UJ2acS9G1S2GR3A1KGw+_1y=XBNeQ@mail.gmail.com> <8fa8134b-97ee-beac-07f3-88362c32618f@NLnetLabs.nl> <6231D8A5-C131-4874-B9CA-104384B231CF@dnss.ec> <fa6ec641-0eab-dec6-2267-3ca818402812@NLnetLabs.nl>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/Wa49-pDagrjOs6OxVJi9FYo0Dog>
Subject: Re: [DNSOP] Working Group Last call for draft-ietf-dnsop-dns-error-reporting
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jun 2023 21:13:27 -0000

On Thu, Jun 08, 2023 at 11:59:59AM +0200, Benno Overeinder wrote:

> This starts a two week Working Group Last Call process, and ends on: 
> June 22nd, 2023.

I hope my feedback is not too late.  There are a few important elements
of the draft that could use some changes.

On Tue, Jun 20, 2023 at 01:14:02PM +0200, Willem Toorop wrote:
>
> I have one nit.
> 
> In the Example in section 4.2., a request still "includes an empty ENDS0 
> report channel". The third paragraph of that same section states 
> something similar: "As support for DNS error reporting was indicated by 
> a empty EDNS0 report channel option in the request". But Section 6.1. 
> Reporting Resolver Specification states: "The EDNS0 report channel 
> option MUST NOT be included in queries."

On Tue, Jun 20, 2023 at 12:20:51PM +0100, Roy Arends wrote:
> 
> Ah, yes, I will remove that sentence completely!

So, under what conditions is the authoritative server free to include
the error reporting channel extension in its reply?

    - Does the resolver have to explicitly solicit it?

The reason this is important, is that there is non-negligible population
of authoritative servers that (EDNS0 requirements notwithstanding) are
not tolerant of unrecognised EDNS0 options.  Therefore, soliciting the
error reporting channel information is (at least initially, while this
is not widely supported) more likely to lead to errors than to help to
resolver errors.  This is then not attractive to implement!

I would prefer to require resolvers to be more tolerant of unexpected 
options, and would have servers report the channel without explicit
solicitation.

On Tue, Jun 20, 2023 at 11:35:28PM +0100, Dick Franks wrote:
> > An authoritative server includes the option if configured to do so
> > AND if it has the a non-null domain name configured as the reporting
> > channel. It will then reply to each query. This is IMHO better than
> > having a resolver include the option each and every time. Note that
> > resolvers will ignore options that are unknown to them.
> 
> 6.2.  Authoritative server specification
> Contains not a shred of normative language saying any of that.
> 
> The preliminary waffle in the overview could apply to either the
> solicited or unsolicited regime.
> 
> > > I withdraw my earlier statement that the document is almost ready.
> > > Now, clearly it is not.
> >
> > I hear you. I do not agree though, and I hope you reconsider
> Not without further work

I agree this needs to be made more explicit than just deleting the
conflicting text.

On Thu, Jun 22, 2023 at 04:10:46PM -0700, Wes Hardaker wrote:
> Roy Arends <roy@dnss.ec> writes:
> 
> > That, IMHO is already captured by the last paragraph. I did not
> > explicitly write a recipe of how to do that, and which servers could
> > be used for that :-). Could you suggest text to improve the last
> > paragraph without naming services?
> 
> Erg.  I hate it when I have to come up with text :-P
> 
> How about replacing the last sentence of security considerations with:
> 
> This method can be abused by intentionally deploying broken zones with
> agent domains that are delegated to victims.  This is particularly
> effective when DNS requests that trigger error messages are sent through
> open resolvers [RFC8499] or widely distributed network monitoring
> systems that perform distributed queries from around the globe.
> Implementations SHOULD rate-limit outgoing error messages to a
> recipient to no more than 1 a minute.

What is a "recipient"?  Is it a monitoring agent "zone", or a monitoring
agent transport endpoint?  If we're concerned about DoS, perhaps it
should be the latter, since many zones can resolve to the same set of
underlying nameservers...

On Fri, Jun 23, 2023 at 01:27:21AM +0000, Ben Schwartz wrote:

> I want this draft to move forward, but upon review I noted with
> concern the security section text:
> 
>    DNS error reporting is done without any authentication between the
>    reporting resolver and the authoritative server of the agent domain.
>    Authentication significantly increases the burden on the reporting
>    resolver without any benefit to the monitoring agent, authoritative
>    server or reporting resolver.
> 
> Strong authentication (e.g. to a zone identity with DNSSEC) is
> probably excessive, but the current draft appears to have no defense
> against even trivial IP spoofing.  Anyone in the world who can spoof
> IP addresses can impersonate a reputable resolver and pollute the
> error reports sent to authoritative servers.  As an authoritative
> server operator, I would place a lot more trust in reports from
> reputable resolvers than from unrecognized sources.
> 
> I think the draft should probably say something like: "To defend
> against spoofing of source IP addresses used for error reports,
> reporting resolvers MUST use DNS over TCP [RFC 7766], DNS COOKIE [RFC
> 7873], or another procedure that defeats IP address spoofing."

Requiring cookies would be great, but they have not yet seem broad
adoption.  Can we reasonably expect the monitoring agent zones to
support them (and ensure consistent cookie keys across the server
pool behind each server IP)?

Requiring TCP, combing with per-IP rate limits is probably simpler.

====== New feedback:

And last, but not least, as promised, some important suggestions to
simplify the protocool and improve scalability:


--- Section 4.  Overview

>  If the authoritative server has indicated support for DNS error
>  reporting and there is an issue that can be reported via extended DNS
>  errors, the reporting resolver encodes the error report in the QNAME
>  of the report query.  The reporting resolver builds this QNAME by
>  concatenating the _er label, the QTYPE, the QNAME that resulted in
>  failure, the extended error code (as described in [RFC8914]), the
>  label "_er" again, and the agent domain.  See the example in
>  Section 4.2.  Note that a regular RCODE is not included because the
>  RCODE is not relevant to the extended error code.

The proposed qname structure is suboptimal:

    - There is insufficient justification for the "_er" labels
      at either end of the error report qname.

        o  If the monitoring agent wants to see some particular prefix,
           (perhaps even periodically rotated to quickly drop stale
           junk) the authoritative server can vend the prefix with the
           agent domain.  So the "most-significant" parent "_er" is
           IMNHSO redundant.

        o The leading "least-significant" "_er" is likewise (see below)
          not adequately justified.

        o Making the EDE "info code" more significant than the problem
          domain makes it harder to disclaim responsibility for an
          entire DNS subtree (say, all of "xn--p1ai.monitoring.example").

          Surely the reported domain is *more* significant than the EDE
          info code.

Therefore, a much better qname would be:

        <EDE-info-code>.<qtype>.<qname>.<agent-zone>.


>  The resulting report query is sent as a standard DNS query for a TXT
>  DNS resource record type by the reporting resolver.

Also, qtypes are cheap, and I rather think that a dedicated qtype (one
that a supporting resolver might refuse to accept in queries from
clients for example) makes sense here.  There's no need to overload
TXT here.

>  This document gives no guidance on the content of the TXT resource
>  record RDATA for this record.

The dedicated qtype should have an empty payload.

>  If the monitoring agent were to respond with NXDOMAIN (name error),
>  [RFC8020] says that any name at or below that domain should be
>  considered unreachable, and negative caching would prohibit
>  subsequent queries for anything at or below that domain for a period
>  of time, depending on the negative TTL [RFC2308].

As mentioned above, making the "info-code" more significant than the
domain gets in the way here.

>  The reporting resolver constructs the QNAME
>  "_er.1.broken.test.7._er.a01.agent-domain.example." and resolves it.
>  This QNAME indicates extended DNS error 7 occurred while trying to
>  validate "broken.test." type 1 record.

Therefore, make that:


>  The QNAME for the report query is constructed by concatenating the
>  following elements, appending each successive element in the list to
>  the right-hand side of the QNAME:
>
>  *  A label containing the string "_er".
>
>  *  The QTYPE that was used in the query that resulted in the extended
>     DNS error, presented as a decimal value, in a single DNS label.
>
>  *  The QNAME that was used in the query that resulted in the extended
>     DNS error.  The QNAME may consist of multiple labels and is
>     concatenated as is, i.e. in DNS wire format.
>
>  *  The extended DNS error, presented as a decimal value, in a single
>     DNS label.
>
>  *  A label containing the string "_er".
>
>  *  The agent domain.  The agent domain as received in the EDNS0
>     report channel option set by the authoritative server.

See above, drop the pointless "_er" labels, and move the info code to
the leaf label.

>  The "_er" labels allow the monitoring agent to differentiate between
>  the agent domain and the faulty query name.  When the specified agent
>  domain is empty, or a null label (despite being not allowed in this
>  specification), the report query will have "_er" as a top-level
>  domain as a result and not the original query.  The purpose of the
>  first "_er" label is to indicate that a complete report query has
>  been received, instead of a shorter report query due to query
>  minimization.

Instead, note that qname minimised queries will not have the same qtype
(be it TXT or dedicated).  Instead they'll typically be "A" or "NS",
and also the reporting resolve should avoid all qname minimisation
below the agent domain, unasking the question.

-- 
    Viktor.