Re: [DNSOP] Working Group Last call for draft-ietf-dnsop-dns-error-reporting

Benno Overeinder <benno@NLnetLabs.nl> Mon, 17 July 2023 13:21 UTC

Return-Path: <benno@NLnetLabs.nl>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2D7FDC15154D; Mon, 17 Jul 2023 06:21:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=nlnetlabs.nl
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZqGX54nAtN6O; Mon, 17 Jul 2023 06:21:06 -0700 (PDT)
Received: from dane.soverin.net (dane.soverin.net [IPv6:2a10:de80:1:4092:b9e9:2294:0:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E4612C15154F; Mon, 17 Jul 2023 06:21:05 -0700 (PDT)
Received: from smtp.soverin.net (c04smtp-lb01.int.sover.in [10.10.4.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dane.soverin.net (Postfix) with ESMTPS id 4R4N4t3L5kz2xFb; Mon, 17 Jul 2023 13:21:02 +0000 (UTC)
Received: from smtp.soverin.net (smtp.soverin.net [10.10.4.100]) by soverin.net (Postfix) with ESMTPSA id 4R4N4t1XCSzFy; Mon, 17 Jul 2023 13:21:02 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nlnetlabs.nl; s=soverin; t=1689600062; bh=VSKyJiQWrQiUtrQUwGhCGUZ3zWOBJtaqIrJnSeIVypA=; h=Date:To:References:Cc:From:Subject:In-Reply-To:From; b=xe1YFnv4n6QBBb6/EmDhQjfwDTxbo/tvFtf5xxb0U1sE/PMujxzxSRn71OV8Nb1Ko 7rzFOwAShW/ctVSfrkLGYCieW3EW30BdawPZPSQaHmMFdyDF8e9xT1+orBo+X2vWlK ZIPLLm1mXYfojmLvXcUuVUbbZ4hpxBhyf/r4OTvh4BSL0K1e0W4BUpoWqBy+a09eWv wxQf5XqdcrMCSynPbxALCHa7qCxo2tFzWchX3eLqOVGr2gQsN8PCU+t6YIKYPUxyt/ 2xPWudJKS3pUvcRbrfO2MvUMmMXYcntB+l+rFSvDK+M5Mh+IFS/AFQMvXj3sLZQJfW pdMSNRuQxBtCw==
Message-ID: <2587d696-bd1a-6c25-6837-0f6269bdb813@NLnetLabs.nl>
Date: Mon, 17 Jul 2023 15:21:01 +0200
MIME-Version: 1.0
Content-Language: en-GB
To: DNSOP Working Group <dnsop@ietf.org>
References: <ZJn_cwWWOKIn1wbq@straasha.imrryr.org> <76E9FBC8-9F6D-4050-9C6F-E92A2CBEB326@dnss.ec> <ZKw40DEHBUfBEoUI@straasha.imrryr.org> <1583409F-8F04-4172-B9A1-94D9900402AB@dnss.ec> <ZKyHyo4Mb8I34rZI@straasha.imrryr.org>
Cc: DNSOP Chairs <dnsop-chairs@ietf.org>
X-Soverin-Authenticated: true
From: Benno Overeinder <benno@NLnetLabs.nl>
In-Reply-To: <ZKyHyo4Mb8I34rZI@straasha.imrryr.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=Mfmq+bzf c=1 sm=1 tr=0 ts=64b5403e a=IkcTkHD0fZMA:10 a=I02-Yb95z6Xi4a-irbgA:9 a=QEXdDO2ut3YA:10
X-Cloudmark-Reporter: FQWYQNP129J+TRk7ijcMJaZowPg=
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/xGX9lFIQQr0hjrubUMimrSfhJyg>
Subject: Re: [DNSOP] Working Group Last call for draft-ietf-dnsop-dns-error-reporting
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jul 2023 13:21:12 -0000

Dear WG,

This ends the WGLC for draft-ietf-dnsop-dns-error-reporting.

The last call has been extended a bit longer than initially planned, but 
valuable feedback has been received from the WG on the the draft.  Thank 
you very much.

The authors published a -05 revision a week ago that incorporates the 
feedback.  Some issues may need to be addressed in a subsequent revision 
before the document is sent to the IESG.  We are coordinating this 
further with the authors.

Best regards,

-- Benno

On 11/07/2023 00:35, Viktor Dukhovni wrote:
> On Mon, Jul 10, 2023 at 10:27:45PM +0100, Roy Arends wrote:
> 
>>> Right, but surely the monitoring agent can decide whether to solicit
>>> such a prefix label or not.  That is whether an "_er" prefix label is
>>> signalled is a *local matter* betweent the authoritative server
>>> signalling the option and the monitoring agent.
>>
>> I agree that a monitoring agent can specify a domain that may include
>> a separator as the least significant label. However, it also requires
>> the monitoring agent to understand that it should sometimes include
>> this separator, and that it may be redundant at other times.
> 
> If all the monitoring agent's "customers" (authoritative servers that
> return its "suffix" in the new option) are informed to signal an
> "_er.agent.example" name, there's no "sometimes".  The agent, by mutual
> agreement with the nameservers it supports can choose whatever suffix
> format meets its needs, fixed across all customers, or customer-specific.
> 
> I haven't yet seen a reason to insist on a fixed suffix pattern.  The
> resolver just stutters back the suffix it was handed by the
> authoritative server's extension payload.  What problem does mandating
> the least significant label of the suffix solve, that can't be solved by
> just signalling the desired suffix, special label and all?
> 
>> It assumes that those running the authoritative server that returns
>> the agent domain and those that run the reporting agent are in sync.
>> Those are a lot of assumptions.
> 
> If they're not in sync, surely reporting will be broken, whether or not
> an "_er" suffix label is used.
> 
>>>   Why should resolvers have to be responsible for this?
>>
>> Because this separating label is trivial to include and avoids a lot of hassle.
> 
> The hassle in question remains unclear.  I see two relevant/likely
> deployment models:
> 
>      * Self-hosted reporting, directly by the authoritative server:
> 
>          - Error reports are special by virtue of a dedicated qname
>            suffix and perhaps qtype.
> 
>          - No special coördination required, the server both publishes
>            and consumes the error reporting suffix.
> 
>       * Outsourced/centralised reporting, via server IPs dedicated to
>         error report processing.
> 
>           - Here again no need for "_er", because all queries are
>             presumptively error reports, and if the signal from the
>             "customer" auth server was wrong (whether or not an "_er"
>             label is included) the error report will not be handled
>             correctly.
> 
>            - If the signal has the correct (mutually agreed) suffix,
>              again no problem.
> 
>            - And of course the monitoring agent can specify the use
>              of "_er" (or whatever) if that's convenient.
> 
> What use-case actually benefits from the "_er" LSL (least-significant
> label) in the signal?  How is this benefit not obtained by mutual
> agreement between the monitoring agent and its customers?
> 
>>>> The sole purpose of the leading “least-significant” “_er” is to
>>>> distinguish between qname-minimized queries (for lack of a better
>>>> term) and “full” queries. I understand that you argue that a
>>>> monitoring agent can determine this without the _er labels (as
>>>> described below), but that seem suboptimal to me.
>>>
>>> The qname minimised query (whether or not a dedicated qtype is used)
>>> will be for "A" or "NS" records, not TXT or the dedicated qtype, so
>>> there's no need for "_er" in the first label, the qtype is sufficient.
>>
>> RFC9156 contains no hard requirement to use A/NS. So I’m not confident
>> that all current and future qname-minimisation implementations use
>> A/NS.
> 
> This is where this document can specify that qname minimised error
> reports MUST use a qtype other than the qtype for the final error
> report.
> 
>>> However, to avoid forwarding junk reports to the monitoring agent, a
>>> resolver may well sensibly choose to not forward such queries, and
>>> only source them internally.
>>
>> I’m not following.
> 
> If the qtype is "TXT", then an open resolver is easily subject to
> proxying forged error reports purporting errors that the resolver did
> not observe.  Some client of the open resolver sends an explicit query
> for:
> 
>      <error-reporting-qname>. IN TXT ?
> 
> which then looks like an error report from *that* resolver to the
> monitoring agent.  If instead we have a dedicated qtype for error
> reports, it becomes a simple matter of refusing to iterate queries for
> 
>      <whatever>. IN <ERTYPE> ?
> 
> Any resolver wanting to report an error must do so directly, not via a
> forwarder.  Especially because the forwarders won't be passing the
> agent extension through to their clients!
> 
>>> The specification might also recommend that "stub" resolvers that
>>> forward most queries to a "full service" resolver, should send error
>>> reports *directly* to the monitoring agent.  And, of course, "full
>>> service" resolvers MUST NOT *forward* the monitoring agent OPTION to
>>> clients, if they send such an option, it should be locally generated
>>> to signal the monitoring agent for the resolver itself.
>>
>> I’m not following.
> 
> In a forwarder chain:
> 
>      stub resolver  <->  full-service resolver  <->  auth server
> 
> When the stub resolver wants to report an error, it must contact the
> monitoring agent directly, rather than pass it to the full-service
> resolver.  Any agent suffix it receives from the full-service resolver
> will the monitoring agent for **that** resolver, not the auth server,
> and the reports need to go to the authoritative server for specified
> endpoint directly!
> 
> [ Admittedly, in practice stub resolvers are not likely to make
>    error reports, and forwarders are unlikely to solicit them. ]
> 
> 
>>>> Allocating a new QTYPE for this purpose just seems redundant.
>>>
>>> It is not.  This is not a normal query, it is an error report.
>>
>> However, it is a normal query though. All the intermediates
>> (forwarders, caches, authoritivate servers) have no idea that this
>> query is any different than others. There is nothing special in this
>> query. I really want to avoid OPCODE subtyping by qtype.
> 
> But that's a problem, because forwarding of error reports masks the
> origin IP, with problem reports then misattributed to the edge resolver,
> that may have had no problems resolving the reported name, and may be
> misused by its clients to forge such reports.
> 
>>> I would strongly prefer a dedicated qtype (with support from Puneet
>>> Sood).  However, if the WG consensus is TXT, we'll grudginly cope.
>>> Would it make sense to raise this narrow question by the chairs as a
>>> consensus call?
>>
>> To me, a dedicated qtype vs TXT seems like bike-shedding.
> 
> I disagree.  We're not disagreeing on cosmetic details of the name of a
> new qtype, rather we're disagreeing on whether to overload TXT, which
> a substantive difference.
> 
>>> I did not see a response to the point about moving the info code to the
>>> least-significant label in the query (first or right after the leading
>>> "_er" if despite my exhortations that's retained).
>>
>> The purpose of keeping the info code right before the separating _er
>> label is that it helps to separate incoming reports by “severeness”,
>> as in “lame delegation” reports go here,  “expired RRSIG” reports go
>> there. This can all be delegated nicely by the monitoring agent.
> 
> Though lexically last, THIS is the point I want to most strongly
> emphasise.  Putting the info code in the MSL (most signficant label) of
> the error qname prefixed to the agent suffix breaks NXDOMAIN caching,
> because we now have 65536 parent info codes for each domain that the
> agent does not serve:
> 
>      *.ru.0._er.agent.example. ; signal == _er.agent.example.
>      *.ru.1._er.agent.example. ; signal == _er.agent.example.
>      *.ru.2._er.agent.example. ; signal == _er.agent.example.
>      ....
>      *.ru.65535._er.agent.example. ; signal == _er.agent.example.
> 
> Whereas, instead and with no loss of ability to group errors by severity
> (indeed the LSL is parsed first!) the agent could return NXDOMAIN for:
> 
>      *.ru._er.agent.example. ; signal == _er.agent.example.
> 
> and be rid of all "*.ru" reports.
> 
>>>> Viktor, your optimisations (removing the _er labels) are premature as
>>>> it turns a deterministic process at the monitoring agent into a
>>>> heuristic process.
>>>
>>> I don't see how it becomes heuristic.  The dedicated qtype signals an
>>> complete error reporting query, other qtypes are minimised variants.
> 
> There's no heuristic.  The agent knows what suffix(es) it serves, and
> strips that suffix to recover the error report.
> 
>> Again, there is no guarantee that a minimised variant does not use the
>> dedicated qtype. It is simply easier to recognise a minimised variant
>> by checking if the QNAME starts with _er. This is far more reliable
>> than assuming a dedicated QTYPE is not minimised.
> 
> Though I think the leading "_er" is redundant, it is mostly harmless,
> I'd prefer to see it go, but will grudgingly accept it staying.
> 
> The main thing is to move the info code to the LSL (least signicant
> label), modulo any final (redundant) "_er" prefix (the complete query
> should be distinguished by its qtype).
> 
> Also, resolvers SHOULD NOT do query minimisation below the signalled
> error reporting suffix in the first place.  Save everyone needless
> latency and potential ENT issues.  Let's specify that too.
>