Re: [DNSOP] Working Group Last call for draft-ietf-dnsop-dns-error-reporting

Benno Overeinder <benno@NLnetLabs.nl> Mon, 17 July 2023 13:21 UTC

Message-ID: <2587d696-bd1a-6c25-6837-0f6269bdb813@NLnetLabs.nl>
Date: Mon, 17 Jul 2023 15:21:01 +0200
MIME-Version: 1.0
Content-Language: en-GB
To: DNSOP Working Group <dnsop@ietf.org>
References: <ZJn_cwWWOKIn1wbq@straasha.imrryr.org> <76E9FBC8-9F6D-4050-9C6F-E92A2CBEB326@dnss.ec> <ZKw40DEHBUfBEoUI@straasha.imrryr.org> <1583409F-8F04-4172-B9A1-94D9900402AB@dnss.ec> <ZKyHyo4Mb8I34rZI@straasha.imrryr.org>
Cc: DNSOP Chairs <dnsop-chairs@ietf.org>
From: Benno Overeinder <benno@NLnetLabs.nl>
In-Reply-To: <ZKyHyo4Mb8I34rZI@straasha.imrryr.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/xGX9lFIQQr0hjrubUMimrSfhJyg>
Subject: Re: [DNSOP] Working Group Last call for draft-ietf-dnsop-dns-error-reporting
Precedence: list

Dear WG,

This ends the WGLC for draft-ietf-dnsop-dns-error-reporting.

The last call has been extended a bit longer than initially planned, but 
valuable feedback has been received from the WG on the the draft.  Thank 
you very much.

The authors published a -05 revision a week ago that incorporates the 
feedback.  Some issues may need to be addressed in a subsequent revision 
before the document is sent to the IESG.  We are coordinating this 
further with the authors.

Best regards,

-- Benno

On 11/07/2023 00:35, Viktor Dukhovni wrote:
> On Mon, Jul 10, 2023 at 10:27:45PM +0100, Roy Arends wrote:
> 
>>> Right, but surely the monitoring agent can decide whether to solicit
>>> such a prefix label or not.  That is whether an "_er" prefix label is
>>> signalled is a *local matter* betweent the authoritative server
>>> signalling the option and the monitoring agent.
>>
>> I agree that a monitoring agent can specify a domain that may include
>> a separator as the least significant label. However, it also requires
>> the monitoring agent to understand that it should sometimes include
>> this separator, and that it may be redundant at other times.
> 
> If all the monitoring agent's "customers" (authoritative servers that
> return its "suffix" in the new option) are informed to signal an
> "_er.agent.example" name, there's no "sometimes".  The agent, by mutual
> agreement with the nameservers it supports can choose whatever suffix
> format meets its needs, fixed across all customers, or customer-specific.
> 
> I haven't yet seen a reason to insist on a fixed suffix pattern.  The
> resolver just stutters back the suffix it was handed by the
> authoritative server's extension payload.  What problem does mandating
> the least significant label of the suffix solve, that can't be solved by
> just signalling the desired suffix, special label and all?
> 
>> It assumes that those running the authoritative server that returns
>> the agent domain and those that run the reporting agent are in sync.
>> Those are a lot of assumptions.
> 
> If they're not in sync, surely reporting will be broken, whether or not
> an "_er" suffix label is used.
> 
>>>   Why should resolvers have to be responsible for this?
>>
>> Because this separating label is trivial to include and avoids a lot of hassle.
> 
> The hassle in question remains unclear.  I see two relevant/likely
> deployment models:
> 
>      * Self-hosted reporting, directly by the authoritative server:
> 
>          - Error reports are special by virtue of a dedicated qname
>            suffix and perhaps qtype.
> 
>          - No special coördination required, the server both publishes
>            and consumes the error reporting suffix.
> 
>       * Outsourced/centralised reporting, via server IPs dedicated to
>         error report processing.
> 
>           - Here again no need for "_er", because all queries are
>             presumptively error reports, and if the signal from the
>             "customer" auth server was wrong (whether or not an "_er"
>             label is included) the error report will not be handled
>             correctly.
> 
>            - If the signal has the correct (mutually agreed) suffix,
>              again no problem.
> 
>            - And of course the monitoring agent can specify the use
>              of "_er" (or whatever) if that's convenient.
> 
> What use-case actually benefits from the "_er" LSL (least-significant
> label) in the signal?  How is this benefit not obtained by mutual
> agreement between the monitoring agent and its customers?
> 
>>>> The sole purpose of the leading “least-significant” “_er” is to
>>>> distinguish between qname-minimized queries (for lack of a better
>>>> term) and “full” queries. I understand that you argue that a
>>>> monitoring agent can determine this without the _er labels (as
>>>> described below), but that seem suboptimal to me.
>>>
>>> The qname minimised query (whether or not a dedicated qtype is used)
>>> will be for "A" or "NS" records, not TXT or the dedicated qtype, so
>>> there's no need for "_er" in the first label, the qtype is sufficient.
>>
>> RFC9156 contains no hard requirement to use A/NS. So I’m not confident
>> that all current and future qname-minimisation implementations use
>> A/NS.
> 
> This is where this document can specify that qname minimised error
> reports MUST use a qtype other than the qtype for the final error
> report.
> 
>>> However, to avoid forwarding junk reports to the monitoring agent, a
>>> resolver may well sensibly choose to not forward such queries, and
>>> only source them internally.
>>
>> I’m not following.
> 
> If the qtype is "TXT", then an open resolver is easily subject to
> proxying forged error reports purporting errors that the resolver did
> not observe.  Some client of the open resolver sends an explicit query
> for:
> 
>      <error-reporting-qname>. IN TXT ?
> 
> which then looks like an error report from *that* resolver to the
> monitoring agent.  If instead we have a dedicated qtype for error
> reports, it becomes a simple matter of refusing to iterate queries for
> 
>      <whatever>. IN <ERTYPE> ?
> 
> Any resolver wanting to report an error must do so directly, not via a
> forwarder.  Especially because the forwarders won't be passing the
> agent extension through to their clients!
> 
>>> The specification might also recommend that "stub" resolvers that
>>> forward most queries to a "full service" resolver, should send error
>>> reports *directly* to the monitoring agent.  And, of course, "full
>>> service" resolvers MUST NOT *forward* the monitoring agent OPTION to
>>> clients, if they send such an option, it should be locally generated
>>> to signal the monitoring agent for the resolver itself.
>>
>> I’m not following.
> 
> In a forwarder chain:
> 
>      stub resolver  <->  full-service resolver  <->  auth server
> 
> When the stub resolver wants to report an error, it must contact the
> monitoring agent directly, rather than pass it to the full-service
> resolver.  Any agent suffix it receives from the full-service resolver
> will the monitoring agent for **that** resolver, not the auth server,
> and the reports need to go to the authoritative server for specified
> endpoint directly!
> 
> [ Admittedly, in practice stub resolvers are not likely to make
>    error reports, and forwarders are unlikely to solicit them. ]
> 
> 
>>>> Allocating a new QTYPE for this purpose just seems redundant.
>>>
>>> It is not.  This is not a normal query, it is an error report.
>>
>> However, it is a normal query though. All the intermediates
>> (forwarders, caches, authoritivate servers) have no idea that this
>> query is any different than others. There is nothing special in this
>> query. I really want to avoid OPCODE subtyping by qtype.
> 
> But that's a problem, because forwarding of error reports masks the
> origin IP, with problem reports then misattributed to the edge resolver,
> that may have had no problems resolving the reported name, and may be
> misused by its clients to forge such reports.
> 
>>> I would strongly prefer a dedicated qtype (with support from Puneet
>>> Sood).  However, if the WG consensus is TXT, we'll grudginly cope.
>>> Would it make sense to raise this narrow question by the chairs as a
>>> consensus call?
>>
>> To me, a dedicated qtype vs TXT seems like bike-shedding.
> 
> I disagree.  We're not disagreeing on cosmetic details of the name of a
> new qtype, rather we're disagreeing on whether to overload TXT, which
> a substantive difference.
> 
>>> I did not see a response to the point about moving the info code to the
>>> least-significant label in the query (first or right after the leading
>>> "_er" if despite my exhortations that's retained).
>>
>> The purpose of keeping the info code right before the separating _er
>> label is that it helps to separate incoming reports by “severeness”,
>> as in “lame delegation” reports go here,  “expired RRSIG” reports go
>> there. This can all be delegated nicely by the monitoring agent.
> 
> Though lexically last, THIS is the point I want to most strongly
> emphasise.  Putting the info code in the MSL (most signficant label) of
> the error qname prefixed to the agent suffix breaks NXDOMAIN caching,
> because we now have 65536 parent info codes for each domain that the
> agent does not serve:
> 
>      *.ru.0._er.agent.example. ; signal == _er.agent.example.
>      *.ru.1._er.agent.example. ; signal == _er.agent.example.
>      *.ru.2._er.agent.example. ; signal == _er.agent.example.
>      ....
>      *.ru.65535._er.agent.example. ; signal == _er.agent.example.
> 
> Whereas, instead and with no loss of ability to group errors by severity
> (indeed the LSL is parsed first!) the agent could return NXDOMAIN for:
> 
>      *.ru._er.agent.example. ; signal == _er.agent.example.
> 
> and be rid of all "*.ru" reports.
> 
>>>> Viktor, your optimisations (removing the _er labels) are premature as
>>>> it turns a deterministic process at the monitoring agent into a
>>>> heuristic process.
>>>
>>> I don't see how it becomes heuristic.  The dedicated qtype signals an
>>> complete error reporting query, other qtypes are minimised variants.
> 
> There's no heuristic.  The agent knows what suffix(es) it serves, and
> strips that suffix to recover the error report.
> 
>> Again, there is no guarantee that a minimised variant does not use the
>> dedicated qtype. It is simply easier to recognise a minimised variant
>> by checking if the QNAME starts with _er. This is far more reliable
>> than assuming a dedicated QTYPE is not minimised.
> 
> Though I think the leading "_er" is redundant, it is mostly harmless,
> I'd prefer to see it go, but will grudgingly accept it staying.
> 
> The main thing is to move the info code to the LSL (least signicant
> label), modulo any final (redundant) "_er" prefix (the complete query
> should be distinguished by its qtype).
> 
> Also, resolvers SHOULD NOT do query minimisation below the signalled
> error reporting suffix in the first place.  Save everyone needless
> latency and potential ENT issues.  Let's specify that too.
>

[DNSOP] Working Group Last call for draft-ietf-dn… Benno Overeinder
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] DNSOPWorking Group Last call for draf… Wes Hardaker
Re: [DNSOP] Working Group Last call for draft-iet… Benno Overeinder
Re: [DNSOP] DNSOPWorking Group Last call for draf… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Dick Franks
Re: [DNSOP] Working Group Last call for draft-iet… Willem Toorop
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Dick Franks
Re: [DNSOP] Working Group Last call for draft-iet… Dick Franks
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Dick Franks
Re: [DNSOP] Working Group Last call for draft-iet… Dick Franks
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Paul Wouters
Re: [DNSOP] DNSOPWorking Group Last call for draf… Wes Hardaker
Re: [DNSOP] Working Group Last call for draft-iet… Ben Schwartz
Re: [DNSOP] Working Group Last call for draft-iet… Viktor Dukhovni
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Viktor Dukhovni
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Ben Schwartz
Re: [DNSOP] Working Group Last call for draft-iet… Roy Arends
Re: [DNSOP] Working Group Last call for draft-iet… Viktor Dukhovni
Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-erro… Viktor Dukhovni
Re: [DNSOP] Working Group Last call for draft-iet… Benno Overeinder
Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-erro… Roy Arends