Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-reporting

Manu Bretelle <chantr4@gmail.com> Fri, 12 November 2021 17:18 UTC

MIME-Version: 1.0
References: <8A09A0DF-D915-45AD-AD57-229641F19120@dnss.ec> <CAArYzrLDZrjR2b9nvaxDn8vScb5TYLd54JrFtoCoLmvqikVVOQ@mail.gmail.com> <ef4dcfc8-0208-af8d-24d5-99fe7e3edb56@isc.org>
In-Reply-To: <ef4dcfc8-0208-af8d-24d5-99fe7e3edb56@isc.org>
From: Manu Bretelle <chantr4@gmail.com>
Date: Fri, 12 Nov 2021 09:18:20 -0800
Message-ID: <CAArYzrJJHpO+Je04PGNvPM_9gf_hetShe-mn36FTrhiOP0nRew@mail.gmail.com>
To: Petr Špaček <pspacek@isc.org>
Cc: Roy Arends <roy@dnss.ec>, dnsop <dnsop@ietf.org>, dnsop-chairs <dnsop-chairs@ietf.org>, Matt Larson <matt.larson@icann.org>
Content-Type: multipart/alternative; boundary="000000000000fc44a705d09aa5f7"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/VGkF9MbowGRmfV7G-2aDv3_7zp4>
Subject: Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-reporting
Precedence: list

Hi,

On Fri, Nov 12, 2021 at 5:24 AM Petr Špaček <pspacek@isc.org> wrote:

> On 12. 11. 21 7:42, Manu Bretelle wrote:
> > Hi Roy,
> >
> > I like the idea of an out-of-band error reporting and therefore I like
> > the proposition of this draft.
> >
> > One of the things I have a hard time visualizing though is how this
> > could be used for more than reporting DNSSEC specific errors. With the
> > option not being signed in the first place, it does not seem that DNSSEC
> > is a requirement to be able to leverage this functionality, hence it
> > would be great to think how we can make this work for more than
> > DNSSEC-only errors.
>
> E.g. it can conceivably report errors like "resolver had to fallback to
> Nth server because the first one we tried times out". Is that a
> sufficient example?
>

I suppose it could. Another one which may already fit in the EDE error code
could be EDE Code 3, "Stale Answer",
https://www.rfc-editor.org/rfc/rfc8914.html#name-extended-dns-error-code-3-s
as an example.

Some others I have a harder time understanding their value could be EDE
Code 20, "Not Authoritative",
https://www.rfc-editor.org/rfc/rfc8914.html#name-extended-dns-error-code-20-
.
On one hand, this is log you already have as an auth operator, but on the
other hand, through the reporting endpoint, and ignoring possible abuses of
said endpoint, you would get a peek at the resolver view, not just any
unsolicited request that was sent to your auth server, making it easier to
track broken delegation.


> > As it is, the requirement for the EDNS0 option to be in the response,
> > while it does offer some properties such as controlling sampling rate…,
> > essentially will prevent any report of answers which are not properly
> > formatted in the first place, or never received like when a resolver is
> > not able to reach any authorities for a given name, when resolver start
> > falling back on staled data, and possibly in the future, failing to
> > reach over an advertised encrypted channel… There is likely value for an
> > authoritative resolver operator to be able to get report for those
> > issues too.
>
> While I agree with the sentiment that reporting other issues would be
> also useful, I think that _for now_ we should keep the scope limited to
> situations which do not require any extra state in resolvers.
>
> That is, reporting "no server is reachable" requires prior information
> stored or reachable somewhere else, which is IMHO order of magnitude
> more complex task. Let's get experience with simple error reporting
> first and only then move forward to more complex tasks...
>

I am more than happy to have an iterative approach to this. My concern was
that this solution would be the end-goal, essentially closing
possibilities for other type of errors such as the ones mentioned.


>
> > The title of the draft: "DNS Error Reporting" would let one believe that
> > it is a somewhat generic mechanism, but I don't think it is as is.
>
> I disagree here. It is a generic mechanism, see the first response
> paragraph in this e-mail.


This sentence was coming in block with the rest of the paragraph below for
illustration.

>
> > Actually, while DNSSEC is not named in the title/abstract, the examples
> > in the abstract are DNSSEC specific, the wording in the rest of the
> > document refers for the most part to "validating resolvers". Should this
> > be a "DNSSEC Error Reporting" draft? or a "DNS Error Reporting" draft,
> > but then the function of "validating" itself should be less emphasized?
> > While a validating resolver can report more type of errors than a
> > non-validating resolvers, validation is not a requirement to be able to
> > report.
>
> Agreed, but I really don't feel the problem as severe. Would it be
> sufficient to add more examples of non-DNSSEC errors?
>

Yes, I think a non-DNSSEC error could help, along with not using
"validating" outside the scope of DNSSEC specific errors. As an example, in
the terminology, the reporting resolver is a validating resolver:

> Reporting Resolver: In the context of this document, the term
   reporting resolver is used as a shorthand for a validating recursive
   resolver that supports DNS Error Reporting



>
> > On Tue, Nov 9, 2021 at 3:07 PM Roy Arends <roy@dnss.ec
> > <mailto:roy@dnss.ec>> wrote:
> >
> >     Dear WG,
> >
> >     Change 3) There as a lot of descriptive text what implementations
> >     should and shouldn’t do, and what configurations should and
> >     shouldn’t do. This was found to be overly descriptive and pedantic,
> >     and has now been removed.
> >
> >
> > I see that the security consideration about not reporting errors from an
> > encrypted channel (over a supposedly unencrypted channel) has been
> > removed. Wouldn’t it make sense to leave it in order to avoid leaking
> > traffic for queries that were not previously visible on the network?
> > Possibly requiring than an encrypted channel (equal or stronger, for
> > whatever definition that may be) is used to send such reports if needed?
> > This would also make sure the mechanism is going to work once the ADo*
> > mechanisms are ironed out.
>
> AFAIK it was removed because the only things we could place there were
> extremely vague and probably not implementable anyway.
>
> Reason: There is _no such thing_ as 1:1 mapping between client queries
> and outgoing answers, which makes it super hard to define anything
> sensible.
>
> A simple example:
>
> 1. Client A asks for
> login.secret.facebook.com
> over plain UDP (and is now waiting for resolver's answer).
>
> 2. Resolver starts recursing and eventually sends query for
> secret.facebook.com NS over UDP (client sent query over plain UDP,
> right?). At this point the query was sent but answer was not received yet
>
> 3. Client B asks for
> supersecretdomainnobodyshouldsee.secret.facebook.com
> over TLS
>
> 4. Resolver deduplicates the query for secret.facebook.com NS, i.e.
> queries (1) and (3) are now waiting for the same packet - delegation
> from facebook.com to secret.facebook.com.
>
> 5. If this deduplicated query for secret.facebook.com NS failed and came
> back with error reporting option, what should the resolver do now? We
> have two clients waiting for it. Is the query considered "secret" or
> not? If the client B (packet in step 3.) arrived couple ms later it
> would not be secret?
>
> In short: This way madness lies.
>
> The only sane way to implement "never leak queries to plaintext" policy
> is to operate TLS-only resolver and do not permit non-TLS
> clients/queries. Then you can disable the error reporting feature
> completely ...
>

Thanks, that makes sense. I did not remember this from either the interim
discussion or the list, so it thought it was removed under the
explanation "This
was found to be overly descriptive and pedantic, and has now been removed."


>
> Having said that, we can have _some_ text in Security considerations
> section, but someone needs to write a sensible description - which I'm
> not capable of.
>

Agreed, there is going to be tons of shades between cases. For a resolver
internal to a network, it may not care that the client queried over
plaintext, but wants to prevent the plaintext query to go out of network,
for an open resolver the considerations are going to be different.
Maybe some warning notes along the line "In a possible scenario where
recursive perform strict encryption to authoritative name servers, Error
reporting over plaintext DNS could leak queries." would at least highlight
the possible problem.

Thanks,
Manu


> Have a great day.
> Petr Špaček
>
>
>
> >
> > Thanks,
> > Manu
> >
> >
> >
> >     There was a request to put the markdown version of the document in
> >     GitHub. This has now been placed here:
> >     https://github.com/RoyArends/draft-ietf-dnsop-dns-error-reporting
> >     <https://github.com/RoyArends/draft-ietf-dnsop-dns-error-reporting>
> >
> >     New version:
> >
> https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-dns-error-reporting-01.txt
> >     <
> https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-dns-error-reporting-01.txt
> >
> >     Diffs:
> >
> https://www.ietf.org/rfcdiff?url2=draft-ietf-dnsop-dns-error-reporting-01
> >     <
> https://www.ietf.org/rfcdiff?url2=draft-ietf-dnsop-dns-error-reporting-01>
> >
> >     Warm regards,
> >
> >     Roy Arends
>
>
>

[DNSOP] Status of draft-ietf-dnsop-dns-error-repo… Roy Arends
Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-… libor.peltan
Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-… Roy Arends
Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-… Manu Bretelle
Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-… Petr Špaček
Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-… Manu Bretelle
Re: [DNSOP] Status of draft-ietf-dnsop-dns-error-… Manu Bretelle