Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended-error-07.txt

Loganaden Velvindron <loganaden@gmail.com> Sat, 10 August 2019 05:38 UTC

MIME-Version: 1.0
References: <156541402569.2433.16692366614072050737@ietfa.amsl.com>
In-Reply-To: <156541402569.2433.16692366614072050737@ietfa.amsl.com>
From: Loganaden Velvindron <loganaden@gmail.com>
Date: Sat, 10 Aug 2019 09:37:47 +0400
Message-ID: <CAOp4FwTbM+aanhjkbf+FbKTibGGOQzyRCvOsmiqVaDUbDQz3Ew@mail.gmail.com>
To: dnsop <dnsop@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/E2K3VjOdbdiCzgmWEXO_nJrVvZs>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended-error-07.txt
Precedence: list

On Sat, Aug 10, 2019 at 9:14 AM <internet-drafts@ietf.org> wrote:
>
>
> A New Internet-Draft is available from the on-line Internet-Drafts directories.
> This draft is a work item of the Domain Name System Operations WG of the IETF.
>
>         Title           : Extended DNS Errors
>         Authors         : Warren Kumari
>                           Evan Hunt
>                           Roy Arends
>                           Wes Hardaker
>                           David C Lawrence
>         Filename        : draft-ietf-dnsop-extended-error-07.txt
>         Pages           : 13
>         Date            : 2019-08-09
>
> Abstract:
>    This document defines an extensible method to return additional
>    information about the cause of DNS errors.  Though created primarily
>    to extend SERVFAIL to provide additional information about the cause
>    of DNS and DNSSEC failures, the Extended DNS Errors option defined in
>    this document allows all response types to contain extended error
>    information.
>
>
I went to talk to quad9. Here is the reply they sent.

Fwd:

1) I see at least one more model that needs to be supported, which is
how to handle edns extended codes that are generated by a remote
server, i.e. passthrough. Layering multiple forwarding resolvers
behind each other is common, and some way to notify the end user that
the originating message was not generated by the first resolver would
be important.  I don't know if there needs to be some way to indicate
how "deep" the error was away from the end user; it seems just two
levels (locally generated or non-locally generated) would be
sufficient with only minor thought on it.

Re: 1) This is a good point, but implementation will likely run afoul
of existing standards or else require duplicative response codes or
use of an additional flag in the INFO-CODES section.
Perhaps a new flag type, similar to AA, which can be used to say that
this recursor will return this result reliably/deterministically.
Attempting to provide depth is perhaps unlikely, but flags for
stub/forwarder/recursive/intermediate recursive or a subset of those
might make sense.
Perhaps a non-descript flag such as 'DR' for Deterministic Response.
Obviously INFO-CODES can support many different flags, of which IR
(Intermediate Resolver) or such could be included
at the point of response generation, with the last server providing
actual data in the chain being the one to authoritatively set the
flag, which then must not be modified by further
downstream resolvers in the process of returning the response.

2) SERVFAIL needs another error code to indicate the difference
between a network error (unexpected network response like ICMP, or TCP
error such as connection refused) versus timeout of the remote auth
server, as that is often a confusing issue.

Re: 2)  Specifics as an item in the below list.

3) Really, I'd like to see a definition of some of the EXTRA TEXT
strings here, since that will be almost immediately an issue that
would need to be sorted out before this could be useful. There have
been some discussions (sorry, don't know if it's a draft or just
talking) about browsers consuming "extra" data in DNS responses that
can do a number of things.  As an example that is important to Quad9
(or any blocking-based DNS service) it might be the case that upon
receiving a request for a "blocked" qname/qtype, we would hand back a
forged answer that leads to a splash page as the default result.
However, if the request was made from a resolver stack that had the
EDNS extensions, we might include the "real" result in the EXTRA TEXT
field, as well as a URL that points the user to an explanation of why
that particular qname/qtype was blocked.  Or we might add a risk
factor, or type of risk ("risk=100, risktype=phishing")  or the like.
This allows a single query to be digestable by "dumb" stacks that we
want to have do the most safe thing, but also allow "smart" resolver
stacks to present a set of options to the end user.

Re: 3) Seems reasonable.

4) I'm confused as to why a "blocked" or "censored" result would have
a retry as mandatory.   The resolver gave a canonical answer from the
point of policy.

Re: 4) See below notes.

Potential inclusions/Adjustments:

4.1.3.1: A use case exists where a stale answer should attempt a
retry. A declarative setting for the Retry bit should not be specified
here, but instead guidance on whether or not the R bit should be set
should be included. For example, when using a front-end load balancer,
if the recursive backends are temporarily inaccessible but are
expected to recover in time to handle a subsequent query, it would be
prudent to include the R bit. No additional load would be generated
towards the Authoritatives in this case, and the Intermediate Recursor
may choose to set the R bit or not based on whether the failure mode
appears to be temporary.

4.1.5: Another area where guidance should be provided. Some recursive
resolvers process requests out of order, asynchronously, or will retry
alternative authoritatives post-processing as part of infrastructure
table management and thus may response to a subsequent query, where
the initial will fail, likely due to timeouts. In our specific case,
due to our use of multiple recursive backend technologies, a
subsequent query failing DNSSEC validation has a significant chance of
being answered by an alternative recursor. See also 4.2.1.

4.1.6: Synthesized Answer: This response could be considered a
sub-case of forged. An example of this would be the id.server or
version.bind queries, they cannot be considered forged, but also no
authority truly holds them.

4.2.11: SERVFAIL - Network: The SERVFAIL response is being generated
due to what is clearly identifiable to the answering server as a
network issue. R bit should be set.

4.4.3: Abusive: The answering system considers the query in question
to be abusive for reasons other than load, indicating that the
specific requests are undesired. This could provide hints to Network
Operators or simply poorly configured client implementations that the
specific queries may be part of an amplification or other attack and
should be inspected.

4.4.4: Excessive: The answering system considers the query volume of
the client to be excessive, indicating that it is the volume and not
the content of the queries being refused and that it may be willing to
answer if volume is reduced. This could provide hints to Network
Operators or poorly configured client systems that they need to add
additional endpoints or reduce their request volume to restore
service.

4.4.5: Go Away: The answering system considers further queries from
the client/network to have to exceeded thresholds by large margins or
excessive durations, and further queries are likely to be dropped.
This message is an attempt to limit the continued use of resources
terminating queries which will not be answered. This may simply be a
sub-case of Abusive/Excessive, but also is not intended to be sent for
each query, but instead only intermittently, and to bypass the need
for lengthy troubleshooting efforts when drop rules cause a recursor
to seem to have vanished.

4.5.1: The R flag being set here implies that there are potentially
multiple policies in use and that a retry might receive an answer -
which should not be the case with a single intermediate recursive
service. A client, knowing that it has multiple recursive services
with differring policies might retry against a different recursive
service (ex: 8.8.8.8 instead of 9.9.9.9), but this effectively defeats
the policies of the initial recursor, rendering it ineffective. The
use of a specific server as a delineation is also confusing - it
should instead specify that the answering entity - be it a single
server or larger entity, has blocked this response. Also, blocked
should be further defined to avoid collision with the definition of
the Censored response code. Blocked in this case would be used as a
catch-all for anything not otherwise categorized.

4.5.2: See 4.5.1. Censoring is inherently a governmental action and
this should be reserved for that due to the severity and legal
repercussions of attempts to bypass. R bits should not be set.
Censored should be defined in the document to avoid confusion.

4.5.3: Filtered: Differentiated from Blocked/Censored in that this
content has been specifically redacted at the perceived behest of the
client - may include ad-blockers, dnsbl, or other specific cases -
intended to be used by those systems. Would potentially include
corporate IT policies.

4.5.4: Malicious: Differentiated from Blocked and Filtered in that the
answering server believes the response to be actively malicious and
harmful to the requesting systems or applications, and not merely
undesired or offensive. R bits should not be set.

4.5.5: Malicious Upstream - The upstream entity is considered
malicious by the answering server and thus a refusal to respond has
been returned. Details should be included within the INFO-CODE and
potentially EXTRA-TEXT. This is differentiated from Malicious in that
in this case, it is the actual upstream server that is having all
responses blocked, not the content itself - for instance a revoked or
unexpected certificate (such as due to a CAA record) - from which no
responses will be accepted. The R bit being set here depends on
whether the server believes that the specific path is compromised - if
all authoritatives are failed, then a retry will not help. If only one
is, then it will help to get to the non-compromised server. In the
absence of data, the R bit should be set.

Other Notes:
INFO-CODE: It would seem that would be best to include a basic
recommendation for a standard DNS-specific RWhois/CRL-like endpoint
which could provide local (non-IANA) information about returned codes,
potentially at a well-known URI, or even within the DNS itself via TXT
records or even within the EXTRA-TEXT field itself.

It may make sense to create an extension of the R bit, via additional
flag or other field which adds additional context to the retry
declaration, such as that the request should retry the same recursor,
or should instead immediately move to and try the next available.

[DNSOP] I-D Action: draft-ietf-dnsop-extended-err… internet-drafts
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Loganaden Velvindron
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Shane Kerr
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Wes Hardaker
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Loganaden Velvindron
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Wes Hardaker