Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended-error-07.txt

Loganaden Velvindron <loganaden@gmail.com> Wed, 11 September 2019 06:09 UTC

MIME-Version: 1.0
References: <156541402569.2433.16692366614072050737@ietfa.amsl.com> <CAOp4FwTbM+aanhjkbf+FbKTibGGOQzyRCvOsmiqVaDUbDQz3Ew@mail.gmail.com> <ybllfuvebx3.fsf@w7.hardakers.net>
In-Reply-To: <ybllfuvebx3.fsf@w7.hardakers.net>
From: Loganaden Velvindron <loganaden@gmail.com>
Date: Wed, 11 Sep 2019 10:08:56 +0400
Message-ID: <CAOp4FwT-b8w=j-xJBtobzdWGKrxq3JO_RJu_1MhTefvDdqh14g@mail.gmail.com>
To: Wes Hardaker <wjhns1@hardakers.net>
Cc: dnsop <dnsop@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/0aPsmrGKR_BfNEJ-o2MYxJvyTRA>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended-error-07.txt
Precedence: list

On Wed, Sep 11, 2019 at 7:42 AM Wes Hardaker <wjhns1@hardakers.net> wrote:
>
> Loganaden Velvindron <loganaden@gmail.com> writes:
>
> Hi Loganaden,
>
> Thanks for the comments about the EDE draft.  I've marked up your
> comments with responses and actions below.  Let us know if you have any
> questions.
Hi Wes,

One small note: This reply was from John Todd from Quad9. I asked him to review
the draft, and he sent me his comments which I then forwarded to the
dnsop wg mailing list.


>
> 11 Loganaden Velvindron
> ==================================================
>
> 11.1 NOCHANGE pass-through
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   1) I see at least one more model that needs to be supported, which is
>   how to handle edns extended codes that are generated by a remote
>   server, i.e. passthrough. Layering multiple forwarding resolvers
>   behind each other is common, and some way to notify the end user that
>   the originating message was not generated by the first resolver would
>   be important.  I don't know if there needs to be some way to indicate
>   how "deep" the error was away from the end user; it seems just two
>   levels (locally generated or non-locally generated) would be
>   sufficient with only minor thought on it.
>
>   Re: 1) This is a good point, but implementation will likely run afoul
>   of existing standards or else require duplicative response codes or
>   use of an additional flag in the INFO-CODES section.  Perhaps a new
>   flag type, similar to AA, which can be used to say that this recursor
>   will return this result reliably/deterministically.  Attempting to
>   provide depth is perhaps unlikely, but flags for
>   stub/forwarder/recursive/intermediate recursive or a subset of those
>   might make sense.  Perhaps a non-descript flag such as 'DR' for
>   Deterministic Response.  Obviously INFO-CODES can support many
>   different flags, of which IR (Intermediate Resolver) or such could be
>   included at the point of response generation, with the last server
>   providing actual data in the chain being the one to authoritatively
>   set the flag, which then must not be modified by further downstream
>   resolvers in the process of returning the response.
>
>   + Response: this has been discussed a few times, and the current view
>     (that at least I hold, and likely others based on past discussions)
>     is that it would be best to get this out as is, without a
>     pass-through model while we deploy it and get operational experience
>     with its use.  Pass-through is complex for a bunch of reasons (NAT
>     alone, eg), and it's unclear we can come up with a solution for all
>     the likely corner cases to appear.
>
>     TL;DR: we should definitely work on it, but in the future.
>
>
> 11.2 DONE network error code needed beyond timeout
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   1) SERVFAIL needs another error code to indicate the difference
>   between a network error (unexpected network response like ICMP, or TCP
>   error such as connection refused) versus timeout of the remote auth
>   server, as that is often a confusing issue.
>
>   + Response: looks like a reasonable idea, so it has been added to the
>     latest draft.  thank you!
>
>   Re: 2) Specifics as an item in the below list.
>
>
> 11.3 NOCHANGE
> ~~~~~~~~~~~~~~
>
>   1) Really, I'd like to see a definition of some of the EXTRA TEXT
>   strings here, since that will be almost immediately an issue that
>   would need to be sorted out before this could be useful. There have
>   been some discussions (sorry, don't know if it's a draft or just
>   talking) about browsers consuming "extra" data in DNS responses that
>   can do a number of things.  As an example that is important to Quad9
>   (or any blocking-based DNS service) it might be the case that upon
>   receiving a request for a "blocked" qname/qtype, we would hand back a
>   forged answer that leads to a splash page as the default result.
>   However, if the request was made from a resolver stack that had the
>   EDNS extensions, we might include the "real" result in the EXTRA TEXT
>   field, as well as a URL that points the user to an explanation of why
>   that particular qname/qtype was blocked.  Or we might add a risk
>   factor, or type of risk ("risk=100, risktype=phishing") or the like.
>   This allows a single query to be digestable by "dumb" stacks that we
>   want to have do the most safe thing, but also allow "smart" resolver
>   stacks to present a set of options to the end user.
>
>   + Again, I suspect that the complexity associated with standardizing
>     on exactly a structure (including internationalization) of
>     extra-information in a machine understandable and parsable mechanism
>     is fraught with a very long discussion period.  It might be worthy
>     of future work, and I certainly think it would be valuable, but
>     (IMHO) it would be better to get this out and work on that as a
>     follow-on project *if* we could achieve consensus on it (which, I'll
>     be honesty, will be either difficult or take a long time or both).
>
>   Re: 3) Seems reasonable.
>
>
> 11.4 NOCHANGE blacked/censored/retry
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   1) I'm confused as to why a "blocked" or "censored" result would have
>   a retry as mandatory.  The resolver gave a canonical answer from the
>   point of policy.
>
>   + the retry flag is now gone.
>
>   Re: 4) See below notes.
>
>   Potential inclusions/Adjustments:
>
>
> 11.5 NOCHANGE More retry case thoughts
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   4.1.3.1: A use case exists where a stale answer should attempt a
>   retry. A declarative setting for the Retry bit should not be specified
>   here, but instead guidance on whether or not the R bit should be set
>   should be included. For example, when using a front-end load balancer,
>   if the recursive backends are temporarily inaccessible but are
>   expected to recover in time to handle a subsequent query, it would be
>   prudent to include the R bit. No additional load would be generated
>   towards the Authoritatives in this case, and the Intermediate Recursor
>   may choose to set the R bit or not based on whether the failure mode
>   appears to be temporary.
>
>   4.1.5: Another area where guidance should be provided. Some recursive
>   resolvers process requests out of order, asynchronously, or will retry
>   alternative authoritatives post-processing as part of infrastructure
>   table management and thus may response to a subsequent query, where
>   the initial will fail, likely due to timeouts. In our specific case,
>   due to our use of multiple recursive backend technologies, a
>   subsequent query failing DNSSEC validation has a significant chance of
>   being answered by an alternative recursor. See also 4.2.1.
>
>   4.2.11: SERVFAIL - Network: The SERVFAIL response is being generated
>   due to what is clearly identifiable to the answering server as a
>   network issue. R bit should be set.
>
>   4.4.3: Abusive: The answering system considers the query in question
>   to be abusive for reasons other than load, indicating that the
>   specific requests are undesired. This could provide hints to Network
>   Operators or simply poorly configured client implementations that the
>   specific queries may be part of an amplification or other attack and
>   should be inspected.
>
>   4.4.4: Excessive: The answering system considers the query volume of
>   the client to be excessive, indicating that it is the volume and not
>   the content of the queries being refused and that it may be willing to
>   answer if volume is reduced. This could provide hints to Network
>   Operators or poorly configured client systems that they need to add
>   additional endpoints or reduce their request volume to restore
>   service.
>
>   4.4.5: Go Away: The answering system considers further queries from
>   the client/network to have to exceeded thresholds by large margins or
>   excessive durations, and further queries are likely to be dropped.
>   This message is an attempt to limit the continued use of resources
>   terminating queries which will not be answered. This may simply be a
>   sub-case of Abusive/Excessive, but also is not intended to be sent for
>   each query, but instead only intermittently, and to bypass the need
>   for lengthy troubleshooting efforts when drop rules cause a recursor
>   to seem to have vanished.
>
>   4.5.1: The R flag being set here implies that there are potentially
>   multiple policies in use and that a retry might receive an answer -
>   which should not be the case with a single intermediate recursive
>   service. A client, knowing that it has multiple recursive services
>   with differring policies might retry against a different recursive
>   service (ex: 8.8.8.8 instead of 9.9.9.9), but this effectively defeats
>   the policies of the initial recursor, rendering it ineffective. The
>   use of a specific server as a delineation is also confusing - it
>   should instead specify that the answering entity - be it a single
>   server or larger entity, has blocked this response. Also, blocked
>   should be further defined to avoid collision with the definition of
>   the Censored response code. Blocked in this case would be used as a
>   catch-all for anything not otherwise categorized.
>
>   4.5.2: See 4.5.1. Censoring is inherently a governmental action and
>   this should be reserved for that due to the severity and legal
>   repercussions of attempts to bypass. R bits should not be set.
>   Censored should be defined in the document to avoid confusion.
>
>   4.5.3: Filtered: Differentiated from Blocked/Censored in that this
>   content has been specifically redacted at the perceived behest of the
>   client - may include ad-blockers, dnsbl, or other specific cases -
>   intended to be used by those systems. Would potentially include
>   corporate IT policies.
>
>   4.5.4: Malicious: Differentiated from Blocked and Filtered in that the
>   answering server believes the response to be actively malicious and
>   harmful to the requesting systems or applications, and not merely
>   undesired or offensive. R bits should not be set.
>
>   4.5.5: Malicious Upstream - The upstream entity is considered
>   malicious by the answering server and thus a refusal to respond has
>   been returned. Details should be included within the INFO-CODE and
>   potentially EXTRA-TEXT. This is differentiated from Malicious in that
>   in this case, it is the actual upstream server that is having all
>   responses blocked, not the content itself - for instance a revoked or
>   unexpected certificate (such as due to a CAA record) - from which no
>   responses will be accepted. The R bit being set here depends on
>   whether the server believes that the specific path is compromised - if
>   all authoritatives are failed, then a retry will not help. If only one
>   is, then it will help to get to the non-compromised server. In the
>   absence of data, the R bit should be set.
>
>   It may make sense to create an extension of the R bit, via additional
>   flag or other field which adds additional context to the retry
>   declaration, such as that the request should retry the same recursor,
>   or should instead immediately move to and try the next available.
>
>
> 11.6 TODO synthesized == forged
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   4.1.6: Synthesized Answer: This response could be considered a
>   sub-case of forged. An example of this would be the id.server or
>   version.bind queries, they cannot be considered forged, but also no
>   authority truly holds them.
>
>   + Response: I think this is worthy of further thought and I'd love to
>     hear opinions from others.  IMHO, I'm not sure we should get into
>     micro-error coding.  I would say forged, in your examples, still
>     fits.  But there are other cases where I think synthesized may make
>     sense.  Anyone else have thoughts?
>
>
> 11.7 NOCHANGE finish categorizing
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   Other Notes: INFO-CODE: It would seem that would be best to include a
>   basic recommendation for a standard DNS-specific RWhois/CRL-like
>   endpoint which could provide local (non-IANA) information about
>   returned codes, potentially at a well-known URI, or even within the
>   DNS itself via TXT records or even within the EXTRA-TEXT field itself.
>
>   + Response: per discussions with others too, which you've hopefully
>     read, there is a lot of desire for ways to potentially standardize
>     supplemental information within the EXTRA-TEXT field.  However, for
>     the time being the goal is to get this out and get experience with
>     how it is used and potentially standardize on the addition of
>     machine readable supplemental information (URLs being the other
>     common suggestion).  Publishing this first (as is) doesn't get in
>     the way of a future RFCs extending this specification.
>
> --
> Wes Hardaker
> USC/ISI

[DNSOP] I-D Action: draft-ietf-dnsop-extended-err… internet-drafts
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Loganaden Velvindron
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Shane Kerr
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Wes Hardaker
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Loganaden Velvindron
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Wes Hardaker