Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended-error-07.txt

Wes Hardaker <wjhns1@hardakers.net> Wed, 11 September 2019 03:42 UTC

From: Wes Hardaker <wjhns1@hardakers.net>
To: Loganaden Velvindron <loganaden@gmail.com>
Cc: dnsop <dnsop@ietf.org>
References: <156541402569.2433.16692366614072050737@ietfa.amsl.com> <CAOp4FwTbM+aanhjkbf+FbKTibGGOQzyRCvOsmiqVaDUbDQz3Ew@mail.gmail.com>
Date: Tue, 10 Sep 2019 20:42:32 -0700
In-Reply-To: <CAOp4FwTbM+aanhjkbf+FbKTibGGOQzyRCvOsmiqVaDUbDQz3Ew@mail.gmail.com> (Loganaden Velvindron's message of "Sat, 10 Aug 2019 09:37:47 +0400")
Message-ID: <ybllfuvebx3.fsf@w7.hardakers.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/mddrRPX_EPbiNQqx3q7l3EVJf-0>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended-error-07.txt
Precedence: list

Loganaden Velvindron <loganaden@gmail.com> writes:

Hi Loganaden,

Thanks for the comments about the EDE draft.  I've marked up your
comments with responses and actions below.  Let us know if you have any
questions.

11 Loganaden Velvindron
==================================================

11.1 NOCHANGE pass-through
~~~~~~~~~~~~~~~~~~~~~~~~~~

  1) I see at least one more model that needs to be supported, which is
  how to handle edns extended codes that are generated by a remote
  server, i.e. passthrough. Layering multiple forwarding resolvers
  behind each other is common, and some way to notify the end user that
  the originating message was not generated by the first resolver would
  be important.  I don't know if there needs to be some way to indicate
  how "deep" the error was away from the end user; it seems just two
  levels (locally generated or non-locally generated) would be
  sufficient with only minor thought on it.

  Re: 1) This is a good point, but implementation will likely run afoul
  of existing standards or else require duplicative response codes or
  use of an additional flag in the INFO-CODES section.  Perhaps a new
  flag type, similar to AA, which can be used to say that this recursor
  will return this result reliably/deterministically.  Attempting to
  provide depth is perhaps unlikely, but flags for
  stub/forwarder/recursive/intermediate recursive or a subset of those
  might make sense.  Perhaps a non-descript flag such as 'DR' for
  Deterministic Response.  Obviously INFO-CODES can support many
  different flags, of which IR (Intermediate Resolver) or such could be
  included at the point of response generation, with the last server
  providing actual data in the chain being the one to authoritatively
  set the flag, which then must not be modified by further downstream
  resolvers in the process of returning the response.

  + Response: this has been discussed a few times, and the current view
    (that at least I hold, and likely others based on past discussions)
    is that it would be best to get this out as is, without a
    pass-through model while we deploy it and get operational experience
    with its use.  Pass-through is complex for a bunch of reasons (NAT
    alone, eg), and it's unclear we can come up with a solution for all
    the likely corner cases to appear.

    TL;DR: we should definitely work on it, but in the future.


11.2 DONE network error code needed beyond timeout
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  1) SERVFAIL needs another error code to indicate the difference
  between a network error (unexpected network response like ICMP, or TCP
  error such as connection refused) versus timeout of the remote auth
  server, as that is often a confusing issue.

  + Response: looks like a reasonable idea, so it has been added to the
    latest draft.  thank you!

  Re: 2) Specifics as an item in the below list.


11.3 NOCHANGE 
~~~~~~~~~~~~~~

  1) Really, I'd like to see a definition of some of the EXTRA TEXT
  strings here, since that will be almost immediately an issue that
  would need to be sorted out before this could be useful. There have
  been some discussions (sorry, don't know if it's a draft or just
  talking) about browsers consuming "extra" data in DNS responses that
  can do a number of things.  As an example that is important to Quad9
  (or any blocking-based DNS service) it might be the case that upon
  receiving a request for a "blocked" qname/qtype, we would hand back a
  forged answer that leads to a splash page as the default result.
  However, if the request was made from a resolver stack that had the
  EDNS extensions, we might include the "real" result in the EXTRA TEXT
  field, as well as a URL that points the user to an explanation of why
  that particular qname/qtype was blocked.  Or we might add a risk
  factor, or type of risk ("risk=100, risktype=phishing") or the like.
  This allows a single query to be digestable by "dumb" stacks that we
  want to have do the most safe thing, but also allow "smart" resolver
  stacks to present a set of options to the end user.

  + Again, I suspect that the complexity associated with standardizing
    on exactly a structure (including internationalization) of
    extra-information in a machine understandable and parsable mechanism
    is fraught with a very long discussion period.  It might be worthy
    of future work, and I certainly think it would be valuable, but
    (IMHO) it would be better to get this out and work on that as a
    follow-on project *if* we could achieve consensus on it (which, I'll
    be honesty, will be either difficult or take a long time or both).

  Re: 3) Seems reasonable.


11.4 NOCHANGE blacked/censored/retry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  1) I'm confused as to why a "blocked" or "censored" result would have
  a retry as mandatory.  The resolver gave a canonical answer from the
  point of policy.

  + the retry flag is now gone.

  Re: 4) See below notes.

  Potential inclusions/Adjustments:


11.5 NOCHANGE More retry case thoughts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  4.1.3.1: A use case exists where a stale answer should attempt a
  retry. A declarative setting for the Retry bit should not be specified
  here, but instead guidance on whether or not the R bit should be set
  should be included. For example, when using a front-end load balancer,
  if the recursive backends are temporarily inaccessible but are
  expected to recover in time to handle a subsequent query, it would be
  prudent to include the R bit. No additional load would be generated
  towards the Authoritatives in this case, and the Intermediate Recursor
  may choose to set the R bit or not based on whether the failure mode
  appears to be temporary.

  4.1.5: Another area where guidance should be provided. Some recursive
  resolvers process requests out of order, asynchronously, or will retry
  alternative authoritatives post-processing as part of infrastructure
  table management and thus may response to a subsequent query, where
  the initial will fail, likely due to timeouts. In our specific case,
  due to our use of multiple recursive backend technologies, a
  subsequent query failing DNSSEC validation has a significant chance of
  being answered by an alternative recursor. See also 4.2.1.

  4.2.11: SERVFAIL - Network: The SERVFAIL response is being generated
  due to what is clearly identifiable to the answering server as a
  network issue. R bit should be set.

  4.4.3: Abusive: The answering system considers the query in question
  to be abusive for reasons other than load, indicating that the
  specific requests are undesired. This could provide hints to Network
  Operators or simply poorly configured client implementations that the
  specific queries may be part of an amplification or other attack and
  should be inspected.

  4.4.4: Excessive: The answering system considers the query volume of
  the client to be excessive, indicating that it is the volume and not
  the content of the queries being refused and that it may be willing to
  answer if volume is reduced. This could provide hints to Network
  Operators or poorly configured client systems that they need to add
  additional endpoints or reduce their request volume to restore
  service.

  4.4.5: Go Away: The answering system considers further queries from
  the client/network to have to exceeded thresholds by large margins or
  excessive durations, and further queries are likely to be dropped.
  This message is an attempt to limit the continued use of resources
  terminating queries which will not be answered. This may simply be a
  sub-case of Abusive/Excessive, but also is not intended to be sent for
  each query, but instead only intermittently, and to bypass the need
  for lengthy troubleshooting efforts when drop rules cause a recursor
  to seem to have vanished.

  4.5.1: The R flag being set here implies that there are potentially
  multiple policies in use and that a retry might receive an answer -
  which should not be the case with a single intermediate recursive
  service. A client, knowing that it has multiple recursive services
  with differring policies might retry against a different recursive
  service (ex: 8.8.8.8 instead of 9.9.9.9), but this effectively defeats
  the policies of the initial recursor, rendering it ineffective. The
  use of a specific server as a delineation is also confusing - it
  should instead specify that the answering entity - be it a single
  server or larger entity, has blocked this response. Also, blocked
  should be further defined to avoid collision with the definition of
  the Censored response code. Blocked in this case would be used as a
  catch-all for anything not otherwise categorized.

  4.5.2: See 4.5.1. Censoring is inherently a governmental action and
  this should be reserved for that due to the severity and legal
  repercussions of attempts to bypass. R bits should not be set.
  Censored should be defined in the document to avoid confusion.

  4.5.3: Filtered: Differentiated from Blocked/Censored in that this
  content has been specifically redacted at the perceived behest of the
  client - may include ad-blockers, dnsbl, or other specific cases -
  intended to be used by those systems. Would potentially include
  corporate IT policies.

  4.5.4: Malicious: Differentiated from Blocked and Filtered in that the
  answering server believes the response to be actively malicious and
  harmful to the requesting systems or applications, and not merely
  undesired or offensive. R bits should not be set.

  4.5.5: Malicious Upstream - The upstream entity is considered
  malicious by the answering server and thus a refusal to respond has
  been returned. Details should be included within the INFO-CODE and
  potentially EXTRA-TEXT. This is differentiated from Malicious in that
  in this case, it is the actual upstream server that is having all
  responses blocked, not the content itself - for instance a revoked or
  unexpected certificate (such as due to a CAA record) - from which no
  responses will be accepted. The R bit being set here depends on
  whether the server believes that the specific path is compromised - if
  all authoritatives are failed, then a retry will not help. If only one
  is, then it will help to get to the non-compromised server. In the
  absence of data, the R bit should be set.

  It may make sense to create an extension of the R bit, via additional
  flag or other field which adds additional context to the retry
  declaration, such as that the request should retry the same recursor,
  or should instead immediately move to and try the next available.


11.6 TODO synthesized == forged
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  4.1.6: Synthesized Answer: This response could be considered a
  sub-case of forged. An example of this would be the id.server or
  version.bind queries, they cannot be considered forged, but also no
  authority truly holds them.

  + Response: I think this is worthy of further thought and I'd love to
    hear opinions from others.  IMHO, I'm not sure we should get into
    micro-error coding.  I would say forged, in your examples, still
    fits.  But there are other cases where I think synthesized may make
    sense.  Anyone else have thoughts?


11.7 NOCHANGE finish categorizing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Other Notes: INFO-CODE: It would seem that would be best to include a
  basic recommendation for a standard DNS-specific RWhois/CRL-like
  endpoint which could provide local (non-IANA) information about
  returned codes, potentially at a well-known URI, or even within the
  DNS itself via TXT records or even within the EXTRA-TEXT field itself.

  + Response: per discussions with others too, which you've hopefully
    read, there is a lot of desire for ways to potentially standardize
    supplemental information within the EXTRA-TEXT field.  However, for
    the time being the goal is to get this out and get experience with
    how it is used and potentially standardize on the addition of
    machine readable supplemental information (URLs being the other
    common suggestion).  Publishing this first (as is) doesn't get in
    the way of a future RFCs extending this specification.

-- 
Wes Hardaker
USC/ISI

[DNSOP] I-D Action: draft-ietf-dnsop-extended-err… internet-drafts
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Loganaden Velvindron
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Shane Kerr
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Wes Hardaker
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Loganaden Velvindron
Re: [DNSOP] I-D Action: draft-ietf-dnsop-extended… Wes Hardaker