[DNSOP] Thoughts about draft-wkumari-dnsop-extended-error

Edward Lewis <edward.lewis@icann.org> Wed, 26 July 2017 15:34 UTC

From: Edward Lewis <edward.lewis@icann.org>
To: dnsop <dnsop@ietf.org>
Thread-Topic: Thoughts about draft-wkumari-dnsop-extended-error
Thread-Index: AQHTBiSzKlkUelFtTUGzx3wDNEirBw==
Date: Wed, 26 Jul 2017 15:34:41 +0000
Message-ID: <78D23998-822B-42C7-8DEA-04E6D5B7D29B@icann.org>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/f.24.1.170721
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [192.0.47.234]
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha1"; boundary="B_3583913682_1109143530"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/YyxMTaVudWo5-Det3gsU1WW9Hqw>
Subject: [DNSOP] Thoughts about draft-wkumari-dnsop-extended-error
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Jul 2017 15:34:47 -0000

First there's a need to divide and conquer. Or maybe conquer a different target.

Why express (return) an error condition notification? To let the requestor know what happened?

I don't think that's sufficient or even the true goal.

Another reason for returning an error notice is to tell the requester what they ought to do next.

In DNS there are a few choices, here are some (I may miss "corner cases"):

1 - try again in 5 seconds
2 - try a different server authoritative for the zone
3 - the zone (admin) says "no"
4 - try a different recursive server
5 - don't try again (ever, for a while) for this name or anything "under" it
6 - change your query to something else (like HTTP? permanent redirect)
7 - probably other reactions

Note that the above list is meant to be independent of what went wrong.

The other reason is to inform the requestor. Doing some navel gazing, why do this? For the most part, the other end only needs to know how to react as above. We do have a great tradition of wanting to know why a service someone else runs misbehaves in our eyes. This can be good, helping in the identification of problems, but as operations become more "professional-grade" this may be an outdated romantic notion. (Although in a speed test of public-twitter-angst vs. nagios-alerts, twitter has proven to win at least some races.)

But if we are going to get into explaining why, here are some considerations.

One, different code bases will have myriads of errors to report that are related to their own threads. (Recall old-time BIND INSISTs?) Do you want to have that be reported?

Two there are protocol "errors" - so long as we don't add to the mythical DNS server protocol state machine, the set of errors can be enumerated.

I can see that knowing if a state is transient or permanent can indicate what a requestor ought to do, but then see the first part of the message. The difference of transient and permanent may be just a perception of time scale, with permanent ending at a reboot.

E.g., what if a resolver gets a response and finds DNSSEC telling it to reject the data for X seconds. For those X seconds, the resolver will not send a response, so it's "permanent" for X seconds in some sense, transient that in X seconds the negative DNSSEC cache will expire the lesson learned.

In short - instead of error conditions, define "exceptional reactions" a requestor ought to pursue. This will probably be much more quantifiable (in that the mythical protocol state machine for the requestor is simpler than the server and less likely to be radically changed in the future).

And - you will get away from having to localize the error explanation into different human languages. (I hadn't forgotten this, but this issue is a red herring.) The DNS protocol need not be human friendly, it's meant for machines to to talk to machines. Trying to make it talk to people at the level people understand might just be too large a task.

Mental exercise: what does a querier do when it see's NOTIMP now? Switch servers and hope? REFUSED vs. SERVFAIL for lame delegation, which is better? Protocol-wise, "what happened" isn't all that useful, "what to do next" would be.

Attachment: smime.p7s

[DNSOP] Thoughts about draft-wkumari-dnsop-extend… Edward Lewis

[DNSOP] Thoughts about draft-wkumari-dnsop-extended-error

Attachment: smime.p7s