[dnssd] WGLC Updates to draft-ietf-dnsop-session-signal

Ted Lemon <mellon@fugue.com> Fri, 09 March 2018 05:20 UTC

From: Ted Lemon <mellon@fugue.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_EE66BBF5-AD99-490B-8875-EC6628E610CA"
Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Message-Id: <676A8EDD-2F6D-4A48-82BA-508DE275B20F@fugue.com>
Date: Fri, 09 Mar 2018 00:20:19 -0500
Cc: dnssd@ietf.org
To: dnsop <dnsop@ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnssd/PGUxjzK9_-R8Hgerglr8PMyak2s>
Subject: [dnssd] WGLC Updates to draft-ietf-dnsop-session-signal
Precedence: list

We have updated draft-ietf-dnsop-session-signal to address the points raised during the working group last call, which ended on 2/23. We've already received comments on the new version from Paul Hoffman (thanks for that) and will address those as well in a later update.

I've included links to the messages we received during last call and a log of the changes we made based on the points raised in those messages below. Thanks to everyone for the thorough reviews—we got lots of helpful comments!

---

Jan Komissar: https://mailarchive.ietf.org/arch/msg/dnssd/v48k8QbOSN-HtDRFRf20csoq1ZY

1: Should we require TLS?

We entertained this idea, and it would simplify the protocol work, but it seems like an overreach for this document. If we require TLS, we are effectively making that a requirement for any DNS-over-TCP connection for which session signaling would be useful. We do not believe that we have working group consensus on this point, and therefore can't make this move in this document. If the working group wants to require TLS for DNS TCP connections, that should be done in a separate document.

Ed Lewis
https://mailarchive.ietf.org/arch/msg/dnsop/CQ_qRzpm_omXivdY7eELQq-fX5I
https://mailarchive.ietf.org/arch/msg/dnsop/Iig2nnIvBFsjG5FafKXgpdtBDTw

2: Is the term "session" inappropriate here?

The consensus appears to be that using a different term in order to accommodate the OSI layering model, which isn't actually in use, seems unnecessary. We will add text to the document to explicitly state that this isn't session in the OSI sense. Numerous other comments were made that asked for clarification on the basis of the assumption that we were talking about an OSI-model session; these are all addressed by clarifying that we are not.

3: The endpoints - is that the stub resolver and authoritative server (on in DNSSEC terms) the signer and the validator?

The text here explicitly refers to a connection between client and server; to clarify it to "stub resolver and authoritative server" would not be correct, since there are many other "clients" and "servers." For example a full-service cache is a "server" to the stub resolver, and a "client" to an authoritative server or upstream forwarder. RFC 7766 currently uses the terms "client" and "server" in the same way, and it's not clear how we could further clarify this.

4: This section (4.1) intermingles text on whether or not each DSO request elicits a response or not and the process of DSO "session" establishment. With proper editing, these should be separated to lessen confusion.

This has been clarified

5: (Section 4.1.1) Requirements (first MUST) and recommendations to operate in a certain way tend to become dated quickly. Instead of placing requirements on clients to act good, let the server refuse workload. I am thinking of the issue surrounding the iterations in NSEC3 and recent surveys of operators. Despite the documents saying a low value is better, operators use high values.

This text is quote from RFC 7766, so updating it would require an update to RFC7766.

6: This phrase is confusing and unnecessary "this is a fatal error". The logic to that point is clear that the situation doesn't happen and the prescribed behavior ("close the connection") makes sense.

The term "fatal error" is used throughout the document to mean an error that precludes further communication on the connection. The terminology section has been updated to clarify.

7: This is clumsy, when describing the RCODE: "generally set to zero on transmission, and silently ignored on reception, except".

This text has been clarified.

8: Perhaps these are not "unacknowledged requests" but "subsequent responses".

We had a debate about what the right phrase was to use and settled on "unacknowledged messages." The document has been updated to use this phrase consistently, which should lessen the confusion. "Subsequent responses" doesn't really work because an unacknowledged message isn't really a response.

9: Do not do name compression! No No No No.

The reason we don't say this is that TLVs can tunnel DNS messages, and this text would require us to uncompress the message before encoding it in a TLV.

We have changed the text to be slightly more generic: the content of a TLV is opaque and subject to definition of that TLV. We no longer mention DNS name compression at all—since TLVs are opaque blobs, no requirements can be placed on what they look like.

10: Unrecognized primary: "I would have thought this would warrant tearing down the connection given the words earlier that this ought never happen."

We agree, and the document has been updated accordingly.

11: "The namespaces of 16-bit MESSAGE IDs are disjoint in each direction. For example, it is *not* an error for both client and server to send a request message with the same ID." This will, someday, confuse a young and inexperienced DNS hosting engineer.

The reason we wrote this in the spec is so that a young engineer won't make that mistake.

We have added some text to clarify that the directionality of the message can be considered an extra bit. There is no way for the server and client to know that the other isn't using the same message ID as they are composing the message, and the tracking required to prevent clashes that theoretically could be prevented would impose a huge amount of complexity—this really isn't at all analogous to the key_id example.

12: There will be a need to fight cruft, or garbage collect. Inactive objects tend to be forgotten while still using up resources.

If you've got buggy client software that forgets to cancel some operation it's going to forget to stop sending renewals as well. This is one reason for having keepalives.

13: I thought that if a connection ends, the DSO session ends.

Correct.

We have changed "reconnect" to "make a new session" which means a new connection as well.

14: Sometimes a client can't distinguish this: "If reconnecting to the same server," as some server processes have multiple addresses and names."

We have stolen the following text from DNS Push and flipped it around to change the emphasis:

DNS Updates and DNS Push Notifications may be handled on different ports on the same target host, in which case they are not considered to be the "same server" for the purposes of this specification, and communications with these two ports are handled independently.

The issue is that it's actually a hard problem for the client to know that two servers are the same server. This definition is sufficiently constrained that we can imagine a client successfully implementing. Matching IP addresses is too hard. As a server operator, if you want servers to share connections it's your job to set up the server with the same host and ports. The opposite is also true.

15: "The RECOMMENDED value is 10 seconds." Probably a bad idea to codify this because implementations will set it to 10 and not scatter it when it should be. (Like closing out many connections in a load-shedding panic.)

Instead could say that it's recommended that servers set this value to ten seconds when they want the client to reconnect, and refer back to the session that has the more general text about this (5.6.1.1).

The text has been updated as suggested—thanks!

Paul Hoffman:
https://mailarchive.ietf.org/arch/msg/dnsop/PUlnpSfGvZo6cZge476SEWUR7tM

16: The document should explicitly list which protocols are currently acceptable, and say that the list can change in the future based on standards-track documents.

We do not agree that this should be restricted to standards-track documents. That said, we took the text that Paul kindly provided and tweaked it. Updates to DSO obviously still have to be standards-track.

Paul also sent us comments on the post-WGLC -06 update, which have not yet been addressed as of now. Thanks very much for the quick response, Paul!

Stephane Bortzmeyer:
https://mailarchive.ietf.org/arch/msg/dnssd/NFtd5RGIJhbOnb4Q_dWLwfZmkYg

17: My personal feeling is that it is complicated, with a lot of details. May be separating in two documents, one for the base DSO concept and one for the standards TLV (with their detailed behavior) would have been better.

We don't see any working group consensus on this point, and it seems unnecessary.

18: There is a discussion about a possible Privacy section <https://github.com/raybellis/draft-bellis-dnsop-session-signal/pull/36> for which I suggest the following text: [...]

We started with Stephane's text, and wound up with this:

When designing new DSO TLVs, the potential for data in the TLV to be
used as a tracking identifier should be taken into consideration, and
should be avoided when not required.

When used without TLS or similar cryptographic protection, a
malicious entity maybe able to inject a malicious Retry Delay
Unacknowledged Message into the data stream, specifying an
unreasonably large RETRY DELAY, causing a denial-of-service attack
against the client.

This is now in the Security Considerations section, rather than in a separate Privacy Considerations section. We could make it a separate section if there is a strong desire to do so, but it seemed unnecessary.

19: "There are a myriad of other potential use cases for DSO given the versatility and extensibility of this specification." I don't really like this sort of sentence. Either we have ideas about these potential use cases and we should write them down, or we don't and we should avoid this sort of very general words (after all, human imagination being what it is, we can be sure surprising use cases will be found.)

We agree and have removed this text.

20: "If the RCODE is set to any value other than NOERROR (0) or DSONOTIMP (tentatively 11), then the client should assume that the server does not support DSO." (Why "should" in lower case?) RFC 1035 being very clear that the rcode from a non-DSO server must be NOTIMP (this is also said in section 4.2.1 of the draft), I suggest to change that to: [...]

We actually think it should be MUST.

We believe the text otherwise says essentially the same thing as Stephane's text, except that Stephane's text actually places new requirements on non-conforming servers. We therefore prefer the current text.

21: It seems to me that the draft uses "response-requiring messages" as a synonym of "acknowledged request messages" (and "non-response-requiring messages" as a synonym of "unacknowledged request messages"). If I'm correct, it would be better to state it clearly in the terminology section.

There's a linquistic ambiguity here. We wrote "response-requiring" rather than "acknowledged" because "acknowledged" could mean "has been acknowledged" or "requires acknowledgment," whereas "response-requiring" is unambiguous.

So we prefer to keep the text as is.

22: Since there are only four combinations, I do not find this table useful.

The idea is that this table is a template for future modifications, so we want to have it there as a way of priming that.

23: Last, RFC 5226 (IANA considerations section) is now replaced by RFC 8126

Fixed, thanks!

Bernie Volz
https://mailarchive.ietf.org/arch/msg/dnsop/56NcwNBSY1K26tD424Z1sJtQqWU

24: Section 1: Not sure why Stateful is capitalized in the following line: "transport protocol. Each Stateful operation is communicated in its"

Stuart coincidentally had already changed this text to say "each DSO operation" rather than "each stateful operation," so problem solved. :)

25: Section 2: You probably should update this to the text in https://tools.ietf.org/html/rfc8174 and update the references accordingly.

The text now references both RFC 2119 and RFC 8174.

26: Section 3: Do you want to add anything about Section 6 (and perhaps later sections)?

We concluded that this paragraph was actually unnecessary, so rather than adding a mention of section 6, we deleted the paragraph!

27: Section 4.2.2.3 (and perhaps other places similar text exists): When a connection is aborted because of an invalid message, is there any recommendation to be added about retrying? If the client terminates the connection, it would likely not be wise for it to immediately retry and repeat the operation as that can lead to an endless loop? Should some recommended backoff technique be provided? Or some other connection rate limiting warning?

Good point, client should not reconnect immediately after a fatal error. There is no good answer here.

We have added some text in the terminology section that talks about the implications of fatal errors. A fatal error means you are not following the spec. It is still possible that a non-DSO connection would work. How the client navigates this is beyond our ability to specify. The new text says that the client shouldn't try to reconnect to the same server instance for at least an hour

28: Section 4.2.3, last paragraph: What should happen if a EDNS(0) TCP Keepalive option does appear? Should the connection be terminated? The message ignored?

This has already been specified. Document already says ARCOUNT must be zero for DSO messages. We have added text saying that if an EDNS(0) keepalive option appears within a DSO session, it's a fatal error and is treated accordingly.

29: Section 4.3: first paragraph: This text kind of conflicts with the text in 4.2.1 which says that whether a message is acknowledged or unacknowledged is determined only by the specification for the Primary TLV. I understand this may not be worth addressing, but perhaps a reference to 4.2.1 is worth considering?

We have updated and significantly clarified this section, and we believe that this point has been addressed.

30: Section 6.1: INACTIVITY TIMEOUT and KEEPALIVE INTERVAL. Is mention 0xffffffff (infinite) worth adding to this text?

We have added a separate paragraph about 0xfffffffe in section 5.4.2. Would probably be redundant to add it here as well. We also deleted sections 7.3 and 7.4, which were redundant.

31: Section 6.2.1: If a client were to receive a new RCODE but does not understand it (older version), should there be a statement as to how the client should react? Should it treat the unknown error code as if NOERROR were sent?

We have updated this to say that all nonzero errors should be taken as if NOERROR were sent and treat them as routine shutdown.

32: Section 9: Seems a bit light, but OK if it ends up being acceptable. For example, while it probably means you have bigger problems, but large timer values (such as in the Retry Delay TLV) could be a denial of service vector. Though if the server does that, it probably isn’t who you wanted to be talking to anyway and you should have used TLS. Perhaps also saying that if DNS over TLS isn’t used (just plan TCP), then it may be possible for a man-in-the-middle to inject messages (such as with a large Retry Delay TLV)?

We do not see an action here that doesn't already apply to TCP and TLS.

33: While there seems not be a solid definition of what the “Updates” means in the RFC header, it always seems to me that “Updates” means you are changing something in those documents, rather than just extending them with new capabilities within the original documents framework.

The point of [updates] here is to flag to readers of the earlier specifications that they should read this specification as well. There has been some agreement in the working group that this should be here. In a future world where DSO is pretty common, then we have changed RFC 7766, and it would be problematic for an implementer to not implement this specification as well.

[dnssd] WGLC Updates to draft-ietf-dnsop-session-… Ted Lemon