Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Thu, 02 August 2018 13:54 UTC

Date: Thu, 02 Aug 2018 08:54:36 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Tom Pusateri <pusateri@bangj.com>
Cc: The IESG <iesg@ietf.org>, tjw.ietf@gmail.com, dnsop@ietf.org, dnsop-chairs@ietf.org, draft-ietf-dnsop-session-signal@ietf.org
Message-ID: <20180802135436.GF96369@kduck.kaduk.org>
References: <153270509617.32757.1191915890190419981.idtracker@ietfa.amsl.com> <EFFB4DC5-5A4B-4EAB-8B9F-56229080CDF0@bangj.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <EFFB4DC5-5A4B-4EAB-8B9F-56229080CDF0@bangj.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/74Ao8u-l4g-sUe2c3SSMkUIKBhw>
Subject: Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)
Precedence: list

On Tue, Jul 31, 2018 at 03:53:57PM -0400, Tom Pusateri wrote:
> 
> 
> > On Jul 27, 2018, at 11:24 AM, Benjamin Kaduk <kaduk@mit.edu> wrote:
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > Hopefully this first point is simple to resolve (whether by changing text or merely
> > un-confusing me).  The other ones may require some actual discussion,
> > though.
> > 
> > Section 6.6.1.2 has:
> > 
> >   After a DSO Session is ended by the server (either by sending the
> >   client a Retry Delay message, or by forcibly aborting the underlying
> >   transport connection) the client SHOULD try to reconnect, to that
> >   service instance, or to another suitable service instance, if more
> >   than one is available.
> > 
> > which seems to contradict the text from Section 3:
> > 
> > %  [...] Given that these fatal error
> > %  conditions signify defective software, and given that defective
> > %  software is likely to remain defective for some time until it is
> > %  fixed, after forcibly aborting a connection, a client SHOULD refrain
> > %  from automatically reconnecting to that same service instance for at
> > %  least one hour.
> > 
> > Given some other mentions in the document about aborting the connection, it
> > may be that I am just reading the "refrain from reconnecting for an hour"
> > more strongly than I should be.
> 
> Section 3 discusses fatal errors on the server side that aren’t likely to change until the software is modified so there’s no use having the client retry quickly (hence the one hour). Section 6.6.1.2 is about receiving a Retry Delay message and the server having the close the connection because the client didn’t follow the spec and close the connection when asked.
> 
> I think the difference can be described as Section 3 talks about how the client deals with server bugs and Section 6.6.1.2 describes how the server deals with client bugs.
> 
> Does that help?

That helps with the intent; I'll need some time to go through the document
and see whether the document text matches the intent and I was just
confused, or if I should suggest some text changes.

> > 
> > Is Section 5.1.2 expected to be considered an "application protocol
> > profile" that defines the use of TLS 1.3 0-RTT data for DSO?  (If so, I do
> > not personally feel that it provides adequate treatment to be considered as
> > such.)
> > 
> > 
> > I should probably leave this to my (transport-area?) colleagues to discuss
> > further, but I'm not sure that the interaction of this mechanism with
> > high-RTT connections is fully covered -- for example, the inactivity
> > timeout in Section 6.4(.x) could behave poorly when the timeout is set to a
> > smaller value than the RTT, as the server would potentially end up starting
> > the "forcibly abort" process (and potentially causing the client to lose
> > for an hour) because the server's timer starts when it sends the DSO
> > response that initiates its idea of the session, and would not recieve
> > graceful shutdown messages from a properly-behaving client in time.
> > 
> 
> The 0 RTT discussion is also happening in another thread so I will not address it here.
> 
> > 
> > I'm not sure that the behavior specified in Section 7.1.2 is entirely safe.
> > Consider when a client sends a DSO keepalive to try to request a DSO
> > session, but subsequently sends EDNS(0) TCP Keepalive (whether "just in
> > case" or for some other reason).  The Server will receive the DSO keepalive
> > and respond, creating a session, but the EDNS(0) TCP Keepalive is already
> > in flight.  I don't remember seeing any text that prevents this client
> > behavior explicitly, but that seems like the right sort of thing to do.
> 
> Ok, is this more clear?
> 
> The inactivity timeout value in the Keepalive TLV (DSO-TYPE=1) has similar intent to the edns-tcp-keepalive EDNS0 Option [RFC7828] <file:///Users/pusateri/doc/draft-bellis-dnsop-session-signal/draft-ietf-dnsop-session-signal.html#RFC7828>. A client/server pair that supports DSO MUST NOT use the edns-tcp-keepalive EDNS0 Option within any message after a DSO Session has been established. A client that has sent a DSO message to establish a session MUST NOT send an edns-tcp-keepalive EDNS0 Option from this point on. Once a DSO Session has been established, if either client or server receives a DNS message over the DSO Session that contains an edns-tcp-keepalive EDNS0 Option, this is a fatal error and the receiver of the edns-tcp-keepalive EDNS0 Option MUST forcibly abort the connection immediately.

I think that addresses my concern, thanks.  (I know it does seem a little
silly to worry about, but part of what I do as SEC AD is look for edge
cases...)

> > 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Six authors exceeds five, so "there is likely to be discussion" about this
> > being too large a number of authors.  What is the justification for the
> > author count?
> 
> We have all contributed significantly.
> 
> > Do we need to specify some GREASE (per draft-ietf-tls-grease) for new TLV
> > types in order to ensure the proper handling of unknown types?
> 
> Happy to do this if the other authors are in agreement.

Sure -- I'm just trying to raise awareness of issues that have been faced in
other situations, not to force anything on you.

> > Section-by-section comments follow.
> > 
> > Section 1
> > 
> > A DoH reference seems timely/apt.  (But maybe then it is only "Some such
> > transports" that can offer persistent sessions.)
> 
> We specifically didn’t want to reference DoH since HTTP is unsuitable for long lived connections and DSO wouldn’t apply here. We didn’t want to imply that DoH was suitable by referencing it.

Hmm.  I think DoH is clearly a non-UDP transport for DNS, and it is hard to
argue that it is not being specified.  My parenthetical suggestion was
probably a bit unclear -- I was proposing to add DNS over HTTP to the list
in the first sentence, and change the second sentence to start as "Some
such transports can offer persistent[...]".  Does that still seem like it
runs the risk of implying that DoH is suitable, to you?

> > Maybe give some examples of advantages of server-initiated messages?  Are
> > we talking about letting the server push records with larger TTLs or
> > notifying when the response to a query is changing, or just more mundane
> > keepalive-type things?
> 
> Yes, I will add a reference to DNS Push Notifications (https://tools.ietf.org/html/draft-ietf-dnssd-push-14 <https://tools.ietf.org/html/draft-ietf-dnssd-push-14>)

Thanks!

> > 
> > Section 3
> > 
> >   The terms "initiator" and "responder" correspond respectively to the
> >   initial sender and subsequent receiver of a DSO request message or
> >   unacknowledged message, regardless of which was the "client" and
> >   "server" in the usual DNS sense.
> > 
> > We just defined "client" and "server" explictly (without reference to the
> > "usual DNS sense"), so probably it's best to have this definition refer to
> > the previous client/server definitions or clarify that the above
> > definitions match the "usual DNS sense”.
> 
> Noted. I will add "the usual DNS sense" to the client and server definitions.
> 
> > 
> >   When an anycast service is configured on a particular IP address and
> >   port, it must be the case that although there is more than one
> >   physical server responding on that IP address, each such server can
> >   be treated as equivalent.  If a change in network topology causes
> >   packets in a particular TCP connection to be sent to an anycast
> >   server instance that does not know about the connection, the normal
> >   keepalive and TCP connection timeout process will allow for recovery.
> >   If after the connection is re-established, [...]
> > 
> > Perhaps clarifying that "recovery" means "detecting the broken session and
> > starting a new one" would be useful?  (I guess Spencer's DISCUSS takes this
> > a different direction.)
> 
> I will let this discussion continue with Spencer’s DISCUSS.

Yes, let's not fork the threads.

> > 
> >   DSO unacknowledged messages are unidirectional messages and do not
> >   generate any response.
> > 
> > "Do not generate any response" at the DNS layer; any reason to mention that
> > TCP will still ACK the bytes (or rather, that the "reliable" part of the
> > data stream will need to do so)?
> 
> Noted. Will add “at the DNS layer”.
> 
> > 
> > Section 5.1
> > 
> >   DSO messages MUST be carried in only protocols and in environments
> >   where a session may be established according to the definition given
> >   above in the Terminology section (Section 3).
> > 
> > nit: is this "in only" or "only in”
> 
> Noted. This will be changed to:
> 
> DSO messages MUST be carried only in protocols and environments where a session may be established according to the definition given above in the Terminology section (Section 3).
> 
> > 
> >   If the RCODE is set to any value other than NOERROR (0) or DSOTYPENI
> >   ([TBA2] tentatively 11), then the client MUST assume that the server
> >   does not implement DSO at all.  In this case the client is permitted
> >   to continue sending DNS messages on that connection, but the client
> >   SHOULD NOT issue further DSO messages on that connection.
> > 
> > I'm confused how the server would still have proper framing for subsequent
> > DNS messages, since the DSO TLVs would be "spurious extra data" after a
> > request header and potentially subject to misinterpretation as the start of
> > another DNS message header.
> 
> Yes, this is a serious oversight. I think we are going to need to encode differently to make all the TLVs look like an RR externally so the RDLEN can be used to skip them and add a single count or switch the TLV syntax back to RR syntax. The existing DNS header format / RR format is less than ideal...
[un-forking the thread, the DNS-over-TCP framing takes care of this, and
clarifying text is being added]
> > 
> > Section 5.1.3
> > 
> > It is probably worth explicitly noting that a middlebox MUST NOT
> > make/forward a DSO request with TLVs that it does not implement.
> 
> If a middlebox understands any DSO messages then it should follow section 5.2.2.4 (Unrecognized TLVs) with regard to how to handle TLVs that it does not implement. I can make that forward reference.

This is probably good enough (but I'm under-caffeinated so could be missing
something).  My main concern is to prevent the sort of behavior we've seen
in TLS, where a middlebox tries to forward on a ClientHello assuming that
it can parse (and interpose into!) the subsequent server responses, but
negotiated extensions can drastically change the protocol behavior and
cause the middlebox to fail to parse things properly.  If the DSO middlebox
is interpreting DSO and using separate sessions for the two sides, that
should be sufficient.

> > 
> > Section 5.2
> > 
> >   If a DSO message is received where any of the count fields are not
> >   zero, then a FORMERR MUST be returned, unless a future IETF Standard
> >   specifies otherwise.
> > 
> > This seems like ... not the conventional wording for this behavior, and
> > subject to large debates about the meaning of "IETF Standard".
> > (Similar language is used elsewhere, too.)
> 
> Noted. We can remove future IETF Standards.
> 
> > 
> > Section 5.2.2
> > 
> > The start of this section seems to duplicate a lot of Section 3 -- e.g.,
> > the specification of the "Primary TLV"; request/response/unacknowledged as
> > the types of messages; etc.  It's unclear to me that this duplication of
> > content is helpful to the reader.
> > 
> >   Unacknowledged messages MUST NOT be used "speculatively" in cases
> >   where the sender doesn't know if the receiver supports the Primary
> >   TLV in the message, because there is no way to receive any response
> >   to indicate success or failure of the request message (the request
> >   message does not contain a unique MESSAGE ID with which to associate
> >   a response with its corresponding request).  Unacknowledged messages
> >   are only appropriate in cases where the sender already knows that the
> >   receiver supports, and wishes to receive, these messages.
> > 
> > Having gone to the trouble to explicitly define (twice!) "request",
> > "response" and "unacknowledged", it's pretty confusing to then use the
> > English word "request" to refer to an "unacknowledged" message.
> > 
> > Section 5.2.2.1
> > 
> >   The specification for a TLV states whether that DSO-TYPE may be used
> >   in "Primary", "Additional", "Response Primary", or "Response
> >   Additional" TLVs.
> > 
> > Perhaps this could be wordsmithed to avoid accidental misreading as
> > exclusive-or?
> 
> How about:
> 
> When a new TLV is defined, the specification MUST include whether the DSO-TYPE can be used as the Primary TLV, used as an Additional TLV, or used in either context for both requests and responses.

That's probably better (but maybe another comma before "for both requests
and responses"?  OTOH, the RFC Editor has a consistent style book to
apply...)

> 
> > 
> > Section 5.3.1
> > 
> >   When a DSO unacknowledged message is unsuccessful for some reason,
> >   the responder immediately aborts the connection.
> > 
> > Doesn't this kill the client/server pairing for an hour?  "For some reason"
> > is very vague to induce such behavior, and could include transient internal
> > errors.
> 
> An unacknowledged message can only be unsuccessful from the receivers point of view and typically would be a code error because they typically wouldn’t be sent before some protocol support has already been determined by other DSO messages. (i.e. mDNS relay or PUSH notifications are the use cases in mind).
> 
> However, your point is noted. This is too general. How about:
> 
> When an unacknowledged DSO message type is received (MESSAGE ID field is
> zero), the receiver SHOULD already be expecting this DSO message type.

I'm not sure that 2119-SHOULD is needed here.

> Section 5.2.2.4 describes the handling of unknown DSO message types.
> Parsing errors MUST also result in the receiver aborting the connection.
> If the DSO message type is not expected, the receiver should also abort
> the connection. Other internal errors processing the DSO message are
> implementation dependent as to whether the connection should be aborted
> according to the severity of the error.

That looks better, thanks.

> > 
> > Section 6.1
> > 
> > When would it be appropriate for the server to send responses out of order?
> 
> An example would be with DNS Push notifications subscriptions. The server may receive a query for which it doesn’t have the information in it’s cache followed by a query where it does have the information in it’s cache. So it may asynchronously perform operations to load it’s cache in the first case but immediately response to the second query out of it’s cache.

I'll leave it to your collective judgment as to whether it's worth
including such an example of out-of-order processing in the draft itself.

> > 
> > Section 6.6.1.1
> > 
> > nit: "RECONNECT DELAY" is used with inconsistent capitalization.
> 
> Noted.
> 
> > 
> > Section 7.1
> > 
> > The description of the two timeout fields is predicated on understanding
> > that it is only the response's incarnation of them that is authoritative;
> > as an editorial matter, it might be nice to introduce this fact earlier.
> 
> Noted. I will change the first paragraph in 7.1 to:
> 
> The Keepalive TLV (DSO-TYPE=1) performs two functions: to reset the keepalive timer for the DSO Session, and to establish the values for the Session Timeouts. The client will request the desired Session timeout values and the server will acknowledge with the final values the client MUST use.

Thanks!

-Benjamin

> > 
> > Section 7.1.1
> > 
> > It seems like we could consolidate the "equal to" and "greater than" cases
> > into "greater than or equal to”.
> 
> Noted.
> 
> > 
> > Section 7.2.1
> > 
> >   A client MUST NOT send a Retry Delay DSO request message or DSO
> >   unacknowledged message to a server. [...]
> > 
> > nit: it must not send it as a response, either, so perhaps "MUST NOT send a
> > Retry Delay DSO message to a server" is shorter and better.
> 
> Noted.
> 
> > 
> > Section 9.3
> > 
> > I thought IANA liked to see a "registration template" for what subsequent
> > registrations in a registry being created will need to specify.  (That
> > said, the IANA state is "IANA OK - Actions Needed", and one might expect
> > that they know better than I do...)
> > 
> > Section 10
> > 
> > I'm a little surprised to not see some discussion that this mechanism
> > encourages the maintenance of persistent connections on DNS servers, which
> > encourages the maintenance of persistent connections on DNS servers, which
> > has impact on resource consumption/load, but is not expected to be
> > problematic because the server can tell the clients to go away if needed.
> 
> I will add this to the end of Section 10.
> 
> Thanks for the thoughtful review. Your comments will definitely make the document better.
> 
> Tom
> 
>

Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Tom Pusateri
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Mirja Kuehlewind (IETF)
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Ted Lemon
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Ted Lemon
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Tom Pusateri
[DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dn… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Mirja Kuehlewind (IETF)
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Ted Lemon
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Tom Pusateri
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Ted Lemon