[DNSOP] Spencer Dawkins' Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)

Spencer Dawkins <spencerdawkins.ietf@gmail.com> Fri, 27 July 2018 04:33 UTC

Return-Path: <spencerdawkins.ietf@gmail.com>
X-Original-To: dnsop@ietf.org
Delivered-To: dnsop@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 30DC0124BE5; Thu, 26 Jul 2018 21:33:20 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Spencer Dawkins <spencerdawkins.ietf@gmail.com>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-dnsop-session-signal@ietf.org, Tim Wicinski <tjw.ietf@gmail.com>, dnsop-chairs@ietf.org, tjw.ietf@gmail.com, dnsop@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.83.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <153266600019.24802.9316144897968330271.idtracker@ietfa.amsl.com>
Date: Thu, 26 Jul 2018 21:33:20 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/oolmf8HkkFyPJQLpvixJesnRGVc>
Subject: [DNSOP] Spencer Dawkins' Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.27
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Jul 2018 04:33:20 -0000

Spencer Dawkins has entered the following ballot position for
draft-ietf-dnsop-session-signal-12: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-dnsop-session-signal/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I really like this document, and think it's headed the right direction. Of
course I have four pages of comments, because reasons, but the only part I'm
really confused about is this one ...

I would have thought that if you end up with a different endpoint because your
anycast address now resolves differently, the new endpoint would have to have
shared a lot of state with the previous endpoint, for this to work:

  When an anycast service is configured on a particular IP address and
   port, it must be the case that although there is more than one
   physical server responding on that IP address, each such server can
   be treated as equivalent.  If a change in network topology causes
   packets in a particular TCP connection to be sent to an anycast
   server instance that does not know about the connection, the normal
   keepalive and TCP connection timeout process will allow for recovery.

What I would have expected to happen, is that the new endpoint sees a packet
arrive that's not on a synchronized TCP connection, and immediately responds
with a RST (reset), rather than the normal keepalive and TCP connection timeout
process happening. That's also the way I'm reading
https://tools.ietf.org/html/rfc7828#section-3.6. Is that not the way it's
working for anycast these days?


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------


Everything else is a comment, so non-blocking, and please do the right thing.

This is a nit, and your answer could be "no", and that's fine, but in some
places this document uses "DSO keepalive", and in other places, "keepalive"
with no qualifier. It's likely that less confusion would result if you could
consistently call this "DSO keepalive", so that it is clearly NOT a TCP
keepalive. Do the right thing, of course.

Is the expectation that DSO would also be used in DNS over HTTP? I'm reading

  At the time of publication, DSO is specified only for DNS over TCP
   [RFC1035] [RFC7766], and for DNS over TLS over TCP [RFC7858].  Any
   use of DSO over some other connection technology needs to be
   specified in an appropriate future document.

and noticing that https://tools.ietf.org/html/draft-ietf-doh-dns-over-https-12
is currently in IETF Last Call.

This next one is well within the "Spencer wouldn't have done it this way, but
Spencer's not the working group, or the IETF" range, but

  However, in the typical case a server will not know in advance
   whether a client supports DSO, so in general, unless it is known in
   advance by other means that a client does support DSO, a server MUST
   NOT initiate DSO request messages or DSO unacknowledged messages
   until a DSO Session has been mutually established by at least one
   successful DSO request/response exchange initiated by the client, as
   described below.  Similarly, unless it is known in advance by other
   means that a server does support DSO, a client MUST NOT initiate DSO
   unacknowledged messages until after a DSO Session has been mutually
   established.

seems fragile, especially in environments where clients can come and go, and
servers may be addressed using anycast (so I knew in advance that the four
servers at that anycast address supported DSO, but somebody installed a fifth
server that does not). Is that unlikely to be a problem?

I'm sure

  A single server may support multiple services, including DNS Updates
   [RFC2136], DNS Push Notifications [I-D.ietf-dnssd-push], and other
   services, for one or more DNS zones.  When a client discovers that
   the target server for several different operations is the same target
   hostname and port, the client SHOULD use a single shared DSO Session
   for all those operations.  A client SHOULD NOT open multiple
   connections to the same target host and port just because the names
   being operated on are different or happen to fall within different
   zones.  This requirement is to reduce unnecessary connection load on
   the DNS server.

is correct from the server side, but perhaps it's also worth noting that using
multiple TCP connections unnecessarily increases the chances that data
transfers happen during TCP slow start. If only one or two packets are being
exchanged, that doesn't matter, but as more packets are exchanged, the
difference increases, because congestion windows will grow more rapidly if
fewer connections are used.

I appreciate the inclusion of 5.4.  DSO Response Generation

But I've gotta ask. In the last paragraph of that section, I see

   o  Use a networking API that lets the receiver signal to the TCP
      implementation that the receiver has received and processed a
      client request for which it will not be generating any immediate
      response.  This allows the TCP implementation to operate
      efficiently in both cases; for requests that generate a response,
      the TCP ACK, window update, and DSO response are transmitted
      together in a single TCP segment, and for requests that do not
      generate a response, the application-layer software informs the
      TCP implementation that it should go ahead and send the TCP ACK
      and window update immediately, without waiting for the Delayed ACK
      timer.  Unfortunately it is not known at this time which (if any)
      of the widely-available networking APIs currently include this
      capability.

I would love to know if there are any widely-available network APIs that
include this capability, before including this text in a standards-track RFC.
Do you need help chasing this down?

The text in 6.1.  DSO Session Initiation seems rough to me, for a couple of
reasons.

   The client may perform as many DNS operations as it wishes using the
   newly created DSO Session.  Operations SHOULD be pipelined (i.e., the

I don't understand why this would be a SHOULD. At least from the client's
perspective, it's not needed for interoperation.

   client doesn't need wait for a response before sending the next
   message).  The server MUST act on messages in the order they are
   transmitted, but responses to those messages SHOULD be sent out of
   order when appropriate.

Is it correct to say that "responses to those messages SHOULD be sent when they
become available, even if the responses are sent out of order"? If not, I'm
probably missing what "when appropriate" means.

I'm a bit mystified by this text in 6.2.  DSO Session Timeouts

  In the usual case where the inactivity timeout is shorter than the
   keepalive interval, it is only when a client has a very long-lived,
   low-traffic, operation that the keepalive interval comes into play,
   to ensure that a sufficient residual amount of traffic is generated
   to maintain NAT and firewall state and to assure client and server
   that they still have connectivity to each other.

I think the basics are correct - the inactivity timer and (DSO) keepalive
interval are independent - but I'm struggling to think of a reason to send
(DSO) keepalives that's NOT tied to maintaining NAT/firewall state, and there's
a lot of text before the paragraph that mentions NAT/firewall, that talks about
why either interval might be longer or shorter than the other, without
considering NAT/firewall. Am I missing something here?

... and, now that I keep reading, 6.5.2.  Values for the Keepalive Interval
does a much better job of explaining how a (DSO) keepalive interval should be
selected - I think you could reasonably delete most of the text about (DSO)
keepalive intervals in section 6.2, and at most provide a forward pointer to
6.5.2.

(As an aside, I think you probably want to cite
https://tools.ietf.org/html/bcp142 as the operative recommendation for NAT
behaviour toward TCP, since https://tools.ietf.org/html/rfc5382 has been
updated)

I found this text

  For long-lived DNS Stateful operations (such as a Push Notification
   subscription [I-D.ietf-dnssd-push] or a Discovery Relay interface
   subscription [I-D.ietf-dnssd-mdns-relay]), an operation is considered
   in progress for as long as the operation is active, until it is
   cancelled.  This means that a DSO Session can exist, with active
   operations, with no messages flowing in either direction, for far
   longer than the inactivity timeout, and this is not an error.  This
   is why there are two separate timers: the inactivity timeout, and the
   keepalive interval.  Just because a DSO Session has no traffic for an
   extended period of time does not automatically make that DSO Session
   "inactive", if it has an active operation that is awaiting events.

to be extremely helpful, but it's 28 pages into the document. Is there a place
earlier in the document that describes these timers, where you could place this
text? Maybe section 3/Terminology isn't the right place, but maybe there is a
right place toward the front of the document.

I'm not understanding why the SHOULDs are not MUSTs in this text:

  If, at any time during the life of the DSO Session, twice the
   inactivity timeout value (i.e., 30 seconds by default), or five
   seconds, if twice the inactivity timeout value is less than five
   seconds, elapses without there being any operation active on the DSO
   Session, the server SHOULD consider the client delinquent, and SHOULD
   forcibly abort the DSO Session.

Perhaps part of my confusion is that I'm not sure what it means to "consider
the client delinquent", but NOT to "forcibly abort the DSO session". But there
are several "will forcibly abort"s in section 6.4.2, that sound more like MUST
than SHOULD.

I don't think the MUST NOT in

  Normally a server MUST NOT close a DSO Session with a client.  A
   server only causes a DSO Session to be ended in the exceptional
   circumstances outlined below.

is quite right. Given that you have a bulleted list of reasons why a server
would violate the MUST not immediately following this sentence, you might want
to say "Normally a server does not close" here.