[DNSOP] Mirja Kühlewind's Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)

Mirja Kühlewind <ietf@kuehlewind.net> Mon, 30 July 2018 20:19 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: dnsop@ietf.org
Delivered-To: dnsop@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 2B43D124D68; Mon, 30 Jul 2018 13:19:31 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Mirja Kühlewind <ietf@kuehlewind.net>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-dnsop-session-signal@ietf.org, Tim Wicinski <tjw.ietf@gmail.com>, dnsop-chairs@ietf.org, tjw.ietf@gmail.com, dnsop@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.83.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <153298197116.8154.9156104510824888266.idtracker@ietfa.amsl.com>
Date: Mon, 30 Jul 2018 13:19:31 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/EMdh3FgaAotmY4IXL8RLLAX-r-Q>
Subject: [DNSOP] Mirja Kühlewind's Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.27
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Jul 2018 20:19:31 -0000

Mirja Kühlewind has entered the following ballot position for
draft-ietf-dnsop-session-signal-12: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-dnsop-session-signal/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

1) In addition to the bullet point in the 6.2 that was flagged by Spencer, I
would like to discuss the content of section 5.4.  (DSO Response Generation). I
understand the desire to optimize for the case where the application knows that
no data will be sent as reply to a certain message, however, TCP does not have
a notion of message boundaries and therefore cannot and should not act based on
the reception of a certain message. Indicating to the TCP that an ACK can be
set immediately in an specific situation is also problematic as ACK processing
is part of the TCP's internal machinery. However, why it is important at all
that an TCP-level ACK is send out fast than the delayed ACK timer? The ACK
receiver does not expose the information when an ACK is received to the
application and the delayed ACK timer only expires if no further data is
received/send by the ACK-receiver, therefore this optimization should not have
any impact in the application performance. I would just recommend to remove
this section and any additional discussion about delayed ACKs.

Please note that the problem described in [NagleDA] only occurs for
request-response protocols where no further request can be sent before the
response is received. This is not the case in this protocol (as pipelining is
supported).

2) Further regarding keep-alives:
in sec 6.5.2: "For example, a hypothetical keepalive interval
   value of 100ms would result in a continuous stream of at least ten
   messages per second, in both directions, to keep the DSO Session
   alive."
This does not seems correct. There should be at max one keep-alives message in
flight. Thus the keep-laives timer should only be restarted after the
keep-alive reply was received. Also: "And, in this extreme example, a single
packet loss and
   retransmission over a long path could introduce a momentary pause in
   the stream of messages, long enough to cause the server to
   overzealously abort the connection."
This doesn't really make sense to me: As I said, TCP will retransmit and the
keep-alive timer should not be running until the reply is received. If you want
to abort the connection based on keep-alives quickly before the TCP connection
indicates you a failure, you need to wait at minimum for an interval that is
larger than the TCP RTO (with is uaually 3 RTTs) which means you basically need
to know the RTT.

Also sec 7.1: "If the client does not generate the
      mandated keepalive traffic, then after twice this interval the
      server will forcibly abort the connection."
Why must the server terminate the connection at all if the client refuses to
send keep-alives? Isn't that what the inactivity timer is meant for? Usually
only the endpoint that initiates the keep-alive should terminate the connection
if no response is received.

3) There is another contraction regarding the inactive timer:
Sec 6.2 say "A shorter inactivity timeout with a longer keepalive interval
signals
   to the client that it should not speculatively keep an inactive DSO
   Session open for very long without reason, but when it does have an
   active reason to keep a DSO Session open, it doesn't need to be
   sending an aggressive level of keepalive traffic to maintain that
   session."
which indicates that the client may leave the session open longer than
indicated by the inactive timer of the server. However section 7.1.1 say that
the client MUST close the connection when the timer is expired.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

1) sec 3: I really find it a bit strange that there is normative language about
error handling (as well as in the "same service instance" definition part) in
the terminology section. Maybe move those paragraphs somewhere else...? Also
the part about "long-lived operations" and messages types provides far more
information than just terminology and I would recommend to also move it into an
own section or maybe just have it as part of the intro.

2) Maybe call section 5 "Protocol specification" instead of "Protocol
details"...?

3) Sec 5.1: "DSO messages MUST be carried in only protocols and in environments
   where a session may be established according to the definition given
   above in the Terminology section (Section 3)."
I don't get this. Which part of section 3? Given section 3 is on terminology
and this is a normative statement, I would recommend to spell out here
explicitly what is meant. Do you mean the protocol must be connection-oriented,
reliable, and providing in-order delivery? Any thing else? However, given that
you say two paragraphs onwards that this spec is only applicable for the use
with TCP and TLS/TCP, do you even need to specify these requirements
normatively?

4) sec 5.1 "It is a common
   convention that protocols specified to run over TLS are given IANA
   service type names ending in "-tls"."
Not sure this is true. Isn't it usually just an "s" at the end? Or with
registry are you talking about?

5) sec 5.1: "In some environments it may be known in advance by external means
   that both client and server support DSO, ..."
I guess the client and server also need to know if TLS is supported or not.
Maybe spell this out as well.

6) sec 5.1: "... therefore either
   client or server may be the initiator of a message."
Maybe s/initiator of a message/initiator of a message exchange/

7) sec 5.1.2: "Having initiated a connection to a server, possibly using zero
round-
   trip TCP Fast Open and/or zero round-trip TLS 1.3, a client MAY send
   multiple response-requiring DSO request messages to the server in
   succession without having to wait for a response to the first request
   message to confirm successful establishment of a DSO session."
Why is the ability to send more than one request related to TCP Fast
Open/TLS1.3 0-RTT? These are two independent mechanisms to speed up processing.
Mentioning TCP Fast Open/TLS1.3 0-RTT here is rather confusing. Respectively I
also don't think that the sentence: "Similarly, DSO supports zero round-trip
operation." is describing quite the same.

8) Further please provide references to TCP Fast Open and TLS1.3 and maybe
rephrase this paragraph to use normative language: "Caution must be taken to
ensure that DSO messages sent before the
   first round-trip is completed are idempotent, or are otherwise immune
   to any problems that could be result from the inadvertent replay that
   can occur with zero round-trip operation."
Maybe just:
"DSO messages sent with TLS1.3 0-RTT before the TLS handshake is completed or
in TCP SYN data with use of TCP Fast Open MUST be idempotent." However, this is
actually already required by TLS1.3 and TFO, so there is after all no need to
just rephrase this requirement here (at least not normatively). I think it
would be more useful for every DSO message type to specify if it can be sent in
0-RTT or not and require this for specification of future TLVs.

9) sec 5.1.3: "In cases where a DSO session is terminated on one side of a
   middlebox, and then some session is opened on the other side of the
   middlebox in order to satisfy requests sent over the first DSO
   session, any such session MUST be treated as a separate session."
This sentence seems a bit non-sensical, which probably isn't great for a
normative sentence. If a session is terminated and open of the other end,
doesn't that mean that you have two sessions?

10) sec 5.1.3: "A middlebox that is not doing a strict pass-through will have
no way to
   know on which connection to forward a DSO message, and therefore will
   not be able to behave incorrectly."
I'm not sure I understand this sentence. Can you clarify?

11) As already briefly mentioned by Ben, there is quite some redundant text in
sec 5 (with 5.2) for handling of message IDs and TLVs. Given this text is
normative, I would really recommend to only specify it clearly once. Please
also check the rest of the doc further things that are specified normatively
multiple times. It usually makes it must clearer to specify it only once, at
least normatively, at the appropriate position in the doc.

12) sec 5.3.1: "When a DSO unacknowledged message is unsuccessful for some
reason, .." What does unsuccessful mean here? Can you clarify?

13) sec 6.5.2: "A corporate DNS server that knows it is serving only clients on
the
   internal network, with no intervening NAT gateways or firewalls, can
   impose a higher keepalive interval, because frequent keepalive
   traffic is not required."
I guess in this scenario it is probably most appropriate to not send any
keep-alives…

14) sec 6.6: "   o  The server application software terminates unexpectedly
(perhaps
      due to a bug that makes it crash)."
This bullet point does not really make sense to me because at that time when
the app is crashed there is no way for the server anymore to perform any
actions.

15) sec 7.1: "When a client is sending its second and subsequent Keepalive DSO
   requests to the server, the client SHOULD continue to request its
   preferred values each time. "
I don't understand the SHOULD here.. what else should be client put in these
field instead...?

16) sec 7.1.2: "Once a DSO Session has been established, if either
   client or server receives a DNS message over the DSO Session that
   contains an EDNS(0) TCP Keepalive option, this is a fatal error and
   the receiver of the EDNS(0) TCP Keepalive option MUST forcibly abort
   the connection immediately."
This is normatively specified multiple time (3?) in the doc. Please consider to
only specify it once where most appropriate (probably section 7.1.2)

16) sec 7.1: "The Keepalive TLV is not used as an Additional TLV."
This is redundant with the normative sentence in the next paragraph. Maybe just
remove this sentence...?

17) +1 to Ben's discuss regarding the reconnection of clients. A TCP RST can be
sent for many reasons and waiting for an hour seems not appropriate. I would
rather recommend to log an error and directly try to reconnect.