Re: [DNSOP] Spencer Dawkins' Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)

"Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net> Mon, 30 July 2018 15:00 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3C7051310F6 for <dnsop@ietfa.amsl.com>; Mon, 30 Jul 2018 08:00:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); domainkeys=pass (1024-bit key) header.from=ietf@kuehlewind.net header.d=kuehlewind.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cJ4i6DoHM_2m for <dnsop@ietfa.amsl.com>; Mon, 30 Jul 2018 08:00:03 -0700 (PDT)
Received: from kuehlewind.net (kuehlewind.net [83.169.45.111]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CD1F3130E3B for <dnsop@ietf.org>; Mon, 30 Jul 2018 08:00:02 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=kuehlewind.net; b=Lnd1slZFCp7accR6Pkm+MxJ6jMTqBY/0B48wJ07oQPR5ASqq5mHAS5AnzyRsoJx/GuVuGJY1QTEWaX3XADDnACUplgTq0u5nrMT1FmZGixA8jdk3aHPpWWFdQzu4hOWG9jR1MR0LHZmljQJ+P/62pl5iyrG3+v/POR+x7JE47m0=; h=Received:Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer:X-PPP-Message-ID:X-PPP-Vhost;
Received: (qmail 20230 invoked from network); 30 Jul 2018 16:59:01 +0200
Received: from i577bce38.versanet.de (HELO ?192.168.178.24?) (87.123.206.56) by kuehlewind.net with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 30 Jul 2018 16:59:00 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: "Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net>
In-Reply-To: <153266600019.24802.9316144897968330271.idtracker@ietfa.amsl.com>
Date: Mon, 30 Jul 2018 16:58:59 +0200
Cc: The IESG <iesg@ietf.org>, tjw.ietf@gmail.com, dnsop@ietf.org, dnsop-chairs@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <539A819E-EABB-4E5B-AEC0-12C4ECFB98E8@kuehlewind.net>
References: <153266600019.24802.9316144897968330271.idtracker@ietfa.amsl.com>
To: Spencer Dawkins <spencerdawkins.ietf@gmail.com>, draft-ietf-dnsop-session-signal@ietf.org
X-Mailer: Apple Mail (2.3445.9.1)
X-PPP-Message-ID: <20180730145901.20221.40645@lvps83-169-45-111.dedicated.hosteurope.de>
X-PPP-Vhost: kuehlewind.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/fw_1yR-A2S2vDsf4DIN7kxdFbRA>
Subject: Re: [DNSOP] Spencer Dawkins' Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Jul 2018 15:00:11 -0000

Hi Spencer, hi authors,

please see a few comments below. More might be coming as I’m currently reviewing the doc.

> Am 27.07.2018 um 06:33 schrieb Spencer Dawkins <spencerdawkins.ietf@gmail.com>:
> 
> Spencer Dawkins has entered the following ballot position for
> draft-ietf-dnsop-session-signal-12: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-dnsop-session-signal/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> I really like this document, and think it's headed the right direction. Of
> course I have four pages of comments, because reasons, but the only part I'm
> really confused about is this one ...
> 
> I would have thought that if you end up with a different endpoint because your
> anycast address now resolves differently, the new endpoint would have to have
> shared a lot of state with the previous endpoint, for this to work:
> 
>  When an anycast service is configured on a particular IP address and
>   port, it must be the case that although there is more than one
>   physical server responding on that IP address, each such server can
>   be treated as equivalent.  If a change in network topology causes
>   packets in a particular TCP connection to be sent to an anycast
>   server instance that does not know about the connection, the normal
>   keepalive and TCP connection timeout process will allow for recovery.
> 
> What I would have expected to happen, is that the new endpoint sees a packet
> arrive that's not on a synchronized TCP connection, and immediately responds
> with a RST (reset), rather than the normal keepalive and TCP connection timeout
> process happening. That's also the way I'm reading
> https://tools.ietf.org/html/rfc7828#section-3.6. Is that not the way it's
> working for anycast these days?

I think is what is meant here, is that is a RST is received, the client can reconnect. If the server is not sending a RST, the connection will time out and will be re-established. Especially if only keep-alives are currently send (no application data), a failure can be detected relatively quickly. I do agree that both cases RST and no RST should be mentioned here and the text could be further clarified.

> 
> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> 
> Everything else is a comment, so non-blocking, and please do the right thing.
> 
> This is a nit, and your answer could be "no", and that's fine, but in some
> places this document uses "DSO keepalive", and in other places, "keepalive"
> with no qualifier. It's likely that less confusion would result if you could
> consistently call this "DSO keepalive", so that it is clearly NOT a TCP
> keepalive. Do the right thing, of course.
> 
> Is the expectation that DSO would also be used in DNS over HTTP? I'm reading
> 
>  At the time of publication, DSO is specified only for DNS over TCP
>   [RFC1035] [RFC7766], and for DNS over TLS over TCP [RFC7858].  Any
>   use of DSO over some other connection technology needs to be
>   specified in an appropriate future document.
> 
> and noticing that https://tools.ietf.org/html/draft-ietf-doh-dns-over-https-12
> is currently in IETF Last Call.
> 
> This next one is well within the "Spencer wouldn't have done it this way, but
> Spencer's not the working group, or the IETF" range, but
> 
>  However, in the typical case a server will not know in advance
>   whether a client supports DSO, so in general, unless it is known in
>   advance by other means that a client does support DSO, a server MUST
>   NOT initiate DSO request messages or DSO unacknowledged messages
>   until a DSO Session has been mutually established by at least one
>   successful DSO request/response exchange initiated by the client, as
>   described below.  Similarly, unless it is known in advance by other
>   means that a server does support DSO, a client MUST NOT initiate DSO
>   unacknowledged messages until after a DSO Session has been mutually
>   established.
> 
> seems fragile, especially in environments where clients can come and go, and
> servers may be addressed using anycast (so I knew in advance that the four
> servers at that anycast address supported DSO, but somebody installed a fifth
> server that does not). Is that unlikely to be a problem?
> 
> I'm sure
> 
>  A single server may support multiple services, including DNS Updates
>   [RFC2136], DNS Push Notifications [I-D.ietf-dnssd-push], and other
>   services, for one or more DNS zones.  When a client discovers that
>   the target server for several different operations is the same target
>   hostname and port, the client SHOULD use a single shared DSO Session
>   for all those operations.  A client SHOULD NOT open multiple
>   connections to the same target host and port just because the names
>   being operated on are different or happen to fall within different
>   zones.  This requirement is to reduce unnecessary connection load on
>   the DNS server.
> 
> is correct from the server side, but perhaps it's also worth noting that using
> multiple TCP connections unnecessarily increases the chances that data
> transfers happen during TCP slow start. If only one or two packets are being
> exchanged, that doesn't matter, but as more packets are exchanged, the
> difference increases, because congestion windows will grow more rapidly if
> fewer connections are used.
> 
> I appreciate the inclusion of 5.4.  DSO Response Generation
> 
> But I've gotta ask. In the last paragraph of that section, I see
> 
>   o  Use a networking API that lets the receiver signal to the TCP
>      implementation that the receiver has received and processed a
>      client request for which it will not be generating any immediate
>      response.  This allows the TCP implementation to operate
>      efficiently in both cases; for requests that generate a response,
>      the TCP ACK, window update, and DSO response are transmitted
>      together in a single TCP segment, and for requests that do not
>      generate a response, the application-layer software informs the
>      TCP implementation that it should go ahead and send the TCP ACK
>      and window update immediately, without waiting for the Delayed ACK
>      timer.  Unfortunately it is not known at this time which (if any)
>      of the widely-available networking APIs currently include this
>      capability.
> 
> I would love to know if there are any widely-available network APIs that
> include this capability, before including this text in a standards-track RFC.
> Do you need help chasing this down?

There is the TCP_QUICKACK socket option that can be set per connection (not per message) any time during connection lifetime though. However, the text above is really confusing as it seems to assume that TCP is aware of any message semantics. TCP is a stream based protocol and does not know which part of the data below to the same message. Therefore providing a message based interface for TCP (without some additional intermediate machinery (see taps working group)) is impossible. Usually the delayed ACK timer is 100ms or less. So if an application response is generated within this time frame, it would be send together with the ACK. However, TCP will usually also ACK at least very second received segment. So if more than one segment is received some ACKs will be sent. If no data is ready to send and no additional segment is received within the specified timeout, a delayed ACK will be send. 

However, I guess you don’t know in advance if an incoming message generates a response or not. Therefore you cannot change the TCP_QUICKACK socket option during the connection appropriately. I guess you can decide with case is more common (response or no response) and optimize for that case. The trade-off is between a potentially delay for the ACK or generating slightly more data. Is it problematic for DSO if the ACK is slightly delayed? Otherwise I would recommend to simply not try to further optimize here (and not set the TCP_QUICKACK socket option).

Mirja


> 
> The text in 6.1.  DSO Session Initiation seems rough to me, for a couple of
> reasons.
> 
>   The client may perform as many DNS operations as it wishes using the
>   newly created DSO Session.  Operations SHOULD be pipelined (i.e., the
> 
> I don't understand why this would be a SHOULD. At least from the client's
> perspective, it's not needed for interoperation.
> 
>   client doesn't need wait for a response before sending the next
>   message).  The server MUST act on messages in the order they are
>   transmitted, but responses to those messages SHOULD be sent out of
>   order when appropriate.
> 
> Is it correct to say that "responses to those messages SHOULD be sent when they
> become available, even if the responses are sent out of order"? If not, I'm
> probably missing what "when appropriate" means.
> 
> I'm a bit mystified by this text in 6.2.  DSO Session Timeouts
> 
>  In the usual case where the inactivity timeout is shorter than the
>   keepalive interval, it is only when a client has a very long-lived,
>   low-traffic, operation that the keepalive interval comes into play,
>   to ensure that a sufficient residual amount of traffic is generated
>   to maintain NAT and firewall state and to assure client and server
>   that they still have connectivity to each other.
> 
> I think the basics are correct - the inactivity timer and (DSO) keepalive
> interval are independent - but I'm struggling to think of a reason to send
> (DSO) keepalives that's NOT tied to maintaining NAT/firewall state, and there's
> a lot of text before the paragraph that mentions NAT/firewall, that talks about
> why either interval might be longer or shorter than the other, without
> considering NAT/firewall. Am I missing something here?
> 
> .... and, now that I keep reading, 6.5.2.  Values for the Keepalive Interval
> does a much better job of explaining how a (DSO) keepalive interval should be
> selected - I think you could reasonably delete most of the text about (DSO)
> keepalive intervals in section 6.2, and at most provide a forward pointer to
> 6.5.2.
> 
> (As an aside, I think you probably want to cite
> https://tools.ietf.org/html/bcp142 as the operative recommendation for NAT
> behaviour toward TCP, since https://tools.ietf.org/html/rfc5382 has been
> updated)
> 
> I found this text
> 
>  For long-lived DNS Stateful operations (such as a Push Notification
>   subscription [I-D.ietf-dnssd-push] or a Discovery Relay interface
>   subscription [I-D.ietf-dnssd-mdns-relay]), an operation is considered
>   in progress for as long as the operation is active, until it is
>   cancelled.  This means that a DSO Session can exist, with active
>   operations, with no messages flowing in either direction, for far
>   longer than the inactivity timeout, and this is not an error.  This
>   is why there are two separate timers: the inactivity timeout, and the
>   keepalive interval.  Just because a DSO Session has no traffic for an
>   extended period of time does not automatically make that DSO Session
>   "inactive", if it has an active operation that is awaiting events.
> 
> to be extremely helpful, but it's 28 pages into the document. Is there a place
> earlier in the document that describes these timers, where you could place this
> text? Maybe section 3/Terminology isn't the right place, but maybe there is a
> right place toward the front of the document.
> 
> I'm not understanding why the SHOULDs are not MUSTs in this text:
> 
>  If, at any time during the life of the DSO Session, twice the
>   inactivity timeout value (i.e., 30 seconds by default), or five
>   seconds, if twice the inactivity timeout value is less than five
>   seconds, elapses without there being any operation active on the DSO
>   Session, the server SHOULD consider the client delinquent, and SHOULD
>   forcibly abort the DSO Session.
> 
> Perhaps part of my confusion is that I'm not sure what it means to "consider
> the client delinquent", but NOT to "forcibly abort the DSO session". But there
> are several "will forcibly abort"s in section 6.4.2, that sound more like MUST
> than SHOULD.
> 
> I don't think the MUST NOT in
> 
>  Normally a server MUST NOT close a DSO Session with a client.  A
>   server only causes a DSO Session to be ended in the exceptional
>   circumstances outlined below.
> 
> is quite right. Given that you have a bulleted list of reasons why a server
> would violate the MUST not immediately following this sentence, you might want
> to say "Normally a server does not close" here.
> 
>