Re: [DNSOP] Mirja Kühlewind's Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)

"Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net> Mon, 15 October 2018 14:02 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6878C12F1A5; Mon, 15 Oct 2018 07:02:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 21OAfrJBj236; Mon, 15 Oct 2018 07:02:32 -0700 (PDT)
Received: from wp513.webpack.hosteurope.de (wp513.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8223::]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C750B1293FB; Mon, 15 Oct 2018 07:02:31 -0700 (PDT)
Received: from 200116b82cf25f00fda6c9c588538a68.dip.versatel-1u1.de ([2001:16b8:2cf2:5f00:fda6:c9c5:8853:8a68]); authenticated by wp513.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) id 1gC3Rz-0004eO-MG; Mon, 15 Oct 2018 16:02:27 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: "Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net>
In-Reply-To: <42DDC0E9-1088-450D-8AC4-2B137858697E@fugue.com>
Date: Mon, 15 Oct 2018 16:02:26 +0200
Cc: The IESG <iesg@ietf.org>, draft-ietf-dnsop-session-signal@ietf.org, Tim Wicinski <tjw.ietf@gmail.com>, dnsop-chairs <dnsop-chairs@ietf.org>, dnsop WG <dnsop@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <68A00EEF-9826-47B9-9F9E-6871D7E2113F@kuehlewind.net>
References: <153298197116.8154.9156104510824888266.idtracker@ietfa.amsl.com> <CAPt1N1kxsd5Y_=r_NwJ3E8fexycOw_wth8BdxT0U3VtqxDPYZg@mail.gmail.com> <42DDC0E9-1088-450D-8AC4-2B137858697E@fugue.com>
To: Ted Lemon <mellon@fugue.com>
X-Mailer: Apple Mail (2.3445.9.1)
X-bounce-key: webpack.hosteurope.de;ietf@kuehlewind.net;1539612151;7d920880;
X-HE-SMSGID: 1gC3Rz-0004eO-MG
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/vy8GXaY7ZV99b7vcNeD1GvV_c2U>
Subject: Re: [DNSOP] =?utf-8?q?Mirja_K=C3=BChlewind=27s_Discuss_on_draft-ietf?= =?utf-8?q?-dnsop-session-signal-12=3A_=28with_DISCUSS_and_COMMENT=29?=
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2018 14:02:39 -0000

Hi Ted,

sorry for the delay, however, as you performed a couple of changes it took me a while to re-review. I believe I’m unfortunately not fully ready to release my discuss at this point, but close..

Regarding my first discuss point (delayed ACKs aso.) I think the text improved and  I would like to seem my minor wording question (comment 2) below addressed before I finally release the discuss here. However, I still think the extensive discussion as provided in section 9.5 now, does not necessarily belong in this document. Therefore I would rather would have preferred to move this text in a real appendix, or removed it completely and maybe document in an own informational RFC (in tcpm).

Regarding my second discuss point (keep-alives), the text seems still not quite right yet, or I’m really confused. Please also see also further below (comment 3).

Anyway here are my comments on the edited/new text in the order they appear in the draft:

1) I think the following text in section 3 is not fully correct:

"Fast Open message: A TCP SYN packet that begins a DSO connection and
   contains early data ([RFC8446] section 2.3).  Fast Open is only
   permitted when using TLS encapsulation: a TCP SYN message that does
   not use TLS encapsulation but contains early data is not permitted.“
If TLS 0-RTT is used this data will not be carried in the TCP SYN, it will „just“ be send at the same time as the TLS handshake is performed (but after the TCP handshake). Only if TCP Fast Open (TFO) (see RFC7413) is used, data can also be sent in the TCP SYN. I guess you mainly need to fix the reference here, or maybe name both mechanisms separately.


2) In section 5.5.1:
   "With a DSO request message, the TCP implementation waits for the
   application-layer client software to generate the corresponding DSO
   response message, which enables the TCP implementation to send a
   single combined IP packet containing the TCP acknowledgement, the TCP
   window update, and the application-generated DSO response message.
   This is more efficient than sending three separate IP packets.“

The phrasing here is a bit confusing, to me at least. It sounds a bit like there is a special TCP for DSO… maybe the following is a bit better:
   "With a DSO request message, TCP delayed acknowledge timer will usually
   make the implementation wait for the
   application-layer client software to generate the corresponding DSO
   response message before it sends out an TCP acknowledgment
   This will generate a 
   single combined IP packet containing the TCP acknowledgement, the TCP
   window update, and the application-generated DSO response message and
   is more efficient than sending three separate IP packets.“

(Note that the deplayed ack timer can be configured to a very small value as well, and as such it depends on the processing time and the value of the timer if a TCP implementation will wait or not.)

3) Section 6.5.2
"For example, a (hypothetical and unrealistic)
   keepalive interval value of 100 ms would result in a continuous
   stream of ten messages per second or more, in both directions, to
   keep the DSO Session alive.  And, in this extreme example, a single
   packet loss and retransmission over a long path could introduce a
   momentary pause in the stream of messages of over 200 ms, long enough
   to cause the server to overzealously abort the connection.“
I think this example is still not correct (and the changes might made have it worse: how can there be more then 10 messages?)

So the point here is that there is a dependency on the RTT. Only if the RTT is smaller than 200ms this can happen, otherwise the connection is closed anyway after two keep-alives. However, if the RTT is much smaller than 100ms and e.g. TLP is used, it would still work even if one packet is lost.

In any case, I don’t think this example is actually very helpful. The point is that the keep-alives interval should always be much larger than the RTT to make this work appropriately. However, the point about keeping the network load is, is rather independent to the question of when the mechanism actually breaks. I would recommend to simply remove this example and just say that the interval MUST not be smaller than 10 sec to keep the network load reasonably low.

However, having read this and the previous section again, I think your implementation of the keep-alives mechanism could also be improved. Usually, there should be two intervals. One defines, how long the connection can be idle before an keeps-live is sent and one that defines when a keeper-lives should be retransmitted if it is deemed to be lost, where the first one just usually be larger than the second one (and both timers should always be larger than the RTT). That would enable faster failure if the connection is actually lost. 

4) Section 6.6.2.2. (Reconnecting After an Unexplained Connection Drop)
  "It is also possible for a server to forcibly terminate the
   connection; in this case the client doesn't know whether the
   termination was the result of a protocol error or a network outage.
   The client could determine which of the two is occurring by noticing
   if a connection is repeatedly dropped by the server; if so, the
   client can mark the server as not supporting DSO.“
How often should the client try and in which interval?

5) Section 9.2:
   "In principle, anycast servers could maintain sufficient state that	
    they can both handle packets in the same TCP connection.“
Really? I mean in theory yes but has this ever been done in practice? I would think that sharing TCP state is even harder than sharing DSO state.


Thanks!
Mirja





> Am 27.09.2018 um 06:57 schrieb Ted Lemon <mellon@fugue.com>om>:
> 
> Mirja, I notice that you are still holding a discuss on this document.   I believe that we addressed the concerns you raised in your discuss.   Could you please let us know if there is still work to do on this, and if not, clear the discuss?
> 
> Thanks!
> 
>