[tcpm] Review of draft-ietf-tcpm-generalized-ecn-04

"Scharf, Michael" <Michael.Scharf@hs-esslingen.de> Tue, 10 September 2019 11:00 UTC

Return-Path: <Michael.Scharf@hs-esslingen.de>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7B7E31200F8; Tue, 10 Sep 2019 04:00:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=hs-esslingen.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oC9UtYzdyv50; Tue, 10 Sep 2019 04:00:00 -0700 (PDT)
Received: from mail.hs-esslingen.de (mail.hs-esslingen.de [134.108.32.78]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB14D120043; Tue, 10 Sep 2019 03:59:59 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.hs-esslingen.de (Postfix) with ESMTP id 6C31125A1B; Tue, 10 Sep 2019 12:53:16 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hs-esslingen.de; s=mail; t=1568112796; bh=5+VW8J/EUAMxUUiPBOQhgNLKaMbvp3+at62OhTlSG+M=; h=From:To:CC:Subject:Date:From; b=aZ+W0eHkDWQmfvthy1Ls8g8y63vtBrJkrjN4iRHfJ/1FHVdO8cr4ixz51LsiQCzNX fazhwfUYosiBvlIq/Y+aVANubQiJ8buhsX3Q7IiiYXxZ5AutiT7pCakGwkURKsptj5 KWUfazOLtjVsWBkvKyUsCmI7t3RrFHGTBM/lZJbw=
X-Virus-Scanned: by amavisd-new-2.7.1 (20120429) (Debian) at hs-esslingen.de
Received: from mail.hs-esslingen.de ([127.0.0.1]) by localhost (hs-esslingen.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pgDZIiOgg5A7; Tue, 10 Sep 2019 12:53:15 +0200 (CEST)
Received: from rznt8101.rznt.rzdir.fht-esslingen.de (rznt8101.rznt.rzdir.fht-esslingen.de [134.108.29.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.hs-esslingen.de (Postfix) with ESMTPS; Tue, 10 Sep 2019 12:53:14 +0200 (CEST)
Received: from RZNT8114.rznt.rzdir.fht-esslingen.de ([169.254.3.19]) by rznt8101.rznt.rzdir.fht-esslingen.de ([fe80::bd73:d6a9:24d7:95f1%10]) with mapi id 14.03.0468.000; Tue, 10 Sep 2019 12:53:14 +0200
From: "Scharf, Michael" <Michael.Scharf@hs-esslingen.de>
To: "tcpm@ietf.org" <tcpm@ietf.org>
CC: "draft-ietf-tcpm-generalized-ecn@ietf.org" <draft-ietf-tcpm-generalized-ecn@ietf.org>
Thread-Topic: Review of draft-ietf-tcpm-generalized-ecn-04
Thread-Index: AdVnHvNUzPQ6SlYtQJy+BLbT1UpXAw==
Date: Tue, 10 Sep 2019 10:53:14 +0000
Message-ID: <6EC6417807D9754DA64F3087E2E2E03E2D4379D8@rznt8114.rznt.rzdir.fht-esslingen.de>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [134.108.29.249]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/me27pFsN-R-A8LOjIcZeNz8iNz0>
Subject: [tcpm] Review of draft-ietf-tcpm-generalized-ecn-04
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Sep 2019 11:00:04 -0000

Marcelo, Bob,

Please find below review of draft-ietf-tcpm-generalized-ecn-04, without any hat. The context is my review of draft-ietf-tcpm-accurate-ecn that is posted separately.

Macroscopic feedback: While the document provides a lot of excellent reasoning for why it is appropriate to set ECT on most control segments, the two cases in which this document mandates use of draft-ietf-tcpm-accurate-ecn, as well as several algorithmic design choices, are IMHO not well enough motivated. My last TCP implementation experience is many years ago, and maybe my knowledge is simply too outdated to be relevant today. But when I read the document, I repeatedly run into the question "why should I indeed implement that this way if I had to do so?". I will provide specific details below. I guess my main high-level message is that the text should introduce early in the document that all mechanisms have been defined conservatively and with safety in mind, possibly at the cost of performance and efficiency. This is reasonable for an IETF document. But in a real-world Internet, for instance with initial congestion windows bigger than 10 being accepted by (some) implementers, I believe very careful and conservative design choices must be well explained. Otherwise there is a significant risk of just being ignored. Personally, I believe that also less conservative design choices would also work in an experiment, but I cannot proof that by data, i.e., it is speculation only.

Major issues (not ordered by priority): 

* Section 1.  Introduction

  "Nonetheless, if the client does not
   implement AccECN, it cannot use ECN++ on the one packet that offers
   most benefit from it - the initial SYN." 

  I don't fully agree to this (see below). But, putting this aside, I think it should be explained at least with some words early in the document why the SYN is challenging, in particular since SYN segments would benefit most. A reference to the later details would work, but actually I would try to provide the high-level rationale early to those who don't read the document to the every end...

* Section 1.1

   "Finally, there are ongoing experimental efforts to promote the
   adoption of a slightly modified variant of DCTCP (and similar
   congestion controls) over the Internet to achieve low latency, low
   loss and scalable throughput (L4S) for all communications
   [I-D.ietf-tsvwg-l4s-arch].  In such an approach, L4S packets identify
   themselves using an ECN codepoint [I-D.ietf-tsvwg-ecn-l4s-id].  With
   L4S, preventing TCP control packets from obtaining the benefits of
   ECN would not only expose them to the prevailing level of congestion
   loss, but it would also classify them into a different queue.  Then
   only L4S data packets would enjoy ultra-low latency forwarding, while
   the packets controlling and retransmitting these data packets would
   still get stuck behind the queue induced by legacy ('Classic') TCP
   traffic."

  I fail to understand why this document must present L4S in such detail. As L4S has been mentioned and references already earlier in the introduction, this text could just be removed, IMHO. If the authors prefer to keep this section, I ask to reconsider the wording:
    1/ The wording "only L4S data packets would enjoy ultra-low latency forwarding" should be phrased using neutral statements , such as "only L4S packets would be classified into the L4S queue that may have lower latency"
    2/ The term "legacy ('Classic') TCP" should be replaced by a term without negative connotation, such as "standard TCP", "non-L4S-enabled TCP", or the like. As I have mentioned on the TSVWG list, I disagree with using terms such as "Legacy TCP", "Classic TCP", etc. when referring to a modern TCP implementation compliant to the standards published by TCPM. Note that I am fine with the terms "Classic ECN" or "Classic ECN feedback" if these terms have consensus in TSVWG.

* Section 3.2

   "It can be seen that the sender cannot set ECT on the SYN if it is not
   requesting AccECN feedback."

  As mentioned before, I think this is one of the cases in which reasoning is needed as it is not trivial to understand the root cause. A reference to a later section could work.

  Having said this, I'd like to raise a question related to the type of IoT scenarios discussed in draft-ietf-lwig-tcp-constrained-node-networks: Why would a TCP sender that sends at most one MSS of data per RTT (e.g. CWND=1 MSS fixed at least during the first two RTTs) have to implement AccECN to be allowed to set ECT on SYN? Note that the document explains in Section 3.2.1.3 why a back-off of the RTO is not needed. Also, note that RFC 8085 basically explains (for UDP) that one packet per RTT may be OK in today's Internet. Of course, such a sender would not make much use of any ECN congestion feedback in any circumstances. But if there is an environment in which packets market by ECT get better service than non-ECT packet, why should this narrow-band server with small code footprint not leverage the benefits of setting ECT in SYNs without the complexity of AccECN?

* Section 3.2

  "Therefore it is RECOMMENDED that the
   experimental AccECN specification [I-D.ietf-tcpm-accurate-ecn] is
   implemented (as well as the present specification)"

  I have repeatedly mentioned the downsides of creating dependencies between documents. In this case, the risk is that the AccECN experiment fails (e.g., the IETF has to obsolete it in some years because another alternative has been found). The same applies if a standards-track RFC of AccECN would not be backward-compatible for whatever reason. His would be a pity since significant parts of the document are entirely independent of AccECN. This risk could be reduced by not mandating a particular specification, but mandating functional requirements.

  For instance, I believe that ECT on SYNs can always be set if there is a mechanism that reliably echos CE marks on SYNs back to the sender. AccECN is a solution to this, and the only solution we currently have, but we are dealing with experiments, not standards, so who knows what will happen in some years from now? To me, a better wording would be along the lines of (neglecting the previously mentioned exception): "ECT on SYNs can only be set if there is a corresponding feedback scheme. The experimental AccECN specification [I-D.ietf-tcpm-accurate-ecn] provides this feedback. It is RECOMMENDED to implement an accurate feedback scheme as well as the present specification." I merely show this as an example that would not be impossible to better decouple documents. As long as I am the only person who brings up this point, I don't ask for this specific change to be implemented. I just like to record my concern including potential solutions (e.g., for a document shepherd).

* Section 3.2.1.1.2

   "A TCP initiator MUST NOT set ECT on a SYN if it does not also attempt to negotiate Accurate ECN feedback in the same SYN."

  See above. If CWND is kept at one MSS during the beginning of the connection, I don't understand the rationale for the MUST.
 
* Section 3.2.1.2.  Caching where to use ECT on SYNs

  I think it could be useful to reference RFC 2140 / 2140bis (non-normatively) and possibly adapt to the terminology therein. The same may apply to later sections discussing caching.

* Section 3.2.1.3.  SYN Congestion Response

  "If the SYN-ACK returned to the TCP initiator confirms that the server
   supports AccECN, it will also indicate whether or not the SYN was CE-
   marked.  If the SYN was CE-marked, the initiator MUST reduce its
   Initial Window (IW) and SHOULD reduce it to 1 SMSS (sender maximum
   segment size)."

  Regarding the MUST: How can a sender meet the MUST requirement if IW was 1 MSS?

  Regarding the SHOULD: This is one example for which the draft picks a very conservative approach. Being conservative is not wrong but should be well explained (see below).

* Section 3.2.1.3.  SYN Congestion Response

   "If the initiator has set ECT on the SYN and if the SYN-ACK shows that
   the server does not support AccECN, the TCP initiator MUST
   conservatively reduce its Initial Window and SHOULD reduce it to 1
   SMSS.  A reduction to greater than 1 SMSS MAY be appropriate (see
   Section 4.2.1).  Conservatism is necessary because a non-AccECN SYN-
   ACK cannot show whether the SYN was CE-marked."

 Regarding the MUST, I have the same question like in the previous comment, i.e., the guidance when IW is already 1 MSS .

 The SHOULD is one of the cases that really need backing. Unfortunately, as explained below, I have doubts that the current wording in Section 4.2.1 sufficiently explains this SHOULD. As a result, I either believe that the explanation in Section 4.2.1 needs to be improved, or, if that is not possible, this guidance could have to be reviewed (e.g., "MAY reduce it to 1 SMSS"). BTW, if I had to implement the protocol, this is one of the cases where I would consider violating the spec from day one, given the potential harm on TCP performance. Luckily, I am not an implementer... Anyway, this brings me to another point: Are there no "MEASURMENTS NEEDED" in Section 3.2.1.3, e.g., regarding potential impact on application-layer performance, performance/robustness comparison of different strategies, and the like?

* Section 3.3.2 SYN (Receive)

  "Some implementations of TCP servers (e.g. current Linux) assume that,
   if a host receives a SYN with a non-zero IP/ECN field, it must be due
   to network mangling, and they disable ECN for the rest of the
   connection.  Section 4.2.2.2 finds that this type of network mangling
   seems to be virtually non-existent so it would be preferable to
   report any such mangling so it can be fixed."

  The term "current Linux" is not specific. To a reader of the RFC in some years from now, it will not be clear to what this refers to. So, at minimum this draft would have to be specific about kernel versions etc. And operating systems change. But, more importantly, I do not believe that discussing implementation choices belongs into the normative part of the specification at all. Thus, I think that at minimum "(e.g., current Linux)" should be removed in the normative part of the document.

* Section 4.2.1 Argument 1a: Unregognized CE on the SYN

  "Happily, the appropriate conservative congestion response is to
   reduce the initial window, and it is extremely rare for a TCP client
   to send more than one packet as its initial request anyway.  Any
   clients that do frequently use a larger initial window for their
   first message to the server can cache which servers will not
   understand ECT on a SYN (see Section 4.2.3 below)."

  I disagree with the entire first sentence. First, I fail to understand why "it is extremely rare for a TCP client to send more than one packet as its initial request". For instance, I believe that FTP is a counter-example. There is a huge amount of application protocols running on top of TCP. And which protocols are "rare" depend _a_ lot on what traffic mix one looks at. What data backs this claim for _all_ protocols using TCP in all IP networks?

  I agree that this statement may be true for Web browsing. HTTP is an extremely relevant protocol for TCP (well, still), but by far not the only TCP protocol, there are probably even HTTP use cases with huge POST/PUT requests. So, this statement must either be backed for all TCP-based application or be removed.

  Second, I disagree with the term "appropriate" unless sufficient data is provided regarding the first point. And unless this is fixed, this document IMHO lacks an explanation of why the particular design choice is indeed "appropriate". I don't know how to easily fix this; potential solutions could e.g. include relaxing the requirements to less conservative guidance, openly state that the algorithm is just a conservative design choice, ... or provide data from running code backing the claim.

  I also believe that the follow-up sentence "Any clients that do frequently use a larger initial window for their first message to the server can cache which servers will not understand ECT on a SYN" needs backing that explains why this indeed works for "any" client. For instance, what may work for me would be a worst-case analysis of the number of peers (tens of thousands, or even more?) that a given client may have, experience from running code, or some reference to a scientific study. But I do believe that some form of backing is needed that this works for "any" client. If that is not possible, both the tone of the text and possible the normative guidance would have to be adapted to make clear that there is uncertainty whether this indeed works well in "any" case.

4.2.3.  Caching Strategies for ECT on SYNs

   "For instance, if a user's Internet access bottleneck supported L4S ECN
   but not Classic ECN, strategy (S2A) would make most sense and there
   would be no point trying to avoid the 'over-strict' test and
   negotiate Classic ECN. [and several following paragraphs]"

  I believe this discussion could take place in an L4S document, e.g., submitted to TSVWG, since it contains L4S-specific assumptions. A reader may not understand this without reading the L4S specifications. To me, this is a non-starter for a document in TCPM. 

4.3.2.  Response to Congestion on a SYN-ACK

   [After a pretty detailed discussion...]
   "Therefore, TCP's response to a CE-marked SYN-ACK can be similar to
   its response to the loss of _any_ packet, rather than backing off as
   if the special _initial_ packet of a flow has been lost."

  This section contains some good and reasonable discussion. It concludes with a very conservative design (Slow-Start from IW1). I agree that the design choice sounds safe. I was probably save in the Internet 20 years ago, and is therefore still safe today. But, well, the response to "loss of _any_ packet" in TCP congestion control is more or less having the rate (and many stack may do less without running into issues), and TCPM has started the ABE experiment that even allows a smaller reduction for CE. Are we really convinced that we must experiment with one of the most conservative possible protocol designs? I really wonder if implementers would implement ECN++ with those parameters in a production deployment and see a performance benefit in doing so.

* 4.4.2.  Summary: Enabling ECN on Pure ACKs

   "During review of this specification, it was decided that allowing
   hosts to set ECT on Pure ACKs without a feasible response mechanism
   would set an undesirable precedent.  It would certainly improve the
   flow's own performance, but it would slightly increase potential harm
   to others.  Therefore, Section 3.2.3 allows ECT on Pure ACKs if
   AccECN feedback has been negotiated, but not with classic RFC 3168
   ECN feedback."

  To me, "slightly increase potential harm" is a weak justification. Do we have any data on how relevant this problem may be, e.g., from running code? Personally, I am not convinced by what is written here (but, like elsewhere, I am perfectly fine with being in the rough part of the consensus). Unfortunately, I can't offer data to proof my opinion. At minimum, what I would suggest is to rephrase this paragraph to something like:

   "This specification uses a safe approach. Allowing hosts to set ECT on Pure ACKs without a feasible response mechanism could result in risk.  It would certainly improve the flow's own performance, but it would slightly increase potential harm to others. Therefore, Section 3.2.3 allows ECT on Pure ACKs if AccECN feedback has been negotiated, but not with classic RFC 3168 ECN feedback."

* Section 5.1.  IW10

  "As specified in Section 3.2.1.1, a TCP initiator can only set ECT on
   the SYN if it requests AccECN support.  If, however, the SYN-ACK
   tells the initiator that the responder does not support AccECN,
   Section 3.2.1.1 advises the initiator to conservatively reduce its
   initial window to 1 SMSS because, if the SYN was CE-marked, the SYN-
   ACK has no way to feed that back.

   If the initiator implements IW10, it seems rather over-conservative
   to reduce IW from 10 to 1 just in case a congestion marking was
   missed.  Nonetheless, the reduction to 1 SMSS will rarely harm
   performance, because:

   o  as long as the initiator is caching failures to negotiate AccECN,
      subsequent attempts to access the same server will not use ECT on
      the SYN anyway, so there will no longer be any need to
      conservatively reduce IW;

   o  currently, at least for web sessions, it is extremely rare for a
      TCP initiator (client) to have more than one data segment to send
      at the start of a TCP connection [28; Fig 3] - IW10 is primarily
      exploited by TCP servers."

  See my previous questions. I do not understand why there is no deployed TCP-based application with some reasonable share of traffic that sends more than one segment at the beginning of the TCP connection. This document deals with all TCP usage; it is not written only for the communication between Web browsers and Web servers. For instance, why would the TCP client in an FTP session not use IW10? I disagree that one can generalize from HTTP to all TCP-based application protocols. And there is also no evidence that caching will work for any TCP protocol. For that, the worst-case scenario would be end-systems that maintain TCP connections to _many_ peers (at least tens of thousands of peers, maybe even more). How well will caching work in such cases? If those statements are not backed better, the text already explains what a reader could conclude: "it seems rather over-conservative to reduce IW from 10 to 1 just in case a congestion marking was missed." Yes, indeed, unless proven otherwise.

* Section 5.1.  IW10

   If a responder receives feedback that the SYN-ACK was CE-marked,
   Section 3.2.2.2 mandates that it reduces its initial window to 1
   SMSS.  When the responder also implements IW10, it is particularly
   important to adhere to this requirement in order to avoid overflowing
   a queue that is clearly already congested.

  Here the document is internally consistent. In Section 3.2.2.2. it argues that "the ECN feedback proves that the network is delivering packets successfully and is not severely overloaded." And in this section it argues that the queue is clearly congested. Please ensure that the document is consistent.

  For what it is worth, I personally believe a "clearly congested" queue would drop packets instead of marking them. Of course, a reduction to IW 1 is safe according to all what we have learnt in the last 20 years in the Internet. But to me it is very conservative compared to, well, some of the congestion control algorithms running in today's Internet. I personally would speculate that we have some more head-room before running into Internet congestion collapse. Yet, of course, the usual trade-offs for congestion control apply. To me, his sounds like a good area for which further data is needed from experimentation. I'd like to record that I am personally not convinced by the reasoning in the document. But, I don't have data. And, I am always willing to learn from running code.

* References

  I am somewhat surprised to see RFC 793 as an informative reference only. Some other references could also be normative. 


Minor comments:

* Section 1.1

   "Non-ECN control packets particularly harm performance in environments
   where the ECN marking level is high."

  Is my understanding correct that this means "Not using ECN on control packets can be particularly detrimental to performance in environments where the ECN marking level is high"? I am not sure if the control packets indeed cause "harm".

* Section 2

  Better would be the wording in RFC 8174

* Section 2

  "Optionally, it could mean ECT(1), which is in the process of being redefined for use by L4S experiments [RFC8311] [I-D.ietf-tsvwg-ecn-l4s-id]."

  I don't understand why this matters here given that section 1 already explains that this document is compatible with L4S.

* Section 3.3.1.  Receiver Behaviour for Any TCP Control Packet or Retransmission
 
   "o  Any TCP implementation SHOULD accept receipt of any valid TCP
      control packet or retransmission irrespective of its IP/ECN field.
      If any existing implementation does not, it SHOULD be updated to
      do so."

   The second sentence is IMHO just redundant. And I am not convinced that the second sentence requires capital letter keywords.

* 3.3.3.  Pure ACK (Receive)

   "Currently AccECN feedback is required to count CE marking of any control packet including pure ACKs."

   Why "currently"?

* Section 4.9.  General Fall-back for any Control Packet
   
   "And particularly given that
   unnecessarily removing ECT from other control packets could lead to
   performance problems, e.g. by directing them into an inferior queue
   [I-D.ietf-tsvwg-ecn-l4s-id] or over a different path, because some
   broken multipath equipment (erroneously) routes based on all 8 bits
   of the Diffserv field."

  I think "inferior" has to be replaced by a neutral term, e.g., "another" queue. I don't think that the queue would always be "inferior", e.g., if DiffServ was used. And, yes, this quite directly runs into L4S-specific questions and that are entirely out-of-scope of this document... All that could be avoided by reducing the references to L4S.

* 5.3.  TCP Derivatives

   "Experience from experiments on adding ECN support to all TCP packets
   ought to be directly transferable between TCP and derivatives of TCP,
   like SCTP or QUIC."

  I would suggest to use at least another term instead of "TCP Derivates", such as "other Internet transport protocols". And I think this section could actually either be removed or moved to an appendix. 


Thanks

Michael