Re: [tcpm] Review of draft-ietf-tcpm-generalized-ecn-04

MARCELO GABRIEL BAGNULO BRAUN <marcelo@it.uc3m.es> Mon, 04 November 2019 10:23 UTC

Return-Path: <marcelo@it.uc3m.es>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C09D3120CEE for <tcpm@ietfa.amsl.com>; Mon, 4 Nov 2019 02:23:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=it.uc3m.es
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nljqJ0EzUyUu for <tcpm@ietfa.amsl.com>; Mon, 4 Nov 2019 02:23:20 -0800 (PST)
Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D077612096E for <tcpm@ietf.org>; Mon, 4 Nov 2019 02:23:19 -0800 (PST)
Received: by mail-lj1-x234.google.com with SMTP id n21so3306955ljg.12 for <tcpm@ietf.org>; Mon, 04 Nov 2019 02:23:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=it.uc3m.es; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=i7uUqWBLG3hMsW1LV1SRrHqUl/czWWPukTh7r/nP2q4=; b=EKP8k/wugDRkNrXjMbUeMgWMsTqrEo+9PwZpOX8GybS9OoYV3S1bSLo5P270X0Pmie mIAWmT1bmh8BUthRq3i0j7NJovL/ZrhdjpQMyknFOthPYCDR1AjjvLSxJX6tt9lv6eGk VLmt/rhSudo0RTmak2l/Cm+2ccRTrAmWooFcSs9bUR/s3ntqRwtWG3sQ0W+WS4Ops020 XQJS9p/XnYXVC7JiMmB65fdpp1vo29b24dlodCyxLGMREcoHlO8IKxjkRB9DUQk+L7uP IE+04VkOfYW7Rm9rHY/dzNSaLVzljO22MYSPVJsCVcA9dFyYqRYsaApgVqzeRPAyTlbz RrNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=i7uUqWBLG3hMsW1LV1SRrHqUl/czWWPukTh7r/nP2q4=; b=aSryOeEYMqZPDGlUICfXACE+prdhrQOiRwbUW5/lrBkG9mOSVR6Y3OfwxLRLaBsHGu PW2Vi0g39X0fh/NrWFN9tGOTqAkwglJq0+y57lZ97NcX2myFAXIU8S6XghJ1/fPm/AjJ bqu/ltp4LSbZXvfVOCXG8Eb71vlxNzPwwRehoRptLl2vcn4j1IfScHmw0YvPopbYuCYw xg+sL+pW2nTI+pXmVn6uimi3t5j83iRC0JivSx+uTMtX8GdIz4jWZh7rE9q+fYrJIEOx 8pl0aXc78ZO5s+l8dsddaBE7lG8r94ZuRHlRUGBhzewXwxZb36/EYUBgpEpdddpHv7nf MX1g==
X-Gm-Message-State: APjAAAX4T0GJTMYjs8G9ge7VaYqmyuZlNBfa6P4CekBgSCAkBnBor4xg n0xgENUxx9hXLL5bG7d+4r/pPixjB26hLaopzWdXY0KPbgE=
X-Google-Smtp-Source: APXvYqyywYzcNLzUO2mFpwwv+F0bYwbSWOxDlE8cQfcMvJ/oPW5ZrvMfNmRMVHKhTyLzrcbZoi5xvj3pBLgyTtdBszo=
X-Received: by 2002:a2e:8118:: with SMTP id d24mr17173397ljg.111.1572862996605; Mon, 04 Nov 2019 02:23:16 -0800 (PST)
MIME-Version: 1.0
References: <6EC6417807D9754DA64F3087E2E2E03E2D4379D8@rznt8114.rznt.rzdir.fht-esslingen.de> <CACn7K3wXmyiJNm=w=TJ_aFKQRYBe1zq6jopAY_QgBCVx6Wa=cA@mail.gmail.com> <CACn7K3zvJFhf0gwL-YNP3RN5+K+7MD-1Vtp6cTCF-R6URZ_GrQ@mail.gmail.com>
In-Reply-To: <CACn7K3zvJFhf0gwL-YNP3RN5+K+7MD-1Vtp6cTCF-R6URZ_GrQ@mail.gmail.com>
From: MARCELO GABRIEL BAGNULO BRAUN <marcelo@it.uc3m.es>
Date: Mon, 04 Nov 2019 11:23:04 +0100
Message-ID: <CACn7K3xuthVFDq3=zgmFPcwWeZM-Yhcr-zRo_=2AfHMqS3rAAw@mail.gmail.com>
To: tcpm IETF list <tcpm@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/VaLCuap1zxnrPwCPQPl1gjqzY8k>
Subject: Re: [tcpm] Review of draft-ietf-tcpm-generalized-ecn-04
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Nov 2019 10:23:25 -0000

Hi Michael,

Thank you very much for the detailed review. This is indeed very
helpful to converge. I also apologize for the delay in replying.
We have quite a bit of back and forth with Bob crafting this reply, so
please refer to the 05 version submitted for the final text. I have
tried to capture the reply to each comment below.
Replies below.

El mar., 10 sept. 2019 a las 13:00, Scharf, Michael
(<Michael.Scharf@hs-esslingen.de>) escribió:
>
> Marcelo, Bob,
>
> Please find below review of draft-ietf-tcpm-generalized-ecn-04, without any hat. The context is my review of draft-ietf-tcpm-accurate-ecn that is posted separately.
>
> Macroscopic feedback: While the document provides a lot of excellent reasoning for why it is appropriate to set ECT on most control segments, the two cases in which this document mandates use of draft-ietf-tcpm-accurate-ecn, as well as several algorithmic design choices, are IMHO not well enough motivated. My last TCP implementation experience is many years ago, and maybe my knowledge is simply too outdated to be relevant today. But when I read the document, I repeatedly run into the question "why should I indeed implement that this way if I had to do so?". I will provide specific details below. I guess my main high-level message is that the text should introduce early in the document that all mechanisms have been defined conservatively and with safety in mind, possibly at the cost of performance and efficiency.


I have added at the end of the second paragraph of the introduction,
the following sentence:
      The mechanisms proposed in
      this document have been defined conservatively and with safety in mind
      possibly in some cases incurring additional cost in terms of performance
      efficiency.


>
> This is reasonable for an IETF document. But in a real-world Internet, for instance with initial congestion windows bigger than 10 being accepted by (some) implementers, I believe very careful and conservative design choices must be well explained. Otherwise there is a significant risk of just being ignored. Personally, I believe that also less conservative design choices would also work in an experiment, but I cannot proof that by data, i.e., it is speculation only.
>

I will address each of the cases you mention below.

>
> Major issues (not ordered by priority):
>
> * Section 1.  Introduction
>
>   "Nonetheless, if the client does not
>    implement AccECN, it cannot use ECN++ on the one packet that offers
>    most benefit from it - the initial SYN."
>
>   I don't fully agree to this (see below). But, putting this aside, I think it should be explained at least with some words early in the document why the SYN is challenging, in particular since SYN segments would benefit most. A reference to the later details would work, but actually I would try to provide the high-level rationale early to those who don't read the document to the every end...



The explanation why ECN marking in the SYN is critical is described in
section 1.1, literally, the paragraph after the next one, so I pointed
out in the text that the reason why this is critical is explained in
the following subsection.

I have modified the text as follows:

   Using ECN on initial SYN packets provides significant benefits, as we
   describe in the next subsection.  However, only AccECN provides a way
   to feed back whether the SYN was CE marked, and RFC 3168 does not.
   Therefore, implementers of ECN++ are RECOMMENDED to also implement
   AccECN.  Conversely, if AccECN (or an equivalent safety mechanism) is
   not implemented with ECN++, this specification rules out ECN on the
   SYN.

(the reference to further safety mechanisms is related to the concern
you expressed below, please see below)

>
> * Section 1.1
>
>    "Finally, there are ongoing experimental efforts to promote the
>    adoption of a slightly modified variant of DCTCP (and similar
>    congestion controls) over the Internet to achieve low latency, low
>    loss and scalable throughput (L4S) for all communications
>    [I-D.ietf-tsvwg-l4s-arch].  In such an approach, L4S packets identify
>    themselves using an ECN codepoint [I-D.ietf-tsvwg-ecn-l4s-id].  With
>    L4S, preventing TCP control packets from obtaining the benefits of
>    ECN would not only expose them to the prevailing level of congestion
>    loss, but it would also classify them into a different queue.  Then
>    only L4S data packets would enjoy ultra-low latency forwarding, while
>    the packets controlling and retransmitting these data packets would
>    still get stuck behind the queue induced by legacy ('Classic') TCP
>    traffic."
>
>   I fail to understand why this document must present L4S in such detail. As L4S has been mentioned and references already earlier in the introduction, this text could just be removed, IMHO. If the authors prefer to keep this section, I ask to reconsider the wording:
>     1/ The wording "only L4S data packets would enjoy ultra-low latency forwarding" should be phrased using neutral statements , such as "only L4S packets would be classified into the L4S queue that may have lower latency"


I have adapted your text as follows: only L4S packets would be
classified into the L4S queue that is expected to have lower latency
I guess we would agree that this is the case.

>
>     2/ The term "legacy ('Classic') TCP" should be replaced by a term without negative connotation, such as "standard TCP", "non-L4S-enabled TCP", or the like. As I have mentioned on the TSVWG list, I disagree with using terms such as "Legacy TCP", "Classic TCP", etc. when referring to a modern TCP implementation compliant to the standards published by TCPM. Note that I am fine with the terms "Classic ECN" or "Classic ECN feedback" if these terms have consensus in TSVWG.
>

I have changed it to non-L4S-enabled TCP. Actually, this appeared only
once in whole document.


> * Section 3.2
>
>    "It can be seen that the sender cannot set ECT on the SYN if it is not
>    requesting AccECN feedback."
>
>   As mentioned before, I think this is one of the cases in which reasoning is needed as it is not trivial to understand the root cause. A reference to a later section could work.


So, the reason why we (as WG) decided against supporting ECN marking
of the SYN packets when AccECN is not supported is to prevent the
consumption of reserved bits in the TCP header for  this purpose.

I mean, as currently defined, we need to be able to identify 3 types
of servers, namely, non-ECN servers, classic ECN servers and AccECN
servers (which already defined the encoding of the CE mark in the SYN
packet).

If we were to also support classic ECN servers that support ECN++,
this would be a new type of server that would then require an
additional codepoint in the TCP header. We deemed this was wasteful
(considering that there are very few bits reserved available) and
especially considering that when we asked the WG (in several
occasions) no-one expressed any interest in the support of a classic
ECN server that supported ECN++.

In any case, we changed the text as follows:

s/   It can be seen that the sender cannot set ECT on the SYN if it is not
   requesting AccECN feedback.
 /  t can be seen that we recommend against the sender setting ECT on
        the SYN if it is not requesting AccECN feedback.
 /


>
>   Having said this, I'd like to raise a question related to the type of IoT scenarios discussed in draft-ietf-lwig-tcp-constrained-node-networks: Why would a TCP sender that sends at most one MSS of data per RTT (e.g. CWND=1 MSS fixed at least during the first two RTTs) have to implement AccECN to be allowed to set ECT on SYN? Note that the document explains in Section 3.2.1.3 why a back-off of the RTO is not needed. Also, note that RFC 8085 basically explains (for UDP) that one packet per RTT may be OK in today's Internet. Of course, such a sender would not make much use of any ECN congestion feedback in any circumstances. But if there is an environment in which packets market by ECT get better service than non-ECT packet, why should this narrow-band server with small code footprint not leverage the benefits of setting ECT in SYNs without the complexity of AccECN?


>
>
> * Section 3.2
>
>   "Therefore it is RECOMMENDED that the
>    experimental AccECN specification [I-D.ietf-tcpm-accurate-ecn] is
>    implemented (as well as the present specification)"
>
>   I have repeatedly mentioned the downsides of creating dependencies between documents. In this case, the risk is that the AccECN experiment fails (e.g., the IETF has to obsolete it in some years because another alternative has been found). The same applies if a standards-track RFC of AccECN would not be backward-compatible for whatever reason. His would be a pity since significant parts of the document are entirely independent of AccECN. This risk could be reduced by not mandating a particular specification, but mandating functional requirements.
>
>   For instance, I believe that ECT on SYNs can always be set if there is a mechanism that reliably echos CE marks on SYNs back to the sender. AccECN is a solution to this, and the only solution we currently have, but we are dealing with experiments, not standards, so who knows what will happen in some years from now? To me, a better wording would be along the lines of (neglecting the previously mentioned exception): "ECT on SYNs can only be set if there is a corresponding feedback scheme. The experimental AccECN specification [I-D.ietf-tcpm-accurate-ecn] provides this feedback. It is RECOMMENDED to implement an accurate feedback scheme as well as the present specification." I merely show this as an example that would not be impossible to better decouple documents. As long as I am the only person who brings up this point, I don't ask for this specific change to be implemented. I just like to record my concern including potential solutions (e.g., for a document shepherd).
>

I agree with this approach.

I have modified the document in several parts to provide guidance
about how to enable ECN++ in future ECN feedback mechanisms to be
defined. However, as of today, there are only two ECN feedback
mechanisms for TCP defined, standard ECN and AccECN. We have also been
asked to ensure this draft is concrete and clear, so we want to
describe in detail what ECN++ should do in both cases as the document
does. So, my proposal is to 1) keep all the guidance and
reccomendations regarding legacy ECN and AccECN feedback and 2)
provide guidance on the ECN++ support for future ECN feedback
mechanisms, based on the funcitonality required, as you suggest.

To achieve that I have modified the following subsections:

In the Introduction, I have included:

      <t>ECN++ uses a sender-only deployment model. It works whether the two
      ends of the TCP connection use classic ECN feedback <xref
      target="RFC3168"/> or experimental Accurate ECN feedback (AccECN <xref
      target="I-D.ietf-tcpm-accurate-ecn"/>), the only two ECN feedback
      mechanisms for TCP standardised at the time of this writing.
      Nonetheless, if the client does
      not implement AccECN, it cannot use ECN++ the initial SYN packet.
      Using ECN in the SYN packets provides significant benefits, as we describe
      in the next subsection. However, unlike RFC 3168 feedback, AccECN provides
      a way for a TCP server to feed back whether a SYN arrived marked CE. So,
      if the client does not implement AccECN, it could be unsafe to use ECN++
      on the initial SYN packet. Therefore, implementers of ECN++
      are RECOMMENDED to also implement AccECN. In this document, we also
      provide guidance regarding the ECN++ support for future
      ECN feedback mechanisms for TCP based on the the functionality required
      to enable ECN marking of the different packets.</t>

Then in section 3.2 I modified the following paragraph as follows:

        <t>Nonetheless, this specification also caters for the case where an
        ECN++ TCP sender is not using AccECN. This could be because it does
        not support AccECN or because the other end of the TCP connection does
        not (AccECN can only be used for a connection if both ends support
        it). </t>
Also, section 3.2.1.1. was modified as follows:

            <t>With classic <xref target="RFC3168"/> ECN feedback, the SYN was
            not expected to be ECN-capable, so the flag provided to feed back
            congestion was put to another use (it is used in combination with
            other flags to indicate that the responder supports ECN). In
            contrast, Accurate ECN (AccECN) feedback <xref
            target="I-D.ietf-tcpm-accurate-ecn"/> provides a codepoint in the
            SYN-ACK for the responder to feed back whether the SYN arrived
            marked CE. Therefore the setting of the IP/ECN field on the SYN is
            specified separately for each case in the following two
            subsections. If in the future, additional ECN feedback
mechanisms are
            defined for TCP, it would only be possible to enable the ECT marking
            of the SYN packets as long as the new ECN feedback
mechanism provides
            the means to feed back the CE marking of the SYN back to the TCP
            initiator. If this is the case, the new ECN feedback
mechanism should
            follow the same recommendation described below for AccECN regarding
            the processing of the SYN packets.</t>


>
> * Section 3.2.1.1.2
>
>    "A TCP initiator MUST NOT set ECT on a SYN if it does not also attempt to negotiate Accurate ECN feedback in the same SYN."
>
>   See above.


This should have been addressed by the replies to the previous comments.

>
> If CWND is kept at one MSS during the beginning of the connection, I don't understand the rationale for the MUST.


I understand that most TCP senders today use an IW larger than one. I
understand you mean that it would be possible to ECT mark the SYN as
long as the IW is kept to 1MSS. But this is a significant performance
penalty for most TCP senders.

In order to enable this possibility I have added the following text:

3.2.1.1.2.  ECN++ TCP Client does not Support AccECN

   If the SYN sent by a TCP initiator does not attempt to negotiate
   Accurate ECN feedback, or does not use an equivalent safety
   mechanism, it MUST still comply with RFC 3168, which says that a TCP
   initiator "MUST NOT set ECT on a SYN".

   The only envisaged examples of "equivalent safety mechanisms" are: a)
   some future TCP ECN feedback protocol, perhaps evolved from AccECN,
   that feeds back CE marking on a SYN; b) setting the initial window to
   1 SMSS.  IW=1 is NOT RECOMMENDED because it could degrade
   performance, but might be appropriate for certain lightweight TCP
   implementations.

   See Section 4.2 for discussion and rationale.

   If the TCP initiator does not set ECT on the SYN, the rest of
   Section 3.2.1 does not apply.






>
>
> * Section 3.2.1.2.  Caching where to use ECT on SYNs
>
>   I think it could be useful to reference RFC 2140 / 2140bis (non-normatively) and possibly adapt to the terminology therein. The same may apply to later sections discussing caching.



I added a reference to 2140 (i didnt used 2140 bis as it is still a
draft and i is expired)


>
>
> * Section 3.2.1.3.  SYN Congestion Response
>
>   "If the SYN-ACK returned to the TCP initiator confirms that the server
>    supports AccECN, it will also indicate whether or not the SYN was CE-
>    marked.  If the SYN was CE-marked, the initiator MUST reduce its
>    Initial Window (IW) and SHOULD reduce it to 1 SMSS (sender maximum
>    segment size)."
>
>   Regarding the MUST: How can a sender meet the MUST requirement if IW was 1 MSS?
>


Rephrased as

If the SYN is CE-marked, if the initial
            window was greater than 1 MSS, then, the initiator MUST
            reduce its Initial Window (IW) and SHOULD reduce it to 1 SMSS
            (sender maximum segment size).

>
>   Regarding the SHOULD: This is one example for which the draft picks a very conservative approach. Being conservative is not wrong but should be well explained (see below).


The motivation for this is described later in the draft, I have
included a forward reference to section 4 where this is justified.


>
>
> * Section 3.2.1.3.  SYN Congestion Response
>
>    "If the initiator has set ECT on the SYN and if the SYN-ACK shows that
>    the server does not support AccECN, the TCP initiator MUST
>    conservatively reduce its Initial Window and SHOULD reduce it to 1
>    SMSS.  A reduction to greater than 1 SMSS MAY be appropriate (see
>    Section 4.2.1).  Conservatism is necessary because a non-AccECN SYN-
>    ACK cannot show whether the SYN was CE-marked."
>
>  Regarding the MUST, I have the same question like in the previous comment, i.e., the guidance when IW is already 1 MSS .
>

rephrased as

If the initiator has set ECT on the SYN and if the SYN-ACK
            shows that the server does not support AccECN and if the initial
            congestion window of the initiator is greater than 1 MSS, then
            the TCP initiator
            MUST conservatively reduce its Initial Window and SHOULD reduce it
            to 1 SMSS.

>
>  The SHOULD is one of the cases that really need backing. Unfortunately, as explained below, I have doubts that the current wording in Section 4.2.1 sufficiently explains this SHOULD. As a result, I either believe that the explanation in Section 4.2.1 needs to be improved, or, if that is not possible, this guidance could have to be reviewed (e.g., "MAY reduce it to 1 SMSS").



The justification is what is currently written in the draft. I mean,
the sender is unable to know if the SYN was marked, so it should
assume the worst case, i.e. it was CE marked. Besides, the sentence
right after the one you copied above states: "A reduction to greater
than 1 SMSS MAY be appropriate", so the draft already allows the
implementer to reduce to a larger value than 1 MSS!


>
> BTW, if I had to implement the protocol, this is one of the cases where I would consider violating the spec from day one, given the potential harm on TCP performance.


By reducing the IW to a value larger than 1 MSS, you wouldnt be
violating the spec, as it is already supported in the current text.

>
> Luckily, I am not an implementer... Anyway, this brings me to another point: Are there no "MEASURMENTS NEEDED" in Section 3.2.1.3, e.g., regarding potential impact on application-layer performance, performance/robustness comparison of different strategies, and the like?



I think there are needed many more measurements than the ones
explicitly spelled out in the document. I guess for each subsection,
there are several relevant measurements that would allow to understand
the experiment. I am unconvinced it would be useful to spell them all
out. It may be better to eliminate the current ones, as they will
always be an incomplete list of measurements. what do you think?
In the spirit of keeping the current ones as representative of some
relevant measurements needed, we've added EXPERIMENTATION NEEDED (not
just MEASUREMENTS NEEDED) bullets in the 2 places within the rationale
section where response to congestion on a SYN or SYN-ACK is discussed.
And referred to these from 1.2.

>
>
> * Section 3.3.2 SYN (Receive)
>
>   "Some implementations of TCP servers (e.g. current Linux) assume that,
>    if a host receives a SYN with a non-zero IP/ECN field, it must be due
>    to network mangling, and they disable ECN for the rest of the
>    connection.  Section 4.2.2.2 finds that this type of network mangling
>    seems to be virtually non-existent so it would be preferable to
>    report any such mangling so it can be fixed."
>
>   The term "current Linux" is not specific. To a reader of the RFC in some years from now, it will not be clear to what this refers to. So, at minimum this draft would have to be specific about kernel versions etc. And operating systems change. But, more importantly, I do not believe that discussing implementation choices belongs into the normative part of the specification at all. Thus, I think that at minimum "(e.g., current Linux)" should be removed in the normative part of the document.


Ok, I removed the current linux part.

>
>
> * Section 4.2.1 Argument 1a: Unregognized CE on the SYN
>
>   "Happily, the appropriate conservative congestion response is to
>    reduce the initial window, and it is extremely rare for a TCP client
>    to send more than one packet as its initial request anyway.  Any
>    clients that do frequently use a larger initial window for their
>    first message to the server can cache which servers will not
>    understand ECT on a SYN (see Section 4.2.3 below)."
>
>   I disagree with the entire first sentence. First, I fail to understand why "it is extremely rare for a TCP client to send more than one packet as its initial request". For instance, I believe that FTP is a counter-example. There is a huge amount of application protocols running on top of TCP. And which protocols are "rare" depend _a_ lot on what traffic mix one looks at. What data backs this claim for _all_ protocols using TCP in all IP networks?
>
>   I agree that this statement may be true for Web browsing. HTTP is an extremely relevant protocol for TCP (well, still), but by far not the only TCP protocol, there are probably even HTTP use cases with huge POST/PUT requests. So, this statement must either be backed for all TCP-based application or be removed.
>
>   Second, I disagree with the term "appropriate" unless sufficient data is provided regarding the first point. And unless this is fixed, this document IMHO lacks an explanation of why the particular design choice is indeed "appropriate". I don't know how to easily fix this; potential solutions could e.g. include relaxing the requirements to less conservative guidance, openly state that the algorithm is just a conservative design choice, ... or provide data from running code backing the claim.
>
>   I also believe that the follow-up sentence "Any clients that do frequently use a larger initial window for their first message to the server can cache which servers will not understand ECT on a SYN" needs backing that explains why this indeed works for "any" client. For instance, what may work for me would be a worst-case analysis of the number of peers (tens of thousands, or even more?) that a given client may have, experience from running code, or some reference to a scientific study. But I do believe that some form of backing is needed that this works for "any" client. If that is not possible, both the tone of the text and possible the normative guidance would have to be adapted to make clear that there is uncertainty whether this indeed works well in "any" case.
>

I have rephrased the paragraph scoping it to http.
the wording is as follows
          <t>The recommeded conservative congestion response is to
          reduce the initial window, which does not affect the performance of
          very popular protocols such as HTTP, since it is extremely rare
          for an HTTP client to send more than one packet as its initial request
          anyway (for data on HTTP/1 &amp; HTTP/2 request sizes see <xref
          target="Manzoor17"/>).


>
> 4.2.3.  Caching Strategies for ECT on SYNs
>
>    "For instance, if a user's Internet access bottleneck supported L4S ECN
>    but not Classic ECN, strategy (S2A) would make most sense and there
>    would be no point trying to avoid the 'over-strict' test and
>    negotiate Classic ECN. [and several following paragraphs]"
>
>   I believe this discussion could take place in an L4S document, e.g., submitted to TSVWG, since it contains L4S-specific assumptions. A reader may not understand this without reading the L4S specifications. To me, this is a non-starter for a document in TCPM.
>


Ok, we have made the following changes:
Created an L4S subsection under section 5, which discusses
interactions with other experimental protocols.
And move this example to there, referring forward from this caching section.





>
> 4.3.2.  Response to Congestion on a SYN-ACK
>
>    [After a pretty detailed discussion...]
>    "Therefore, TCP's response to a CE-marked SYN-ACK can be similar to
>    its response to the loss of _any_ packet, rather than backing off as
>    if the special _initial_ packet of a flow has been lost."
>
>   This section contains some good and reasonable discussion. It concludes with a very conservative design (Slow-Start from IW1). I agree that the design choice sounds safe. I was probably save in the Internet 20 years ago, and is therefore still safe today. But, well, the response to "loss of _any_ packet" in TCP congestion control is more or less having the rate (and many stack may do less without running into issues), and TCPM has started the ABE experiment that even allows a smaller reduction for CE. Are we really convinced that we must experiment with one of the most conservative possible protocol designs? I really wonder if implementers would implement ECN++ with those parameters in a production deployment and see a performance benefit in doing so.
>

In this case, the reduction to 1 MSS is a SHOULD, not a must, so
implementers and other parties forming part of the experiment can use
different values if they understand the implications. This is why we
included all these rationale and motivation sections.


>
> * 4.4.2.  Summary: Enabling ECN on Pure ACKs
>
>    "During review of this specification, it was decided that allowing
>    hosts to set ECT on Pure ACKs without a feasible response mechanism
>    would set an undesirable precedent.  It would certainly improve the
>    flow's own performance, but it would slightly increase potential harm
>    to others.  Therefore, Section 3.2.3 allows ECT on Pure ACKs if
>    AccECN feedback has been negotiated, but not with classic RFC 3168
>    ECN feedback."
>
>   To me, "slightly increase potential harm" is a weak justification. Do we have any data on how relevant this problem may be, e.g., from running code? Personally, I am not convinced by what is written here (but, like elsewhere, I am perfectly fine with being in the rough part of the consensus). Unfortunately, I can't offer data to proof my opinion. At minimum, what I would suggest is to rephrase this paragraph to something like:
>
>    "This specification uses a safe approach. Allowing hosts to set ECT on Pure ACKs without a feasible response mechanism could result in risk.  It would certainly improve the flow's own performance, but it would slightly increase potential harm to others. Therefore, Section 3.2.3 allows ECT on Pure ACKs if AccECN feedback has been negotiated, but not with classic RFC 3168 ECN feedback."
>

ok, i replaced the current text by your proposed text, slightly
modified to include the precendet argument, that I think it should be
preserved.

          <t>This specification uses a safe approach. Allowing hosts to set
          ECT on Pure ACKs without a feasible response mechanism could result
          in risk. It would certainly improve the flow's own performance, but
          it would slightly increase potential harm to others. Morevoer, if
          would set an undesirable precedent regading the
          lack of response to congestion signals. Therefore,
          <xref target="acks"/> allows ECT on Pure ACKs if AccECN feedback has
          been negotiated, but not with classic RFC 3168 ECN feedback.</t>

>
> * Section 5.1.  IW10
>
>   "As specified in Section 3.2.1.1, a TCP initiator can only set ECT on
>    the SYN if it requests AccECN support.  If, however, the SYN-ACK
>    tells the initiator that the responder does not support AccECN,
>    Section 3.2.1.1 advises the initiator to conservatively reduce its
>    initial window to 1 SMSS because, if the SYN was CE-marked, the SYN-
>    ACK has no way to feed that back.
>
>    If the initiator implements IW10, it seems rather over-conservative
>    to reduce IW from 10 to 1 just in case a congestion marking was
>    missed.  Nonetheless, the reduction to 1 SMSS will rarely harm
>    performance, because:
>
>    o  as long as the initiator is caching failures to negotiate AccECN,
>       subsequent attempts to access the same server will not use ECT on
>       the SYN anyway, so there will no longer be any need to
>       conservatively reduce IW;
>
>    o  currently, at least for web sessions, it is extremely rare for a
>       TCP initiator (client) to have more than one data segment to send
>       at the start of a TCP connection [28; Fig 3] - IW10 is primarily
>       exploited by TCP servers."
>
>   See my previous questions. I do not understand why there is no deployed TCP-based application with some reasonable share of traffic that sends more than one segment at the beginning of the TCP connection. This document deals with all TCP usage; it is not written only for the communication between Web browsers and Web servers. For instance, why would the TCP client in an FTP session not use IW10? I disagree that one can generalize from HTTP to all TCP-based application protocols. And there is also no evidence that caching will work for any TCP protocol. For that, the worst-case scenario would be end-systems that maintain TCP connections to _many_ peers (at least tens of thousands of peers, maybe even more). How well will caching work in such cases? If those statements are not backed better, the text already explains what a reader could conclude: "it seems rather over-conservative to reduce IW from 10 to 1 just in case a congestion marking was missed." Yes, indeed, unless proven otherwise.



There are several aspects to this comment:
First, section 3.3.1.3 recommends to reduce the IW to 1 MSS, but it
allows for the initiator to use a greater window. So, an implementer
can decide whether to follow the recommendation or not.

This section serves the purpose of informing the implementors when
deciding whether they will follow the recommendation or not. I believe
that it is useful for them to provide information about the impact in
web traffic, as it does account for a large set of the traffic. If the
implementaros have a different applications use case in mind, she
shouldnt take this into account. Similarly for caching. Keep in mind
that caching is relevant in clients. I would assume that a very large
majority of clients would work well with a smaller cache than your
worst case. If an implementor expects that several tens of thousands
of outgoing connections will be the usual case, she should not
consider this argument neither. Again, this is a non normative part of
the specification. Keeping these arguments here, allow the
implementors to not follow the recommendation if the arguments here
are not convincing.



>
>
> * Section 5.1.  IW10
>
>    If a responder receives feedback that the SYN-ACK was CE-marked,
>    Section 3.2.2.2 mandates that it reduces its initial window to 1
>    SMSS.  When the responder also implements IW10, it is particularly
>    important to adhere to this requirement in order to avoid overflowing
>    a queue that is clearly already congested.
>
>   Here the document is internally consistent. In Section 3.2.2.2. it argues that "the ECN feedback proves that the network is delivering packets successfully and is not severely overloaded." And in this section it argues that the queue is clearly congested. Please ensure that the document is consistent.


I reworded as:

When the responder also implements IW10,
        it is particularly important to adhere to this requirement in order to
        avoid overflowing a queue that has clearly already been building up.

>
>
>   For what it is worth, I personally believe a "clearly congested" queue would drop packets instead of marking them. Of course, a reduction to IW 1 is safe according to all what we have learnt in the last 20 years in the Internet. But to me it is very conservative compared to, well, some of the congestion control algorithms running in today's Internet. I personally would speculate that we have some more head-room before running into Internet congestion collapse. Yet, of course, the usual trade-offs for congestion control apply. To me, his sounds like a good area for which further data is needed from experimentation. I'd like to record that I am personally not convinced by the reasoning in the document. But, I don't have data. And, I am always willing to learn from running code.
>
> * References
>
>   I am somewhat surprised to see RFC 793 as an informative reference only. Some other references could also be normative.
>

I moved 793 to normative

>
>
> Minor comments:
>
> * Section 1.1
>
>    "Non-ECN control packets particularly harm performance in environments
>    where the ECN marking level is high."
>
>   Is my understanding correct that this means "Not using ECN on control packets can be particularly detrimental to performance in environments where the ECN marking level is high"? I am not sure if the control packets indeed cause "harm".


ok, used your wording


>
>
> * Section 2
>
>   Better would be the wording in RFC 8174


added

>
>
> * Section 2
>
>   "Optionally, it could mean ECT(1), which is in the process of being redefined for use by L4S experiments [RFC8311] [I-D.ietf-tsvwg-ecn-l4s-id]."
>
>   I don't understand why this matters here given that section 1 already explains that this document is compatible with L4S.



but it makes sense to mention it here, since we provide some
justification why by default is ECt(0) and what about ECT(1).

>
>




>
>
> * 3.3.3.  Pure ACK (Receive)
>
>    "Currently AccECN feedback is required to count CE marking of any control packet including pure ACKs."
>
>    Why "currently"?



Removed the currently

>
>
> * Section 4.9.  General Fall-back for any Control Packet
>
>    "And particularly given that
>    unnecessarily removing ECT from other control packets could lead to
>    performance problems, e.g. by directing them into an inferior queue
>    [I-D.ietf-tsvwg-ecn-l4s-id] or over a different path, because some
>    broken multipath equipment (erroneously) routes based on all 8 bits
>    of the Diffserv field."
>
>   I think "inferior" has to be replaced by a neutral term, e.g., "another" queue. I don't think that the queue would always be "inferior", e.g., if DiffServ was used. And, yes, this quite directly runs into L4S-specific questions and that are entirely out-of-scope of this document... All that could be avoided by reducing the references to L4S.
>


changed inferior with another queue.


>
> * 5.3.  TCP Derivatives
>
>    "Experience from experiments on adding ECN support to all TCP packets
>    ought to be directly transferable between TCP and derivatives of TCP,
>    like SCTP or QUIC."
>
>   I would suggest to use at least another term instead of "TCP Derivates", such as "other Internet transport protocols". And I think this section could actually either be removed or moved to an appendix.
>

I changed it to other transport protocols.

Regards, marcelo


>
>
> Thanks
>
> Michael



--
MARCELO GABRIEL BAGNULO BRAUN
Universidad Carlos III de Madrid


--
MARCELO GABRIEL BAGNULO BRAUN
Universidad Carlos III de Madrid


-- 
MARCELO GABRIEL BAGNULO BRAUN
Universidad Carlos III de Madrid