Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rfc8312bis-03

Markku Kojo <kojo@cs.helsinki.fi> Mon, 30 August 2021 16:33 UTC

Date: Mon, 30 Aug 2021 19:33:16 +0300
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Yoshifumi Nishida <nsd.ietf@gmail.com>
cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
In-Reply-To: <CAAK044SjMmBnO8xdn2ogWMZTcecXoET1dmZqd6Dt3WzOUi359A@mail.gmail.com>
Message-ID: <alpine.DEB.2.21.2108300740560.5845@hp8x-60.cs.helsinki.fi>
References: <CAAK044SjMmBnO8xdn2ogWMZTcecXoET1dmZqd6Dt3WzOUi359A@mail.gmail.com>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-380-1630341197-0001-2"
Content-ID: <alpine.DEB.2.21.2108301525000.5845@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/nd4Af8QRQMDkp_R_9ZQMkI4yGAc>
Subject: Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rfc8312bis-03
Precedence: list

Hi Yoshi, all,

On Wed, 18 Aug 2021, Yoshifumi Nishida wrote:

> Hi,The chairs think 8312bis draft is in good shape and it's ready for submission.
> We know we had some discussions on how to handle ABC draft in the draft during the
> last WG meeting and there are on-going related discussions on the ML.
> However, we still think we can proceed with the draft as the relevance between the
> draft and the current discussions is not very high.
> 
> If anyone has different thoughts on this, please let us know.
> Otherwise, we will prepare a writeup soon and submit it to IESG.

My sincere apologies as this comes very late in the process but I was not 
able to follow IETF mailing list at the time of WGLC nor during 
the recent weeks, and I couldn't attend IETF 111.

I took a look at the draft and some statements of it seemed to be in 
conflict with other Standards Track RFCs. I believe the tcpm 
wg should look more closer to these to avoid publishing conflicting 
guidance in different Standards Track RFCs. In addition, there are some 
issues I am concerned about.

1. ECN

a) The draft modifies RFC 3168 when ECE arrives and would result
    in cwnd < 2 MSS by setting a lower bound of 2 MSS for cwnd (only
    ssthresh is supposed to have a lower bound of 2 MSS).
    This is in conflict with RFC 3168, RFC 5033, and RFC 2914 which
    require "full backoff", that is, a sender must continue decreasing
    sending rate as long as congestion persists. This is a fundamental
    property for any congestion control mechanism. For ECN, RFC 3168
    (sec 6.1.2) requires that cwnd is halved until the minimum cwnd
    of one MSS is received, and then the sender continues reducing
    sending rate by using a timer with exponential backoff, if more
    ECE-echo packets keep on arriving.

    This implementation bug has been long with Linux and is present
    in other stacks as well and should get corrected ASAP with
    appropriate advise in all published RFCs, instead of replicating
    the bug in the RFC series.

b) RFC 8311 (sec 4.1) allows modifying the TCP-sender response to
    ECE for experimental purposes only. Has there been any discussion
    with tsvwg in that modifying the TCP-response to ECE in CUBIC is
    conflict with RFC 8311 as CUBIC is currently intended to become
    a Standards Track RFC?

c) ABE (RFC 8511) is currently the only experimental RFC to modify
    the TCP-sender response to ECE. ABE allows modifying multiplicative
    decrease factor only for AIMD TCP and only when ECE arrives in
    congestion avoidance, that is, not when the sender is in slow-start.

    Applying a decrease factor of 0.7 (or higher) when a congestion
    singnal arrives and ends the initial slow start would be
    inconsiderate because it extends the convergence time from
    the slow-start overshoot. ABE has found that using a larger decrease
    factor yields performance improvement when applied in congestion
    avoidance, but not otherwise. Do we have data that would support
    different findings with CUBIC?

2. Slow-Start Overshoot w/ loss-based congestion conrol

    The larger decrease factor of 0.7 seems unadviseable also if
    used in the initial slow start with loss based congestion
    control (w/ Not-ECT traffic); packets start getting dropped
    when a TCP sender has increased cwnd in slow start such that
    the available network bandwidth and buffering capacity at the
    bottleneck is filled, but the TCP sender continues sending
    more packets for one RTT doubling cwnd and hence also the number
    of packets inflight before the congestion signal reaches the sender.
    Now, even if the sender uses the standard decrease factor of 0.5,
    the cwnd gets reduced only to a value that equals to the cwnd just
    before (or around) the congestion point. That is, the network is
    still full when the sender enters fast recovery but we do not
    expect more drops during fast recovery in a deterministic model.
    Only in congestion avoidance after the recovery, the sender
    increases cwnd again and gets a packet drop that takes the
    sender to a normal sawtooth cycle in an ideal case. So, the
    convergence time from slow-start is expexted to be fast though
    in reality loss recovery does not always work ideally with
    such many drops in a window of data.

    However, if the sender applies decrease factor of 0.7, it
    continues in fast recovery with a 40% higher cwnd than what is
    the available network capacity. This is very likely to result in
    significant number of packet losses during fast recovery, and
    very likely to result in loss of retransmissions. So, it is no
    wonder that so many people have been very concerned about the
    slow-start overshoot and the problems it creates.
    It is very obvious that applying decrease factor of 0.7 in
    the initial slow start is likely to extend the convergence
    time from the slow-start overshoot significantly. Or, do we
    have data that shows that such concern is unnecessary?
    Also, a number of new loss-recovery mechanisms have been
    introduced maybe mainly because of this?
    I would hesitate recommending decrease factor of 0.7 when
    a congestion event occurs during the initial slow start.

3. RACK (and QUIC)

    The draft states that RACK (and QUIC loss detection) can be used
    with CUBIC to detect losses. However, it seems to have gone
    unnoticed that RACK may also detect loss of a retransmission in
    which case the congestion control response is required to be taken
    twice, i.e., ssthresh and cwnd must be lowered again (MUST in
    RFC 5681 Sec. 4.3). Once RACK got published all new congestion
    controls and updates to existing RFCs must include this essential
    congestion control response, if the congestion control mechanism
    intends to use RACK for loss detection.

    This draft does not have any such requirement nor does it specify
    how this is done?

4. Fairness to AIMD congestion control

    The equation on page 12 to derive increase factor α      that
                                                       cubic

    intends to achieve the same average window as AIMD TCP seems to
    have its origins in a preliminary paper that states that the
    authors do not have an explanation to the discrepancy between
    their AIMD model and experimental results, which clearly deviate.
    It seems to have gone unnoticed that the equation assumes equal
    drop probability for the different values of the increase factor
    and multiplicative decrease factor but the drop probability
    changes when these factors change. The equations for the drop
    probability / the # of packets in one congestion epoch
    are available in the original paper and one can easily verify
    this. Therefore, the equations used in CUBIC are not correct
    and seem to underestimate _W_est_ for AIMD TCP, resulting in
    moving away from AIMD-Friendly region too early. This gives
    CUBIC unjustified advantage over AIMD TCP particularly in
    environments with low level of statistical multiplexing. With
    high level of multiplexing, drop probability goes higher and
    differences in the drop probablilities tend to get small. On the
    other hand, with such high level of competition, the theoretical
    equations may not be that valid anymore.

5. Contribution to buffer bloat and slower convergence due to
    larger decrease factor

    This draft uses a larger cwnd decrease factor, resulting in larger
    average cwnd and buffer occupation. This means that it is
    likely to contribute significantly to buffer bloat, particularly
    when considering also the use of concave increase function in the
    beginning of the congestion avoidance that keeps the cwnd close
    to maximum most of the time as carefully explained in the draft.
    This means that CUBIC keeps also buffer bloated router queues
    very efficiently full at all times.

    Currently the draft does mention the slower convergence speed
    as the only side effect for the larger decrease factor and does
    not discuss the contribution to buffer bloat. It would be
    important to assess this together with measurement data to
    back up any observations.

    Do we have data in different environments, including buffer-bloated
    environments that show how much effect CUBIC has compared to
    AIMD TCP?
    And, how does larger decrease function impact convergence speed,
    particularly in buffer-bloated environments.
    Many people have complained that window-based (TCP) congestion
    control drives buffer bloat. Of course, also the current standard
    AIMD TCP tends to fill in the buffer-bloated queues but it
    unlikely does it as effectively as CUBIC? This would be good to
    understand better.

6. Citing Experimental RFCs as if being a part of CUBIC

    The draft says that CUBIC MAY implement DSACK [RFC3708], limited slow
    start [RFC 3742], [RFC7661] and hybrid slow start [cites a paper].
    Aren't the first three down references? Not sure if it is appropriate
    for a Stds Track document to cite experimental work or a paper like
    this even though it's a MAY.

7. Discussion

I regret to say that the discussion in Sec 5. brings up surprisingly 
little data to back up the claims that are made. Given the long 
deployment experience that is emphasised in the draft, there, however, is 
little evidence (measurement data) summarised and cited to back up the 
claims. "There is a long deployment experience" does not provide 
any evidence as such. There should be a lot of studies with measurement 
data accumulated over the years that would support the assertions in the 
doc. Or, is there?

Sec 5.1

In this subsection, one should show the impact of CUBIC when 
competing with AIMD TCP. The numbers in tables are derived from
analytical models that give average window size with fixed random 
loss probabilities and unlimited bandwidth. That is not the same as when 
flows are combeting in the same congested bottleneck that builds a queue.
Loss probabilities for different flows are likely to be different 
especially at lower levels of statistical multiplexing.

The first para of sec 5.1 does not sound like true. Simply looking at the 
original CUBIC paper [HRX08] reveals that CUBIC dominates AIMD TCP (SACK 
TCP) in the regions where SACK TCP alone is able to fully utilize the 
available bandwidth (Figure 10 c up until 200 Mbps, and to some extent in 
Fig 10 a with 40 ms delay). And ín all cases where SACK TCP alone is not 
able to utilize all available b/w, CUBIC steals multiple times more b/w 
from SACK TCP than what SACK TCP is not able to utilize. Figures 5 and 6 
tell the same story. Has something changed and/or is there possibly data 
that provides alternative evidence.

In addition, the recommended value for constant C and the two alternative 
values presented in the draft are the same as in the original paper. It 
would be interesting to see if there has been any experimentation with 
different values and what might be the outcome?

Sec 5.3

Any experimental data to summarize and cite?

Sec 5.4

The text correctly states that CUBIC fills queues faster than AIMD TCP 
and increases the risk of standing queues. Then it proposes queue sizing 
and AQM as a solution, which is odd. Applying AQM to keep the queues 
shorter of course decreases the RTT (delay) seen but it does not help 
with standing queues (they remain standing but are just shorter).

Sec 5.5

Setting lower bound of 2 MSS for cwnd with ECN may result in symptoms of 
congestion collapse with certain specific conditions, e.g., if the actual 
(physical) queue size is very large and there is a mix of ECN-capable 
(ECT) and not ECN-capable flows. When the number of ECN capable flows 
increase the start starving the not ECN-capable flows as ECT flows stop 
responding to congestion and start increasing the queue such that AQM has 
to drop almost all not ECT packets.

Sec 5.6

Competing CUBIC flows will converge but it happens very slowly and 
requires a large amount of data to send, i.e., short flows are more 
unlikely to live long enough to converge. This seems to be case at least 
according to the results in original paper [HRX08, Fig 4 b].
Summary and citing some performance data would be very useful and much 
more convincing.

Sec 5.8

The MUST NOT requirement would be much better placed with other 
specifications in Sec 4 and would benefit from more accurate description.

Sec 5.9

The statement made here is not convincing and is likely to be incorrect. 
E.g., CUBIC with larger decrease factor would most likely release 
capacity notably slower than AIMD TCP if there is sudden congestion.

8. The draft says in the intro that CUBIC is to be regarded as *current 
standard* for TCP congestion control. It sounds a bit like it would 
obsolete RFC 5681 which is not the intent. RFC 5681 still has its 
specific role as the document that gives the baseline and generic 
guidelines for TCP (and other) congestion control.
Instead, I think this document should articulate very carefully its role 
among the congestion control algorithms. How, I am not sure. Maybe 
simply as an alternative for RFC5681 congestion avoidance and 
multiplicative decrease.

Please note also that when specifying these algorithms this document is
in direct conflict with a MUST in RFC5681 which says: "however, a TCP 
MUST NOT be more aggressive than the following algorithms allow (that is, 
MUST NOT send data when the value of cwnd computed by the following 
algorithms would not allow the data to be sent)."
Therefore, the draft should make this differentation very clear maybe 
already in the abstract and justify the deviations much better than it 
currently does (accompanied with evidence = data). This is very important 
i order to make a convincing case why it is ok for this doc to deviate 
from the current Standards Track TCP normative statements.

Misc comments:

_epoch_start_: needs more accurate and consistent definition when the 
exactly the epoch starts. Is it when congestion event occurs or when 
TSP sender enters congestion avoidance first time after an congestion 
event. If it is  different in different scenarios that would be good to 
present systematically.

In many occassions:

  "(upon receiving) an ACK" -> "(upon receiving) a new ACK"

On page 13:

   " the sender MAY employ a Fast
     Recovery algorithm to gradually adjust the congestion window to its
     new reduced _ssthresh_ value."

I assume this is aiming at saying that something similar to PRR MAY be 
used to reduce cwnd. This, however, is somewhat vaguely said and using 
fasr recovery is misleading. We need to remember also that it might not 
be trivial to have it right. So, dunno whether it would be useful to drop 
this.

Best regards,

/Markku

[tcpm] Concluding WGLC for draft-ietf-tcpm-rfc831… Yoshifumi Nishida
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Yoshifumi Nishida
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Lars Eggert
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Vidhi Goel
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Vidhi Goel
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Lisong Xu
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Sangtae Ha
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Markku Kojo
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Lars Eggert
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Lars Eggert
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Markku Kojo
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Yoshifumi Nishida
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Lars Eggert
[tcpm] alpha_cubic (was: Concluding WGLC for draf… Bob Briscoe
Re: [tcpm] alpha_cubic (was: Concluding WGLC for … Yoshifumi Nishida
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Markku Kojo
Re: [tcpm] alpha_cubic (was: Concluding WGLC for … Yoshifumi Nishida
Re: [tcpm] alpha_cubic (was: Concluding WGLC for … Jonathan Morton
Re: [tcpm] alpha_cubic Bob Briscoe
Re: [tcpm] Concluding WGLC for draft-ietf-tcpm-rf… Bob Briscoe
Re: [tcpm] alpha_cubic Yoshifumi Nishida
Re: [tcpm] alpha_cubic Bob Briscoe
Re: [tcpm] alpha_cubic Neal Cardwell
Re: [tcpm] alpha_cubic Bob Briscoe
Re: [tcpm] alpha_cubic Bob Briscoe
Re: [tcpm] alpha_cubic Markku Kojo
Re: [tcpm] alpha_cubic Yoshifumi Nishida
Re: [tcpm] alpha_cubic (Issue 1) Bob Briscoe
Re: [tcpm] alpha_cubic (Issue 1) Bob Briscoe