Re: [tcpm] 2nd WGLC for draft-ietf-tcpm-rfc8312bis

Hi Vidhi, all,

catching up ...

On Thu, 17 Feb 2022, Vidhi Goel wrote:

> I have a general question for the WG, are the "current practices" defined
> in https://datatracker.ietf.org/doc/html/rfc2914 still current? We are 22 years ahead of
> the published date of this draft and there have been magnitude of advancements in
> congestion control algorithms at the hosts and what is the right way to measure
> conformance for a congestion control. A very interesting paper about how to quantify a new
> congestion control, which most might have read, "Beyond Jain's Fairness Index: Setting the
> Bar For The Deployment of Congestion Control Algorithms"

When I am referring to fairness, I am not referring to equal fairness 
which is quite appropriately criticised in the paper you cite. Neither 
does RFC 2914 nor RFC 5033 require equal fairness. See also my other 
comments below that relate to this issue.

> I skimmed through RFC 2914 and it references obsoleted drafts for what it calls conformant
> TCP. Like below:
> 
> For TCP, this
>    means congestion control algorithms conformant with the current TCP
>    specification [RFC793, RFC1122, RFC2581].

None of these are HISTORIC, see more below.

> I don’t think we can continue to use old RFCs as a gold standard although we can use some
> the text as is if it still holds true.
>
>               Some of these may fail to implement
>               the TCP congestion avoidance mechanisms correctly because of
>             poor
>               implementation [RFC2525].  Others may deliberately be
>             implemented
>               with congestion avoidance algorithms that are more aggressive in
>               their use of bandwidth than other TCP implementations; this
>             would
>               allow a vendor to claim to have a "faster TCP".  The logical
>               consequence of such implementations would be a spiral of
>             increasingly
>               aggressive TCP implementations, or increasingly aggressive
>             transport
>               protocols, leading back to the point where there is effectively
>             no
>               congestion avoidance and the Internet is chronically congested.
> 
> For example, in this text quoted by Markku, I agree that we shouldn’t end up with
> algorithms that eventually lead to a chronically congested networks but at the same time,
> if a more aggressive approach provides better efficiency by ramping up a bit faster
> without leading to more congestion, then that shouldn’t be discouraged.

I fully agree and I have not (intentionally) criticised the feature of 
CUBIC when in the "genuine CUBIC" mode that ramps up faster. It is all OK 
once CUBIC is out of the region where std CC (Reno CC) is not able 
to utilize the available capacity (i.e., when not within the 
Reno-Friendly region). CUBIC may well ramp up faster when it does not 
steal the capacity from other (TCP-compatible) flows. This is one point 
the paper you pointed addresses and also allowed by RFC 2914 and RFC 
5033, and quite explicitly mentioned in RFC 5033 when it points to RFC 
3649.

The uncertainty we have is whether CUBIC is fair to Reno CC when it is in 
the Reno-Friendly region (i.e., whether the model and formula is correct) 
and whether CUBIC leaves the Reno-friendly region at a correct point (the 
value for C (=0.4) has not been validated in a *wide range* of 
environments.

And, very importantly, when CUBIC finds it needs to slow down (e.g., in 
the event of sudden heavy congestion), it does it very slowly as the 
paper pointed by Bob shows. This makes it to steal capacity from 
competing flows as it continues for a very long time at much higher rate 
than TCP-compatible flows do. There is the "fast convergence" feature 
with CUBIC but it really is not effective in releasing capacity to others 
when sudden congestion occurs as the measurements show and I have tried 
to explain, and I am happy to be convinced otherwise with experimental 
data.

Also, the use of MD=0.7 when in slow start is way too aggressive and 
dangerous, against the theory and design by Van Jacobson as I have 
pointed, and not justified at all in the draft. Applying MD=0.7 when a 
pkt loss is detected during slow-start results in a CUBIC sender to 
inject ~40% more packets than what is the available capacity during 
the Fast Recovery. This means that all these 40% excess packets (either 
those of the CUBIC sender or those of competing flows) are guaranteed to 
be dropped at a tail-drop bottleneck and, very importantly, note that 
these packets are so called "undelivered packets" (see RFC 2914) that 
create a danger of congestion collapse (of some degree) "by delivering 
packets through the network that are dropped before reaching their 
ultimate destination." [RFC 2914]

RFC 2914 emphasises:

  "This is probably the largest unresolved danger with respect to
   congestion collapse in the Internet today."

I urge the tcpm wg to very carefully consider the above issue.

In addition, I just noticed a little while ago when passing through the 
I-D but didn't comment yet that while the rule for changing alpha to 1 
when Wmax is reached in the Reno-friendly region is the correct thing to 
do during the normal steady state, it actually is incorrect action to 
take when in the fast convergence mode and within the Reno-friendly 
region, because it would act just opposite to what CUBIC does when in 
"genuine" CUBIC mode where it lowers the "plateau". That is, in the 
Reno-friendly region, instead of slowing down the increase rate during 
CA it actually accelerates as alpha is increased to 1 earlier than when 
not in the fast convergence mode. This has been modified in the bis 
version of the draft. I wonder if there is any experimental data 
available for evaluating the behavior?

> Faster TCP doesn’t
> necessarily mean that will cause congestion, it could just mean, instead of waiting 100s
> of RTTs to get to full link utilization, it does so in much lesser number of RTTs.
> In using Reno as a gold standard, we tend to forget that Reno is probably the cause of
> buffer bloat in the network. It basically requires network queues to be configured double
> the capacity in order to continue to have full link utilization after 50% reduction.
> And again, this might not have been evident at the time Reno was developed. We have
> realized these problems with time and we should move away from continuing to use Reno as
> the only acceptable Proposed Stds track congestion control.

Yes, having a shallower sawthooth is benefical as it allows shallower 
queues and it is not criticised.

>               It is convenient to divide flows into three classes: (1) TCP-
>               compatible flows, (2) unresponsive flows, i.e., flows that do
>             not
>               slow down when congestion occurs, and (3) flows that are
>             responsive
>               but are not TCP-compatible.  The last two classes contain more
>               aggressive flows that pose significant threats to Internet
>               performance,
> 
> 
> This second paragraph is also outdated. IIUC, TCP-compatible flows is equivalent to the
> RFC 2581 which is obsolete.

RFC 2581 is obsoleted by RFC 5681 when the doc was advanced to Draft 
Standard status, that is, the document and content is not obsoleted. How 
you are supposed to read such references is that you replace the cited 
RFC with the RFC that obsoleted the cited RFC (with some exceptions).

> I think we need to replace TCP-compatible with something else
> for two reasons:
> 
> 1. TCP is not the only transport protocol out there. What about QUIC, SCTP?

The word TCP-compatible has been selected to indicate std TCP congestion 
control behavior that the congestion controls implemented in other 
protocols than TCP are also required to follow. See the list below for 
many such protocols.

> 2. Congestion control algorithms should be evaluated in the same way for different
> transport protocols. So, a better word could be, Conformant congestion control, one which
> once accepted as Stds Track should just work for any transport protocol.
> 
> Perhaps, we need to show more experimental / deployment data for Cubic to become Stds
> Track but we shouldn’t rely on RFC2914 from the year 2000 to decide the acceptance
> criteria. If we continue to do so, then even 100 years from now, RFC 5681 will be the only
> TCP-conformant / TCP-compatible congestion control as defined by RFC 2914 and the one and
> only Stds Track congestion control to ever exist. (Please let me know if I missed any
> other Stds Track congestion control)

We have been constantly introducing TCP-conformant / TCP-compatible 
congestion controls and/or extending/modifying the existing Stds:

  RFC 3042, RFC 3448, RFC 4015, RFC 4340, RFC 4341, RFC 4342, RFC 4960,
  RFC 5348, RFC 5682, RFC 6582, RFC 6675, RFC 6298, RFC 8961, and many
  more.

Thanks,

/Markku

> Thanks,
> Vidhi
> 
>
>       On Feb 17, 2022, at 8:13 AM, Neal Cardwell
>       <ncardwell=40google.com@dmarc.ietf.org> wrote:
> 
> 
> 
> On Thu, Feb 17, 2022 at 9:42 AM Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>
> wrote:
>       Hi Yoshi,
>
>       On Tue, 15 Feb 2022, Yoshifumi Nishida wrote:
>
>       > Hi Markku,
>       > 
>       > Thanks for the comments. I think these are very valid points. 
>       > However, I would like to check several things as a co-chair and a doc
>       shepherd before we
>       > discuss the points you've raised.
>       > 
>       > In my understanding (please correct me if I'm wrong), when this draft
>       was adopted as an WG
>       > item, I think the goal of the doc was some minor updates from RFC8312
>       which include more
>       > clarifications, minor changes and bug fixes. 
>       > However, if we try to address your concerns, I think we'll need to
>       invent a new version of
>       > CUBIC something like CUBIC++ or NewCUBIC in the end. 
>       > I won't deny the value of such doc, but, this seems not to be what we
>       agreed on
>       > beforehand.  
>       > if we proceed in this direction, I think we will need to check the WG
>       consensus whether
>       > this should be a new goal for the doc.
>       > 
>       > So, I would like to check if this is what you intend for the doc or
>       you think we can
>       > address your points while aligning with the original goal.
>       > Also, if someone has opinions on this, please share.
>
>       I think it is important that we remember the status of RFC 8312 and the 
>       decades long process that has been followed in tsv area for new 
>       TCP congestion control algorithms that have been proposed and submitted 
>       to IETF. In order to ensure that new cc algos are safe and fair, the 
>       process that has been followed for all current stds track TCP cc algos 
>       has required that the cc algo is first accepted and published as 
>       experimental RFC and only once enough supportive experimental evidence 
>       has been gathered the doc has become a candidate to be forwaded to stds 
>       track. We have even agreed on a relatively strict evaluation process to 
>       follow when cc algos are brought to the IETF to be published as 
>       experimental:
>
>       https://www.ietf.org/about/groups/iesg/statements/experimental-congestion-control/
>
>       RFC 8312 was published as "Informational" and if I recall correctly the 
>       idea was "just to publish what's out there" for the benefit of the 
>       community. RFC 8312 was never really evaluated, particularly not in the 
>       way new cc algos are supposed to be as per the agreed process.
>
>       I do not recall what/how exactly was agreed when rfc8312bis was
>       launched 
>       but I would be very interested to hear the justification why this doc 
>       does not need to follow the process mentioned above but we would like
>       to 
>       propose IETF to publish a non-evaluated Informational doc to be
>       published 
>       "with minor updates", i.e., without actual evaluation, as a stds track 
>       RFC? If the target really remains as PS then the bar should be even 
>       higher than what is described for experimental docs in the above
>       process 
>       document, i.e, what we have followod for experimental to be moved to
>       stds 
>       track.
>
>       The only justification that I have heard has beed "because CUBIC has
>       long 
>       and wide deployment experience" and "the Internet has not smelted or
>       that 
>       "we should have noticed if there were problems". We must, however, 
>       understand that in order to have noticeable bad impact CUBIC should
>       cause 
>       some sort of congestion collapse. Congestion collapse, however, is not
>       an 
>       issue with CUBIC nor with any other CC algo that applies an RTO
>       mechanisms 
>       together with correctly implemented Karn's algo that retains the 
>       backed-off RTO until an Ack is received for a new (not rexmitted) data 
>       packet. The issue is fairness to competing traffic. This cannot be 
>       observed by deploying and measuring the performance and behaviour of
>       CUBIC 
>       alone. CUBIC being more aggressive than current stds track TCP CC would 
>       just gives good performance results that one running CUBIC would be
>       happy 
>       with. One must evaluate CUBIC's impact on the competing (Reno CC)
>       traffic 
>       in range of environments which requires carefully designed active 
>       measurements with thoroughly-analyzed results (as required by the above 
>       process document, RFC 5033 and RFC 2914). What we seem to be missing is 
>       this evidence on CUBIC's impact and that is something the IETF must
>       focus 
>       on, not just that whether CUBIC can achieve better performance than
>       other 
>       existing CCs. The latter has been shown in many publications and is the 
>       majos focus in  many scientific papers proposing new algos.
>       I appreciate a lot that CUBIC has been implemented/developped and 
>       deployed for long and I wonder whether those deploying CUBIC have 
>       unpublished results the wg could review before taking the decicion?
>
>       I suggest everyone to read carefully RFC 2914 Sec 3.2 and particularly 
>       what it says about more aggressive (than RFC 5681) congestion control 
>       algorithms:
>
>         Some of these may fail to implement
>         the TCP congestion avoidance mechanisms correctly because of poor
>         implementation [RFC2525].  Others may deliberately be implemented
>         with congestion avoidance algorithms that are more aggressive in
>         their use of bandwidth than other TCP implementations; this would
>         allow a vendor to claim to have a "faster TCP".  The logical
>         consequence of such implementations would be a spiral of increasingly
>         aggressive TCP implementations, or increasingly aggressive transport
>         protocols, leading back to the point where there is effectively no
>         congestion avoidance and the Internet is chronically congested.
>
>       And:
>
>         It is convenient to divide flows into three classes: (1) TCP-
>         compatible flows, (2) unresponsive flows, i.e., flows that do not
>         slow down when congestion occurs, and (3) flows that are responsive
>         but are not TCP-compatible.  The last two classes contain more
>         aggressive flows that pose significant threats to Internet
>         performance,
>
>       As I have tried to point out there are several features with CUBIC
>       where 
>       it is likely to be (or to me it seems it obviously is) more aggressive 
>       than what is reguired to be TCP-compatible. I'm not aware of evidince 
>       presented to tcpm (or IETF/IRTF) which shows opposite (and I happy to
>       be 
>       educated what I have missed).
>
>       You may take my comments to be a part of the expert review phase 
>       performed by the IRTF/ICCRG for CUBIC. I'm not requesting to modify
>       this 
>       doc to CUBIC++ (or something) but it seems to be that this would be 
>       necessary if this doc intends to become published as PS. For
>       experimental, 
>       I think it would need some addtioinal updates and record the areas 
>       uncertainty and where more experimentation (clearly) is required.
> 
> 
> Hi Markku,
> 
> Thanks for your careful argument above. Given your argument, in order to continue to
> make constructive progress on rfc8312bis without blocking for new research and
> experimentation, I would propose that we change rfc8312bis so that it remains
> "Informational" (like RFC8312) rather than the "Standards Track" status currently
> listed at:
> 
>   https://ntap.github.io/rfc8312bis/draft-ietf-tcpm-rfc8312bis.html
> 
> As far as I can tell, that would allow all of the valuable contributions in
> rfc8312bis to carry forward and be published, while still respecting all of the
> considerations you enumerated above, which center around concerns with calling CUBIC
> "standards track".
> 
> How does that sound?
> 
> best,
> neal
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
> 
> 
> 
>