Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBICdraft- thoughts and late follow-up)

Markku Kojo <kojo@cs.helsinki.fi> Mon, 04 July 2022 00:10 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 277D0C14CF18; Sun, 3 Jul 2022 17:10:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.008
X-Spam-Level:
X-Spam-Status: No, score=-2.008 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tgduYb_z1eVp; Sun, 3 Jul 2022 17:10:34 -0700 (PDT)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2D4EAC14F75F; Sun, 3 Jul 2022 17:10:30 -0700 (PDT)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Mon, 04 Jul 2022 03:10:03 +0300
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=KI05G2 E7ZWs+GoDX1Fu2f/Ejd6bZSVysTk6G/REwpm4=; b=PFduqCesR3Aw8vhjDentTc A+Chp/mLYFFEmxhGO3jFm3dUjWR5pBVVRbxkvuY61AAECNyBDWEEFca+I98B/1dC px0RaqEMERhCcxB16eqvrOEu6er5mfYjnO8HQQkkrfqucNnnuu9fKV46OhZ0LXnm VzgVcwA31Wn2Hihw1iUoI=
Received: from hp8x-60 (85-76-167-113-nat.elisa-mobile.fi [85.76.167.113]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Mon, 04 Jul 2022 03:10:03 +0300 id 00000000005A00CF.0000000062C22FDB.00003D81
Date: Mon, 04 Jul 2022 03:09:57 +0300
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Yoshifumi Nishida <nsd.ietf@gmail.com>
cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Lars Eggert <lars@eggert.org>, tcpm-chairs <tcpm-chairs@ietf.org>, Bob Briscoe <ietf@bobbriscoe.net>
In-Reply-To: <CAAK044QiP-EvLR0MAbpfH274+M0KyhO9v_qop1tBKcUW6EVBZw@mail.gmail.com>
Message-ID: <alpine.DEB.2.21.2206301512430.7292@hp8x-60.cs.helsinki.fi>
References: <alpine.DEB.2.21.2206061517230.7292@hp8x-60.cs.helsinki.fi> <alpine.DEB.2.21.2206141739100.7292@hp8x-60.cs.helsinki.fi> <CAAK044QqfB1_gnDLNKNd15XskrC1FWhxfmytw8xvSu9uCHFRWQ@mail.gmail.com> <alpine.DEB.2.21.2206200339080.7292@hp8x-60.cs.helsinki.fi> <CAAK044QiP-EvLR0MAbpfH274+M0KyhO9v_qop1tBKcUW6EVBZw@mail.gmail.com>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-15770-1656893403-0001-2"
Content-ID: <alpine.DEB.2.21.2207040309070.7292@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/lIQQf77NcwG-0c3k-MWOvssvhuc>
Subject: Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBICdraft- thoughts and late follow-up)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jul 2022 00:10:40 -0000

Hi Yoshi,

CC'ing also Bob as we haven't heard his view on my explanation w.r.t why 
the model for determining alpha in incorrect. Bob, my explanation is 
available at

   https://mailarchive.ietf.org/arch/msg/tcpm/bds-h_a6-NliTjx-ZqUSaFpSSnA/

On Tue, 21 Jun 2022, Yoshifumi Nishida wrote:

> Hi Markku,
> 
> I think the important point is "any potentially incorrect behaviour that later gets observed must of
> course be reconsidered and corrected."  I believe all people in this community will agree with it. 
> So, I think the gap in the discussion is whether we publish the doc and fix it when we see some issues or 
> we don't publish the doc until we can think there's no potential issue.
>  
> If we chose the prior one and we found some issues right after we publish the doc, then I will admit the
> decision was a mistake.
> But, I think the possibility for it is very low as we probably will need to monitor extensively as you
> mentioned. 

When I said

  "any potentially incorrect behaviour that later gets observed must of
  course be reconsidered and corrected."

I referred to RFC 9002. I believe we all agree on that as you said.

However, what I meant was that we should not repeat any mistake done with 
publishing some earlier document and use that as an excuse for repeating 
the mistake instead of fixing or documenting possible issues with a draft 
before publishing it.

> In the end, we all know evaluating CC is not an easy task at all and CUBIC is not an exception. Hence, I
> think
> the publishing and fixing later strategy isn't a bad idea here. OTOH, If we choose the latter strategy, I am
> concerned 
> it may take years for publishing and it won't be good for the community. 

What is important is that we look into the issues that have been raised 
and separately decide how to handle each of them. Now all discussions have 
been at very general level not tackling the actual issues at all. Each of 
the issues require (potentially) a different resolution.

> So, if we want to hold this process, I would like to see some solid evidence for the negative impacts in the
> current CUBIC logics.
> Some might say the people who propose CUBIC should prove it's safe, which I can agree with when many people
> are skeptical about it.
> But, from my point of view, the current situation is the opposite. Hence, I think the people who oppose
> publishing should prove its risk.  

Having data to prove something is always important. Regarding the issue 1 
that we are discussing in this thread:

1) The original paper that tried to validate the model for determining
   the AI factor alpha failed its validation attempt.

2) Bob and I have carefully described why the model is incorrect (my
   explanation corrected Bob's analysis for the tail-drop case and
   complemented Bob's analysis for the AQM case).

3) In the tail-drop case, which is the classical case and quite likely
   still the default for most bottlenecks in the Internet, when a Reno
   CC and CUBIC flow are competing, the CUBIC sender will opt out roughly
   every second cwnd reduction and thereby have a significant negative
   impact on the competing Reno CC flow.
   I thought you agreed my explanation of the problem was correct for the
   tail-drop case?
   Do you think that opting out every second cwnd reduction is not worth
   addressing in any way when the correct operation would be to reduce
   cwnd every time like a competing TCP-compatible flow does? Do you think
   that it does not have significant enough impact? For some evidence, see
   the next item.

4) For some evidence simply look at the original CUBIC paper [HRX08].
    It clearly reveals that CUBIC dominates Reno TCP (SACK TCP) in the
    regions where SACK TCP alone is able to fully utilize the available
    bandwidth. In Figure 10 (c) in all cases up until 200 Mbps SACK TCP
    is able to fully utilize the bottleneck link. However, when SACK TCP
    competes with CUBIC, CUBIC steals bandwidth from SACT TCP. Even in
    the case with 400 Mbps where SACK TCP leaves less than 10% of the
    link capacity underutilized, CUBIC stels the b/w from SACK TCP leaving
    only less than 10% to SACK TCP. Also in Fig 10 (a) with 40-160 ms RTT,
    CUBIC steals clearly more capasity from SACK TCP than what SACK TCP
    alone is not able to utilize in these cases.

    Also the draft cites paper [HLRX07] in section 5.2 and says

     "Our test results in [HLRX07] indicate that CUBIC uses the spare
      bandwidth left unused by existing Reno TCP flows in the same
      bottleneck link without taking away much bandwidth from the existing
      flows."

    If you look at the Fig 3 a in [HLRX07], it clearly indicates that Reno
    TCP (SACK TCP) is able to fully utlize the available link capacity like
    all other variants with 20 and 40 ms RTT and almost fully with 80 ms
    RTT. However, if we look at the Fig 12 a, where SACK TCP is clearly
    shown to be friendly to itself with the above RTTs, we see that
    CUBIC steals a notable amount of link capacity from SACK TCP with the
    same RTTs.

    Do you think that the above text claiming "that CUBIC uses the spare
    bandwidth left unused by existing Reno TCP flows in the same
    bottleneck link without taking away much bandwidth from the existing
    flows" is correct statement and can be published as an objective
    statement?

Given all the above, do you think we can ignore the issue 1 and just hide 
it?

Thanks,

/Markku

> --
> Yoshi
> 
> On Mon, Jun 20, 2022 at 4:42 PM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>       Hi Yoshi,
>
>       On Wed, 15 Jun 2022, Yoshifumi Nishida wrote:
>
>       > Hi Markku,
>       >
>       > Thanks for the response. Yes, you got valid points. But, I still have some comments.
>       >
>       > First thing I would like to clarify is that we acknowledge the model used for CUBIC has not
>       been validated as
>       > you pointed out.
>
>       Note that the model is not only unvalidated but it is also *incorrect*,
>       that is, it does not perform as intended. And, the reason for the
>       incorrect behaviour is different with a bottleneck at a tail-drop router
>       and with a bottleneck at an AQM router.
>
>       > However, at the same time, I believe it doesn't mean the model has significant threats to the
>       Internet. We've
>       > never seen such evidence even though CUBIC has been widely deployed for a long time.
>
>       This seems to be something that many don't quite understand: we cannot see
>       any such evidence unless somebody measures the impact to the other
>       competing traffic. It is not visible by observing the behaviour of a CUBIC
>       sender that people deploying CUBIC are interested in and are likely
>       to monitor quite extensively. The early measurements published along with
>       the original CUBIC paper already show this problem as I have pointed out
>       but the draft claims the opposite. AFAIK since then no published
>       measurement have been carried out.
>
>       > I am personally thinking
>       > that we will need to see tangible evidence for the threats to leave out the fact that it has
>       been widely
>       > used.
>
>       No, I don't think so. The responsibility of showing that there is no
>       threats (or that CUBIC behaves as intended) is for those proposing an
>       alternative CC algorithm.
>
>       > The second thing I would like to mention is that I am not sure how many drafts have been passed
>       through the
>       > RFC5033 process. 
>       > For example, RFC8995, RFC9002 are congestion control related standard docs, but in my
>       understanding, the
>       > process had not been applied to them.
>
>       Sorry for being unclear. I meant all stds track TCP congestion control
>       RFCs.
>
>       RFC8985 is a loss-detection algorithm, it does not specify any new
>       congestion control algos nor does it modify any existing CC algos as the
>       RFC clearly states. Indeed, it has a potential to make TCP quite
>       aggressive. Therefore, IMO tcpm wg should be very careful in reviewing
>       any congestion control algo proposal that possibly employs RACK for loss
>       detection.
>
>       I did not have cycles to closely follow the wg discussions on RFC 9002, so
>       I cannot tell how thoroughly it was evaluated before publishing.
>       Quite likely not quite to the extent that RFC 5033 requires. While it
>       follows quite closely and for most part the current stds track TCP
>       behaviour (NewReno / Reno CC), it has elements that potentially are more
>       agressive than current stds track TCP CC algos. If it was not evaluated
>       for all parts as RFC 5033 requires, it is a mistake that has happened and
>       any potentially incorrect behaviour that later gets observed must of
>       course be reconsidered and corrected.
>
>       > Some may say that because these proposals are not big threats, but from my point of view, they
>       are more
>       > aggressive than NewReno in some ways.
>       > I am not sure what's the clear differences between CUBIC draft and them. I personally haven't
>       seen very solid
>       > evidence that they are not unfair to the current standards. 
>       > We may need to redefine or enhance the process in the future, but at this point, I personally
>       don't have a
>       > strong reason to set a high bar only for this draft. Because I believe all docs should be
>       treated equally. 
>
>       IMO, if the IETF made a mistake with one (or some) earlier published RFCs
>       that must never be used as an excuse to repeat the same mistake.
>
>       > Hence, describing the fact that the CUBIC draft hasn't passed the RFC5033 process in the doc
>       looks
>       > sufficient to me. 
>
>       What would be the rational reason to hide the fact that the model CUBIC
>       uses to determine alpha is incorrect?
>
>       Thanks,
>
>       /Markku
>
>       > Thanks,
>       > --
>       > Yoshi
>       >  
>       >
>       > On Tue, Jun 14, 2022 at 8:02 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>       >       Hi Yoshi,
>       >
>       >       I moved your comment and the discussion on your reply under this thread on
>       >       the Issue 1 (see below)
>       >
>       >       On Tue, 14 Jun 2022, Markku Kojo wrote:
>       >
>       >       > Hi all,
>       >       >
>       >       > this thread starts the discussion on the issue 1: the incorrect model for
>       >       > determining CUBIC alpha for the congestion avoidance (CA) phase (Issue 1 a)
>       >       > and the inadequate validation of a proper constant C for the CUBIC window
>       >       > increase function (Issue 1 b).
>       >       >
>       >       >
>       >       > Issue 1 a)
>       >       > ----------
>       >       >
>       >       > The model that CUBIC uses to be fair to Reno CC (in Reno-friendly region) is
>       >       > unvalidated and actually incorrect.
>       >       >
>       >       > A more detailed description of the issue:
>       >       >
>       >       > The original paper manuscript that CUBIC bases its behaviour in the
>       >       > Reno-friendly region did a preliminary attempt to validate the model but
>       >       > failed (and the paper never got published). This is the only known attempt to
>       >       > validate the model and even this failed validation attempt was quite light,
>       >       > consisting of only a couple of network settings and obviously did not use any
>       >       > replications for the results shown in the paper. Hence, even the statistical
>       >       > validity of the results remains questionable. Results were shown only for a
>       >       > setting with AQM enabled at the bottleneck router. The results for a
>       >       > tail-drop case are missing in the paper manuscript.
>       >       >
>       >       > The report (creno.pdf, see a pointer to the doc in the email pointed to
>       >       > below) that Bob wrote provides some explanation why the model does not give
>       >       > correct results and thereby the resulting behaviour presented in the original
>       >       > paper notably deviates from that of Reno CC. The email that I wrote to the wg
>       >       > list
>       >       >
>       >       > https://mailarchive.ietf.org/arch/msg/tcpm/bds-h_a6-NliTjx-ZqUSaFpSSnA/
>       >       >
>       >       > complements Bob's explanation for the AQM case and corrects Bob's analysis
>       >       > for the tail-drop case, explaining why the model is incorrect for the
>       >       > traditional and still today prevailing tail-drop router case.
>       >       >
>       >       > Consequently, the use of the incorect model results in unknown behaviour of
>       >       > CUBIC when in the Reno-friendly region. Moreover, it is quite likely that the
>       >       > behaviour is different with different AQM implementations at the bottleneck,
>       >       > resulting in even more random behavior. This alone is very problematic and
>       >       > becomes more problematic when considering how moving out from the
>       >       > Reno-friendly region is specified: when the genuine CUBIC formula gives a
>       >       > larger cwnd than the cwnd that the Reno-friendly model gives, CUBIC moves to
>       >       > the genuine CUBIC mode that is significantly more aggressive than Reno CC.
>       >       >
>       >       > Therefore, if the incorrect model gives too low cwnd for mimicked Reno CC,
>       >       > CUBIC moves too early to the genuine CUBIC mode and becomes too agggressive
>       >       > too early even though it should behave equally aggressive as Reno CC. On the
>       >       > other hand, if the incorrect model gives too large cwnd, CUBIC is too
>       >       > aggressive throughout the Reno-friendly region.
>       >       > In summary, if the model is not correct, it results in more aggressive
>       >       > behaviour than Reno CC no matter which direction the model fails.
>       >       >
>       >       > And very importantly: some people have suggested that CUBIC should replace
>       >       > the current stds track CC algos and become the default. The behaviour of Reno
>       >       > CC is very thoroughly studied and very well understood. If we replace it with
>       >       > *unknown* behaviour, how can we anymore specify what is the correct and
>       >       > allowed aggressiveness for any upcoming CC when the behaviour of the new
>       >       > default itself is unknown, making comparative analysis of other CCs against
>       >       > CUBIC in the Reno-frindly region very difficult? The behaviour is assumed to
>       >       > be the same as Reno CC but the actual behaviour is random, it may be 2 times
>       >       > or 8 times more aggressive than Reno, for example.
>       >
>       >
>       >       On Tue, 7 Jun 2022, Yoshifumi Nishida wrote:
>       >
>       >       > Hi Markku,
>       >       >
>       >       > Thanks for the detailed feedback. This is very useful.
>       >       > One thing I would like to clarify is that we’ve already acknowledged the
>       >       > TCP friendly  model in the draft has some unsolved discussions. But, I
>       >       > believe our > current > consensus is to not change the logics for it in
>       >       > the current draft as it will require long term evaluations.
>       >       >
>       >       > So, I would like to check if you’re suggesting we should update the
>       >       > draft against it or you have some ideas to address these issues in
>       >       > some ways (e.g adding more clarification in the draft, mentioning it in
>       >       > the write-up, etc)
>       >       >
>       >       > Thanks,
>       >       > --
>       >       > Yoshi
>       >
>       >       I think the problem is even trickier because it is hard to see how it
>       >       would be possible to correct the model that is based on wrong
>       >       assumptions. This said, it is important for the wg to consider whether it
>       >       is ready to suggest publishing a congestion control algorithm that is not
>       >       correct and has not been validated. And, if the answer is yes, how to
>       >       justify it and what would be the appropriate status for the RFC as well
>       >       as the way forward after publishing the draft.
>       >
>       >       I fully symphatize those who have deployed CUBIC and understand that
>       >       there is a pressure to publish the draft with no modifications to what has
>       >       been implemented. However, RFC 5033 was written specifically to avoid this
>       >       kind of situation where an CC algo has been (widely) deployed and only
>       >       then brought to IETF standardization. It is understandable that those who
>       >       have deployed the CC algo would be very reluctant to modify the algo. On
>       >       the other hand, AFAIK all current stds track CC algos have had various
>       >       issues that have been brought up during the standardization process but
>       >       these issues have been resolved before publishing the draft. So, why we
>       >       should make an exception? IMO, wide deployment cannot be the answer
>       >       because it does not automatically reveal the negative impact to other
>       >       traffic but specific comparative measurements must be carried out.
>       >       Also, why should the IETF set a precedent for any future congestion
>       >       control drafts, implying that it is ok to first deploy a CC algo and then
>       >       bring it to IETF and use the (wide) deployment as an argument against
>       >       modifying it regardless of whatever issues it might have?
>       >
>       >       So, I don't have a good answer. IMO, if the draft is published with
>       >       unresolved issues, the draft itself must clearly identify and document
>       >       the issues and give some kind of justification and a clear way forward.
>       >       That is, we must ensure there is an initiative set and path to follow in
>       >       order to correct any shortcomings in a published RFC. Otherwise, the
>       >       issues are very likely ignored and forgotten forever.
>       >
>       >       Thanks,
>       >
>       >       /Markku
>       >
>       >
>       >
>       >       > Issue 1 b)
>       >       > ----------
>       >       >
>       >       > Another issue related to the operating in the Reno-friendly region is the
>       >       > question when CUBIC should operate in the Reno-friendly region and when it
>       >       > may move out of it. Obviously CUBIC should stay in the Reno-friendly region
>       >       > when Reno CC would be able to fully utilize the available network capacity.
>       >       > In practice, this is specified by selecting the value for constant C in the
>       >       > formula that is used to determine cwnd in the "genuine" CUBIC mode. However,
>       >       > selecting a proper value for C has not been properly validated in a wide
>       >       > range of environments as required in RFC 5033.
>       >       >
>       >       > Preliminary validation of constant C has been done for the original CUBIC
>       >       > paper. That is good enough for a scientific paper but not adequate for an
>       >       > IETF stds track algo. There seems to be no additional evaluation since the
>       >       > timeframe of the CUBIC paper publication around 15 years ago. Particularly,
>       >       > there seems to be no evaluation with AQM at the bottleneck router or with a
>       >       > buffer-bloated bottleneck router, not to mention many other network
>       >       > environments. Nor is there any data available for a non-SACK TCP sender.
>       >       >
>       >       > The evaluation of 1 a) and 1 b) must be done separately. Othserwise, it is
>       >       > very hard to tell whether any deviations are due to the incorrect model or
>       >       > incorrect value of C. The original CUBIC paper and some other papers show
>       >       > that CUBIC is not fair to Reno CC in certain network conditions where Reno CC
>       >       > has no problems in utilizing the available network capacity; instead, CUBIC
>       >       > steals capacity from Reno CC.
>       >       >
>       >       > Thanks,
>       >       >
>       >       > /Markku
>       >       >
>       >
>       >
>       >
> 
> 
>