Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBICdraft - thoughts and late follow-up)

Markku Kojo <kojo@cs.helsinki.fi> Mon, 20 June 2022 23:43 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 22B7EC15D4BA; Mon, 20 Jun 2022 16:43:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.006
X-Spam-Level:
X-Spam-Status: No, score=-2.006 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wq5QulQByaS2; Mon, 20 Jun 2022 16:42:57 -0700 (PDT)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7A0CBC15D4B6; Mon, 20 Jun 2022 16:42:55 -0700 (PDT)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Tue, 21 Jun 2022 02:42:33 +0300
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=paE/32 1CvfQlS1xe4PN1UUfgfd1oxv8CENx3GkTj/BU=; b=eTs3TpBdJ6D/jmr1v3Laa+ mJv3ZVO+MDyO7dfMHJT4ZK69fHD4vfcxZWsmaOQI0oOkEBciwnMXxr4rfNkObYxV jEV5zMDy05FwPfvllOqRckZCUxkPFWYBcVCEoKf3sygieUoTe45hV8CdDzz5l84P SoTuKmanNkWIy2M44gA5g=
Received: from hp8x-60 (85-76-41-28-nat.elisa-mobile.fi [85.76.41.28]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Tue, 21 Jun 2022 02:42:33 +0300 id 00000000005A00CF.0000000062B105E9.00002F79
Date: Tue, 21 Jun 2022 02:42:31 +0300
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Yoshifumi Nishida <nsd.ietf@gmail.com>
cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Lars Eggert <lars@eggert.org>, tcpm-chairs <tcpm-chairs@ietf.org>
In-Reply-To: <CAAK044QqfB1_gnDLNKNd15XskrC1FWhxfmytw8xvSu9uCHFRWQ@mail.gmail.com>
Message-ID: <alpine.DEB.2.21.2206200339080.7292@hp8x-60.cs.helsinki.fi>
References: <alpine.DEB.2.21.2206061517230.7292@hp8x-60.cs.helsinki.fi> <alpine.DEB.2.21.2206141739100.7292@hp8x-60.cs.helsinki.fi> <CAAK044QqfB1_gnDLNKNd15XskrC1FWhxfmytw8xvSu9uCHFRWQ@mail.gmail.com>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-12174-1655768553-0001-2"
Content-ID: <alpine.DEB.2.21.2206210242170.7292@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/_rsHptyn8eFxdlVE9uDXtucQOg0>
Subject: Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBICdraft - thoughts and late follow-up)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Jun 2022 23:43:01 -0000

Hi Yoshi,

On Wed, 15 Jun 2022, Yoshifumi Nishida wrote:

> Hi Markku,
> 
> Thanks for the response. Yes, you got valid points. But, I still have some comments.
> 
> First thing I would like to clarify is that we acknowledge the model used for CUBIC has not been validated as
> you pointed out.

Note that the model is not only unvalidated but it is also *incorrect*, 
that is, it does not perform as intended. And, the reason for the 
incorrect behaviour is different with a bottleneck at a tail-drop router 
and with a bottleneck at an AQM router.

> However, at the same time, I believe it doesn't mean the model has significant threats to the Internet. We've
> never seen such evidence even though CUBIC has been widely deployed for a long time.

This seems to be something that many don't quite understand: we cannot see 
any such evidence unless somebody measures the impact to the other 
competing traffic. It is not visible by observing the behaviour of a CUBIC 
sender that people deploying CUBIC are interested in and are likely 
to monitor quite extensively. The early measurements published along with 
the original CUBIC paper already show this problem as I have pointed out 
but the draft claims the opposite. AFAIK since then no published 
measurement have been carried out.

> I am personally thinking
> that we will need to see tangible evidence for the threats to leave out the fact that it has been widely
> used.

No, I don't think so. The responsibility of showing that there is no 
threats (or that CUBIC behaves as intended) is for those proposing an 
alternative CC algorithm.

> The second thing I would like to mention is that I am not sure how many drafts have been passed through the
> RFC5033 process. 
> For example, RFC8995, RFC9002 are congestion control related standard docs, but in my understanding, the
> process had not been applied to them.

Sorry for being unclear. I meant all stds track TCP congestion control 
RFCs.

RFC8985 is a loss-detection algorithm, it does not specify any new 
congestion control algos nor does it modify any existing CC algos as the 
RFC clearly states. Indeed, it has a potential to make TCP quite 
aggressive. Therefore, IMO tcpm wg should be very careful in reviewing 
any congestion control algo proposal that possibly employs RACK for loss 
detection.

I did not have cycles to closely follow the wg discussions on RFC 9002, so 
I cannot tell how thoroughly it was evaluated before publishing. 
Quite likely not quite to the extent that RFC 5033 requires. While it 
follows quite closely and for most part the current stds track TCP 
behaviour (NewReno / Reno CC), it has elements that potentially are more 
agressive than current stds track TCP CC algos. If it was not evaluated 
for all parts as RFC 5033 requires, it is a mistake that has happened and 
any potentially incorrect behaviour that later gets observed must of 
course be reconsidered and corrected.

> Some may say that because these proposals are not big threats, but from my point of view, they are more
> aggressive than NewReno in some ways.
> I am not sure what's the clear differences between CUBIC draft and them. I personally haven't seen very solid
> evidence that they are not unfair to the current standards. 
> We may need to redefine or enhance the process in the future, but at this point, I personally don't have a
> strong reason to set a high bar only for this draft. Because I believe all docs should be treated equally. 

IMO, if the IETF made a mistake with one (or some) earlier published RFCs 
that must never be used as an excuse to repeat the same mistake.

> Hence, describing the fact that the CUBIC draft hasn't passed the RFC5033 process in the doc looks
> sufficient to me. 

What would be the rational reason to hide the fact that the model CUBIC 
uses to determine alpha is incorrect?

Thanks,

/Markku

> Thanks,
> --
> Yoshi
>  
> 
> On Tue, Jun 14, 2022 at 8:02 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>       Hi Yoshi,
>
>       I moved your comment and the discussion on your reply under this thread on
>       the Issue 1 (see below)
>
>       On Tue, 14 Jun 2022, Markku Kojo wrote:
>
>       > Hi all,
>       >
>       > this thread starts the discussion on the issue 1: the incorrect model for
>       > determining CUBIC alpha for the congestion avoidance (CA) phase (Issue 1 a)
>       > and the inadequate validation of a proper constant C for the CUBIC window
>       > increase function (Issue 1 b).
>       >
>       >
>       > Issue 1 a)
>       > ----------
>       >
>       > The model that CUBIC uses to be fair to Reno CC (in Reno-friendly region) is
>       > unvalidated and actually incorrect.
>       >
>       > A more detailed description of the issue:
>       >
>       > The original paper manuscript that CUBIC bases its behaviour in the
>       > Reno-friendly region did a preliminary attempt to validate the model but
>       > failed (and the paper never got published). This is the only known attempt to
>       > validate the model and even this failed validation attempt was quite light,
>       > consisting of only a couple of network settings and obviously did not use any
>       > replications for the results shown in the paper. Hence, even the statistical
>       > validity of the results remains questionable. Results were shown only for a
>       > setting with AQM enabled at the bottleneck router. The results for a
>       > tail-drop case are missing in the paper manuscript.
>       >
>       > The report (creno.pdf, see a pointer to the doc in the email pointed to
>       > below) that Bob wrote provides some explanation why the model does not give
>       > correct results and thereby the resulting behaviour presented in the original
>       > paper notably deviates from that of Reno CC. The email that I wrote to the wg
>       > list
>       >
>       > https://mailarchive.ietf.org/arch/msg/tcpm/bds-h_a6-NliTjx-ZqUSaFpSSnA/
>       >
>       > complements Bob's explanation for the AQM case and corrects Bob's analysis
>       > for the tail-drop case, explaining why the model is incorrect for the
>       > traditional and still today prevailing tail-drop router case.
>       >
>       > Consequently, the use of the incorect model results in unknown behaviour of
>       > CUBIC when in the Reno-friendly region. Moreover, it is quite likely that the
>       > behaviour is different with different AQM implementations at the bottleneck,
>       > resulting in even more random behavior. This alone is very problematic and
>       > becomes more problematic when considering how moving out from the
>       > Reno-friendly region is specified: when the genuine CUBIC formula gives a
>       > larger cwnd than the cwnd that the Reno-friendly model gives, CUBIC moves to
>       > the genuine CUBIC mode that is significantly more aggressive than Reno CC.
>       >
>       > Therefore, if the incorrect model gives too low cwnd for mimicked Reno CC,
>       > CUBIC moves too early to the genuine CUBIC mode and becomes too agggressive
>       > too early even though it should behave equally aggressive as Reno CC. On the
>       > other hand, if the incorrect model gives too large cwnd, CUBIC is too
>       > aggressive throughout the Reno-friendly region.
>       > In summary, if the model is not correct, it results in more aggressive
>       > behaviour than Reno CC no matter which direction the model fails.
>       >
>       > And very importantly: some people have suggested that CUBIC should replace
>       > the current stds track CC algos and become the default. The behaviour of Reno
>       > CC is very thoroughly studied and very well understood. If we replace it with
>       > *unknown* behaviour, how can we anymore specify what is the correct and
>       > allowed aggressiveness for any upcoming CC when the behaviour of the new
>       > default itself is unknown, making comparative analysis of other CCs against
>       > CUBIC in the Reno-frindly region very difficult? The behaviour is assumed to
>       > be the same as Reno CC but the actual behaviour is random, it may be 2 times
>       > or 8 times more aggressive than Reno, for example.
> 
>
>       On Tue, 7 Jun 2022, Yoshifumi Nishida wrote:
>
>       > Hi Markku,
>       >
>       > Thanks for the detailed feedback. This is very useful.
>       > One thing I would like to clarify is that we’ve already acknowledged the
>       > TCP friendly  model in the draft has some unsolved discussions. But, I
>       > believe our > current > consensus is to not change the logics for it in
>       > the current draft as it will require long term evaluations.
>       >
>       > So, I would like to check if you’re suggesting we should update the
>       > draft against it or you have some ideas to address these issues in
>       > some ways (e.g adding more clarification in the draft, mentioning it in
>       > the write-up, etc)
>       >
>       > Thanks,
>       > --
>       > Yoshi
>
>       I think the problem is even trickier because it is hard to see how it
>       would be possible to correct the model that is based on wrong
>       assumptions. This said, it is important for the wg to consider whether it
>       is ready to suggest publishing a congestion control algorithm that is not
>       correct and has not been validated. And, if the answer is yes, how to
>       justify it and what would be the appropriate status for the RFC as well
>       as the way forward after publishing the draft.
>
>       I fully symphatize those who have deployed CUBIC and understand that
>       there is a pressure to publish the draft with no modifications to what has
>       been implemented. However, RFC 5033 was written specifically to avoid this
>       kind of situation where an CC algo has been (widely) deployed and only
>       then brought to IETF standardization. It is understandable that those who
>       have deployed the CC algo would be very reluctant to modify the algo. On
>       the other hand, AFAIK all current stds track CC algos have had various
>       issues that have been brought up during the standardization process but
>       these issues have been resolved before publishing the draft. So, why we
>       should make an exception? IMO, wide deployment cannot be the answer
>       because it does not automatically reveal the negative impact to other
>       traffic but specific comparative measurements must be carried out.
>       Also, why should the IETF set a precedent for any future congestion
>       control drafts, implying that it is ok to first deploy a CC algo and then
>       bring it to IETF and use the (wide) deployment as an argument against
>       modifying it regardless of whatever issues it might have?
>
>       So, I don't have a good answer. IMO, if the draft is published with
>       unresolved issues, the draft itself must clearly identify and document
>       the issues and give some kind of justification and a clear way forward.
>       That is, we must ensure there is an initiative set and path to follow in
>       order to correct any shortcomings in a published RFC. Otherwise, the
>       issues are very likely ignored and forgotten forever.
>
>       Thanks,
>
>       /Markku
> 
> 
>
>       > Issue 1 b)
>       > ----------
>       >
>       > Another issue related to the operating in the Reno-friendly region is the
>       > question when CUBIC should operate in the Reno-friendly region and when it
>       > may move out of it. Obviously CUBIC should stay in the Reno-friendly region
>       > when Reno CC would be able to fully utilize the available network capacity.
>       > In practice, this is specified by selecting the value for constant C in the
>       > formula that is used to determine cwnd in the "genuine" CUBIC mode. However,
>       > selecting a proper value for C has not been properly validated in a wide
>       > range of environments as required in RFC 5033.
>       >
>       > Preliminary validation of constant C has been done for the original CUBIC
>       > paper. That is good enough for a scientific paper but not adequate for an
>       > IETF stds track algo. There seems to be no additional evaluation since the
>       > timeframe of the CUBIC paper publication around 15 years ago. Particularly,
>       > there seems to be no evaluation with AQM at the bottleneck router or with a
>       > buffer-bloated bottleneck router, not to mention many other network
>       > environments. Nor is there any data available for a non-SACK TCP sender.
>       >
>       > The evaluation of 1 a) and 1 b) must be done separately. Othserwise, it is
>       > very hard to tell whether any deviations are due to the incorrect model or
>       > incorrect value of C. The original CUBIC paper and some other papers show
>       > that CUBIC is not fair to Reno CC in certain network conditions where Reno CC
>       > has no problems in utilizing the available network capacity; instead, CUBIC
>       > steals capacity from Reno CC.
>       >
>       > Thanks,
>       >
>       > /Markku
>       >
> 
> 
>