Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBICdraft - thoughts and late follow-up)

Bob Briscoe <ietf@bobbriscoe.net> Wed, 03 August 2022 09:34 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AE8E1C157B43; Wed, 3 Aug 2022 02:34:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FtPAomKOgrwF; Wed, 3 Aug 2022 02:34:02 -0700 (PDT)
Received: from mail-ssdrsserver2.hostinginterface.eu (mail-ssdrsserver2.hostinginterface.eu [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 47A19C157B4D; Wed, 3 Aug 2022 02:34:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=In-Reply-To:From:References:Cc:To:Subject: MIME-Version:Date:Message-ID:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=uQL2W6+fB3KYg0RcWoYwOEFoarpcXy24LnFp5/JNE6w=; b=hpZfffXwJTu+7o7wxm05CLV5i5 HLm73kysbgztohmH3H8VC4/Xc3WJPNqR/37XKwzf14ssiPa4zTJv7QpFR/dy5Xlnp40DsT1qs99rT PxhA5kEmiUsSilqeCHMFgN+K2wUNKITD76pec/k0DbVcUF/lan1nO9g6PmB5Oh73UJMT5juJYNyLu meKVK7SLKPmJeQdDv2yll/6UTmCAoi7qinWlO+2GS7V1m5YXYfZHy1Hhk/PJYQvfOwDUuxtPXha/F xSGcKGZFrglV0q5QVfjZ2Kr0Al3Qxhn+i3/6uVDDOcLeoPhya6wZB16f1tUIbC1XvN6XoLBfWGsCF 4u4cl2Sg==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:44976 helo=[192.168.1.11]) by ssdrsserver2.hostinginterface.eu with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.95) (envelope-from <ietf@bobbriscoe.net>) id 1oJAlO-0003Lm-PN; Wed, 03 Aug 2022 10:33:57 +0100
Content-Type: multipart/alternative; boundary="------------osEh6gCv83Pp95yCwbvzRCqM"
Message-ID: <21ed7cfd-d88b-4efb-0034-fc5e74ff2a43@bobbriscoe.net>
Date: Wed, 03 Aug 2022 10:33:56 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0
Content-Language: en-GB
To: Yoshifumi Nishida <nsd.ietf@gmail.com>, Markku Kojo <kojo@cs.helsinki.fi>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
References: <alpine.DEB.2.21.2206061517230.7292@hp8x-60.cs.helsinki.fi> <alpine.DEB.2.21.2206141739100.7292@hp8x-60.cs.helsinki.fi> <CAAK044QqfB1_gnDLNKNd15XskrC1FWhxfmytw8xvSu9uCHFRWQ@mail.gmail.com> <alpine.DEB.2.21.2206200339080.7292@hp8x-60.cs.helsinki.fi> <CAAK044QiP-EvLR0MAbpfH274+M0KyhO9v_qop1tBKcUW6EVBZw@mail.gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
In-Reply-To: <CAAK044QiP-EvLR0MAbpfH274+M0KyhO9v_qop1tBKcUW6EVBZw@mail.gmail.com>
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hostinginterface.eu
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hostinginterface.eu: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hostinginterface.eu: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/lA4YDRblRBWLHKFjM2m3VcCDc5Q>
Subject: Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBICdraft - thoughts and late follow-up)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Aug 2022 09:34:06 -0000

Yoshi, Markku,

This note is just to tie two threads together.
I just sent a long response to Markku on the alpha_cubic thread that has 
since been numbered as Issue 1.

Summary: so far we show that the model that was used to calculate the 
cubic_alpha value of 0.53 is not absolutely precise, but it gives equal 
rate flows to a good approximation (within about 10% from analysis and 
even closer in experiments over an AQM). So it is extremely unlikely 
that there is any danger to the Internet here. Even if you believe flow 
equality is critical, this is in the noise.

I wrote up a derivation of the formula with more carefully worded 
assumptions, and I've now added:
* estimation of the likely error due to the known approximations of reality.
* some empirical results
https://raw.githubusercontent.com/bbriscoe/cubic-reno/main/creno_tr.pdf

I can't believe that no-one has done this sort of testing already, but I 
couldn't find another paper. So, in the next few days I intend to submit 
this tech report to arXiv, which is an archival service with a very 
light review process and therefore fast turn-round. Then, if the 
rfc8312-bis authors want, they can refer to that report for the 
justification of the alpha_cubic value, rather than the Floyd paper that 
is outdated and controversial. The RFC Editor accepts arXiv references.


I would add that I didn't really have the time to do this, but I just 
did it, because I could see a way to resolve it even though I don't need 
this RFC to proceed myself - I was just fed up with this conversation 
going round and round with no evidence or basis for either position.

Regards


Bob

On 21/06/2022 09:37, Yoshifumi Nishida wrote:
> Hi Markku,
>
> I think the important point is "any potentially incorrect behaviour 
> that later gets observed must of
> course be reconsidered and corrected."  I believe all people in this 
> community will agree with it.
>
> So, I think the gap in the discussion is whether we publish the doc 
> and fix it when we see some issues or
> we don't publish the doc until we can think there's no potential issue.
> If we chose the prior one and we found some issues right after we 
> publish the doc, then I will admit the decision was a mistake.
> But, I think the possibility for it is very low as we probably will 
> need to monitor extensively as you mentioned.
>
> In the end, we all know evaluating CC is not an easy task at all and 
> CUBIC is not an exception. Hence, I think
> the publishing and fixing later strategy isn't a bad idea here. OTOH, 
> If we choose the latter strategy, I am concerned
> it may take years for publishing and it won't be good for the community.
>
> So, if we want to hold this process, I would like to see some solid 
> evidence for the negative impacts in the current CUBIC logics.
> Some might say the people who propose CUBIC should prove it's safe, 
> which I can agree with when many people are skeptical about it.
> But, from my point of view, the current situation is the opposite. 
> Hence, I think the people who oppose publishing should prove its risk.
>
> --
> Yoshi
>
> On Mon, Jun 20, 2022 at 4:42 PM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>
>     Hi Yoshi,
>
>     On Wed, 15 Jun 2022, Yoshifumi Nishida wrote:
>
>     > Hi Markku,
>     >
>     > Thanks for the response. Yes, you got valid points. But, I still
>     have some comments.
>     >
>     > First thing I would like to clarify is that we acknowledge the
>     model used for CUBIC has not been validated as
>     > you pointed out.
>
>     Note that the model is not only unvalidated but it is also
>     *incorrect*,
>     that is, it does not perform as intended. And, the reason for the
>     incorrect behaviour is different with a bottleneck at a tail-drop
>     router
>     and with a bottleneck at an AQM router.
>
>     > However, at the same time, I believe it doesn't mean the model
>     has significant threats to the Internet. We've
>     > never seen such evidence even though CUBIC has been widely
>     deployed for a long time.
>
>     This seems to be something that many don't quite understand: we
>     cannot see
>     any such evidence unless somebody measures the impact to the other
>     competing traffic. It is not visible by observing the behaviour of
>     a CUBIC
>     sender that people deploying CUBIC are interested in and are likely
>     to monitor quite extensively. The early measurements published
>     along with
>     the original CUBIC paper already show this problem as I have
>     pointed out
>     but the draft claims the opposite. AFAIK since then no published
>     measurement have been carried out.
>
>     > I am personally thinking
>     > that we will need to see tangible evidence for the threats to
>     leave out the fact that it has been widely
>     > used.
>
>     No, I don't think so. The responsibility of showing that there is no
>     threats (or that CUBIC behaves as intended) is for those proposing an
>     alternative CC algorithm.
>
>     > The second thing I would like to mention is that I am not sure
>     how many drafts have been passed through the
>     > RFC5033 process.
>     > For example, RFC8995, RFC9002 are congestion control related
>     standard docs, but in my understanding, the
>     > process had not been applied to them.
>
>     Sorry for being unclear. I meant all stds track TCP congestion
>     control
>     RFCs.
>
>     RFC8985 is a loss-detection algorithm, it does not specify any new
>     congestion control algos nor does it modify any existing CC algos
>     as the
>     RFC clearly states. Indeed, it has a potential to make TCP quite
>     aggressive. Therefore, IMO tcpm wg should be very careful in
>     reviewing
>     any congestion control algo proposal that possibly employs RACK
>     for loss
>     detection.
>
>     I did not have cycles to closely follow the wg discussions on RFC
>     9002, so
>     I cannot tell how thoroughly it was evaluated before publishing.
>     Quite likely not quite to the extent that RFC 5033 requires. While it
>     follows quite closely and for most part the current stds track TCP
>     behaviour (NewReno / Reno CC), it has elements that potentially
>     are more
>     agressive than current stds track TCP CC algos. If it was not
>     evaluated
>     for all parts as RFC 5033 requires, it is a mistake that has
>     happened and
>     any potentially incorrect behaviour that later gets observed must of
>     course be reconsidered and corrected.
>
>     > Some may say that because these proposals are not big threats,
>     but from my point of view, they are more
>     > aggressive than NewReno in some ways.
>     > I am not sure what's the clear differences between CUBIC draft
>     and them. I personally haven't seen very solid
>     > evidence that they are not unfair to the current standards.
>     > We may need to redefine or enhance the process in the future,
>     but at this point, I personally don't have a
>     > strong reason to set a high bar only for this draft. Because I
>     believe all docs should be treated equally.
>
>     IMO, if the IETF made a mistake with one (or some) earlier
>     published RFCs
>     that must never be used as an excuse to repeat the same mistake.
>
>     > Hence, describing the fact that the CUBIC draft hasn't passed
>     the RFC5033 process in the doc looks
>     > sufficient to me.
>
>     What would be the rational reason to hide the fact that the model
>     CUBIC
>     uses to determine alpha is incorrect?
>
>     Thanks,
>
>     /Markku
>
>     > Thanks,
>     > --
>     > Yoshi
>     >
>     >
>     > On Tue, Jun 14, 2022 at 8:02 AM Markku Kojo
>     <kojo@cs.helsinki.fi> wrote:
>     >       Hi Yoshi,
>     >
>     >       I moved your comment and the discussion on your reply
>     under this thread on
>     >       the Issue 1 (see below)
>     >
>     >       On Tue, 14 Jun 2022, Markku Kojo wrote:
>     >
>     >       > Hi all,
>     >       >
>     >       > this thread starts the discussion on the issue 1: the
>     incorrect model for
>     >       > determining CUBIC alpha for the congestion avoidance
>     (CA) phase (Issue 1 a)
>     >       > and the inadequate validation of a proper constant C for
>     the CUBIC window
>     >       > increase function (Issue 1 b).
>     >       >
>     >       >
>     >       > Issue 1 a)
>     >       > ----------
>     >       >
>     >       > The model that CUBIC uses to be fair to Reno CC (in
>     Reno-friendly region) is
>     >       > unvalidated and actually incorrect.
>     >       >
>     >       > A more detailed description of the issue:
>     >       >
>     >       > The original paper manuscript that CUBIC bases its
>     behaviour in the
>     >       > Reno-friendly region did a preliminary attempt to
>     validate the model but
>     >       > failed (and the paper never got published). This is the
>     only known attempt to
>     >       > validate the model and even this failed validation
>     attempt was quite light,
>     >       > consisting of only a couple of network settings and
>     obviously did not use any
>     >       > replications for the results shown in the paper. Hence,
>     even the statistical
>     >       > validity of the results remains questionable. Results
>     were shown only for a
>     >       > setting with AQM enabled at the bottleneck router. The
>     results for a
>     >       > tail-drop case are missing in the paper manuscript.
>     >       >
>     >       > The report (creno.pdf, see a pointer to the doc in the
>     email pointed to
>     >       > below) that Bob wrote provides some explanation why the
>     model does not give
>     >       > correct results and thereby the resulting behaviour
>     presented in the original
>     >       > paper notably deviates from that of Reno CC. The email
>     that I wrote to the wg
>     >       > list
>     >       >
>     >       >
>     https://mailarchive.ietf.org/arch/msg/tcpm/bds-h_a6-NliTjx-ZqUSaFpSSnA/
>     >       >
>     >       > complements Bob's explanation for the AQM case and
>     corrects Bob's analysis
>     >       > for the tail-drop case, explaining why the model is
>     incorrect for the
>     >       > traditional and still today prevailing tail-drop router
>     case.
>     >       >
>     >       > Consequently, the use of the incorect model results in
>     unknown behaviour of
>     >       > CUBIC when in the Reno-friendly region. Moreover, it is
>     quite likely that the
>     >       > behaviour is different with different AQM
>     implementations at the bottleneck,
>     >       > resulting in even more random behavior. This alone is
>     very problematic and
>     >       > becomes more problematic when considering how moving out
>     from the
>     >       > Reno-friendly region is specified: when the genuine
>     CUBIC formula gives a
>     >       > larger cwnd than the cwnd that the Reno-friendly model
>     gives, CUBIC moves to
>     >       > the genuine CUBIC mode that is significantly more
>     aggressive than Reno CC.
>     >       >
>     >       > Therefore, if the incorrect model gives too low cwnd for
>     mimicked Reno CC,
>     >       > CUBIC moves too early to the genuine CUBIC mode and
>     becomes too agggressive
>     >       > too early even though it should behave equally
>     aggressive as Reno CC. On the
>     >       > other hand, if the incorrect model gives too large cwnd,
>     CUBIC is too
>     >       > aggressive throughout the Reno-friendly region.
>     >       > In summary, if the model is not correct, it results in
>     more aggressive
>     >       > behaviour than Reno CC no matter which direction the
>     model fails.
>     >       >
>     >       > And very importantly: some people have suggested that
>     CUBIC should replace
>     >       > the current stds track CC algos and become the default.
>     The behaviour of Reno
>     >       > CC is very thoroughly studied and very well understood.
>     If we replace it with
>     >       > *unknown* behaviour, how can we anymore specify what is
>     the correct and
>     >       > allowed aggressiveness for any upcoming CC when the
>     behaviour of the new
>     >       > default itself is unknown, making comparative analysis
>     of other CCs against
>     >       > CUBIC in the Reno-frindly region very difficult? The
>     behaviour is assumed to
>     >       > be the same as Reno CC but the actual behaviour is
>     random, it may be 2 times
>     >       > or 8 times more aggressive than Reno, for example.
>     >
>     >
>     >       On Tue, 7 Jun 2022, Yoshifumi Nishida wrote:
>     >
>     >       > Hi Markku,
>     >       >
>     >       > Thanks for the detailed feedback. This is very useful.
>     >       > One thing I would like to clarify is that we’ve already
>     acknowledged the
>     >       > TCP friendly  model in the draft has some unsolved
>     discussions. But, I
>     >       > believe our > current > consensus is to not change the
>     logics for it in
>     >       > the current draft as it will require long term evaluations.
>     >       >
>     >       > So, I would like to check if you’re suggesting we should
>     update the
>     >       > draft against it or you have some ideas to address these
>     issues in
>     >       > some ways (e.g adding more clarification in the draft,
>     mentioning it in
>     >       > the write-up, etc)
>     >       >
>     >       > Thanks,
>     >       > --
>     >       > Yoshi
>     >
>     >       I think the problem is even trickier because it is hard to
>     see how it
>     >       would be possible to correct the model that is based on wrong
>     >       assumptions. This said, it is important for the wg to
>     consider whether it
>     >       is ready to suggest publishing a congestion control
>     algorithm that is not
>     >       correct and has not been validated. And, if the answer is
>     yes, how to
>     >       justify it and what would be the appropriate status for
>     the RFC as well
>     >       as the way forward after publishing the draft.
>     >
>     >       I fully symphatize those who have deployed CUBIC and
>     understand that
>     >       there is a pressure to publish the draft with no
>     modifications to what has
>     >       been implemented. However, RFC 5033 was written
>     specifically to avoid this
>     >       kind of situation where an CC algo has been (widely)
>     deployed and only
>     >       then brought to IETF standardization. It is understandable
>     that those who
>     >       have deployed the CC algo would be very reluctant to
>     modify the algo. On
>     >       the other hand, AFAIK all current stds track CC algos have
>     had various
>     >       issues that have been brought up during the
>     standardization process but
>     >       these issues have been resolved before publishing the
>     draft. So, why we
>     >       should make an exception? IMO, wide deployment cannot be
>     the answer
>     >       because it does not automatically reveal the negative
>     impact to other
>     >       traffic but specific comparative measurements must be
>     carried out.
>     >       Also, why should the IETF set a precedent for any future
>     congestion
>     >       control drafts, implying that it is ok to first deploy a
>     CC algo and then
>     >       bring it to IETF and use the (wide) deployment as an
>     argument against
>     >       modifying it regardless of whatever issues it might have?
>     >
>     >       So, I don't have a good answer. IMO, if the draft is
>     published with
>     >       unresolved issues, the draft itself must clearly identify
>     and document
>     >       the issues and give some kind of justification and a clear
>     way forward.
>     >       That is, we must ensure there is an initiative set and
>     path to follow in
>     >       order to correct any shortcomings in a published RFC.
>     Otherwise, the
>     >       issues are very likely ignored and forgotten forever.
>     >
>     >       Thanks,
>     >
>     >       /Markku
>     >
>     >
>     >
>     >       > Issue 1 b)
>     >       > ----------
>     >       >
>     >       > Another issue related to the operating in the
>     Reno-friendly region is the
>     >       > question when CUBIC should operate in the Reno-friendly
>     region and when it
>     >       > may move out of it. Obviously CUBIC should stay in the
>     Reno-friendly region
>     >       > when Reno CC would be able to fully utilize the
>     available network capacity.
>     >       > In practice, this is specified by selecting the value
>     for constant C in the
>     >       > formula that is used to determine cwnd in the "genuine"
>     CUBIC mode. However,
>     >       > selecting a proper value for C has not been properly
>     validated in a wide
>     >       > range of environments as required in RFC 5033.
>     >       >
>     >       > Preliminary validation of constant C has been done for
>     the original CUBIC
>     >       > paper. That is good enough for a scientific paper but
>     not adequate for an
>     >       > IETF stds track algo. There seems to be no additional
>     evaluation since the
>     >       > timeframe of the CUBIC paper publication around 15 years
>     ago. Particularly,
>     >       > there seems to be no evaluation with AQM at the
>     bottleneck router or with a
>     >       > buffer-bloated bottleneck router, not to mention many
>     other network
>     >       > environments. Nor is there any data available for a
>     non-SACK TCP sender.
>     >       >
>     >       > The evaluation of 1 a) and 1 b) must be done separately.
>     Othserwise, it is
>     >       > very hard to tell whether any deviations are due to the
>     incorrect model or
>     >       > incorrect value of C. The original CUBIC paper and some
>     other papers show
>     >       > that CUBIC is not fair to Reno CC in certain network
>     conditions where Reno CC
>     >       > has no problems in utilizing the available network
>     capacity; instead, CUBIC
>     >       > steals capacity from Reno CC.
>     >       >
>     >       > Thanks,
>     >       >
>     >       > /Markku
>     >       >
>     >
>     >
>     >
>
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/