Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBIC draft - thoughts and late follow-up)

Yoshifumi Nishida <nsd.ietf@gmail.com> Wed, 15 June 2022 09:57 UTC

Return-Path: <nsd.ietf@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9D654C15D874; Wed, 15 Jun 2022 02:57:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.107
X-Spam-Level:
X-Spam-Status: No, score=-2.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KAV-iYlpyAZA; Wed, 15 Jun 2022 02:57:32 -0700 (PDT)
Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6FD2FC15D873; Wed, 15 Jun 2022 02:57:32 -0700 (PDT)
Received: by mail-wr1-x42f.google.com with SMTP id o16so14641520wra.4; Wed, 15 Jun 2022 02:57:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LVgfxHbIGCVLgwbBNFVs7dElrnyLN/IRCo4UGVwKqYQ=; b=IDHyVJ2UwO8MfdbPEW/GRi04bQHJEvLhLYxiwbD9CjfY2ABRIg8ViaYOX9l3Y6iYuQ c3ntiHEBNEiY0FxaSAtDOBZ9F8mkY+9Plwf6t/W2rSP00fzI1AUFu413fzx+ktRzjNA9 CSR5X3GzwDREcrm7bDzCTkxhbZbHGlPLTa1H/DFv6khVLbuxwxQtwphEIfNO/Dyia0PX 71aMcnHBIPcs4abUqItGu94Axcmb5AMYQSv7q6qTqGPHsJYmBC47QQcSUzW688plZwe9 s4Hp2xs1ZVLMxG+tOufs7FB3f7/8HtZxuE88beUalrwIMhqG7J4n0qjM2D0rs3YywSZ3 yFTg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LVgfxHbIGCVLgwbBNFVs7dElrnyLN/IRCo4UGVwKqYQ=; b=77zX6+CFfekNE2ija3ezKBFmxkCdnRt8iT6h663ICz5G2X77JC8xY3x2dYcnxyEOPv kBienb7jI7xnHcMO88v+Igl7o4yz5F2FsaxPew358kgEa2H3eQzSYrhGyScn+ookVGwK 1cMSln1aru11XUHkifna/viYZrGappFmlP+VTkF4qJLbC84sDHiUmGca6fZbxJdiXHJU JLr5g8bnldzG1tgC6/cKw5C1vNCOXkzHvrUDBRssSPc+eyI/YWDBlLJ/GrJMfhIa9Tfu SzEVJ4+S5brBwHLrjoNkpBrFP2+1cFjV3FyJdNrWhDqXVHmnfFiOJkF/HMw13yChGSv3 WAuA==
X-Gm-Message-State: AJIora8IPABJZSgxqkQLtUw6PoHbm40zQru5bwKG6E1kruWYl9/GeuYm E2r68tlXs5D2fw3EdfuUcawYu5TZv2h5hLt65tarYUBP
X-Google-Smtp-Source: AGRyM1twB7bZk0K6No1e+KkY6DDAbKOL9Nv0M9pOntiD+LsvoyXupgS6lf3RWYtFGUlrWdO/xi5uaZuQfZIYKPAqt2s=
X-Received: by 2002:a5d:4708:0:b0:215:d1fa:1b9e with SMTP id y8-20020a5d4708000000b00215d1fa1b9emr9257850wrq.202.1655287050321; Wed, 15 Jun 2022 02:57:30 -0700 (PDT)
MIME-Version: 1.0
References: <alpine.DEB.2.21.2206061517230.7292@hp8x-60.cs.helsinki.fi> <alpine.DEB.2.21.2206141739100.7292@hp8x-60.cs.helsinki.fi>
In-Reply-To: <alpine.DEB.2.21.2206141739100.7292@hp8x-60.cs.helsinki.fi>
From: Yoshifumi Nishida <nsd.ietf@gmail.com>
Date: Wed, 15 Jun 2022 02:57:18 -0700
Message-ID: <CAAK044QqfB1_gnDLNKNd15XskrC1FWhxfmytw8xvSu9uCHFRWQ@mail.gmail.com>
To: Markku Kojo <kojo@cs.helsinki.fi>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Lars Eggert <lars@eggert.org>, tcpm-chairs <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000a2c1fd05e1798cc5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/evOeooARwmPE93gvU_HePuVkGbE>
Subject: Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 1 (Was: Re: ProceedingCUBIC draft - thoughts and late follow-up)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2022 09:57:33 -0000

Hi Markku,

Thanks for the response. Yes, you got valid points. But, I still have some
comments.

First thing I would like to clarify is that we acknowledge the model used
for CUBIC has not been validated as you pointed out.
However, at the same time, I believe it doesn't mean the model has
significant threats to the Internet. We've never seen such evidence even
though CUBIC has been widely deployed for a long time. I am personally
thinking that we will need to see tangible evidence for the threats to
leave out the fact that it has been widely used.

The second thing I would like to mention is that I am not sure how many
drafts have been passed through the RFC5033 process.
For example, RFC8995, RFC9002 are congestion control related standard docs,
but in my understanding, the process had not been applied to them.
Some may say that because these proposals are not big threats, but from my
point of view, they are more aggressive than NewReno in some ways.
I am not sure what's the clear differences between CUBIC draft and them. I
personally haven't seen very solid evidence that they are not unfair to the
current standards.
We may need to redefine or enhance the process in the future, but at this
point, I personally don't have a strong reason to set a high bar only for
this draft. Because I believe all docs should be treated equally.
Hence, describing the fact that the CUBIC draft hasn't passed the RFC5033
process in the doc looks sufficient to me.

Thanks,
--
Yoshi


On Tue, Jun 14, 2022 at 8:02 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:

> Hi Yoshi,
>
> I moved your comment and the discussion on your reply under this thread on
> the Issue 1 (see below)
>
> On Tue, 14 Jun 2022, Markku Kojo wrote:
>
> > Hi all,
> >
> > this thread starts the discussion on the issue 1: the incorrect model
> for
> > determining CUBIC alpha for the congestion avoidance (CA) phase (Issue 1
> a)
> > and the inadequate validation of a proper constant C for the CUBIC
> window
> > increase function (Issue 1 b).
> >
> >
> > Issue 1 a)
> > ----------
> >
> > The model that CUBIC uses to be fair to Reno CC (in Reno-friendly
> region) is
> > unvalidated and actually incorrect.
> >
> > A more detailed description of the issue:
> >
> > The original paper manuscript that CUBIC bases its behaviour in the
> > Reno-friendly region did a preliminary attempt to validate the model but
> > failed (and the paper never got published). This is the only known
> attempt to
> > validate the model and even this failed validation attempt was quite
> light,
> > consisting of only a couple of network settings and obviously did not
> use any
> > replications for the results shown in the paper. Hence, even the
> statistical
> > validity of the results remains questionable. Results were shown only
> for a
> > setting with AQM enabled at the bottleneck router. The results for a
> > tail-drop case are missing in the paper manuscript.
> >
> > The report (creno.pdf, see a pointer to the doc in the email pointed to
> > below) that Bob wrote provides some explanation why the model does not
> give
> > correct results and thereby the resulting behaviour presented in the
> original
> > paper notably deviates from that of Reno CC. The email that I wrote to
> the wg
> > list
> >
> > https://mailarchive.ietf.org/arch/msg/tcpm/bds-h_a6-NliTjx-ZqUSaFpSSnA/
> >
> > complements Bob's explanation for the AQM case and corrects Bob's
> analysis
> > for the tail-drop case, explaining why the model is incorrect for the
> > traditional and still today prevailing tail-drop router case.
> >
> > Consequently, the use of the incorect model results in unknown behaviour
> of
> > CUBIC when in the Reno-friendly region. Moreover, it is quite likely
> that the
> > behaviour is different with different AQM implementations at the
> bottleneck,
> > resulting in even more random behavior. This alone is very problematic
> and
> > becomes more problematic when considering how moving out from the
> > Reno-friendly region is specified: when the genuine CUBIC formula gives
> a
> > larger cwnd than the cwnd that the Reno-friendly model gives, CUBIC
> moves to
> > the genuine CUBIC mode that is significantly more aggressive than Reno
> CC.
> >
> > Therefore, if the incorrect model gives too low cwnd for mimicked Reno
> CC,
> > CUBIC moves too early to the genuine CUBIC mode and becomes too
> agggressive
> > too early even though it should behave equally aggressive as Reno CC. On
> the
> > other hand, if the incorrect model gives too large cwnd, CUBIC is too
> > aggressive throughout the Reno-friendly region.
> > In summary, if the model is not correct, it results in more aggressive
> > behaviour than Reno CC no matter which direction the model fails.
> >
> > And very importantly: some people have suggested that CUBIC should
> replace
> > the current stds track CC algos and become the default. The behaviour of
> Reno
> > CC is very thoroughly studied and very well understood. If we replace it
> with
> > *unknown* behaviour, how can we anymore specify what is the correct and
> > allowed aggressiveness for any upcoming CC when the behaviour of the new
> > default itself is unknown, making comparative analysis of other CCs
> against
> > CUBIC in the Reno-frindly region very difficult? The behaviour is
> assumed to
> > be the same as Reno CC but the actual behaviour is random, it may be 2
> times
> > or 8 times more aggressive than Reno, for example.
>
>
> On Tue, 7 Jun 2022, Yoshifumi Nishida wrote:
>
> > Hi Markku,
> >
> > Thanks for the detailed feedback. This is very useful.
> > One thing I would like to clarify is that we’ve already acknowledged the
> > TCP friendly  model in the draft has some unsolved discussions. But, I
> > believe our > current > consensus is to not change the logics for it in
> > the current draft as it will require long term evaluations.
> >
> > So, I would like to check if you’re suggesting we should update the
> > draft against it or you have some ideas to address these issues in
> > some ways (e.g adding more clarification in the draft, mentioning it in
> > the write-up, etc)
> >
> > Thanks,
> > --
> > Yoshi
>
> I think the problem is even trickier because it is hard to see how it
> would be possible to correct the model that is based on wrong
> assumptions. This said, it is important for the wg to consider whether it
> is ready to suggest publishing a congestion control algorithm that is not
> correct and has not been validated. And, if the answer is yes, how to
> justify it and what would be the appropriate status for the RFC as well
> as the way forward after publishing the draft.
>
> I fully symphatize those who have deployed CUBIC and understand that
> there is a pressure to publish the draft with no modifications to what has
> been implemented. However, RFC 5033 was written specifically to avoid this
> kind of situation where an CC algo has been (widely) deployed and only
> then brought to IETF standardization. It is understandable that those who
> have deployed the CC algo would be very reluctant to modify the algo. On
> the other hand, AFAIK all current stds track CC algos have had various
> issues that have been brought up during the standardization process but
> these issues have been resolved before publishing the draft. So, why we
> should make an exception? IMO, wide deployment cannot be the answer
> because it does not automatically reveal the negative impact to other
> traffic but specific comparative measurements must be carried out.
> Also, why should the IETF set a precedent for any future congestion
> control drafts, implying that it is ok to first deploy a CC algo and then
> bring it to IETF and use the (wide) deployment as an argument against
> modifying it regardless of whatever issues it might have?
>
> So, I don't have a good answer. IMO, if the draft is published with
> unresolved issues, the draft itself must clearly identify and document
> the issues and give some kind of justification and a clear way forward.
> That is, we must ensure there is an initiative set and path to follow in
> order to correct any shortcomings in a published RFC. Otherwise, the
> issues are very likely ignored and forgotten forever.
>
> Thanks,
>
> /Markku
>
>
>
> > Issue 1 b)
> > ----------
> >
> > Another issue related to the operating in the Reno-friendly region is
> the
> > question when CUBIC should operate in the Reno-friendly region and when
> it
> > may move out of it. Obviously CUBIC should stay in the Reno-friendly
> region
> > when Reno CC would be able to fully utilize the available network
> capacity.
> > In practice, this is specified by selecting the value for constant C in
> the
> > formula that is used to determine cwnd in the "genuine" CUBIC mode.
> However,
> > selecting a proper value for C has not been properly validated in a wide
> > range of environments as required in RFC 5033.
> >
> > Preliminary validation of constant C has been done for the original
> CUBIC
> > paper. That is good enough for a scientific paper but not adequate for
> an
> > IETF stds track algo. There seems to be no additional evaluation since
> the
> > timeframe of the CUBIC paper publication around 15 years ago.
> Particularly,
> > there seems to be no evaluation with AQM at the bottleneck router or
> with a
> > buffer-bloated bottleneck router, not to mention many other network
> > environments. Nor is there any data available for a non-SACK TCP sender.
> >
> > The evaluation of 1 a) and 1 b) must be done separately. Othserwise, it
> is
> > very hard to tell whether any deviations are due to the incorrect model
> or
> > incorrect value of C. The original CUBIC paper and some other papers
> show
> > that CUBIC is not fair to Reno CC in certain network conditions where
> Reno CC
> > has no problems in utilizing the available network capacity; instead,
> CUBIC
> > steals capacity from Reno CC.
> >
> > Thanks,
> >
> > /Markku
> >