Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up

"Rodney W. Grimes" <ietf@gndrsh.dnsmgr.net> Tue, 21 June 2022 22:54 UTC

Return-Path: <ietf@gndrsh.dnsmgr.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 469D7C14CF12; Tue, 21 Jun 2022 15:54:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.903
X-Spam-Level:
X-Spam-Status: No, score=-2.903 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.001, NICE_REPLY_B=-1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H1L3CUYvn7k0; Tue, 21 Jun 2022 15:54:09 -0700 (PDT)
Received: from gndrsh.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 491DAC14F75F; Tue, 21 Jun 2022 15:54:09 -0700 (PDT)
Received: from gndrsh.dnsmgr.net (localhost [127.0.0.1]) by gndrsh.dnsmgr.net (8.13.3/8.13.3) with ESMTP id 25LMs5k0055046; Tue, 21 Jun 2022 15:54:05 -0700 (PDT) (envelope-from ietf@gndrsh.dnsmgr.net)
Received: (from ietf@localhost) by gndrsh.dnsmgr.net (8.13.3/8.13.3/Submit) id 25LMs4h5055045; Tue, 21 Jun 2022 15:54:04 -0700 (PDT) (envelope-from ietf)
From: "Rodney W. Grimes" <ietf@gndrsh.dnsmgr.net>
Message-Id: <202206212254.25LMs4h5055045@gndrsh.dnsmgr.net>
In-Reply-To: <CAM4esxThEBcYZPdZeFbAHzweH3aFDTGnpr9BEOGdmvR3F3MSrg@mail.gmail.com>
To: Martin Duke <martin.h.duke@gmail.com>
Date: Tue, 21 Jun 2022 15:54:04 -0700
CC: Vidhi Goel <vidhi_goel=40apple.com@dmarc.ietf.org>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
X-Mailer: ELM [version 2.4ME+ PL121h (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/0ubS6wuCt6DFhmqDq2Rm1AjzQw4>
Subject: Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jun 2022 22:54:10 -0000

> (with no hats)

 [rwg] I was going to stay quiet on this, but one inline comment below.


> 
> Markku,
> 
> I think it's important to distinguish between "aggressive" algorithms that
> are aggressive and reach a superior equilibrium for everyone using that
> algorithm, and aggressive algorithms that don't scale if everyone is using
> them.
> 
> There's one scenario (A) that I think everyone would agree was acceptable:
> 1) Early adopters deploy a new algorithm
> 2) The old algorithm is not affected at all
> 3) As users migrate from new to old, the network converges on a
> higher-utilization equilibrium
> 
> Similarly, we would all agree that Scenario (B) is unacceptable
> 1) Deploy new algorithm
> 2) The old algorithm is starved and unusable
> 3) As users migrate from new to old, the network converges on a
> higher-utilization equilibrium
> 
> There's a middle ground (C) where the old algorithm suffers degraded
> performance, but not fatally. Reasonable people can disagree on where the
> exact threshold lies, and the argument has several dimensions. It's an
> eternal human argument about how much damage is acceptable in making
> technical progress that we won't settle here.
> 
> In the case of Cubic, it is *extremely widely* deployed. Whether or not
> doing damage to Reno connections was justified, we have already sped
> through (2) and have landed on (3). Cubic is the default and users

 [rwg]
Default where?  As far as I know FreeBSD, and I believe other BSD's
use newreno as the default:

	net.inet.tcp.cc.algorithm: newreno

And from the mod_cc(4) manual page of FreeBSD 12.x:
     The default algorithm is NewReno, and all connections use the default
     unless explicitly overridden using the TCP_CONGESTION socket option (see
     tcp(4) for details).  The default can be changed using a sysctl(3) MIB
     variable detailed in the MIB Variables section below.

I doubt there is a bunch of userland code calling with TCP_CONGESTION
socket options.

And... I do not know if Netflix, IIRC the source of approximately 1/3
of USA network downstream traffic has tweaked things to use cc_cubic,
but might be worth an ask.  Most of there interesting stuff is in the
use of RACK, and iirc agaion that is neither newreno OR cubic based.

Regard,
Rod Grimes

> generally have to seek out Reno to use it. So what is to be gained by
> continuing to defend an inferior equilibrium against a superior one that
> has already won in the market?
> 
> As for RFC 9002: this was an expedient choice; QUICWG needed a standard
> congestion control, was not chartered to create a new one, and there was
> only one on the shelf to choose from. If Cubic had been standards-track,
> the WG may very well have chosen that one. In the real world the most
> important production QUIC implementations are not using Reno.
> 
> On Mon, Jun 20, 2022 at 6:08 PM Vidhi Goel <vidhi_goel=
> 40apple.com@dmarc.ietf.org> wrote:
> 
> > If we are talking about RFC 9002 New Reno implementations, then that
> > already modifies RFC 5681 and doesn?t comply with RFC 5033. Since it has a
> > major change from 5681 for any congestion event, I wouldn?t call it closely
> > following new Reno. Also, in another email, you said that you didn?t follow
> > discussions on QUIC WG for RFC 9002, so how do you know whether QUIC
> > implementations are using New Reno or CUBIC congestion control?
> > It would be good to stay consistent in our replies, if you agree RFC 9002
> > is already non compliant with RFC 5033, then why use it as a reference to
> > cite Reno implementations!
> >
> > Vidhi
> >
> > > On Jun 20, 2022, at 5:06 PM, Markku Kojo <kojo=
> > 40cs.helsinki.fi@dmarc.ietf.org> wrote:
> > > ?Hi Lars,
> > >
> > > On Sun, 19 Jun 2022, Lars Eggert wrote:
> > >
> > >> Hi,
> > >>
> > >> sorry for misunderstanding/misrepresenting  your issues.
> > >>
> > >>> On Jun 6, 2022, at 13:29, Markku Kojo <kojo@cs.helsinki.fi> wrote:
> > >>> These issues are significant and some number of people have also said
> > >>> they should not be left unaddressed. Almost all of them are related to
> > >>> the behaviour of CUBIC in the TCP-friendly region where it is intended
> > >>> and required to fairly compete with the current stds track congestion
> > >>> control mechanisms. The evaluation whether CUBIC competes fairly
> > >>> *cannot* be achieved without measuring the impact of CUBIC to the
> > >>> other traffic competing with it over a shared bottleneck link. This
> > >>> does not happen by deploying but requires specifically planned
> > measurements.
> > >>
> > >> So whether CUBIC competes fairly with Reno in certain regions is a
> > >> completely academic question in 2022. There is almost no Reno traffic
> > >> anymore on the Internet or in data centers.
> > >
> > > To my understanding we have quite a bit QUIC traffic for which RFC 9002
> > has just been published and it follows Reno CC quite closely with some
> > exceptions. We have also some SCTP traffic that follows very closely Reno
> > CC and numerous proprietary UDP-based protocols that RFC 8085 requires to
> > follow the congestion control algos as described in RFC 2914 and RFC 5681.
> > So, are you saying RFC 2914, RFC 8085 and RFC 9002 are just academic
> > exercises?
> > >
> > > Moreover, my answer to why we see so little Reno CC traffic is very
> > simple: people deployed CUBIC that is more aggressive than Reno CC, so it
> > is an inherent outcome that hardly anyone is willing to run Reno CC when
> > others are running a more aggressive CC algo that leaves little room for
> > competing Reno CC.
> > >
> > >> I agree that it in an ideal world, the ubiquitous deployment of CUBIC
> > >> should have been accompanied by A/B testing, including an investigation
> > >> into impact on competing non-CUBIC traffic.
> > >>
> > >> But that didn?t happen, and we find ourselves in the situation we?re
> > in. What is gained by not recognizing CUBIC as a standard?
> > >
> > > First, if the CUBIC draft is published as it currently is that would
> > give an IETF stamp and 'official' start for "a spiral of increasingly
> > > aggressive TCP implementations" that RFC 2914 appropriately warns about.
> > The little I had time to follow L4S discussions in tsvwg people already
> > insisted to compare L4S performance to CUBIC instead of Reno CC. The fact
> > is that we don't know how much more aggressive CUBIC is than Reno CC in its
> > TCP friendly region. However, if I recall correctly it was considered Ok
> > that L4S is somewhat more aggressive than CUBIC. So, the spiral has already
> > started within the IETF as well as in the wild (Internet).
> > >
> > > Second, by recognizing CUBIC as a standard as it is currently written
> > would ensure that all issues that have been raised would get ignored and
> > forgotten forever.
> > >
> > > Third, you did not indicate which issue are you referring to. A part of
> > the issues have nothing to do with fair competition against Reno CC in
> > certain regions. E.g, issue 2 causes also self-inflicted problems to a flow
> > itself as Neal indicated based on some traces he had seen. And there is a
> > simple, effective and safe fix to it as I have proposed.
> > >
> > > As I have tried to say, I do not care too much what would be the status
> > of CUBIC when it gets published as long as we do not hide the obvious
> > issues it has and we have a clear plan to ensure that all issues that have
> > not been resoved by the time of publishing it will have a clear path and
> > incentive to get fixed. IMO that can be best achieved by publishing it as
> > Experimental and documenting all unresolved issues in the draft. That
> > approach would involve the incentive for all proponents to do whatever is
> > needed (measurements, algo fixes/tuning) to solve the remaining issues and
> > get it to stds track.
> > >
> > > But let me ask a different question: what is gained and how does the
> > community benefit from a std that is based on flawed design that does not
> > behave as intended?
> > >
> > > Congestion control specifications are considered as having significant
> > operational impact on the Internet similar to security mechanisms. Would
> > you in IESG support publication of a security mechanism that is shown to
> > not operate as intended?
> > >
> > > Could we now finally focus on solving each of the remaining issues and
> > discussing the way forward separately with each of them? Issue 3 a) has
> > pretty much been solved already (thanks Neal), some text tweaking may still
> > be needed.
> > >
> > > Thanks,
> > >
> > > /Markku
> > >
> > >> Thanks,
> > >> Lars
> > >>
> > >> --
> > >> Sent from a mobile device; please excuse typos.
> > > _______________________________________________
> > > tcpm mailing list
> > > tcpm@ietf.org
> > > https://www.ietf.org/mailman/listinfo/tcpm
> >
> > _______________________________________________
> > tcpm mailing list
> > tcpm@ietf.org
> > https://www.ietf.org/mailman/listinfo/tcpm
> >

> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm

-- 
Rod Grimes                                                 rgrimes@freebsd.org