Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up

Randall Stewart <rrs@netflix.com> Thu, 23 June 2022 12:52 UTC

Return-Path: <rrs@netflix.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0949AC159481 for <tcpm@ietfa.amsl.com>; Thu, 23 Jun 2022 05:52:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.853
X-Spam-Level:
X-Spam-Status: No, score=-2.853 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.745, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=netflix.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MQBp__WdSz-t for <tcpm@ietfa.amsl.com>; Thu, 23 Jun 2022 05:52:17 -0700 (PDT)
Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9A563C157B4C for <tcpm@ietf.org>; Thu, 23 Jun 2022 05:52:17 -0700 (PDT)
Received: by mail-pl1-x634.google.com with SMTP id m2so10825330plx.3 for <tcpm@ietf.org>; Thu, 23 Jun 2022 05:52:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=yHgBMnaI3QiLr/d7S9HSlHioeq+nEwFlTAK5AAyocv0=; b=caaTN4l1d2l9uHYschaTyI9RL29nh/HTROT+nBOTWL+IOcoCrtsvKO0AFcDJ5yfz3M 9Garuvn2/tQybg+rIV8xQFB5J5Q8oexM1FZceiJeoct+EA1cCdB7VWI9rqBEVpA/xFWs +kTgWQB/vznwkggJfI1rzBYydzRK/mFGOCM4w=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=yHgBMnaI3QiLr/d7S9HSlHioeq+nEwFlTAK5AAyocv0=; b=deJM8qeQstKEB6lw9bwQmgNZOO086J/bEXIgDrr3AFY0vDhv+ae5DkYOY0o7k1ndLU 7v0xGa18p1AR2eMO5Jydev0FTu9gtaQRpRBBkRZ2emcHTubAZ6pOsLdIgOm0k+JLGaRV Yh7B5tKQ12gSlCtbTo97egxu48HoxgkdwfZVTRtKjBjPcI7/aQJNJFixc9AZkxgBObc+ qX/WVfem6FxhF8eyvzKc3mCDpVt1TYmGYNGgLyH8MecrzY0MwzdGz4066iOymLYs2GIw 5RdkFCm5WzL4rW+xxneBZbHffEP2Nqgn/nTVtMG+3sAeBCbpNifqpGVJas2CnnhiMfjt nKlg==
X-Gm-Message-State: AJIora821FUA1AmP0UC1m/zbC3gVfbwI+otrbCXC7+SZ8+kUY365cQiE ZTPDpRt9B09UzL7nu0O9NYatYhXSgc4Fww==
X-Google-Smtp-Source: AGRyM1viv+v7H4dSpnd+vDg6re/q/veybyZ60WQ/kksvV/PVJJG58bJ7cMNRsLT/u14RaRon0LYODA==
X-Received: by 2002:a17:90b:2248:b0:1ea:8403:d81c with SMTP id hk8-20020a17090b224800b001ea8403d81cmr3957852pjb.97.1655988736697; Thu, 23 Jun 2022 05:52:16 -0700 (PDT)
Received: from smtpclient.apple (072-239-136-185.res.spectrum.com. [72.239.136.185]) by smtp.gmail.com with ESMTPSA id 25-20020aa79119000000b0050dc7628150sm3410226pfh.42.2022.06.23.05.52.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Jun 2022 05:52:16 -0700 (PDT)
From: Randall Stewart <rrs@netflix.com>
Message-Id: <10E105F3-9DD7-431C-BEE7-4E5193498FE3@netflix.com>
Content-Type: multipart/signed; boundary="Apple-Mail=_010B2BF2-C70C-4F72-A6A5-18E156523CC5"; protocol="application/pkcs7-signature"; micalg="sha-256"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Date: Thu, 23 Jun 2022 08:52:13 -0400
In-Reply-To: <202206212254.25LMs4h5055045@gndrsh.dnsmgr.net>
Cc: Martin Duke <martin.h.duke@gmail.com>, Vidhi Goel <vidhi_goel=40apple.com@dmarc.ietf.org>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
To: "Rodney W. Grimes" <ietf@gndrsh.dnsmgr.net>
References: <202206212254.25LMs4h5055045@gndrsh.dnsmgr.net>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/4hB13IT-XKt4VlPcx1aIr6_vR8k>
Subject: Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jun 2022 12:52:21 -0000

Rodney:

I wanted to weigh in just a small bit on this since you mentioned Netflix :)

Netflix currently utilizes NewReno's linear congestion window increase function for our content delivery TCP connections, 
rather than the RFC8312bis "cubic" congestion window increase function. Most of our performance-oriented efforts have focused 
on improved loss detection and recovery (RACK) and judicious application of TCP pacing. CUBIC has indeed been widely deployed 
for a long time. However, given that our TCP connections are frequently over short or modest paths (rather than the long paths 
that motivated Cubic's early development) we're unaware of CUBIC-based competing connections being a problem for us.

Best wishes

R

> On Jun 21, 2022, at 6:54 PM, Rodney W. Grimes <ietf@gndrsh.dnsmgr.net> wrote:
> 
>> (with no hats)
> 
> [rwg] I was going to stay quiet on this, but one inline comment below.
> 
> 
>> 
>> Markku,
>> 
>> I think it's important to distinguish between "aggressive" algorithms that
>> are aggressive and reach a superior equilibrium for everyone using that
>> algorithm, and aggressive algorithms that don't scale if everyone is using
>> them.
>> 
>> There's one scenario (A) that I think everyone would agree was acceptable:
>> 1) Early adopters deploy a new algorithm
>> 2) The old algorithm is not affected at all
>> 3) As users migrate from new to old, the network converges on a
>> higher-utilization equilibrium
>> 
>> Similarly, we would all agree that Scenario (B) is unacceptable
>> 1) Deploy new algorithm
>> 2) The old algorithm is starved and unusable
>> 3) As users migrate from new to old, the network converges on a
>> higher-utilization equilibrium
>> 
>> There's a middle ground (C) where the old algorithm suffers degraded
>> performance, but not fatally. Reasonable people can disagree on where the
>> exact threshold lies, and the argument has several dimensions. It's an
>> eternal human argument about how much damage is acceptable in making
>> technical progress that we won't settle here.
>> 
>> In the case of Cubic, it is *extremely widely* deployed. Whether or not
>> doing damage to Reno connections was justified, we have already sped
>> through (2) and have landed on (3). Cubic is the default and users
> 
> [rwg]
> Default where?  As far as I know FreeBSD, and I believe other BSD's
> use newreno as the default:
> 
> 	net.inet.tcp.cc.algorithm: newreno
> 
> And from the mod_cc(4) manual page of FreeBSD 12.x:
>     The default algorithm is NewReno, and all connections use the default
>     unless explicitly overridden using the TCP_CONGESTION socket option (see
>     tcp(4) for details).  The default can be changed using a sysctl(3) MIB
>     variable detailed in the MIB Variables section below.
> 
> I doubt there is a bunch of userland code calling with TCP_CONGESTION
> socket options.
> 
> And... I do not know if Netflix, IIRC the source of approximately 1/3
> of USA network downstream traffic has tweaked things to use cc_cubic,
> but might be worth an ask.  Most of there interesting stuff is in the
> use of RACK, and iirc agaion that is neither newreno OR cubic based.
> 
> Regard,
> Rod Grimes
> 
>> generally have to seek out Reno to use it. So what is to be gained by
>> continuing to defend an inferior equilibrium against a superior one that
>> has already won in the market?
>> 
>> As for RFC 9002: this was an expedient choice; QUICWG needed a standard
>> congestion control, was not chartered to create a new one, and there was
>> only one on the shelf to choose from. If Cubic had been standards-track,
>> the WG may very well have chosen that one. In the real world the most
>> important production QUIC implementations are not using Reno.
>> 
>> On Mon, Jun 20, 2022 at 6:08 PM Vidhi Goel <vidhi_goel=
>> 40apple.com@dmarc.ietf.org> wrote:
>> 
>>> If we are talking about RFC 9002 New Reno implementations, then that
>>> already modifies RFC 5681 and doesn?t comply with RFC 5033. Since it has a
>>> major change from 5681 for any congestion event, I wouldn?t call it closely
>>> following new Reno. Also, in another email, you said that you didn?t follow
>>> discussions on QUIC WG for RFC 9002, so how do you know whether QUIC
>>> implementations are using New Reno or CUBIC congestion control?
>>> It would be good to stay consistent in our replies, if you agree RFC 9002
>>> is already non compliant with RFC 5033, then why use it as a reference to
>>> cite Reno implementations!
>>> 
>>> Vidhi
>>> 
>>>> On Jun 20, 2022, at 5:06 PM, Markku Kojo <kojo=
>>> 40cs.helsinki.fi@dmarc.ietf.org> wrote:
>>>> ?Hi Lars,
>>>> 
>>>> On Sun, 19 Jun 2022, Lars Eggert wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> sorry for misunderstanding/misrepresenting  your issues.
>>>>> 
>>>>>> On Jun 6, 2022, at 13:29, Markku Kojo <kojo@cs.helsinki.fi> wrote:
>>>>>> These issues are significant and some number of people have also said
>>>>>> they should not be left unaddressed. Almost all of them are related to
>>>>>> the behaviour of CUBIC in the TCP-friendly region where it is intended
>>>>>> and required to fairly compete with the current stds track congestion
>>>>>> control mechanisms. The evaluation whether CUBIC competes fairly
>>>>>> *cannot* be achieved without measuring the impact of CUBIC to the
>>>>>> other traffic competing with it over a shared bottleneck link. This
>>>>>> does not happen by deploying but requires specifically planned
>>> measurements.
>>>>> 
>>>>> So whether CUBIC competes fairly with Reno in certain regions is a
>>>>> completely academic question in 2022. There is almost no Reno traffic
>>>>> anymore on the Internet or in data centers.
>>>> 
>>>> To my understanding we have quite a bit QUIC traffic for which RFC 9002
>>> has just been published and it follows Reno CC quite closely with some
>>> exceptions. We have also some SCTP traffic that follows very closely Reno
>>> CC and numerous proprietary UDP-based protocols that RFC 8085 requires to
>>> follow the congestion control algos as described in RFC 2914 and RFC 5681.
>>> So, are you saying RFC 2914, RFC 8085 and RFC 9002 are just academic
>>> exercises?
>>>> 
>>>> Moreover, my answer to why we see so little Reno CC traffic is very
>>> simple: people deployed CUBIC that is more aggressive than Reno CC, so it
>>> is an inherent outcome that hardly anyone is willing to run Reno CC when
>>> others are running a more aggressive CC algo that leaves little room for
>>> competing Reno CC.
>>>> 
>>>>> I agree that it in an ideal world, the ubiquitous deployment of CUBIC
>>>>> should have been accompanied by A/B testing, including an investigation
>>>>> into impact on competing non-CUBIC traffic.
>>>>> 
>>>>> But that didn?t happen, and we find ourselves in the situation we?re
>>> in. What is gained by not recognizing CUBIC as a standard?
>>>> 
>>>> First, if the CUBIC draft is published as it currently is that would
>>> give an IETF stamp and 'official' start for "a spiral of increasingly
>>>> aggressive TCP implementations" that RFC 2914 appropriately warns about.
>>> The little I had time to follow L4S discussions in tsvwg people already
>>> insisted to compare L4S performance to CUBIC instead of Reno CC. The fact
>>> is that we don't know how much more aggressive CUBIC is than Reno CC in its
>>> TCP friendly region. However, if I recall correctly it was considered Ok
>>> that L4S is somewhat more aggressive than CUBIC. So, the spiral has already
>>> started within the IETF as well as in the wild (Internet).
>>>> 
>>>> Second, by recognizing CUBIC as a standard as it is currently written
>>> would ensure that all issues that have been raised would get ignored and
>>> forgotten forever.
>>>> 
>>>> Third, you did not indicate which issue are you referring to. A part of
>>> the issues have nothing to do with fair competition against Reno CC in
>>> certain regions. E.g, issue 2 causes also self-inflicted problems to a flow
>>> itself as Neal indicated based on some traces he had seen. And there is a
>>> simple, effective and safe fix to it as I have proposed.
>>>> 
>>>> As I have tried to say, I do not care too much what would be the status
>>> of CUBIC when it gets published as long as we do not hide the obvious
>>> issues it has and we have a clear plan to ensure that all issues that have
>>> not been resoved by the time of publishing it will have a clear path and
>>> incentive to get fixed. IMO that can be best achieved by publishing it as
>>> Experimental and documenting all unresolved issues in the draft. That
>>> approach would involve the incentive for all proponents to do whatever is
>>> needed (measurements, algo fixes/tuning) to solve the remaining issues and
>>> get it to stds track.
>>>> 
>>>> But let me ask a different question: what is gained and how does the
>>> community benefit from a std that is based on flawed design that does not
>>> behave as intended?
>>>> 
>>>> Congestion control specifications are considered as having significant
>>> operational impact on the Internet similar to security mechanisms. Would
>>> you in IESG support publication of a security mechanism that is shown to
>>> not operate as intended?
>>>> 
>>>> Could we now finally focus on solving each of the remaining issues and
>>> discussing the way forward separately with each of them? Issue 3 a) has
>>> pretty much been solved already (thanks Neal), some text tweaking may still
>>> be needed.
>>>> 
>>>> Thanks,
>>>> 
>>>> /Markku
>>>> 
>>>>> Thanks,
>>>>> Lars
>>>>> 
>>>>> --
>>>>> Sent from a mobile device; please excuse typos.
>>>> _______________________________________________
>>>> tcpm mailing list
>>>> tcpm@ietf.org
>>>> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y
>>> 
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y
>>> 
> 
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y
> 
> -- 
> Rod Grimes                                                 rgrimes@freebsd.org
> 
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y

------
Randall Stewart
rrs@netflix.com