Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up

Randall Stewart <rrs@netflix.com> Thu, 23 June 2022 16:00 UTC

Return-Path: <rrs@netflix.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B202FC13CDAA for <tcpm@ietfa.amsl.com>; Thu, 23 Jun 2022 09:00:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.85
X-Spam-Level:
X-Spam-Status: No, score=-2.85 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.745, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=netflix.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dXQ8jobNOp37 for <tcpm@ietfa.amsl.com>; Thu, 23 Jun 2022 09:00:38 -0700 (PDT)
Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6885FC13CDA9 for <tcpm@ietf.org>; Thu, 23 Jun 2022 09:00:38 -0700 (PDT)
Received: by mail-pg1-x533.google.com with SMTP id 68so14119493pgb.10 for <tcpm@ietf.org>; Thu, 23 Jun 2022 09:00:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jQe4nZYHp4wEvVz45VZV1Lkfgr1vMx2MPVi5N8JwHyg=; b=TXs4AzlMiQCpPtiZqjj2WnrpFrJUmqc+HzcDJuPbEiFdHaliUr/y0HRvotSYOq6ttl EJAF3uHax9gk/kBIGsvGxeR+q4hQ+DH+4hxAkzhWlnmRC3C14yx56nV7BbDdwp6gBXGv s7Hsk+ckQSfF5sFXlIpwD0/jPnpLVufTulSSA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jQe4nZYHp4wEvVz45VZV1Lkfgr1vMx2MPVi5N8JwHyg=; b=Wdn/H/4Co9uS0NvqqE1uS9rOWBxpKSJjmLFXO7Pty7fUp5fPdY9CHQwyaF+J4FeuR7 ZLlizLb4sGi9+OkO9HUAvwCWSJAzOhJd7f+CGZ9yBvJJldfEN+H8fQWOhEYpA493IQqF VqsgRE3Db+fOW4n1qe4GUsuCXA8RUfo++X9q27TG730k2abkPQnbf/aa2hGKNXvs6VnV PNQM9CNgNncsZkcjXOl1u9HDTnWQbt7leEINxdf64+nDDMuKk9Hpvep3p5T7DBdr1Gvg 1KKHMYkIo9zJae1DEb/xaqzWVRZp8euxRhPKHbX/UEwjTnxSA0qPIuOKTJAtFnLnb+5E kDdg==
X-Gm-Message-State: AJIora9G8y88NaDMbPPEgmH1XtAjck/s0kz7ZjCNAdOnOkn88uXM/N4S Wtc2DUhMvkeinYiUwiLIQCoNZ1Z7ZIwwuxjQvr82JQ==
X-Google-Smtp-Source: AGRyM1uwqAw79DmMnrhWuS3ep2cqqMSe0wogFke2dIfxHaN8Xtj/fVLS+6GbY3/gwbti3o7Trze8H1S6fy/93QjBaN0=
X-Received: by 2002:aa7:9425:0:b0:525:279e:9251 with SMTP id y5-20020aa79425000000b00525279e9251mr20058825pfo.4.1656000037609; Thu, 23 Jun 2022 09:00:37 -0700 (PDT)
MIME-Version: 1.0
References: <10E105F3-9DD7-431C-BEE7-4E5193498FE3@netflix.com> <202206231510.25NFAUXD061516@gndrsh.dnsmgr.net>
In-Reply-To: <202206231510.25NFAUXD061516@gndrsh.dnsmgr.net>
From: Randall Stewart <rrs@netflix.com>
Date: Thu, 23 Jun 2022 12:00:26 -0400
Message-ID: <CALV9me2KWy9U3U5nUbkUb2Z508C08d+zemRrcvGxMv8qGwGf=g@mail.gmail.com>
To: "Rodney W. Grimes" <ietf@gndrsh.dnsmgr.net>
Cc: Martin Duke <martin.h.duke@gmail.com>, Vidhi Goel <vidhi_goel=40apple.com@dmarc.ietf.org>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000fe2b2105e21f8d24"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/w_S9n2HtBT2FQDJ0UpbcJ_uAf6c>
Subject: Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jun 2022 16:00:40 -0000

Rodney:

We have always wanted to do coexistence testing but have so far
not come up with a workable plan to do so. We can "know" that other
traffic is out there, for example I recently was examining one of our
BlackBox logs and observed a connection that was having a steady
30ms or so RTT and was pacing out packets, and then for about 1 - 2 RTTs
the RTT jumped
to 200ms. This was not caused by our paced flow since it was not sending any
more data into the network (it actually slowed its sending), but
competition. And then
after about 2 RTT's the competitive flow got all it wanted and disappeared.
We continued
on pacing out packets back to observing about a 30ms RTT :)

So we have observed competing traffic but how to come up with a method where
one can test and evaluate competitive traffic is something we have not
found a solution for, though
we would love to :)

R

On Thu, Jun 23, 2022 at 11:10 AM Rodney W. Grimes <ietf@gndrsh.dnsmgr.net>
wrote:

> Randall,
> Thank you for weighing in, I was pretty sure you would be present
> on this list.
>
> > Rodney:
> >
> > I wanted to weigh in just a small bit on this since you mentioned
> Netflix :)
> >
> > Netflix currently utilizes NewReno's linear congestion window increase
> function for our content delivery TCP connections,
> > rather than the RFC8312bis "cubic" congestion window increase function.
> Most of our performance-oriented efforts have focused
> > on improved loss detection and recovery (RACK) and judicious application
> of TCP pacing. CUBIC has indeed been widely deployed
> > for a long time. However, given that our TCP connections are frequently
> over short or modest paths (rather than the long paths
> > that motivated Cubic's early development) we're unaware of CUBIC-based
> competing connections being a problem for us.
>
> Might I ask "unaware" being a rather important part of that statement?
> IE, has any co-exististing testing been done?  What happens when a large
> ammount of CUBIC and Netflix/RACK/NewRemo competes on that "short"
> path, any one got numbers?  I believe these are the types of questions
> that Markku is asking, and I further believe these types of questions
> should be asked, evaluated and answered before the IETF rubstamps
> CUBIC.
>
> For me Markku has raised some very valid concerns, clearly documented
> them as "issues", and while 1 has been addressed (thank you Neal it
> seems as if the WG is willing to rubber stamp over all the others as
> "not an issue because CUBIC is everywhere", which I assert to be a false
> claim.
>
> Please do note I am fully in SUPPORT of CUBIC moving forward, but if
> the IETF is to continue to use the word Engineering in its title it
> really should spend the effort to do some Engineering, and actually
> following the Engineering steps outlined in its own RFC process. Again,
> I do believe that is all that Markku is asking for.
>
> Regards,
> Rod Grimes
> >
> > Best wishes
> >
> > R
> >
> > > On Jun 21, 2022, at 6:54 PM, Rodney W. Grimes <ietf@gndrsh.dnsmgr.net>
> wrote:
> > >
> > >> (with no hats)
> > >
> > > [rwg] I was going to stay quiet on this, but one inline comment below.
> > >
> > >
> > >>
> > >> Markku,
> > >>
> > >> I think it's important to distinguish between "aggressive" algorithms
> that
> > >> are aggressive and reach a superior equilibrium for everyone using
> that
> > >> algorithm, and aggressive algorithms that don't scale if everyone is
> using
> > >> them.
> > >>
> > >> There's one scenario (A) that I think everyone would agree was
> acceptable:
> > >> 1) Early adopters deploy a new algorithm
> > >> 2) The old algorithm is not affected at all
> > >> 3) As users migrate from new to old, the network converges on a
> > >> higher-utilization equilibrium
> > >>
> > >> Similarly, we would all agree that Scenario (B) is unacceptable
> > >> 1) Deploy new algorithm
> > >> 2) The old algorithm is starved and unusable
> > >> 3) As users migrate from new to old, the network converges on a
> > >> higher-utilization equilibrium
> > >>
> > >> There's a middle ground (C) where the old algorithm suffers degraded
> > >> performance, but not fatally. Reasonable people can disagree on where
> the
> > >> exact threshold lies, and the argument has several dimensions. It's an
> > >> eternal human argument about how much damage is acceptable in making
> > >> technical progress that we won't settle here.
> > >>
> > >> In the case of Cubic, it is *extremely widely* deployed. Whether or
> not
> > >> doing damage to Reno connections was justified, we have already sped
> > >> through (2) and have landed on (3). Cubic is the default and users
> > >
> > > [rwg]
> > > Default where?  As far as I know FreeBSD, and I believe other BSD's
> > > use newreno as the default:
> > >
> > >     net.inet.tcp.cc.algorithm: newreno
> > >
> > > And from the mod_cc(4) manual page of FreeBSD 12.x:
> > >     The default algorithm is NewReno, and all connections use the
> default
> > >     unless explicitly overridden using the TCP_CONGESTION socket
> option (see
> > >     tcp(4) for details).  The default can be changed using a sysctl(3)
> MIB
> > >     variable detailed in the MIB Variables section below.
> > >
> > > I doubt there is a bunch of userland code calling with TCP_CONGESTION
> > > socket options.
> > >
> > > And... I do not know if Netflix, IIRC the source of approximately 1/3
> > > of USA network downstream traffic has tweaked things to use cc_cubic,
> > > but might be worth an ask.  Most of there interesting stuff is in the
> > > use of RACK, and iirc agaion that is neither newreno OR cubic based.
> > >
> > > Regard,
> > > Rod Grimes
> > >
> > >> generally have to seek out Reno to use it. So what is to be gained by
> > >> continuing to defend an inferior equilibrium against a superior one
> that
> > >> has already won in the market?
> > >>
> > >> As for RFC 9002: this was an expedient choice; QUICWG needed a
> standard
> > >> congestion control, was not chartered to create a new one, and there
> was
> > >> only one on the shelf to choose from. If Cubic had been
> standards-track,
> > >> the WG may very well have chosen that one. In the real world the most
> > >> important production QUIC implementations are not using Reno.
> > >>
> > >> On Mon, Jun 20, 2022 at 6:08 PM Vidhi Goel <vidhi_goel=
> > >> 40apple.com@dmarc.ietf.org> wrote:
> > >>
> > >>> If we are talking about RFC 9002 New Reno implementations, then that
> > >>> already modifies RFC 5681 and doesn?t comply with RFC 5033. Since it
> has a
> > >>> major change from 5681 for any congestion event, I wouldn?t call it
> closely
> > >>> following new Reno. Also, in another email, you said that you didn?t
> follow
> > >>> discussions on QUIC WG for RFC 9002, so how do you know whether QUIC
> > >>> implementations are using New Reno or CUBIC congestion control?
> > >>> It would be good to stay consistent in our replies, if you agree RFC
> 9002
> > >>> is already non compliant with RFC 5033, then why use it as a
> reference to
> > >>> cite Reno implementations!
> > >>>
> > >>> Vidhi
> > >>>
> > >>>> On Jun 20, 2022, at 5:06 PM, Markku Kojo <kojo=
> > >>> 40cs.helsinki.fi@dmarc.ietf.org> wrote:
> > >>>> ?Hi Lars,
> > >>>>
> > >>>> On Sun, 19 Jun 2022, Lars Eggert wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> sorry for misunderstanding/misrepresenting  your issues.
> > >>>>>
> > >>>>>> On Jun 6, 2022, at 13:29, Markku Kojo <kojo@cs.helsinki.fi>
> wrote:
> > >>>>>> These issues are significant and some number of people have also
> said
> > >>>>>> they should not be left unaddressed. Almost all of them are
> related to
> > >>>>>> the behaviour of CUBIC in the TCP-friendly region where it is
> intended
> > >>>>>> and required to fairly compete with the current stds track
> congestion
> > >>>>>> control mechanisms. The evaluation whether CUBIC competes fairly
> > >>>>>> *cannot* be achieved without measuring the impact of CUBIC to the
> > >>>>>> other traffic competing with it over a shared bottleneck link.
> This
> > >>>>>> does not happen by deploying but requires specifically planned
> > >>> measurements.
> > >>>>>
> > >>>>> So whether CUBIC competes fairly with Reno in certain regions is a
> > >>>>> completely academic question in 2022. There is almost no Reno
> traffic
> > >>>>> anymore on the Internet or in data centers.
> > >>>>
> > >>>> To my understanding we have quite a bit QUIC traffic for which RFC
> 9002
> > >>> has just been published and it follows Reno CC quite closely with
> some
> > >>> exceptions. We have also some SCTP traffic that follows very closely
> Reno
> > >>> CC and numerous proprietary UDP-based protocols that RFC 8085
> requires to
> > >>> follow the congestion control algos as described in RFC 2914 and RFC
> 5681.
> > >>> So, are you saying RFC 2914, RFC 8085 and RFC 9002 are just academic
> > >>> exercises?
> > >>>>
> > >>>> Moreover, my answer to why we see so little Reno CC traffic is very
> > >>> simple: people deployed CUBIC that is more aggressive than Reno CC,
> so it
> > >>> is an inherent outcome that hardly anyone is willing to run Reno CC
> when
> > >>> others are running a more aggressive CC algo that leaves little room
> for
> > >>> competing Reno CC.
> > >>>>
> > >>>>> I agree that it in an ideal world, the ubiquitous deployment of
> CUBIC
> > >>>>> should have been accompanied by A/B testing, including an
> investigation
> > >>>>> into impact on competing non-CUBIC traffic.
> > >>>>>
> > >>>>> But that didn?t happen, and we find ourselves in the situation
> we?re
> > >>> in. What is gained by not recognizing CUBIC as a standard?
> > >>>>
> > >>>> First, if the CUBIC draft is published as it currently is that would
> > >>> give an IETF stamp and 'official' start for "a spiral of increasingly
> > >>>> aggressive TCP implementations" that RFC 2914 appropriately warns
> about.
> > >>> The little I had time to follow L4S discussions in tsvwg people
> already
> > >>> insisted to compare L4S performance to CUBIC instead of Reno CC. The
> fact
> > >>> is that we don't know how much more aggressive CUBIC is than Reno CC
> in its
> > >>> TCP friendly region. However, if I recall correctly it was
> considered Ok
> > >>> that L4S is somewhat more aggressive than CUBIC. So, the spiral has
> already
> > >>> started within the IETF as well as in the wild (Internet).
> > >>>>
> > >>>> Second, by recognizing CUBIC as a standard as it is currently
> written
> > >>> would ensure that all issues that have been raised would get ignored
> and
> > >>> forgotten forever.
> > >>>>
> > >>>> Third, you did not indicate which issue are you referring to. A
> part of
> > >>> the issues have nothing to do with fair competition against Reno CC
> in
> > >>> certain regions. E.g, issue 2 causes also self-inflicted problems to
> a flow
> > >>> itself as Neal indicated based on some traces he had seen. And there
> is a
> > >>> simple, effective and safe fix to it as I have proposed.
> > >>>>
> > >>>> As I have tried to say, I do not care too much what would be the
> status
> > >>> of CUBIC when it gets published as long as we do not hide the obvious
> > >>> issues it has and we have a clear plan to ensure that all issues
> that have
> > >>> not been resoved by the time of publishing it will have a clear path
> and
> > >>> incentive to get fixed. IMO that can be best achieved by publishing
> it as
> > >>> Experimental and documenting all unresolved issues in the draft. That
> > >>> approach would involve the incentive for all proponents to do
> whatever is
> > >>> needed (measurements, algo fixes/tuning) to solve the remaining
> issues and
> > >>> get it to stds track.
> > >>>>
> > >>>> But let me ask a different question: what is gained and how does the
> > >>> community benefit from a std that is based on flawed design that
> does not
> > >>> behave as intended?
> > >>>>
> > >>>> Congestion control specifications are considered as having
> significant
> > >>> operational impact on the Internet similar to security mechanisms.
> Would
> > >>> you in IESG support publication of a security mechanism that is
> shown to
> > >>> not operate as intended?
> > >>>>
> > >>>> Could we now finally focus on solving each of the remaining issues
> and
> > >>> discussing the way forward separately with each of them? Issue 3 a)
> has
> > >>> pretty much been solved already (thanks Neal), some text tweaking
> may still
> > >>> be needed.
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> /Markku
> > >>>>
> > >>>>> Thanks,
> > >>>>> Lars
> > >>>>>
> > >>>>> --
> > >>>>> Sent from a mobile device; please excuse typos.
> > >>>> _______________________________________________
> > >>>> tcpm mailing list
> > >>>> tcpm@ietf.org
> > >>>>
> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y
> > >>>
> > >>> _______________________________________________
> > >>> tcpm mailing list
> > >>> tcpm@ietf.org
> > >>>
> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y
> > >>>
> > >
> > >> _______________________________________________
> > >> tcpm mailing list
> > >> tcpm@ietf.org
> > >>
> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y
> > >
> > > --
> > > Rod Grimes
> rgrimes@freebsd.org
> > >
> > > _______________________________________________
> > > tcpm mailing list
> > > tcpm@ietf.org
> > >
> https://www.google.com/url?q=https://www.ietf.org/mailman/listinfo/tcpm&source=gmail-imap&ust=1656456859000000&usg=AOvVaw28hQ5TFVqaLVQ3bYf7bz4Y
> >
> > ------
> > Randall Stewart
> > rrs@netflix.com
> >
> >
> >
>
> --
> Rod Grimes
> rgrimes@freebsd.org
>


-- 
---
Randall Stewart
rrs@netflix.com