Re: [tsvwg] David Black (individual) on safety of L4S for the Internet

Yuchung Cheng <ycheng@google.com> Fri, 08 May 2020 23:00 UTC

MIME-Version: 1.0
References: <MN2PR19MB4045DBC270D70DECE5F2B4AC83A20@MN2PR19MB4045.namprd19.prod.outlook.com> <CADVnQymXFe7o4M_GgwqxDP0x5UsAKE+1oQcavyDTF04gP-S34Q@mail.gmail.com> <6E67B51E-6F0A-4BC9-9A77-478F6C5B1222@gmail.com>
In-Reply-To: <6E67B51E-6F0A-4BC9-9A77-478F6C5B1222@gmail.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Fri, 08 May 2020 15:59:59 -0700
Message-ID: <CAK6E8=d_g54X1js3vozHKUUzzPPFS=CQuLiizm2gjzZfFZkkmA@mail.gmail.com>
To: Jonathan Morton <chromatix99@gmail.com>
Cc: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/BrHRk4DiToxnR168noBMbEAoNgs>
Subject: Re: [tsvwg] David Black (individual) on safety of L4S for the Internet
Precedence: list

I can't parse the logic of

1) L4S / TCP-prague can have implementation bugs/violations that cause
ECT(1) to be disabled/ignored forever, hence fq+bleaching would make
it safe.
2) QUIC uses Reno so it's safe

What prevents these "safety" mechanisms from having bugs too? I have
seen vendor fq implementations that're really not fq at all. QUIC
senders can in reality run any congestion control they like.


On Fri, May 8, 2020 at 3:38 PM Jonathan Morton <chromatix99@gmail.com> wrote:
>
> > On 9 May, 2020, at 1:04 am, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org> wrote:
> >
> >> Current practice is not to mix DCTCP-like traffic (1/p-class congestion
> >> control) with TCP-like traffic (1/sqrt(p)-class congestion control, e.g.,
> >> NewReno) in the same queue because the DCTCP-like traffic outcompetes the
> >> TCP-like traffic to the point of starvation of the latter.
> >
> > For my education, does someone have a pointer to the experiments
> > underlying that description of DCTCP outcompeting Reno/CUBIC? I'm not
> > aware of that being a fundamental property of DCTCP. The DCTCP
> > algorithm specifies a Reno-style response to loss -
> > https://tools.ietf.org/html/rfc8257#section-3.5 - and has Reno-style
> > increase functions as well. So the algorithm should be Reno friendly,
> > AFAIK.
>
> The problem with DCTCP is not its response to loss, though there was a serious bug that apparently went unnoticed for a long time.  Rather, it has to do with the response to CE marks applied by an AQM.  We can analyse this qualitatively and show a clear difference which will always result in severe unfairness.  This is also easy to confirm by experiment.
>
> Assume that a single queue exists at the bottleneck, with a single ECN AQM which does not discriminate between flows and classes of traffic.  This queue and AQM are shared by diverse traffic flows, some of which adhere to RFC3168 and RFC-8511, some of which are Not-ECT, and some of which are DCTCP.  The Not-ECT packets will, in accordance with RFC-3168, be dropped if they would have been marked CE.
>
> The steady state for both Not-ECT and RFC-3168/8511 transports (which include NewReno and CUBIC) is multiple RTTs between CE marks.  Each round-trip containing a CE mark (or a packet loss) causes a Multiplicative Decrease to between 50% and 85% of the previous congestion window, as defined by RFC-8511 and earlier well-known specifications.  It then takes several RTTs to grow the congestion window to its previous value, where a further congestion signal can be expected.
>
> The steady state for DCTCP is two CE marks per round-trip.  Each CE mark causes half a segment to be subtracted from the congestion window, on average.  This is qualitatively different from the response specified by RFC-8511.  In particular, to obtain the 50% reduction to a single CE mark that NewReno implements, an entire congestion window's worth of segments must be marked CE by the AQM.
>
> Two results should be obvious from the above.  First, an AQM response that causes RFC-8511 compliant transports to back off substantially may cause very little response from DCTCP.  Second, an AQM response that produces the steady state in DCTCP will also apply CE marks (or packet loss) to almost every round-trip of other competing flows, which will cause conventional TCPs' congestion windows to collapse to very small values.
>
> That is why DCTCP is specified for deployment only in controlled environments where its interactions with conventional TCP can be strictly managed.
>
>  - Jonathan Morton
>

[tsvwg] David Black (individual) on safety of L4S… Black, David
Re: [tsvwg] David Black (individual) on safety of… Jonathan Morton
Re: [tsvwg] David Black (individual) on safety of… Neal Cardwell
Re: [tsvwg] David Black (individual) on safety of… Jonathan Morton
Re: [tsvwg] David Black (individual) on safety of… Yuchung Cheng
Re: [tsvwg] David Black (individual) on safety of… Jonathan Morton
Re: [tsvwg] David Black (individual) on safety of… Black, David
Re: [tsvwg] David Black (individual) on safety of… Neal Cardwell
Re: [tsvwg] David Black (individual) on safety of… Jonathan Morton
Re: [tsvwg] David Black (individual) on safety of… Neal Cardwell
Re: [tsvwg] David Black (individual) on safety of… Sebastian Moeller
Re: [tsvwg] David Black (individual) on safety of… Luca Muscariello
Re: [tsvwg] David Black (individual) on safety of… Scheffenegger, Richard
Re: [tsvwg] David Black (individual) on safety of… Jonathan Morton
Re: [tsvwg] David Black (individual) on safety of… alex.burr@ealdwulf.org.uk
Re: [tsvwg] David Black (individual) on safety of… Bob Briscoe
Re: [tsvwg] David Black (individual) on safety of… Black, David
Re: [tsvwg] David Black (individual) on safety of… Gorry Fairhurst
Re: [tsvwg] David Black (individual) on safety of… Sebastian Moeller
Re: [tsvwg] David Black (individual) on safety of… Gorry Fairhurst
Re: [tsvwg] David Black (individual) on safety of… Sebastian Moeller
Re: [tsvwg] David Black (individual) on safety of… Bob Briscoe
Re: [tsvwg] David Black (individual) on safety of… Sebastian Moeller