Re: [tsvwg] What is "Scalable Congestion Control" in L4S?

Vasilenko Eduard <vasilenko.eduard@huawei.com> Tue, 16 April 2024 13:24 UTC

Return-Path: <vasilenko.eduard@huawei.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7C750C14F5E4 for <tsvwg@ietfa.amsl.com>; Tue, 16 Apr 2024 06:24:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.198
X-Spam-Level:
X-Spam-Status: No, score=-4.198 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V-eAT0sxpeHd for <tsvwg@ietfa.amsl.com>; Tue, 16 Apr 2024 06:24:03 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D52E9C14F6E9 for <tsvwg@ietf.org>; Tue, 16 Apr 2024 06:23:43 -0700 (PDT)
Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4VJl7C33kmz6D9Fb; Tue, 16 Apr 2024 21:21:43 +0800 (CST)
Received: from mscpeml100003.china.huawei.com (unknown [10.199.174.67]) by mail.maildlp.com (Postfix) with ESMTPS id 833D8140A08; Tue, 16 Apr 2024 21:23:40 +0800 (CST)
Received: from mscpeml500004.china.huawei.com (7.188.26.250) by mscpeml100003.china.huawei.com (10.199.174.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 16 Apr 2024 16:23:39 +0300
Received: from mscpeml500004.china.huawei.com ([7.188.26.250]) by mscpeml500004.china.huawei.com ([7.188.26.250]) with mapi id 15.02.1258.028; Tue, 16 Apr 2024 16:23:39 +0300
From: Vasilenko Eduard <vasilenko.eduard@huawei.com>
To: Sebastian Moeller <moeller0=40gmx.de@dmarc.ietf.org>
CC: "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] What is "Scalable Congestion Control" in L4S?
Thread-Index: AdqP1O2Z3xIgVfprR9SJUA84Lg6mUv//9g0A//+smBA=
Date: Tue, 16 Apr 2024 13:23:39 +0000
Message-ID: <8e66998698044919b0b5abfaa47ae2fc@huawei.com>
References: <30f6c4b411034046814d6a90956f9949@huawei.com> <BD28D463-9D61-4E91-88B3-78875F6CA45E@gmx.de>
In-Reply-To: <BD28D463-9D61-4E91-88B3-78875F6CA45E@gmx.de>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.199.56.41]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/laRILHdfe4vPcgSfGrsr57e_qgk>
Subject: Re: [tsvwg] What is "Scalable Congestion Control" in L4S?
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2024 13:24:05 -0000

Hi Sebastian,
Thanks for your comments.
I did think about it more (+ read the initial article from 1997 where the proportion to the square root was declared) and concluded that the concept is principally wrong.

The root cause for the "Scalable Congestion Control" definition is the reaction to the drop (claimed to be proportional to something different) - discussed in RFC 9332 section 2.1.
The 1st derivative for "Scalable Congestion Control" would be to say that "it should not build the queue in the bottleneck link, because if the queue is growing then the RTT loop would be longer which would decrease the rate of congestion signals over such a loop".
The current "Scalable Congestion Control" definition is the 2nd order derivative about the same root cause.

The problem with all these square root and not-square root academia approximations is that it was assumed the signal would travel all RTT (including the case when RTT would be bloated on the bottleneck link).
Look to the "The macroscopic behavior of the TCP Congestion Avoidance algorithm" - they integrate everything under the RENO curve.
Then look to the "PI2: A Linearized AQM for both Classic and Scalable TCP" - they use the full integral from the previous for the "congestion signal" frequency estimation.
Effectively, they assume that congestion signal delay is from all BDP (all information in transit). BDP is called "window" in these documents.
But actually, AQM would mark (or drop) queue packets on the head of the queue (with transmission or instead of transmission), not on the tail. It does not matter for the feedback speed on how many packets are waiting in the queue.
Including the situation when the queue is huge (compared to the minimal RTT) - for the famous "bufferbloat" problem.
Hence, the time needed to deliver this congestion signal would not change - it would be 1) the path left after the bottleneck to the destination and 2) back from the destination to the source (assuming no bottleneck in the opposite direction).
This actual AQM behavior breaks macroscopic assumptions and makes this analytics irrelevant.

Funny enough, all CCAs are "Scalable" for the current definition of "Scalable". Because of the typical AQM behavior (drop from the head, no dependency on "window size" that is primarily accumulated in the bottleneck queue).

CCA's aggressiveness and non-fair link sharing are probably related to something else.
Eduard
-----Original Message-----
From: Sebastian Moeller <moeller0=40gmx.de@dmarc.ietf.org> 
Sent: Tuesday, April 16, 2024 13:30
To: Vasilenko Eduard <vasilenko.eduard@huawei.com>
Cc: tsvwg@ietf.org
Subject: Re: [tsvwg] What is "Scalable Congestion Control" in L4S?

Hi Ed,

I stumbled over the same previously, but the subtle issue is the formal definition is about marking rate in marks/second, while the second looks at marking probability as marks/packets over a time window, and while the marking rate stays constant the resulting marking probability will decrease with increasing packet rate. This is also true if marking probability is measured as marks/byte. However I fail to see a clear methods to deduce the relevant timewindow to calculate marking probability over.


> On 16. Apr 2024, at 10:25, Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org> wrote:
> 
> Hi all,
> 
> Both RFCs (9332, 9330) gives a formal definition that:
> "Scalable Congestion Control: A congestion control where the average time from one congestion signal to the next (the recovery time) remains invariant as flow rate scales, all other factors being equal."
> It is just a rate of the congestion signal, a simple matter.

[SM] Yes, this is marking rate in Hz.

> 
> RFC 9332 section 2.1 gives the impression that Scalable Congestion Control has more fundamental differences:
> "the steady-state cwnd of Reno is inversely proportional to the square 
> root of p" (drop probability) But "A supporting paper 
> [https://dl.acm.org/doi/10.1145/2999572.2999578] includes the 
> derivation of the equivalent rate equation for DCTCP, for which cwnd 
> is inversely proportional to p

[SM] But here this is marking probability which will depend on the actual data rate of the flow...

> (not the square root), where in this case p is the ECN-marking probability.

[SM] And that got me confused previously as marking rate and marking probability for a given data rate are proportional so I read p as a different way to say marking rate.

> DCTCP is not the only congestion control that behaves like this, so the term 'Scalable' will be used for all similar congestion control behaviours". Then in section 1.2 we see the BBR in the list of "Scalable CCs".
> 
> 1. The formal definition of "Scalable CC" looks wrong. At least it contradicts section 2.1.

[SM] Let's say that either description might be served well with explicitly describing the rate versus probability issue.

> 2. It is difficult to believe that BBR and CUBIC/RENO have so different reactions to overload signals because they both play fairly (starting from BBRv2) in one queue as demonstrated in many tests.

[SM] But they do differ... Traditional Reno will half its congestion window as a response to a dropped packet (or if rfc3168 is in use also as response to a CE-marked packet), while BBR will not do this... (older Versions of BBR will completely ignore marks and also try to ignore drops, newer versions of BBR will use a scalable response but still ignore drops up to a certain threshold). But these differences are not that relevant to BBR's sharing behaviour, as BBR determines its equitable capacity share via its probing mechanism and hence comes up with a decent response under similar conditions as reno, just based on different principles.

> It is probably impossible for such different sessions to share the load fairly if one session is reacting to p, but the other is reacting to the square root from p (p is the probability for congestion signal).

[SM] That is a true point, and that is why L4S requires a strict separation between the different response types and specific AQMs for each traffic type that take this into account.

Regards
	Sebastian

> 
> Best Regards
> Eduard Vasilenko
> Senior Architect
> Network Algorithm Laboratory
> Tel: +7(985) 910-1105
>