[tsvwg] Sebastian

Sebastian Moeller <moeller0@gmx.de> Tue, 16 April 2024 14:27 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <8c65584ede9a435bac30ee9c0e2ef1fc@huawei.com>
Date: Tue, 16 Apr 2024 16:27:25 +0200
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <E24E9308-13EB-42E7-A51F-3AC6E16C13AA@gmx.de>
References: <30f6c4b411034046814d6a90956f9949@huawei.com> <BD28D463-9D61-4E91-88B3-78875F6CA45E@gmx.de> <8e66998698044919b0b5abfaa47ae2fc@huawei.com> <0997E246-CEA5-4FFB-9025-F94B48B4B489@gmx.de> <8c65584ede9a435bac30ee9c0e2ef1fc@huawei.com>
To: Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org>
UI-OutboundReport: notjunk:1;M01:P0:0niNlwrmM/U=;QMm4NWGpc3ucVogUvLTL/BmTNwt YaFXE//xIyPszIZfsq74gRMlHzCUm0NU8XPjwTNsimSM9mH5b87HHUS0rnwmRSjVbveOc3Ef0 Ao4/vrwGdQqOW9ygp9KPFLbFVptNrBGELF5R0gOAvY23/NigAQsZvinGcRiUQoF4nIcYZ8jB8 mukrZ8gRHE1yhIgJZx0fCrDICD4l3yqj0eWL2MffZaer642K1BtBrEPT62zLtEK6sMnl79/la YLuT5IB4Lv0qCxVrZ9p7gK8DNQ1Veyn63hKXbXHJU6UVgOOEXdVmAnrMT7zIUfxatxMxFtdA+ CtdSvRF5T0aBBP3BiCelTcn0sIQJGzmmHNCtDItpCRCGLaH8UF5nsLvfDnt9QJ3qWhe3YQ4Us 6LCdfJLuCvbzrYITwEEbbbxjgfNDlNHrp62be+oOq82CKxN1z/Zndc2iCa8cCZQqkZoxVnN2j w2G8VnpMsRcGYzICudtrr883wsxeHmkW4Sas0NscZ/xEPiHtCaq69jG0MUVZ3ZnP4Md5Ywjz4 io0y+0FC1FWRyJhEDyXlNRPO0UK3wqmoEsxesEMj2F5/1ftR7fkycE+ZFnomS2QizcmpGZ7FC wlzkJOwCTuH/NAADFC4bKlxy3YSn8lPp+6BbomuZ7fNNM1mas2swvixANprfvE+F2UU8k+9ic OAoknZOP5D1AebFunf/Jqs1zlx4WRaZNtM7Xzp9AwSKjipV1mVzHTZwGIMv8CuijPd1N3H6b8 Sb6FxcRjWjJNAVi3PzzJce6a0wpUW5ALCRfo6WxWiioLlufgfa5/+469geZFEBUl1VDoTv25e 0I++CnUBXl5HTGeJcxJ1K3OHpQNqh1mc6ao/QhzQJQFlg=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/vljGt_GNxIFxKyH0aBjnMS3uDUE>
Subject: [tsvwg] Sebastian
Precedence: list

Hi Ed,


> On 16. Apr 2024, at 16:10, Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org> wrote:
> 
> The congestion signal is started NOT from the flow source. It is by definition started from the scheduler on the bottleneck link. Hence, bloated buffer latency (the biggest part of "the information in transit") is not included.

[SM] Let me be explicit:

relevant Path
[Sender] --> [Bottleneck] --> [Receiver] 

with 
[Bottleneck]: [ingress, Queue --> head-drop AQM --> egress]

so 
[Sender] --> [ingress Queue --> head-drop AQM --> egress] --> [Receiver] 


Now, if the AQM marks/drops {S} the following is happening;

[Sender] --> [ingress Queue --> head-drop AQM -{S}-> egress] --> [Receiver] 

[Sender] --> [ingress Queue --> head-drop AQM --> egress] - {S}-> [Receiver] 

[Sender] --> [ingress Queue --> head-drop AQM --> egress] --> [Receiver {S}: reflection of signal] 

[Receiver] -{S}-> [Sender]

[Receiver] --> [Sender {S}: sender reacts], from {R} denotes the reduced rate of the sender

[Sender] -{R}-> [ingress Queue --> head-drop AQM --> egress] --> [Receiver] 

[Sender] --> [ingress Queue -{R}-> head-drop AQM --> egress] --> [Receiver: reflection of signal] 

[Sender] --> [ingress Queue -{R}-> head-drop AQM {R}: AQM notices reduced load--> egress] --> [Receiver: reflection of signal] 

until the AQM sees the reduced load it must assume that the ingress is still > egress and hence will continuously need to send more {S}.
How {S} is generated depends a bit what kind of AQM we have, but conceptually the AQM will only back-off once the queue length has reduced.

I might have misunderstood your point though, and will leave it to the experts to chime in and correct my inaccuracies.

Regards
        Sebastian



> 
> -----Original Message-----
> From: Sebastian Moeller <moeller0=40gmx.de@dmarc.ietf.org> 
> Sent: Tuesday, April 16, 2024 17:02
> To: Vasilenko Eduard <vasilenko.eduard@huawei.com>
> Cc: Sebastian Moeller <moeller0@gmx.de>; tsvwg@ietf.org
> Subject: Re: [tsvwg] What is "Scalable Congestion Control" in L4S?
> 
> Hi Ed,
> 
> 
>> On 16. Apr 2024, at 15:23, Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org> wrote:
>> 
>> Hi Sebastian,
>> Thanks for your comments.
>> I did think about it more (+ read the initial article from 1997 where the proportion to the square root was declared) and concluded that the concept is principally wrong.
>> 
>> The root cause for the "Scalable Congestion Control" definition is the reaction to the drop (claimed to be proportional to something different) - discussed in RFC 9332 section 2.1.
>> The 1st derivative for "Scalable Congestion Control" would be to say that "it should not build the queue in the bottleneck link, because if the queue is growing then the RTT loop would be longer which would decrease the rate of congestion signals over such a loop".
>> The current "Scalable Congestion Control" definition is the 2nd order derivative about the same root cause.
>> 
>> The problem with all these square root and not-square root academia approximations is that it was assumed the signal would travel all RTT (including the case when RTT would be bloated on the bottleneck link).
>> Look to the "The macroscopic behavior of the TCP Congestion Avoidance algorithm" - they integrate everything under the RENO curve.
>> Then look to the "PI2: A Linearized AQM for both Classic and Scalable TCP" - they use the full integral from the previous for the "congestion signal" frequency estimation.
>> Effectively, they assume that congestion signal delay is from all BDP (all information in transit). BDP is called "window" in these documents.
>> But actually, AQM would mark (or drop) queue packets on the head of the queue (with transmission or instead of transmission), not on the tail. It does not matter for the feedback speed on how many packets are waiting in the queue.
>> Including the situation when the queue is huge (compared to the minimal RTT) - for the famous "bufferbloat" problem.
>> Hence, the time needed to deliver this congestion signal would not change - it would be 1) the path left after the bottleneck to the destination and 2) back from the destination to the source (assuming no bottleneck in the opposite direction).
>> This actual AQM behavior breaks macroscopic assumptions and makes this analytics irrelevant.
>> 
>> Funny enough, all CCAs are "Scalable" for the current definition of "Scalable". Because of the typical AQM behavior (drop from the head, no dependency on "window size" that is primarily accumulated in the bottleneck queue).
> 
> [SM] My take is that is still takes a full RTT (including the filled queue) before the effect of the previous signal (drop/mark) will be visible at the AQM decision point (be that traditional unfortunate tail, or more recent head)... And it is at that point where we need to notice a reduction/improvement in sojourn time or we will keep signalling.
> This assumes that the length of the queue did not significantly change in the interim, but for a loaded link that seems a decent approximation, after all a tail-drop queue will maintain at around full state as long as the ingress exceeds the egress...
> That said, I will not joust for L4S or for 'scalable' as I do not consider L4S to be a good solution... ;)
> 
> Regards
> Sebastian
> 
>> 
>> CCA's aggressiveness and non-fair link sharing are probably related to something else.
>> Eduard
>> -----Original Message-----
>> From: Sebastian Moeller <moeller0=40gmx.de@dmarc.ietf.org>
>> Sent: Tuesday, April 16, 2024 13:30
>> To: Vasilenko Eduard <vasilenko.eduard@huawei.com>
>> Cc: tsvwg@ietf.org
>> Subject: Re: [tsvwg] What is "Scalable Congestion Control" in L4S?
>> 
>> Hi Ed,
>> 
>> I stumbled over the same previously, but the subtle issue is the formal definition is about marking rate in marks/second, while the second looks at marking probability as marks/packets over a time window, and while the marking rate stays constant the resulting marking probability will decrease with increasing packet rate. This is also true if marking probability is measured as marks/byte. However I fail to see a clear methods to deduce the relevant timewindow to calculate marking probability over.
>> 
>> 
>>> On 16. Apr 2024, at 10:25, Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org> wrote:
>>> 
>>> Hi all,
>>> 
>>> Both RFCs (9332, 9330) gives a formal definition that:
>>> "Scalable Congestion Control: A congestion control where the average time from one congestion signal to the next (the recovery time) remains invariant as flow rate scales, all other factors being equal."
>>> It is just a rate of the congestion signal, a simple matter.
>> 
>> [SM] Yes, this is marking rate in Hz.
>> 
>>> 
>>> RFC 9332 section 2.1 gives the impression that Scalable Congestion Control has more fundamental differences:
>>> "the steady-state cwnd of Reno is inversely proportional to the 
>>> square root of p" (drop probability) But "A supporting paper 
>>> [https://dl.acm.org/doi/10.1145/2999572.2999578] includes the 
>>> derivation of the equivalent rate equation for DCTCP, for which cwnd 
>>> is inversely proportional to p
>> 
>> [SM] But here this is marking probability which will depend on the actual data rate of the flow...
>> 
>>> (not the square root), where in this case p is the ECN-marking probability.
>> 
>> [SM] And that got me confused previously as marking rate and marking probability for a given data rate are proportional so I read p as a different way to say marking rate.
>> 
>>> DCTCP is not the only congestion control that behaves like this, so the term 'Scalable' will be used for all similar congestion control behaviours". Then in section 1.2 we see the BBR in the list of "Scalable CCs".
>>> 
>>> 1. The formal definition of "Scalable CC" looks wrong. At least it contradicts section 2.1.
>> 
>> [SM] Let's say that either description might be served well with explicitly describing the rate versus probability issue.
>> 
>>> 2. It is difficult to believe that BBR and CUBIC/RENO have so different reactions to overload signals because they both play fairly (starting from BBRv2) in one queue as demonstrated in many tests.
>> 
>> [SM] But they do differ... Traditional Reno will half its congestion window as a response to a dropped packet (or if rfc3168 is in use also as response to a CE-marked packet), while BBR will not do this... (older Versions of BBR will completely ignore marks and also try to ignore drops, newer versions of BBR will use a scalable response but still ignore drops up to a certain threshold). But these differences are not that relevant to BBR's sharing behaviour, as BBR determines its equitable capacity share via its probing mechanism and hence comes up with a decent response under similar conditions as reno, just based on different principles.
>> 
>>> It is probably impossible for such different sessions to share the load fairly if one session is reacting to p, but the other is reacting to the square root from p (p is the probability for congestion signal).
>> 
>> [SM] That is a true point, and that is why L4S requires a strict separation between the different response types and specific AQMs for each traffic type that take this into account.
>> 
>> Regards
>> Sebastian
>> 
>>> 
>>> Best Regards
>>> Eduard Vasilenko
>>> Senior Architect
>>> Network Algorithm Laboratory
>>> Tel: +7(985) 910-1105
>>> 
>> 
>

[tsvwg] What is "Scalable Congestion Control" in … Vasilenko Eduard
Re: [tsvwg] What is "Scalable Congestion Control"… Sebastian Moeller
Re: [tsvwg] What is "Scalable Congestion Control"… Vasilenko Eduard
Re: [tsvwg] What is "Scalable Congestion Control"… Sebastian Moeller
Re: [tsvwg] What is "Scalable Congestion Control"… Vasilenko Eduard
[tsvwg] Sebastian Sebastian Moeller
Re: [tsvwg] Sebastian Vasilenko Eduard