Re: [tsvwg] "Pacing" requirement is lost in L4S

Vasilenko Eduard <vasilenko.eduard@huawei.com> Tue, 23 April 2024 11:52 UTC

Return-Path: <vasilenko.eduard@huawei.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 65E16C14F70F for <tsvwg@ietfa.amsl.com>; Tue, 23 Apr 2024 04:52:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.894
X-Spam-Level:
X-Spam-Status: No, score=-1.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eYmr7YhiPRpa for <tsvwg@ietfa.amsl.com>; Tue, 23 Apr 2024 04:52:06 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 96329C14F70B for <tsvwg@ietf.org>; Tue, 23 Apr 2024 04:52:06 -0700 (PDT)
Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4VP0pP4hf3z6K971; Tue, 23 Apr 2024 19:51:57 +0800 (CST)
Received: from mscpeml100003.china.huawei.com (unknown [10.199.174.67]) by mail.maildlp.com (Postfix) with ESMTPS id C68CC140A08; Tue, 23 Apr 2024 19:52:04 +0800 (CST)
Received: from mscpeml500004.china.huawei.com (7.188.26.250) by mscpeml100003.china.huawei.com (10.199.174.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 23 Apr 2024 14:52:04 +0300
Received: from mscpeml500004.china.huawei.com ([7.188.26.250]) by mscpeml500004.china.huawei.com ([7.188.26.250]) with mapi id 15.02.1258.028; Tue, 23 Apr 2024 14:52:04 +0300
From: Vasilenko Eduard <vasilenko.eduard@huawei.com>
To: "Bless, Roland (TM)" <roland.bless@kit.edu>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] "Pacing" requirement is lost in L4S
Thread-Index: AdqSNQebXSBBdSUyTsuxQQxBxCIgz///2D0A///D+oCAAFfqgP//qnzggACsOgD/+f3PkA==
Date: Tue, 23 Apr 2024 11:52:03 +0000
Message-ID: <554dc629154245489981e22463d3d1dc@huawei.com>
References: <a19c38376c7541b89a3d52841141fa0c@huawei.com> <cff2147d-e203-4106-b7d6-65a8573e2c22@kit.edu> <12c7c1300b004691a59ac950e66c3e2b@huawei.com> <e45b1d63-62e5-446c-abcf-3a22e911de1b@kit.edu> <f2df9ab26346406ea55a69bf02bc8388@huawei.com> <e079c8f8-f44b-4587-8c27-0dd3a2d6c66d@kit.edu>
In-Reply-To: <e079c8f8-f44b-4587-8c27-0dd3a2d6c66d@kit.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.199.56.41]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/pbCFnpBnd2YYMIf5tQm2xma9KrA>
Subject: Re: [tsvwg] "Pacing" requirement is lost in L4S
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Apr 2024 11:52:11 -0000

Hi Roland,
Thanks for the discussion.
I suspect that we do not understand each other because I am a stranger here and do not use proper terminology.

There was a legacy modeling theory that every session should send BDP=RTT_min*Bottleneck_Bandwidth to fill the pipe.
BDP could be treated as "a window", measured in MBytes.
Your claim that the window is always needed is probably rooted in this fact - just because everybody is so accustomed to that legacy concept.
Free your mind - forget about BDP - it is just one way of modeling the session.
It was good in 1988 when SMTP or FTP was tolerant to latency.
Indeed, if you model the session in this way - you may have theoretical proof that in some situations "buffer is growing to infinity".

Good CCA could predict/measure RTT_min and Bottleneck_Bandwidth separately (the last one is not even needed in some situations because it would be reflected inside the regulated parameter Inter_Packet_Interval).
The power of the new approach is in the fact that these parameters are treated separately (do not try to multiply it to get BDP/Window).
Keep source parameters as milliseconds and bits_per_second.
For the case of RTT increase, Bottleneck_Bandwidth is not even needed - it is possible to just increase Inter_Packet_Interval in the same proportion as RTT increase to RTT_min.
For the case of ECN/drop, the packet size needs to be converted to the additional latency of the additional bottleneck queue (Bottleneck_Bandwidth would be needed), then this additional latency could be treated as RTT increase that should increase Inter_Packet_Interval in the same proportion as for the previous case.
Bottleneck_Bandwidth may be completely canceled because it is easy to calculate from the Inter_Packet_Interval.
Window/BDP would not appear anywhere in these calculations.

I do not understand why you are talking about the burst because if CCA regulates Inter_Packet_Interval then a burst is not possible by default.
The thing opposite to the burst is possible: Bottleneck_Bandwidth may be abruptly changed (especially on wireless) which would have a similar effect to the burst.
If Bottleneck_Bandwidth has decreased then CCA would get ECN or RTT increase,
If Bottleneck_Bandwidth has increased then CCA should have a strategy to probe smaller Inter_Packet_Interval periodically.
In automation control theory, instability is possible only if the control loop is slower than the regulated system.
Then the tradeoff is possible: the accuracy of regulation to stability (do not change Inter_Packet_Interval too fast, use interpolation or averaging).

I need to investigate BBR details more, but I had an impression that BBR is doing something like the above.
Eduard
-----Original Message-----
From: Bless, Roland (TM) <roland.bless@kit.edu> 
Sent: Friday, April 19, 2024 19:07
To: Vasilenko Eduard <vasilenko.eduard@huawei.com>; tsvwg@ietf.org
Subject: Re: [tsvwg] "Pacing" requirement is lost in L4S

Hi Eduard,

[sorry messed up your first name], see inline.

On 19.04.24 at 15:16 Vasilenko Eduard wrote:
> Thanks. Looks like I have understood now.
> Pacing or rate-limiting is about what is the averaging interval.
> 
> The example that I have presented initially (bump to the low-speed link) needs pacing (on the small interval).

> I still do not understand why it is so mandatory to have the "window" concept.

It is not mandatory, but useful IMHO, because window-based congestion control
  - automatically achieves stability even if window size is not perfect
  - implicitly adjusts the sending rate
  - limits the amount of inflight data -> and thus the queue length
  - if the window is too large: you get a standing queue at the bottleneck, but no instability (i.e., steadily growing queue)

> Because if CC is capable of limiting the bottleneck queue in any way then "unlimited queue growth" is not possible.

That is correct, but the question is _how_ the bottleneck queue can be limited. Simply picking a "suitable" sending rate is not sufficient since burstiness may cause unlimited queue growth. As I said:
at a 100Mbit/s bottleneck, two senders with an average data rate of _exactly_ 50Mbit/s but exponentially distributed sending behaviors show enough burstiness to cause unlimited queue growth (this is also confirmed by queuing theory).

> I believe that the clear ECN signal is very good for this purpose, but observing the RTT looks not bad too (as BBR has proved).

Yes, using multiple congestion signal are always good. BBR does not use the RTT as congestion signal BTW, which seems to be a common misunderstanding. BBR uses an estimation of RTT_min to calculate the BDP for a flow. BBRv2/3 use ECN and packet loss (usually with threshold
2%) as congestion signals, but the designers deliberately do not use the measured RTT as it may include too much noise.

> If one has any of these then one does not need a "window".
> 
> "Congestion window" is a very good mechanism to get a bigger latency.

Nope, generally speaking, if you use the perfectly fitting window size there will be no queuing delay (if combined with pacing to avoid micro-bursts).
We have designed TCP-LoLa [*] that is delay- and window-based and able to limit the queuing delay to a configured bound (e.g., 5ms for all the flows at the same bottleneck).
[*] https://ieeexplore.ieee.org/document/8109356

Regards,
  Roland

> Anyway, the fact that L4S is not talking about how to mitigate burstiness is a big hole in the "Next Generation Congestion Control".
> Eduard
> -----Original Message-----
> From: Bless, Roland (TM) <roland.bless@kit.edu>
> Sent: Friday, April 19, 2024 13:57
> To: Vasilenko Eduard <vasilenko.eduard@huawei.com>; tsvwg@ietf.org
> Subject: Re: [tsvwg] "Pacing" requirement is lost in L4S
> 
> Hi Vasilenko,
> 
> On 19.04.24 at 12:01 Vasilenko Eduard wrote:
>> 1.
>> CC may measure RTT, understand buffer growth, and increase the interval between packets.
> 
> Delay-based CC is a different aspect and orthogonal to pacing, but typically pacing is also beneficial for them.
> 
>> Reaction by W and T looks the same effective against loaded buffer.
>> Actially W is easy to re-calculate to T. I do not see a problem.
> 
> As I said, the difference is that a sheer existence of a congestion window has a self-stabilizing effect on buffer occupation, whereas two senders that simply send with perfect average rate shares towards a bottleneck (but slightly bursty sending rate, e.g., packet sending times from an exponential distribution) may cause unlimited queue growth.
> 
>> 2.
>> It looks like I use the wrong terminology because "pacing" for me is exactly the situation when information is sent on some timer (that may change slowly).
>> But you use "pacing" in combination with "BBR rate-based sending" which looks to me like "pacing for pacing".
>> It looks like "pacing" is something different for you.
>> Probably it is my fault. But I have not understood.
> 
> Personally, I actually like to distinguish between rate-based sending and pacing:
> BBR calculates a target sending rate and uses paced sending to avoid burstiness even on smaller time scales. You could effectively use the same target sending rate over a larger interval, e.g., RTT_min and send it in a much more bursty manner.
> That would still be rate-based, but without pacing.
> 
> Regards,
>    Roland
> 
>>
>> Eduard
>> -----Original Message-----
>> From: Bless, Roland (TM) <roland.bless@kit.edu>
>> Sent: Friday, April 19, 2024 12:17
>> To: Vasilenko Eduard <vasilenko.eduard@huawei.com>; tsvwg@ietf.org
>> Subject: Re: [tsvwg] "Pacing" requirement is lost in L4S
>>
>> Hi Vasilenko,
>>
>> [not answering w.r.t. the Scalable requirement, but like to discuss a 
>> different point]
>>
>> I strongly disagree with your conclusion that we should forget about Window-based CC. Window-based CC has the big advantage of being self-stabilizing and thus limiting the amount of queuing delay.
>> Rate-based CCs are harder to control in this respect: the amount of queued data at the bottleneck buffer may grow over time in case you overestimated the available bandwidth.
>>
>> However, you are right that window-based CC may cause micro-bursts 
>> due to various causes (application rate limits or distorted ACK clocking).
>> While I agree that the ACK clock isn't that reliable anymore these 
>> days (see also
>> https://dl.acm.org/doi/10.1145/3371934.3371956 "Deprecating the TCP macroscopic model"), I disagree with abandoning window-based CCs due to that.
>> The counter-measure against micro-bursts and underutilization that may be caused by distorted ACK clocks is *pacing*!
>> This avoids micro-bursts and lets the sender keep sending for a limited time period even if ACKs are not on time.
>>
>> So my conclusion is that you should actually combine both:
>> *window-based CC + pacing* or similar to BBR rate-based sending+pacing + control of the inflight data amount (which is similar to window-based CC).
>>
>> Regards,
>>     Roland
>>
>> On 19.04.24 at 10:39 Vasilenko Eduard wrote:
>>> I see L4S as the "Congestion Control Next Generation from IETF" (that is actually in competition with "Congestion Control Next Generation from Google").
>>> Then I see the important requirement that is missing in L4S.
>>>
>>> The primary requirement for CC is that it "should not grow the buffer on the bottleneck link".
>>> It is very disputable: is "the Scalable" requirement about it or not? Let's pretend that it is about it.
>>>
>>> Then the next critical requirement is "pacing" which is missed in L4S completely.
>>> Kudos to Google because I understood its importance after one local (but big) company tested and investigated BBRv1 (then implemented it).
>>> After tests, they concluded that pacing is even more important than 
>>> low latency. (I doubt, probably latency is more important) Imagine that the server would increase the window sharply. The Server may have a 100GE interface. It could generate 10us of traffic as a burst (or even more).
>>> Intermediate links could be 100GE or even bigger (highly probable), and the burst would travel as it is (without any spreading).
>>> Then this burst could arrive at 10Mbps subscriber (good performance for shared public WiFi). 0.01ms burst would become 100ms burst.
>>> It would create many negative consequences for the bottleneck link:
>>> - tail drop if buffers are not enough
>>> - guaranteed huge latency
>>> Hence, we should completely forget about W (window) while discussing CC, we have to use T (time between packets).
>>> CC next generation "should avoid bursts regulating inter-packet time, not the information permitted in transit".
>>
>