Re: [tsvwg] "Pacing" requirement is lost in L4S

Vasilenko Eduard <vasilenko.eduard@huawei.com> Tue, 23 April 2024 14:11 UTC

Return-Path: <vasilenko.eduard@huawei.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7C030C14F6BA for <tsvwg@ietfa.amsl.com>; Tue, 23 Apr 2024 07:11:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.892
X-Spam-Level:
X-Spam-Status: No, score=-1.892 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eARIEqKDW8ES for <tsvwg@ietfa.amsl.com>; Tue, 23 Apr 2024 07:11:11 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CA4B5C14F6E4 for <tsvwg@ietf.org>; Tue, 23 Apr 2024 07:11:10 -0700 (PDT)
Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4VP3rL44Fbz6JBVr; Tue, 23 Apr 2024 22:08:50 +0800 (CST)
Received: from mscpeml100004.china.huawei.com (unknown [7.188.51.133]) by mail.maildlp.com (Postfix) with ESMTPS id 4FB2E140447; Tue, 23 Apr 2024 22:11:07 +0800 (CST)
Received: from mscpeml500004.china.huawei.com (7.188.26.250) by mscpeml100004.china.huawei.com (7.188.51.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 23 Apr 2024 17:11:06 +0300
Received: from mscpeml500004.china.huawei.com ([7.188.26.250]) by mscpeml500004.china.huawei.com ([7.188.26.250]) with mapi id 15.02.1258.028; Tue, 23 Apr 2024 17:11:06 +0300
From: Vasilenko Eduard <vasilenko.eduard@huawei.com>
To: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>
CC: "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] "Pacing" requirement is lost in L4S
Thread-Index: AdqSNQebXSBBdSUyTsuxQQxBxCIgzwAFxB4AAMGxtNAABatfAAAGiDcw
Date: Tue, 23 Apr 2024 14:11:06 +0000
Message-ID: <a984a93fdea6472ab53d973a1c8354b1@huawei.com>
References: <a19c38376c7541b89a3d52841141fa0c@huawei.com> <CADVnQym-2e7dMeFKSZp-xY7j_vcN349AX_yBTqt0giai4VzHoQ@mail.gmail.com> <b5652106fd66420d92ab71496208b1bf@huawei.com> <CADVnQy=bi5bk1-_exwFSWfyfyBkMNZK0_y-mPN=STgn3hxe44w@mail.gmail.com>
In-Reply-To: <CADVnQy=bi5bk1-_exwFSWfyfyBkMNZK0_y-mPN=STgn3hxe44w@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.199.56.41]
Content-Type: multipart/alternative; boundary="_000_a984a93fdea6472ab53d973a1c8354b1huaweicom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/zPAsYGtTReZxLvbPhM5J1FXLfAY>
Subject: Re: [tsvwg] "Pacing" requirement is lost in L4S
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Apr 2024 14:11:15 -0000

Hi Neal,

The logical conclusion from your message is that BBR is better to use with ECN too. No doubt that ECN makes a difference (not a big one – see the next paragraph).
Let’s compare apple to apple. Compare “Something with ECN” against “BBR with ECN”.


  *   Ultimately, that queuing occurs because BBR employs substantial filtering of the bandwidth and delay signals it measures. And that's because public Internet paths usually have a wifi, DOCSIS, or cellular hop, and all of those technologies have very noisy delay characteristics.
Sorry, but filtering/smoothing would be needed for any CCA, even for CCA with ECN (clear congestion signal).
The problem is that the control loop in the case of CCA may not be faster than the session itself (if the session is not accumulating the big queue – we assume a good CCA algorithm).
As you said: radio could jump faster (locally) than the smallest control loop latency with ECN.
But for good control in automation theory, we need the control system to react much faster than the controlled system – It is not possible in CCA.

Hence, I do believe that mandatory filtering/smoothing would prevent a big difference between:

  1.  BBR without ECN,
  2.  BBR with ECN,
  3.  Something else with ECN.
Smoothing for a few RTT intervals would be needed in all cases. It would destroy the hypothetical advantage of ECN.

But the small advantage would be. Hence, better to implement ECN(0). Just I doubt that ECN(1) would be needed (for the separate queue) – especially on a fair apple-to-apple comparison of different CCAs with ECN.
Unfortunately, the smoothing for a few  RTTs could kill even the desire to have BBR with ECN. The world could stop on BBR without ECN.
Reminder, a lot of routers out there do not support ECN in AQM. And traffic (logistic curve) is almost at a plateau (+20% left) – fewer and fewer incentives to upgrade.

PS: By the way, I completely agree with L4S requirement that ECN should be filtered/smoothed on the source, AQM should send ECN as soon as possible. Let’s make it more flexible – software (on the host) would be easy to change.
Ed/
From: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>
Sent: Tuesday, April 23, 2024 16:33
To: Vasilenko Eduard <vasilenko.eduard@huawei.com>
Cc: tsvwg@ietf.org
Subject: Re: [tsvwg] "Pacing" requirement is lost in L4S

On Tue, Apr 23, 2024 at 4:13 AM Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org<mailto:40huawei.com@dmarc.ietf.org>> wrote:
Hi Neal,
Big thanks for your answer.

1.
For sure, smart people prefer cooperation to competition. CCAs are competing in reality, would like authors or not.
However I do not see a value for L4S after BBR finishes the migration (BBR should become the default for Windows, Linux, iOS, and Android to claim that the transition is finished, half of the real traffic is not enough to claim the transition finished).
Theoretically, L4S has a separate queue where latency is possible to guarantee. It should be better.
But practically, if the Classic queue would be occupied almost exclusively by BBR, then what would be the additional value of a separate queue?
I strongly suspect: miserable.
It is possible to prove only by tests or better pilot deployment.
If my suspect is true then there would be no motivation for additional hardware requirements of L4S.

The additional value of L4S is substantial, even when compared to BBR.

BBR attempts to bound the level of queuing in the bottleneck buffer. However, for multiple BBR flows that bound can cause up to around 1.5*BDP of queuing in the steady-state, with a bottleneck that doesn't support L4S. Ultimately, that queuing occurs because BBR employs substantial filtering of the bandwidth and delay signals it measures. And that's because public Internet paths usually have a wifi, DOCSIS, or cellular hop, and all of those technologies have very noisy delay characteristics.

L4S provides a precise ECN signal that allows senders to know when there is excessive queuing at the bottleneck. It takes the guesswork out of the equation and allows the queuing caused by flows (multiple flows or a single flow) to be *much* shorter: typically at worst O(milliseconds) of queuing, instead of the 1.5*BDP of queuing you would get from multiple BBR flows in a non-L4S bottleneck.

best regards,
neal


L4S is good when compared to RENO which does not exist anymore.
L4S should be good compared to CUBIC, but for unknown reasons – it does not try to compare itself to CUBIC.

2.
Section 8 in general is “Security”. People typically do not read that section.
Requirements are in section 4 - It has nothing about burstiness.
The quote from section 8.2 has a little sense and may be interpreted very loosely.
Not many people would be capable of giving the interpretation that you did.

I did not read the Prague CCA yet because it has the status “personal opinion with zero deployments”.
It may be that it addresses the burstiness properly.
I am trying to understand and predict what will happen next, Prague does not look yet as the probable future.

Eduard
From: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org<mailto:40google.com@dmarc.ietf.org>>
Sent: Friday, April 19, 2024 17:24
To: Vasilenko Eduard <vasilenko.eduard@huawei.com<mailto:vasilenko.eduard@huawei.com>>
Cc: tsvwg@ietf.org<mailto:tsvwg@ietf.org>
Subject: Re: [tsvwg] "Pacing" requirement is lost in L4S

On Fri, Apr 19, 2024 at 4:39 AM Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org<mailto:40huawei.com@dmarc.ietf.org>> wrote:
Hi Experts,
I see L4S as the "Congestion Control Next Generation from IETF" (that is actually in competition with "Congestion Control Next Generation from Google").

IMHO BBR is not in competition with L4S.

BBR is, at its core, about maintaining an explicit model of the network path using whatever signals are available, and using that model to try to achieve low/bounded delay, low loss, and high throughput.

L4S, IMHO, is largely about creating a low-latency, low-loss, scalable throughput service model (metaphorically a "lane") in Internet bottlenecks, and using ECN to provide a signal to achieve that.

There is nothing fundamentally at odds about those two models. And once the details of the Prague congestion control algorithm are finalized, one goal (as our team has mentioned for a number of years) is to have a version of BBR that can use L4S signals and coexist with Prague congestion control in the L4S lane of Internet bottlenecks.

Then I see the important requirement that is missing in L4S.

The primary requirement for CC is that it "should not grow the buffer on the bottleneck link".
It is very disputable: is "the Scalable" requirement about it or not? Let's pretend that it is about it.

Then the next critical requirement is "pacing" which is missed in L4S completely.

IMHO it is not at all fair or accurate to say that L4S misses the pacing requirement. :-)
The pacing requirement is implicit in RFC 9330, at the very least in Section 8.2,  'Latency Friendliness':
   https://www.rfc-editor.org/rfc/rfc9330.html#section-8.2
That section says: "Like the Classic service, the L4S service relies on self-restraint to limit the rate in response to congestion. In addition, the L4S service requires self-restraint in terms of limiting latency (burstiness)." The only approach I'm aware of to limit the "rate" and "burstiness" of a flow, and the "latency" that it imposes, is to use pacing.

And this is explicit in Section 2.5.1, "Packet Pacing", of the Prague congestion control draft, which is part of the L4S suite of documents:
   https://www.ietf.org/archive/id/draft-briscoe-iccrg-prague-congestion-control-03.html#section-2.5.1
That section says: "A Prague CCA MUST pace the packets it sends to avoid the queuing delay and under-utilization that would otherwise be caused by bursts of packets..."

best regards,
neal


Kudos to Google because I understood its importance after one local (but big) company tested and investigated BBRv1 (then implemented it).
After tests, they concluded that pacing is even more important than low latency. (I doubt, probably latency is more important)
Imagine that the server would increase the window sharply. The Server may have a 100GE interface. It could generate 10us of traffic as a burst (or even more).
Intermediate links could be 100GE or even bigger (highly probable), and the burst would travel as it is (without any spreading).
Then this burst could arrive at 10Mbps subscriber (good performance for shared public WiFi). 0.01ms burst would become 100ms burst.
It would create many negative consequences for the bottleneck link:
- tail drop if buffers are not enough
- guaranteed huge latency
Hence, we should completely forget about W (window) while discussing CC, we have to use T (time between packets).
CC next generation "should avoid bursts regulating inter-packet time, not the information permitted in transit".

Best Regards
Eduard Vasilenko
Senior Architect
Network Algorithm Laboratory
Tel: +7(985) 910-1105<tel:+7%20985%20910-11-05>