Re: [tsvwg] QUIC with L4S

Sebastian Moeller <moeller0@gmx.de> Thu, 07 July 2022 09:24 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B444EC15AD5B for <tsvwg@ietfa.amsl.com>; Thu, 7 Jul 2022 02:24:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.659
X-Spam-Level:
X-Spam-Status: No, score=-1.659 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uC2xGFpx_owV for <tsvwg@ietfa.amsl.com>; Thu, 7 Jul 2022 02:24:03 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 91DEFC14F746 for <tsvwg@ietf.org>; Thu, 7 Jul 2022 02:24:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1657185837; bh=bN8WbW+DOS+logw/xwQaMHJ1MRcz5oH8TWYxFqngn4Y=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=Rq75lsjWZdlMfqYT3U7drYWE1vWgDApd19CgG7qgAuiNNff22nA2auDmrfwwEsAGx dGKe3h2DE2TmxDU4tL8aSKOUtV/CGHWLmoyepBPgxacz5UoS2nusvhPYWV7e72rZE9 /xwsRNYO4mqTploSQ5x+x+iwG9B7lcUZYi/kFT3g=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from smtpclient.apple ([134.76.241.253]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MrhQC-1nm7nk48iC-00nhaL; Thu, 07 Jul 2022 11:23:57 +0200
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <9fa10cc8-5634-d170-0431-610b809e4968@huitema.net>
Date: Thu, 07 Jul 2022 11:23:55 +0200
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <A301153A-6128-4396-84A7-3F8B80F55F88@gmx.de>
References: <AM8PR07MB8137710DD707BF2DB4FDEA9DC2B89@AM8PR07MB8137.eurprd07.prod.outlook.com> <AM8PR07MB81378A432301907ABFDFC2B8C2B89@AM8PR07MB8137.eurprd07.prod.outlook.com> <77332295-c7b7-21aa-7661-af5770b4c249@huitema.net> <FD39D53A-0A47-4609-930A-DFD6526CA49B@gmx.de> <9fa10cc8-5634-d170-0431-610b809e4968@huitema.net>
To: Christian Huitema <huitema@huitema.net>
X-Mailer: Apple Mail (2.3696.100.31)
X-Provags-ID: V03:K1:MwFxVzJdJ0AD0m/uGspd5NKz0W6qc4Nq3Owqj+QmLIm6S/xzojp WbeX5oEy41EdifI8XAQ/x/EHmrigROxxKHbScpNoqYC7NyXS1TdZx64BQBuEj8uA2dgnZRL Twr+phQXj5MvHtFGG1Au7d/AIxSdKi1vHWN19Gi9D97dxuQ3spNhON5MNfn1aeJrjRRX8U1 mKFJJNhjlWDqLDgXSigWQ==
X-UI-Out-Filterresults: notjunk:1;V03:K0:S1so26IbKMQ=:XwlgqumVtlumH+3/C6t8hK DVDrG7OVtcMQ1t9udjA6XBON0uXdq0V3os1VMkj0R9/f7wevS077WTbDb+fk6hlREBthu2Eo+ QU5PUKCH5egdRy7Wq/TuImKb84mGTgClV52d61txPuDGFuVM/Az7KeOSFJ5jg89n6T9+Jb0i3 iwHDnSC1v+IDwbbcKN/AkXOUdmlLIor0xkLtFbTv9nZkfltKgNJ7YHbIBZF20hQr1KixEcSRd xwuqkFOPTrsbAZ/W5+ZMz5aNL7ZR0C2kvu1F5AWR/4IdZIKpOgE9EZrG0k4N5cNCVS0Eqlzf3 9ssaAuIyvmWZjW+F2//wLmqJYO9EIKJmcLCtwBpGrGQ4GWgWEtN00uJmAtu3CDEq6QYVlY+Uc 5Txef+0qbG5cepnc1wOXhoV9VmwzACpod5jBacId3DDHsgjwPFJsNJV8hd8S4C846RmXFLf9o v7+L0150nRc1AGjGcu2me3TjHb2SaGbbMJlhxaGcIL0TFO0c5LrxB2VnbP+dKgicANFg+75/6 H+ls5qUD1/D8riYhQWXOIuV4FNpHNtGUfB4gvBdXAxmroJAWSV9BpE1tIZvYsEg6BqcD533At 2yOXy2RR09F+9BunfLPXF4DWhewGDMTHavEdnT2F8mx88xoEF/aDNgyduivFVTGIqjlWrkuo/ UW5iO6YGCIk94/PDOfhsFyEhVXsnQSTATN3J3JB+pdSA9S5cfXxhFtiXfr0+sJwUOCbR7UyO3 TMSmJXQ8Y9xe2R8tO9udKDndX2oe+4RB4KqYcBqDF8r3w7ln5x660ccF1craf6X61KZpOVXKn 2YMfTOcXcSdRWIiIZdx71j8NRt3t0w1cv3JK5AgngAq+YnfsjClksyCUrcfgvwLfjNKXrqCNC GelO+3tWFPgiFnbjpVZC9Ic+MzhRUXNemRNrPm6SNQRRlwrR+9PKoX/nOe9CSxrLzl2i+bZJM nFAqA9kI2eUjXPhxylbVOMEmrlT3I82Z4lSf/ZBeFCajFYZm3Dt8qaTLSoVhu806lP6tDrUNG nMsTZ9h/CuahHHHN1d7MyHSBxNkqI+Eu5shYZF/7BYYpwfxpD21WQv+mNJIji3wUrFjAqcKi3 aABP+FcSfRqq2Z4Ex975XjR1wENMS6QfW5nKcL+aHh2H2f5339pnhZhYg==
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/QgE0TonncY5rhsVEyCSDq0iiVjs>
Subject: Re: [tsvwg] QUIC with L4S
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2022 09:24:04 -0000

Hi Christian,


> On Jul 6, 2022, at 23:19, Christian Huitema <huitema@huitema.net> wrote:
> 
> 
> 
> On 7/6/2022 12:09 AM, Sebastian Moeller wrote:
>>> We may also need to do some work on "slow start". My implementation exist slow start on first ECN-CE mark, but at that point there is a full flight of packets in transit, which is going to cause either packet losses or spikes in latency.
>>> 
>> 	[SM] Is this actually fixable? This seems to require an oracle or alternatively a quick side-effect-free method to empirically estimate a given flows immediate path capacity. Then again, current Prague essentially only works reasonably well on short-RTT paths without bottlenecks with plain FIFOs, so the scenarios were L4S signaling has been demonstrated to work (to some degree) will not suffer from too much data in flight, no?
> My preferred fix would use a two-epoch cycle: send data for one epoch at a tentative high rate, then use a slower rate for the next epoch while the feedback from sending packets at the high rate is collected.

	[SM] For that to beat current "slow-start" the high rate epochs need to be considerably higher than today (because we want to increase the average rate over the slow start period, or rather shorten the "race to the capacity limit"). That will cause problems, because on nodes where such flows meet and duke it out traffic will become more volatile (and potentially more likely to oscillation/"resonance" phenomena). Not saying that faster than current slow-start is not desirable or not possible, just that this will likely come with its own side-effects.


> Based on feedback, either find a plausible rate and get out of slow start, or repeat the two-epoch cycle with a higher tentative rate. This might be combined with creative use of pacing to bunch lots of packets in a burst, then use the transmission time of the burst to assess the high data rate.

	[SM] Well packet pair or packets of variable size can be both used to deduce throughput capacity, but both are not super robust and reliable, e.g. any parallel path (bonded link that does not hash full flows persistently to the same member) can introduce both reordering and remove the inter-packet delay as reliable throughput correlate (similar problems are introduced by all processes/nodes that introduce packet bunching/burts*). All of which I believe is old news... In the context of L4S there was an attempt at having another go at this issue, I think called "paced chirping", but all that lead to so far is a thesis and an IEEE conference paper (and maybe a few more). So IMHO is still unclear how robust this approach is working in general over the existing internet (though I do hope it achieves its goals).


*) Like WiFi, or G.INP retransmissions in DSL, I am sure other physical layers with retransmit will have similar properties.

> That would result in a "make before you break" approach, avoiding too much queuing and too many losses during initial ramp up, which would be nicer for the competing users of the bottleneck.

	[SM] Yes, it would be fine, but the problem is the measure it wants to use is already known to be unreliable (on short time scales, as so often averaging will help, but cost time).

> BBR uses a similar approach when "searching for bottleneck bandwidth", but it only seek a maximum 25% improvement in two epochs. That would probably be too slow, and implementations would demand something faster for slow start. And of course we should remember that most connections manage to send all their data before exiting slow start...

	[SM] I am probably wrong, but I thought both BBR and traditional slow start double rate/cwnd every round/RTT, so it seems that BBR has not "fixed" the slow start issue (caveat doubling the sending rate is unlikely to be exactly equivalent to doubling the cwnd, but I naively? assume they will be somewhat similar in effect).
	There will always be flows that finish before a reliable estimate of the flows share of capacity is available. So the question IMHO becomes "are the side effects from making start-up more aggressive (it is not that exponential growth is not plenty aggressive already*) is worth the benefit of speeding up short flows"? 

*) and can easily be scaled up by using another factor, say quadruple each round.



> 
>>> I also had to try a couple of different settings for the threshold value used in the simulation. Too low, and the throughput drops. Too high, and the amount of losses increases too much. In the tests, the router has a queue size of 1 BDP, and setting the threshold to BDP/4 appears to work well.
>>> 
>> 	[SM] I thought both legs of an L4S-AQM engage based on sojourn time (either measured directly or estimated from queue size and known egress rate), why does the maximal length of the queue matter here? What AQM are you using?
> I was simulating a straight L4S AQM, with the binary marking: keep ECT0

	[SM] Confused, should that not be ETC(1) for L4S? For ECT(0) the expected behavior would be an rfc3168 compliant response IIUC.


> if no queue, switch to CE if queue longer than threshold.

	[SM] Okay, so that uses queue length (in bytes I assume) as proxy for sojourn time (which for fix egress time it will be).

> The overall buffer size is only used when not using Prague, or during slow start. The issue was "where to set the threshold". Most of the complexity comes from the interaction between the L4S AQM and the typical "leaky bucket" pacing used by the transport: we can get lots of CE marks if the leaky bucket size is larger than L4S threshold at the bottleneck. And of course the network admin setting the threshold does not know what parameters the end-to-end transport used for pacing. And vice versa.

	[SM] I interpret that as an example of L4S' intolerance to bursty traffic, it is interesting though that the little burstiness inherent in a TB-pacer apparently is enough to cause noticeable performance degradation. Interesting, although not surprising given Pete's data (https://github.com/heistp/l4s-tests#underutilization-with-bursty-links). 

Regards
	Sebastian


> 
> -- Christian Huitema
> 
>