Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2

Michael Welzl <michawe@ifi.uio.no> Fri, 29 July 2022 06:51 UTC

From: Michael Welzl <michawe@ifi.uio.no>
Message-Id: <AA49A70E-C66C-4715-BE45-86C74D603FEF@ifi.uio.no>
Content-Type: multipart/alternative; boundary="Apple-Mail=_2F184D13-DBB5-4501-B3AC-56F4C6F872C5"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
Date: Fri, 29 Jul 2022 08:50:49 +0200
In-Reply-To: <alpine.DEB.2.21.2207232057050.7292@hp8x-60.cs.helsinki.fi>
Cc: Yoshifumi Nishida <nsd.ietf@gmail.com>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>
To: Markku Kojo <kojo@cs.helsinki.fi>
References: <CAAK044QZxWR6EMi6x+KFWrzkx885BnoQAAbPLqf-EqRHOc_htw@mail.gmail.com> <DAED20B6-5EC1-41E2-94A5-DD1A8D671962@ifi.uio.no> <alpine.DEB.2.21.2207232057050.7292@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/zbzTtwvD8E3Jfm27zbzzPwobtkM>
Subject: Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2
Precedence: list

Hi !


>> Now, Markku’s calculations (about cwnd overshoot) are all correct if everything is paced (which is why the
>> bbrv2 strategy mentioned by neal sounds just right!), but without pacing, the point of first loss depends on
>> the burst size (of all flows) vs the bottleneck queue length, and as a result, cubic (or reno, for that
>> matter) may exit SS much earlier than what Markku describes, I think. Hence somehow being bursty “helps” a
>> little here.
> 
> I believe you mean when the network paces the packets, i.e., when the packet delivery rate is more or less fixed such that Acks arrive nicely paced?

Actually, no, I mean “real” pacing, where packets are artificially delayed, e.g. spread across the RTT.

Without this, under idealized conditions, this is what happens (note, “idealized” also means that I ignore DelACK / ABC effects - they will distort what I describe here to some degree, but in essence it’s similar; I’ll also note that I did see this behavior in a testbed - idealized, controlled, sure, but actual Linux hosts - and this assumes a FIFO queue of course):

Say, we begin with IW 10, and these 10 packets are sent out as a burst. Then, 10 ACKs arrive "nicely paced” (by the network - i.e. reflecting the bottleneck's capacity). Then, ACK clocking would ensure that the next packets would be sent out exactly at line rate **if the number of packets didn't grow**.  However, SS injects an extra packet after every packet that ACK clocking “allows" - we all know that cwnd becomes 20 in this round. So, there is a burst of 20 packets, and exactly 10 (the new 10) of these 20 packets exceed the line rate when the burst hits the bottleneck. There, they enter the queue.

So, slow start early is indeed quite likely to be terminated early, because this queue exceeds the limit long before the cwnd really reflects 2 * the network’s capacity. It depends on the queue length.
To bring this back to reality, I’ll add that our experience with pacing is indeed that slow start can go on much longer than it normally does.


> If the sender does the pacing that should help only during the initial RTT as after the bottleneck is fully utilized, packets are autumatically paced (if the network deliveres packets nicely at fixed rate).
> 
> Sure, I think the exit may be both earlier or later! A large enough back-to-back burst with a shallow bottleneck queue may possibly have a notable effect and make SS exit to occur somewhat earlier.

It depends on the queue length, of course...


> However, if in deterministic network conditions a sender injects 100% more packets during the last RTT of SS than what the available bandwidth is that results in 40% overload (undelivered packets) during the RTT following the SS (with Beta=0.7).

Just to be clear: that particular statement is what I think only works out as you describe when packets truly are paced (not by the network, but by the sender injecting extra delay). Again, what I say matches our practical experience with pacing in our testbed. I’ll also add that the above description also plays out like you describe here without sender-side pacing when there’s exactly 1 BDP of queuing - see below. In practice, this may be a corner case, because network administrators hardly know the actual end-to-end BDP.


> Now, having zero undelivered pkts with Beta=0.7 in slow start would require that the SS exit in the last RTT of SS occurs before the sender has injected more than 43% beyond the available bandwidth. Otherwise, it will inject at least some undelivered packets. Sending any undelivered packets is unadviseable.

I agree with that…


> It would be nice to see measurement data from various networks showing how much the SS exit point varies in reality.

…. and with that!!!   It would be particularly nice to see: cwnd at SS exit vs. later cwnd at the top, in CA => because the latter is likely to reach the capacity limit, whereas I claim that the cwnd at SS exit is not.


> Also, we should remember that there certainly are network paths that represent the deterministic/paced behaviour and CC algos must work appropriately in a wide range of environments.
> 
> On the other hand, most (if not all) AQMs are very slugghish is reacting to the rapid ramp up in slow start. They often react only after the sender has increased its cwnd well beyond the average dropping (marking) point, i.e., the actual cwnd at the time the congestion is signalled is often (much) more than double the (average) saturation point. And most of these excess packets are queued because there are only a few losses (or just marks) until the physical queue is exhausted. CUBIC specifies Beta = 0.7 also when in slow start and ECN is enabled, while ABE (RFC 8511) does not specify larger Beta in slow start, only in CA.

Yes, I’m aware of that   :-)


> I couldn't access the papers ABE cites, but I believe larger Beta in slow start resulted in prolonged delay peak after the slow start, indicating exactly the same overload that results in undelivered packets in a tail-drop queue?

The main paper is this one, which is openly available:  https://folk.universitetetioslo.no/michawe/research/publications/Networking2017ABE.pdf <https://folk.universitetetioslo.no/michawe/research/publications/Networking2017ABE.pdf>
Now, here, the overshoot depicted in fig. 3 is as you say, but we also had exactly a BDP of queuing, which indeed yields the behavior that you describe, but that’s really only for this particular queue size.


> With Beta 0.7 during SS and an AQM at the bottleneck, we are likely to see longer delay spikes due to slow-start overshoot?
> 
> Maybe Michael has some insights what were the results/reasons behind the ABE dacision? Nevertheless, what is the justification for CUBIC to use Beta=0.7 also in slow start with ECN enabled while ABE does not?

I don’t remember what the reasons were but it may well have been you (rightfully, in this case!) complaining about it!   :-)


>> Not sure what recommendation to take from this….. but I think it’s why the current 0.7 choice often works
>> reasonably well in practice.
> 
> Do we possibly have any measurement data to back up "0.7 choice often works reasonably well in practice".

I wrote this under the assumption that this Is this how anyone's Cubic implementation already does it. Is that wrong? Indeed, does anyone have data?  (I don’t, sorry)

Cheers,
Michael

[tcpm] CUBIC rfc8312bis / WGLC Issue 2 Markku Kojo
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Markku Kojo
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Vidhi Goel
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Neal Cardwell
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Markku Kojo
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Markku Kojo
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Yoshifumi Nishida
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Markku Kojo
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Yoshifumi Nishida
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Michael Welzl
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Rodney W. Grimes
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Markku Kojo
Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2 Michael Welzl