Re: [tcpm] alpha_cubic (Issue 1)

Bob Briscoe <ietf@bobbriscoe.net> Wed, 03 August 2022 10:51 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9CB6C157B41 for <tcpm@ietfa.amsl.com>; Wed, 3 Aug 2022 03:51:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.106
X-Spam-Level:
X-Spam-Status: No, score=-2.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, GB_SUMOF=5, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mwA1GeZWRhHR for <tcpm@ietfa.amsl.com>; Wed, 3 Aug 2022 03:51:46 -0700 (PDT)
Received: from mail-ssdrsserver2.hostinginterface.eu (mail-ssdrsserver2.hostinginterface.eu [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 16DA6C157B3F for <tcpm@ietf.org>; Wed, 3 Aug 2022 03:51:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=In-Reply-To:References:Cc:To:Subject:From: MIME-Version:Date:Message-ID:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=rJau/HDF+IdLq0XehaQc6fZJNnz9lReLXe390qC2+RE=; b=iVNM2UZHNizGz4UwXh/bd3xYrF xTXQc1e5BJ0IoVqvTCbaF265oTL7xBv/S9sA27D4LVM4WEdtCXxdYE6WbropKmpW+eHi6EXp1leaR ArSjh9TtmjzEcyCdheUkn7tzF8awGT6T8FTdJePuVCWR3w0h5eoe5P34UwsicHDwXEJEa6l9fqjW8 Gv54OdugedOn7sC+utRZvp5gGaff/NhihktjN6cUpQvSjeP6BliHq/u2dyOjoW0ok6y/fZRsW1ZeH GQ0QrDOXQJDgkEh73XRYgns6xbqeYiJtRlzUkqiUOfhHAOek9B6pGdpBmr5dJvZQIjz7WauRu0PzR soZzroGg==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:44982 helo=[192.168.1.11]) by ssdrsserver2.hostinginterface.eu with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.95) (envelope-from <ietf@bobbriscoe.net>) id 1oJByf-0007Uy-40; Wed, 03 Aug 2022 11:51:43 +0100
Content-Type: multipart/alternative; boundary="------------ISVR0Hl1paRdq0Ik2Xks9pJ5"
Message-ID: <71960c97-fac1-e08e-e0bc-3893373ad1de@bobbriscoe.net>
Date: Wed, 03 Aug 2022 11:51:41 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Markku Kojo <kojo@cs.helsinki.fi>
Cc: Yoshifumi Nishida <nsd.ietf@gmail.com>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>
References: <CAAK044SjMmBnO8xdn2ogWMZTcecXoET1dmZqd6Dt3WzOUi359A@mail.gmail.com> <alpine.DEB.2.21.2108300740560.5845@hp8x-60.cs.helsinki.fi> <ccfc4dc2-0570-1ba2-66a5-b5e199f11359@bobbriscoe.net> <CAAK044T-ZtZUuq4xBSuB1E9aqHOn96orXe=8ZMJHao_j4xpK3Q@mail.gmail.com> <2a3f7032-6548-061f-c6b1-a39442699228@bobbriscoe.net> <CAAK044Rcw2iryxQ7g-1nFGjnwxNkTX8NhnuGaOeDD=sQFH0sKw@mail.gmail.com> <3fa617e4-82ac-a359-e087-a00ff7d4c633@bobbriscoe.net> <alpine.DEB.2.21.2202142333550.4019@hp8x-60.cs.helsinki.fi>
Content-Language: en-GB
In-Reply-To: <alpine.DEB.2.21.2202142333550.4019@hp8x-60.cs.helsinki.fi>
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hostinginterface.eu
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hostinginterface.eu: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hostinginterface.eu: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/FXRFh22gZ2tuo2ZiZ5X7JCAxTAo>
Subject: Re: [tcpm] alpha_cubic (Issue 1)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Aug 2022 10:51:51 -0000

Markku, [Re-sending with message history trimmed following "too large" 
bounce message]

See [BB] inline..
Sry - I marked your email from March'22 below as ToDo, then  it got lost 
in my ToDo list.

On 22/03/2022 16:47, Markku Kojo wrote:
> Hi Yoshi, Bob, all,
>
> below please find my understanding of and feedback on Bob's report
> (https://raw.githubusercontent.com/bbriscoe/cubic-reno/main/creno_tr.pdf). 
> This hopefully also clarifies what I meant when saying that the 
> equation [in the draft] assumes equal drop probability for the 
> different values of alpha and beta but the drop probability actually 
> changes when these values are changed from alpha=1, beta = 0.5. I have 
> also thought this a bit more after reading Bob's report and include my 
> understanding of why the performance with C-Reno and Reno differ in 
> case of AQM.
>
> In brief, the model for steady state behavior for the Reno-friendly 
> region in a tail-drop queue indeed is incorrect when the additive 
> increase factor alpha < 1 is applied for an alternate AIMD(a, b) TCP. 
> This is because the packet dynamics and behavior of TCP that injects 
> more packets only when cwnd has increased by at least one MSS. This 
> differs from the assumptions of the model and what Bob has used in his 
> analysis (see more below). Also the AQM case remains unvalidated, my 
> reasoning differs, or extends, somewhat the reasoning that Bob gave in 
> his report.
>
> The math itself in the original paper [FHP00] which the CUBIC 
> TCP-friendly region is based on seems correct, likewise the math in 
> Bob's report as well as in Yoshi's explanation below:
>
>>  Yoshifumi Nishida <nsd.ietf@gmail.com> wrote:
>>
>> Also, in one congestion epoch, (α=1.0,  β =0.5) increases 0.5 W1 while
>> (α=X,  β =0.7) increases 0.3 W2
>> Now, if we define the congestion epoch as e RTTs, e = 0.5 W1 / 1.0  
>> and also e = 0.3 W2 / X
>> This means, X should satisfy
>>   0.5 W1 /1.0  = 0.3 * 1.5/1.7 W1/ X
>> then we get X = 0.529.
>
>> This will mean if we launch (α=1.0,  β =0.5) and (α=0.529,  β =0.7)
>> at the same time and reduce them when the sum of their windows
>> becomes Wmax, the throughput of them will mostly be the same.
>
> I'll try to explain the problems with the model separately for a 
> tail-drop and AQM queue.
>
> 1. Tail-drop queue (*)
>
> The problem with the use of the formula in the original paper as well 
> as in Bob's and Yoshi's analyses is, like it often easily happens, 
> that the theoretical model that has been derived in one context using 
> certain assumptions is reused in another context where the assumptions 
> do not hold, i.e, the model does not match the reality.
>
> In this case, the correct use of formula presumes that the drop 
> probability is the same (on the average) for a standard TCP = 
> AIMD(1,1/2) and AIMD(a,b). That is implicitly assumed also in Bob's 
> report for synchronized (tail drop) case as Bob's math is derived 
> using the same number of packets over a cycle (= in between drops, 
> which is reciprocal of drop rate). 

[BB] That's incorrect I'm afraid (*#1*). I was careful not to assume the 
same drop probability. Quoting from the assumptions that I stated:

        All the flows are synchronized so that, when-
        ever one flow experiences loss the others do too.
        No assumption is made about how much loss
        occurs at each congestion event, except that
        all flows experience it and they only respond
        to the presence of loss, not its extent.

The two parts of loss probability are:
___no. of losses per event _
     number of packets between events.

Both top and bottom can be different for each flow under the above 
assumptions.
So there was no assumption of same loss probability here.

Nonetheless, below I relax the only assumption ("all flows experience 
it") that is occasionally not true (i.e. an approximation).

> (Btw, involving RTT in the calculations is unnecessary as all 
> competing flows see the same queue on every RTT, so RTT is the same 
> for all flows sharing the same queue, assuming the base RTT for the 
> flows is the same). Also, Yoshi's calculations above effectively make 
> the same assumption that both flows reduce cwnd at the same time 
> ("when the sum of their windows becomes Wmax").

[BB] This is also incorrect (*#2*). In the time of one cycle, a flow 
with a shorter RTT would increase more than a longer RTT flow of the 
same type. So the balance would be different.

>
> To understand the difference, we need to look at the packet dynamics 
> of flows competing in a tail drop queue, not just the math. The 
> assumption of equal drop probability holds for any two (or n) AIMD(1, 
> 1/2) TCPs competing on a shared bottleneck: when the bottleneck tail 
> drop buffer becomes full towards the end of a cycle (on RTT n), both 
> TCPs increase their cwnd by 1 for the RTT n+1 and inject two 
> back-to-back pkts resulting in a drop of 1 pkt each (the latter pkt of 
> the two back-to-back pkts = the excess pkt according to packet 
> conservation rule becomes dropped). When an AIMD(1, 1/2) TCP competes 
> with an AIMD(a=0.53, b=0.7) TCP (=CUBIC) and the tail drop buffer 
> becomes full on RTT n, the AIMD(1,1/2) TCP increases its cwnd by 1 on 
> RTT n+1 and gets 1 pkt dropped. However, AIMD(0.53,0.7) TCP increases 
> its cwnd by 0.53 and only injects an excess packet on RTT n+1 roughly 
> every second cycle because a TCP sender does not inject an additional 
> packet until cwnd has increased by at least by 1 MSS.
>
> Hence, in a synchronized case the AIMD(0.53, 0.7) TCP avoids a packet 
> drop and cwnd reduction roughly every second cycle and is able to 
> extend its cycle because the competing AIMD(1, 1/2) TCP reduced its 
> cwnd on RTT n+1 and the bottleneck buffer is not anymore full on RTT 
> n+2 when AIMD(0.53,0.7) TCP injects its additional packet (and would 
> encounter a pkt drop only if the queue is full). Because the cycle is 
> systematically extended in this way for AIMD(0,53, 0.7) (=CUBIC in 
> Reno-friendly mode), it increases the average number of pkts between 
> drops and thereby decreases the drop probility of CUBIC to be lower 
> than that of the competing AIMD (1, 1/2) TCP. These packet dynamics 
> breaks the model.

[BB] The likelihood of a flow experiencing a loss is not dependent on 
how much the flow /itself/ increases per round. Once the buffer is full, 
the probability that any one flow catches the next loss depends only on 
its packet arrival rate relative to the total packet arrival rate. The 
buffer doesn't know which flow is increasing the queue more than 
another. So all the above para is also incorrect (*#3*).

Nonetheless, you are right that a packet-level analysis is necessary.
Let's consider two flows: 1 Reno and 1 CUBIC in Reno-Friendly mode 
(CReno), and we're still assuming equal RTTs. The queue grows by (1 + 
0.53) seg/RTT between responses to losses. Once the buffer is full, one 
packet has to be dropped, but the queue continues to grow by 1.53 
segments during the next round trip (until the resulting response 
reaches the queue). So it is likely that another packet will have to be 
discarded within the same RTT as the first. The ACK clock is rarely 
completely smooth and even, so this should be thought of as a 
probabilistic process with mean 1.53, not a completely smooth growth 
with either 1 or 2 losses.

As I just said, the likelihood that one flow catches one of the losses 
depends on its packet rate relative to the other.
If the flows both have the same average window
     (which is the goal we are aiming for, as drawn in Fig 1 of the 
paper at 
https://raw.githubusercontent.com/bbriscoe/cubic-reno/main/creno_tr.pdf )
then, by triangle geometry, the ratio between the packet rates of the 
two flows when they both reach their max is
     r_r/r_c = (1 + b_c) / (1 + b_r)
                  = 1.7/1.5
                ~= 1.13.

Incidentally, you said the ratio of Reno to Cubic drops is 1 / 0.53 = 
1.9, which is also incorrect (*#4*) and greatly exaggerated compared to 
1.13.

So, when a loss occurs:
     p_r + p_c = 1             (1)
     p_r / p_c = 17/15     (2)
Therefore
     p_r = 17/32 ~= 53%
     p_c = 15/32 ~= 47%
So, if there are 2 losses for instance, the probabilities of each 
combination of the two losses within the round of the congestion event is:
     p_rr               = 53% * 53%                     = 28%
     p_rc or p_cr = 53%*47% + 47%*53% = 50%
     p_cc              = 47% * 47%                     = 22%

When the same flow is hit twice in the same round, it doesn't reduce any 
more than if it's hit once, but the other flow doesn't reduce at all (if 
there are no more than two losses in the round). So CReno is somewhat 
more likely than Reno to not get hit in some rounds. In such cases, only 
the Reno flow would reduce, then the queue would continue to grow by 
1.53 pkt/RTT, so the next cycle would be shorter and the CReno flow 
would be much more likely to be hit when it next filled the buffer --- 
and more likely to be hit twice.

It would be possible to calculate the average rate of each type of flow 
by calculating the probabilities of each chain of events 
programmatically. However, such precision is unnecessary. For the case 
of tail-drop buffers, it will be sufficient to say:
* either that the AI factor of CReno should be slightly lower than 0.53 
to make CReno and Reno flow rates more precisely equal;
* or that the average rate of CReno flows will be slightly higher in 
comparison with Reno flows, if 0.53 is used.
Here, `slightly''  means roughly within a 10% margin of error.

>
> So, the basic problem is that the model used for the Reno-friendly 
> region obviously is not valid. 

[BB] Why are you so keen to exaggerate? We've found that the model in 
the original paper is not exact, but it's a reasonable approximation. 
Why the need to make out that the sky is falling?

> The results in the original paper [FHP00] do not support the validity 
> of the model either and I am not aware that the model would have been 
> validated by anyone.
>
> A good, unresolved question is why the simulation results in the 
> original paper [FHP00] gave lower throughput for AIMD(1/5, 7/8) 
> (denoted as AIMD(1/5, 1/8) in the original paper) for the AQM case.
> Note that the paper did not give lower throughput for AIMD(1/5, 1/8) 
> with a tail drop bottleneck as the results in the paper were shown 
> only for an AQM bottleneck and the paper just said the results for 
> tail drop are similar which is a vague expression; it may also mean 
> that he results were reversed with similar difference in throughput? 
> Anyway, the results obviously did not validate the model.

[BB] Yes, the simulations in that paper are hard to explain. They used 
RED, which we no longer use, so they're not interesting anyway. I wrote 
a short alternative report that we can use instead. We don't need to 
hold up this draft because of a suspect 22-year old paper that we don't 
need to refer to any more. Let's move on please.

>
> Bob's explanation for the reason why C-Reno's packet rate decreases is 
> correct but what would happen if the buffer was not deep enough and 
> drained out seems not all correct. The packet rate is the relative 
> share on a shared bottleneck link with fixed capacity. The cwnd of the 
> C-Reno flow is larger than that of Reno in the beginning of the cycle, 
> becomes equal in the middle, and stays increasingly behind for the 
> rest of the cycle. Therefore, C-Reno's *proportional* share decreases 
> during the cycle; the overall througput is the same all the time and 
> bounded by the link capacity.
>
> If the buffer was not deep enough and drained out, the relative packet 
> rate between the two flows still behaves the same even if there is no 
> queue and also when the link is underutilized for the beginning of the 
> cycle. When the link is underutilized, the pkt rate increases for both 
> flows until the link becomes fully utilized (note that RTT does not 
> change when the link is underutilized). The pkt rate of C-Reno would 
> increase slower though but it won't change its proportinal share, 
> i.e., C-Reno will not suffer more unlike Bob suggests. The overall 
> average rate is lower in such a case because the link is underutilized 
> for the beginning of the cycle.

[BB] I don't quite know what you mean by "the relative packet rate 
between the two flows still behaves the same." The packet rates do not 
behave the same, but I think you mean that the ratio between them stays 
the same. I've now modelled this aspect as well (added to the paper), 
and I now agree that, although the Reno flow rate reduces by a greater 
absolute amount during the first part of the cycle, the ratio between 
the flow rates at each instant in the cycle does not change, so the 
ratio between the averages isn't altered if the buffer isn't deep enough 
to hold both sawteeth stacked on top of each other.

>
> AQM queue:
>
> Bob's explanation for AQM case seems pretty much being on the correct 
> track.
>
> In the AQM case the desynchronization of the flows is merely 
> guaranteed for low level of multiplexing. I agree with Bob that 
> modelling AQM case is hard and it also depends on the AQM in use. I 
> also agree Bob's analysis that towards the end of a cycle a 
> probabilistic AQM is more likely to hit Reno somewhat more often than 
> C-Reno. On the other hand, with desynchronized flows the AQM may hit 
> any flow irrespective of its phase. Therefore, the AQM is more likely 
> to hit C-Reno towards the beginning of its cycle somewhat more often 
> than Reno because C-Reno's pkt rate is then higher than that of Reno, 
> and a hit in the beginning of the cycle hurts the performance much 
> more than in the end of the cycle. A hit in the beginning of the cycle 
> results in a cwnd value that is significantly lower than in the 
> beginning of the previous cycle while a hit towards the end of cycle 
> results in a cwnd value much closer to that in the beginning of the 
> previous cycle (the behavious the resembles steady state regular 
> cycle). This might explain, at least in part, the lower throughput for 
> C-Reno in AQM case.

[BB]
[Aside: In other work, we have found that, with the smaller BDPs of the 
past, the AQM's response time was longer than the sawtooth cycles, so it 
could hit at any part of a sawtooth cycle. Whereas, with today's higher 
BDPs, the duration of Reno and CReno sawtooth cycles are longer, so AQMs 
tend to have time to reduce loss probability during the valley of a 
sawtooth, and only introduce loss (or ECN) when the sawteeth have caused 
the queue to slowly approach the operating point of the AQM. See Fig 3 
in this tech report: https://arxiv.org/pdf/2107.01003 ]

As I said during the tcpm meeting, we've run some experiments recently 
that happen to compare CUBIC-Reno with Reno. They happened to be on one 
of my slide for ICCRG the day before. They show that C-Reno and Reno 
have nearly identical sharing behaviour over a PIE AQM, for a large 
selection of different numbers of flows.

I've updated the report I wrote to try to resolve issue with the Floyd 
paper:
https://raw.githubusercontent.com/bbriscoe/cubic-reno/main/creno_tr.pdf
You'll see it now includes:
* a model of the shallow buffer case (Fig 2)
* discussion of the limitations of the synchronized loss assumption, 
using the text I've just written earlier in this email.
* plus these empirical results with a PIE AQM (Fig 4) from our ICCRG 
presentation.

>
> I think that an additional possible explanation for the discrepancy in 
> the original paper is that possibly something was wrong with the 
> simulation implementation, or maybe in the simulation parameters. Not 
> all parameters were exposed in the paper. The results also seem not to 
> involve any replications, so we cannot expect statistical validity 
> from the results, a part of the story may be explained just by random 
> effects.
>
> In addition, the reasons for the discrepancy speculated in the 
> original paper seem likely to be correct:
>
>  "One factor could be the assumption of regular, deterministic
>   drops in the deterministic model; another factor could be the
>   role played by retransmit timeouts for TCP in regimes with
>   high packet drop rates."
>
> The first one above means the incorrect assumption that drop 
> probabilities are the same. Moreover, the role of RTOs is definitely 
> one potential reason with the high number of competing flows as the 
> model considers the regular steady-state only.
>
> So, in summary, the AQM case clearly remains unvalidated as well, or 
> the validation in the original paper actually shows that the model is 
> not correct.
>
> An additional problem with the incorrect model is that it tends to 
> result is overaggressive CUBIC behaviour regardless of whether the 
> model yields too low or too high throughput (cwnd) for CUBIC sender in 
> the Reno-frindly region: if the model yields too low throughput, the 
> CUBIC sender leaves the Reno-friendly region too early and becomes 
> (much) more aggressive actual Reno sender would; otherwise, if the 
> model yields too high throughput, the CUBIC sender is too aggressive 
> throughout the Reno-friendly region.

[BB] Let's move on from that paper. We cannot know what was wrong 22 
year ago; the way forward is to run current experiments.

That's what we've done for the AQM case.
If someone does some experiments with a tail-drop buffer, I can add the 
results to that paper. Then submit it to ArXiV so that it is a 
sufficiently stable reference for the RFC Editor to use.


Bob

>
> (*) Note:
>
> The tail drop case in not always synchronized for competing Reno CC 
> flows either. Flows get more or less randomly desynchronized from time 
> to time because the total sum of the cwnds for the competing flows at 
> the end of the cycle does not necessarily match the buffering capacity 
> of the path.
> This, however, is not a big problem for the model because competing 
> flows alternate such that each flow encounters a longer cycle more or 
> less randomly. A flow having a longer cycle is then later pushed 
> towards the correct awg cwnd. In the long run the cycle lengths (= 
> drop probability) averages to be the same, and very importantly, there 
> is no systematic bias.
>
> Thanks,
>
> /Markku

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/