Re: [tcpm] alpha_cubic

Yoshifumi Nishida <nsd.ietf@gmail.com> Mon, 18 April 2022 05:02 UTC

Return-Path: <nsd.ietf@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE3333A1EC9 for <tcpm@ietfa.amsl.com>; Sun, 17 Apr 2022 22:02:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.894
X-Spam-Level: **
X-Spam-Status: No, score=2.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, GB_SUMOF=5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vt5iLV4iEteu for <tcpm@ietfa.amsl.com>; Sun, 17 Apr 2022 22:02:02 -0700 (PDT)
Received: from mail-oa1-x36.google.com (mail-oa1-x36.google.com [IPv6:2001:4860:4864:20::36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 71D4C3A1EC5 for <tcpm@ietf.org>; Sun, 17 Apr 2022 22:02:02 -0700 (PDT)
Received: by mail-oa1-x36.google.com with SMTP id 586e51a60fabf-de3ca1efbaso13242361fac.9 for <tcpm@ietf.org>; Sun, 17 Apr 2022 22:02:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=P4mZP8zHWd03OZyLiEEhYypqZhzprpAIoX5x9BnxcJ0=; b=PPz7JBwub1NUDQQizedXnmUICLB7proN0ipORt6f1vFNXLDoiKh2wK2A0xYl0ftRXy jjDRSyqP4qc4rkIEU/WzYEvUxnptx7eH953Cjc6noUsuopGBBiwLa+F74vrkW2jv9z/G 8uLtE38htjprNtzzBoqrWJ4brejaWyHK0UDZ9FEm0k/5tlurSgE6PsilVsiaalfYhZ3o 6ggKngUZhpagh2gEwaCu/XirkVf/L/vz1tdPr3Wby9rHV7JxUU5bxPxF2KNGZI0CmdIa k/yHdOMX8MeyAD1esIqCY4trCz5WCyITx8M2msxZDjXs+dYr6ttkcJanaNoGiEOQMb5U +elQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=P4mZP8zHWd03OZyLiEEhYypqZhzprpAIoX5x9BnxcJ0=; b=iA/O+h7GCfworsSyA9EWvyteAmbzmYHaihyRf07CAugPto9Ahl1kQIRDS02hmm79Mx /jOcKAziHTJj/I4P1pZFlHe1DHTT+sqY7kvDy9lsH4MOGte2t1lSB8Tuo9cvSDu/Jr0i 30lJh63elYJjoN2mhBAyFbD7VQoU8uqWwffIr49QQ9/awACe2EVy+gsKBHCg5yb0yaXd LN/WtKYNCejZ6QEKxIfQfIxb+9GwsZxEE/xWoY0fNdfKRY2J2uTMurAuPbk0nfp2aWlS /5Cc6fpmj0IzxsBAlUYz8oOTQc2JPxRejT46n40uRonjtD/wLayCmLTfRUZZuvlvibVk cx/g==
X-Gm-Message-State: AOAM53229KV005wtjOAJbpYpoJfQCr7X5/76brIWkkXSgfouIuBJ9Bts MatqTmx5QH05q4Lx/O82TIOKgtHWHQvnbzQ4ynsQKE9t
X-Google-Smtp-Source: ABdhPJx1LR1NUL7RbxMdNv2OsnNEiFHnl9x3jNedKXKHagfe3du1ePEVpNFst/r5Pia7GwPOTojW415mX6skIefMUoU=
X-Received: by 2002:a05:6870:2314:b0:e2:a480:eb4d with SMTP id w20-20020a056870231400b000e2a480eb4dmr3609402oao.133.1650258120767; Sun, 17 Apr 2022 22:02:00 -0700 (PDT)
MIME-Version: 1.0
References: <CAAK044SjMmBnO8xdn2ogWMZTcecXoET1dmZqd6Dt3WzOUi359A@mail.gmail.com> <alpine.DEB.2.21.2108300740560.5845@hp8x-60.cs.helsinki.fi> <ccfc4dc2-0570-1ba2-66a5-b5e199f11359@bobbriscoe.net> <CAAK044T-ZtZUuq4xBSuB1E9aqHOn96orXe=8ZMJHao_j4xpK3Q@mail.gmail.com> <2a3f7032-6548-061f-c6b1-a39442699228@bobbriscoe.net> <CAAK044Rcw2iryxQ7g-1nFGjnwxNkTX8NhnuGaOeDD=sQFH0sKw@mail.gmail.com> <3fa617e4-82ac-a359-e087-a00ff7d4c633@bobbriscoe.net> <alpine.DEB.2.21.2202142333550.4019@hp8x-60.cs.helsinki.fi>
In-Reply-To: <alpine.DEB.2.21.2202142333550.4019@hp8x-60.cs.helsinki.fi>
From: Yoshifumi Nishida <nsd.ietf@gmail.com>
Date: Sun, 17 Apr 2022 22:01:49 -0700
Message-ID: <CAAK044Rmjmw-uPFWUk7XfvqQzyLMEL8zzMpTv298HrF6oiy4wg@mail.gmail.com>
To: Markku Kojo <kojo@cs.helsinki.fi>
Cc: Bob Briscoe <ietf@bobbriscoe.net>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000137b9805dce6a93f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/yR1mpIyKeCpoJgfCaUENsFtSEc8>
Subject: Re: [tcpm] alpha_cubic
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2022 05:02:11 -0000

Hi Markku, Bob,

Sorry for my slow response.
Also, thank you so much for the very detailed explanations.
Based on them, I agree that we can say the model used for Reno-Friendly
region is unvalidated for both tail drop and AQM cases.
This could be a problem for CUBIC, however, I have two opinions on this.

1: I think the model is unvalidated, but at the same time, I haven't seen a
solid analysis on how bad it is.
    It appears to be more aggressive than NewReno, but we would like to
understand whether this is a serious threat or something negligible.
    I think we will need detailed packet dynamics analysis for this, which
seems to be a long-term research topic to me.

    As CUBIC has already been deployed for years while we haven't seen
significant troubles, I think this is not an issue that requires immediate
actions.
    So, I think we can just continue discussions and analysis for now and
create a solution or a remedy once we identify the issue.

2: I am a bit wondering about the impact of the model when Reno and CUBIC
compete.
    I think Reno-Friendly region in CUBIC will only be used when w_max >
cwnd.
    However, I am not sure how much CUBIC's cwnd can grow more than w_max
in competitive situations.
    This might also require detailed packet dynamics analysis, but I think
we'll need to think about this point while analyzing the validity of
CUBIC's logics.

Thanks,
--
Yoshi

On Tue, Mar 22, 2022 at 9:47 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:

> Hi Yoshi, Bob, all,
>
> below please find my understanding of and feedback on Bob's report
> (https://raw.githubusercontent.com/bbriscoe/cubic-reno/main/creno_tr.pdf).
>
> This hopefully also clarifies what I meant when saying that the equation
> [in the draft] assumes equal drop probability for the different values of
> alpha and beta but the drop probability actually changes when these
> values are changed from alpha=1, beta = 0.5. I have also thought this a
> bit more after reading Bob's report and include my understanding of why
> the performance with C-Reno and Reno differ in case of AQM.
>
> In brief, the model for steady state behavior for the Reno-friendly
> region in a tail-drop queue indeed is incorrect when the additive
> increase factor alpha < 1 is applied for an alternate AIMD(a, b) TCP.
> This is because the packet dynamics and behavior of TCP that injects more
> packets only when cwnd has increased by at least one MSS. This differs
> from the assumptions of the model and what Bob has used in his analysis
> (see more below). Also the AQM case remains unvalidated, my reasoning
> differs, or extends, somewhat the reasoning that Bob gave in his report.
>
> The math itself in the original paper [FHP00] which the CUBIC
> TCP-friendly region is based on seems correct, likewise the math in Bob's
> report as well as in Yoshi's explanation below:
>
> >  Yoshifumi Nishida <nsd.ietf@gmail.com> wrote:
> >
> > Also, in one congestion epoch, (α=1.0,  β =0.5) increases 0.5 W1 while
> > (α=X,  β =0.7) increases 0.3 W2
> > Now, if we define the congestion epoch as e RTTs, e = 0.5 W1 / 1.0  and
> > also e = 0.3 W2 / X
> > This means, X should satisfy
> >  0.5 W1 /1.0  = 0.3 * 1.5/1.7 W1/ X
> > then we get X = 0.529.
>
> > This will mean if we launch (α=1.0,  β =0.5) and (α=0.529,  β =0.7)
> > at the same time and reduce them when the sum of their windows
> > becomes Wmax, the throughput of them will mostly be the same.
>
> I'll try to explain the problems with the model separately for a tail-drop
> and AQM queue.
>
> 1. Tail-drop queue (*)
>
> The problem with the use of the formula in the original paper as well as
> in Bob's and Yoshi's analyses is, like it often easily happens, that the
> theoretical model that has been derived in one context using certain
> assumptions is reused in another context where the assumptions do not
> hold, i.e, the model does not match the reality.
>
> In this case, the correct use of formula presumes that the drop
> probability is the same (on the average) for a standard TCP = AIMD(1,1/2)
> and AIMD(a,b). That is implicitly assumed also in Bob's report for
> synchronized (tail drop) case as Bob's math is derived using the same
> number of packets over a cycle (= in between drops, which is reciprocal
> of drop rate). (Btw, involving RTT in the calculations is unnecessary as
> all competing flows see the same queue on every RTT, so RTT is the same
> for all flows sharing the same queue, assuming the base RTT for the flows
> is the same). Also, Yoshi's calculations above effectively make the same
> assumption that both flows reduce cwnd at the same time ("when the sum of
> their windows becomes Wmax").
>
> To understand the difference, we need to look at the packet dynamics of
> flows competing in a tail drop queue, not just the math. The assumption
> of equal drop probability holds for any two (or n) AIMD(1, 1/2) TCPs
> competing on a shared bottleneck: when the bottleneck tail drop buffer
> becomes full towards the end of a cycle (on RTT n), both TCPs increase
> their cwnd by 1 for the RTT n+1 and inject two back-to-back pkts
> resulting in a drop of 1 pkt each (the latter pkt of the two back-to-back
> pkts = the excess pkt according to packet conservation rule becomes
> dropped). When an AIMD(1, 1/2) TCP competes with an AIMD(a=0.53, b=0.7)
> TCP (=CUBIC) and the tail drop buffer becomes full on RTT n, the
> AIMD(1,1/2) TCP increases its cwnd by 1 on RTT n+1 and gets 1 pkt
> dropped. However, AIMD(0.53,0.7) TCP increases its cwnd by 0.53 and
> only injects an excess packet on RTT n+1 roughly every second cycle
> because a TCP sender does not inject an additional packet until cwnd has
> increased by at least by 1 MSS.
>
> Hence, in a synchronized case the AIMD(0.53, 0.7) TCP avoids a packet
> drop and cwnd reduction roughly every second cycle and is able to extend
> its cycle because the competing AIMD(1, 1/2) TCP reduced its cwnd on RTT
> n+1 and the bottleneck buffer is not anymore full on RTT n+2 when
> AIMD(0.53,0.7) TCP injects its additional packet (and would encounter a
> pkt drop only if the queue is full).
> Because the cycle is systematically extended in this way for AIMD(0,53,
> 0.7) (=CUBIC in Reno-friendly mode), it increases the average number of
> pkts between drops and thereby decreases the drop probility of CUBIC to
> be lower than that of the competing AIMD (1, 1/2) TCP. These packet
> dynamics breaks the model.
>
> So, the basic problem is that the model used for the Reno-friendly
> region obviously is not valid. The results in the original paper [FHP00]
> do not support the validity of the model either and I am not aware that
> the model would have been validated by anyone.
>
> A good, unresolved question is why the simulation results in the
> original paper [FHP00] gave lower throughput for AIMD(1/5, 7/8)
> (denoted as AIMD(1/5, 1/8) in the original paper) for the AQM case.
> Note that the paper did not give lower throughput for AIMD(1/5, 1/8) with
> a tail drop bottleneck as the results in the paper were shown only for an
> AQM bottleneck and the paper just said the results for tail drop are
> similar which is a vague expression; it may also mean that he results
> were reversed with similar difference in throughput? Anyway, the results
> obviously did not validate the model.
>
> Bob's explanation for the reason why C-Reno's packet rate decreases is
> correct but what would happen if the buffer was not deep enough and
> drained out seems not all correct. The packet rate is the relative share
> on a shared bottleneck link with fixed capacity. The cwnd of the C-Reno
> flow is larger than that of Reno in the beginning of the cycle, becomes
> equal in the middle, and stays increasingly behind for the rest of the
> cycle. Therefore, C-Reno's *proportional* share decreases during the
> cycle; the overall througput is the same all the time and bounded by the
> link capacity.
>
> If the buffer was not deep enough and drained out, the relative packet
> rate between the two flows still behaves the same even if there is no
> queue and also when the link is underutilized for the beginning of the
> cycle. When the link is underutilized, the pkt rate increases for both
> flows until the link becomes fully utilized (note that RTT does not
> change when the link is underutilized). The pkt rate of C-Reno
> would increase slower though but it won't change its proportinal share,
> i.e., C-Reno will not suffer more unlike Bob suggests. The overall average
> rate is lower in such a case because the link is underutilized for the
> beginning of the cycle.
>
> AQM queue:
>
> Bob's explanation for AQM case seems pretty much being on the correct
> track.
>
> In the AQM case the desynchronization of the flows is merely guaranteed
> for low level of multiplexing. I agree with Bob that modelling AQM
> case is hard and it also depends on the AQM in use. I also agree Bob's
> analysis that towards the end of a cycle a probabilistic AQM is more
> likely to hit Reno somewhat more often than C-Reno. On the other hand,
> with desynchronized flows the AQM may hit any flow irrespective of its
> phase. Therefore, the AQM is more likely to hit C-Reno towards the
> beginning of its cycle somewhat more often than Reno because C-Reno's
> pkt rate is then higher than that of Reno, and a hit in the beginning of
> the cycle hurts the performance much more than in the end of
> the cycle. A hit in the beginning of the cycle results in a cwnd value
> that is significantly lower than in the beginning of the previous cycle
> while a hit towards the end of cycle results in a cwnd value much closer
> to that in the beginning of the previous cycle (the behavious the
> resembles steady state regular cycle). This might explain, at least in
> part, the lower throughput for C-Reno in AQM case.
>
> I think that an additional possible explanation for the discrepancy in
> the original paper is that possibly something was wrong with the
> simulation implementation, or maybe in the simulation parameters. Not all
> parameters were exposed in the paper. The results also seem not to
> involve any replications, so we cannot expect statistical validity from
> the results, a part of the story may be explained just by random effects.
>
> In addition, the reasons for the discrepancy speculated in the original
> paper seem likely to be correct:
>
>   "One factor could be the assumption of regular, deterministic
>    drops in the deterministic model; another factor could be the
>    role played by retransmit timeouts for TCP in regimes with
>    high packet drop rates."
>
> The first one above means the incorrect assumption that drop
> probabilities are the same. Moreover, the role of RTOs is definitely one
> potential reason with the high number of competing flows as the model
> considers the regular steady-state only.
>
> So, in summary, the AQM case clearly remains unvalidated as well, or the
> validation in the original paper actually shows that the model is not
> correct.
>
> An additional problem with the incorrect model is that it tends to result
> is overaggressive CUBIC behaviour regardless of whether the model yields
> too low or too high throughput (cwnd) for CUBIC sender in the
> Reno-frindly region: if the model yields too low throughput, the CUBIC
> sender leaves the Reno-friendly region too early and becomes (much) more
> aggressive actual Reno sender would; otherwise, if the model yields too
> high throughput, the CUBIC sender is too aggressive throughout the
> Reno-friendly region.
>
> (*) Note:
>
> The tail drop case in not always synchronized for competing Reno CC
> flows either. Flows get more or less randomly desynchronized from time to
> time because the total sum of the cwnds for the competing flows at the end
> of the cycle does not necessarily match the buffering capacity of the path.
> This, however, is not a big problem for the model because competing flows
> alternate such that each flow encounters a longer cycle more or less
> randomly. A flow having a longer cycle is then later pushed towards the
> correct awg cwnd. In the long run the cycle lengths (= drop probability)
> averages to be the same, and very importantly, there is no systematic
> bias.
>
> Thanks,
>
> /Markku
>
> Bob Briscoe wrote:
>
> > Markku, Yoshi,
> >
> > As promised, I've tried to develop the maths to derive the additive
> increase factor for the
> > Reno-Friendly mode of Cubic. I did it for two cases:
> > * synchronized (tail-drop)
> > * desynchronized (AQM)
> >
> > I've written a short LaTeX report and uploaded here:
> >
> https://raw.githubusercontent.com/bbriscoe/cubic-reno/main/creno_tr.pdf
> > If Markku and others review it, I'll update it with any corrections, and
> once ready, I can publish
> > it on ArXiv, so that we have an archival reference.
> > Here's a brief summary, but pls read the report before criticizing
> anything in the summary.
> >
> > Tail Drop case
> >  *  This turned out to be straightforward, and ended up with the same
> formula as in RFC8312 (which
> >     came from [FHP00] which assumed /non/-tail-drop!).
> >  *  I believe my derivation is straightforward and rigorous and it
> doesn't rely on loss probability,
> >     unlike [FHP00], which was Markku's main concern.
> >  *  I've also included a possible explanation for the result with
> tail-drop in [FHP00] where the
> >     throughput of CUBIC in Reno-friendly mode was lower, not the same as
> Reno:
> >      +  I've explained why C-Reno would suffer more than Reno if the
> buffer was not deep enough to
> >         take the combined synchronized decreases without draining out
> completely.
> >      +  but I haven't proved this empirically by trying with a deeper
> buffer (and I don't intend to
> >         - someone else can do that if they want).
> > Tail-drop summary: I believe we now have some solid theory for the
> formula in RFC8312, at least for
> > tail drop scenarios, which are (sadly) still likely to be prevalent.
> >
> >
> > AQM case
> >  *  Too hard for me to produce even an approximate derivation of the AI
> factor.
> >  *  I've included a discussion of why the formula in [FHP00] is
> incorrect for the AQM case, which is
> >     more deeply thought-through than my earlier emails:
> >      +  my critique seems similar to Markku's and it is possible it's
> the same, but there's not
> >         enough explanation in Markku's emails to know for sure.
> >      +  @Markku, perhaps you can see if what I've written describes what
> you meant?
> >  *  However, in the AQM case, the simulations in the Floyd report give
> the opposite results to what
> >     I would expect.
> > AQM summary: A formula for the AI factor in the AQM case is still beyond
> me, and the results in
> > [FHP00] are still the opposite of what I'd expect.
> >
> >
> > Bob
> >
> > On 17/09/2021 10:06, Yoshifumi Nishida wrote:
> >       Hi Bob,
> >
> > Thanks for the comments.
> > I at least think we both can agree that the draft uses [FHP00] as the
> basis of their AIMD or
> > NewReno model.
> > If you have any suggestions to update the texts to address your
> concerns, please share. (but,
> > in this case, using github might be straightforward.)
> >
> > Unfortunately, I'm not sure about the empirical evidence to support the
> formula.
> > But, given that cubic has been widely deployed for long time and we
> don't see strong voices to
> > point out problems for it, I am guessing the issues might not be too
> obvious at least.
> > If you have any specific concerns in the formula, could you share?
> >
> > Thanks,
> > --
> > Yoshi
> >
> >
> > On Mon, Sep 13, 2021 at 4:17 PM Bob Briscoe <ietf@bobbriscoe.net> wrote:
> >       Yoshi,
> >
> >       On 11/09/2021 22:02, Yoshifumi Nishida wrote:
> >       Hi Bob,
> >
> > Thanks for the very detailed explanation.
> > I agree with some of your points. But, I have two comments on this.
> >
> > The first point is about RTT variation.
> > I think you have a point there, but I personally think the paper
> presumes very
> > shallow queue so that the model can be very simple.
> > In this case, we don't have to think about min or max of RTT as they
> will be
> > mostly the same.
> >
> >
> > [BB] The paper gets the same answer as I got under assumption #3 (all
> sawtooth vary
> > around a set point half-way between max and min).
> > I didn't assume shallow queues, so it can't be assuming shallow queues.
> >
> >       I think we can probably argue the accuracy of the model, but I
> think it is
> >       too broad as a scope of this document. It looks a very deep
> research topic
> >       to me.
> > I think the model is basically derived from Equation-based Congestion
> Control
> > paper in sigcomm 2000. If we want to update or renew it, we might need
> another
> > RFC5348.
> >
> >
> > [BB] I didn't use Equation-based CC at all and, at least under one of my
> three
> > assumptions, I got the same result as the [FHP00] paper (with one
> further assumptions
> > about how many drops there are at the top of different sawteeth).
> >
> > I just wanted to check the outcome wouldn't be different if I used a
> model of varying
> > RTT, 'cos I'd already recently done this maths for another reason - to
> derive the
> > recovery time after a loss. This proved that, given [FHP00] got the same
> result as one
> > of my cases just using basic AIMD sawtooth geometry, at least in this
> one case, the
> > varying RTT doesn't make any difference.
> >
> > Nonetheless, I think you're right that deriving the correct formula
> would involve deeper
> > research. I don't think publication of the RFC should block on that
> research.
> >
> >       So, I think it would be better for this doc to use the model as it
> is and
> >       leave its enhancements for future discussions.
> >
> >
> > [BB] That's up to the WG. But I would not refer to that paper, at least
> not alone. And I
> > don't think the draft should claim that the formula for Reno-friendly
> Cubic that it uses
> > from that paper is correct, when it clearly doesn't produce the desired
> result, even in
> > the paper that proposed it.
> >
> >
> > The second point is α in the draft might be too big.
> >
> >
> > [BB] How so? The paper's own simulations (exploring a very limited
> space) conclude that
> > α from the model is more than 2x too small.
> >
> >       During my WGLC review, I read [FHP00] once and I had similar
> thoughts.
> > But, after I re-read the paper several times, I start having a different
> > interpretation.
> >
> > First off, it is very clear that β=0.7 is more aggressive than β =0.5
> when there's
> > no other traffic.
> > As cwnd growth is linear, if there's enough long time for the data
> transfer, the
> > average cwnd for β =0.5 will be (1.0 + 0.5)/2 = 0.75 Wmax and it will be
> (1.0 +
> > 0.7)/2 = 0.85 Wmax for β=0.7.
> >
> >
> > [BB] Correct
> >
> >       So, it is obvious that this model does not aim for the cases where
> there's
> >       no other traffic.
> >
> >
> > [BB] It is certainly obvious that comparing flow rates is not about
> single flows.
> >
> > But, by the same token, does it mean anything to say a flow is 'more
> aggressive when
> > there's no other traffic'?
> > You can have different cwnd for the same throughput, if the avg queue
> delay is higher
> > and therefore RTT is higher.  But I don't think that says anything about
> aggression. If
> > anything, aggression is about shorter recovery time.
> >
> > Yes, the analysis I used was for each flow on its own. When they are not
> on their own,
> > but competing in the same queue, (obviously) there cannot be two
> different average queue
> > delays. But you can think of them as each progressing in parallel, each
> taking losses
> > and each contributing to the queue. I'm not saying my analysis is the
> final word by any
> > stretch of the imagination. However, if on their own they have different
> recovery times,
> > then when together A will be pulling B away from its natural recovery
> time and towards
> > A's recovery time, and vice versa.
> >
> >
> > I believe the choice of (α,  β) in [FHP00] is designed to be more or
> less fair
> > only when it competes with (α=1.0,  β =0.5)
> > Hence, I think we should not apply the loss rate for no-other-traffic
> situation to
> > the formula, which can lead to α3 = 0.20
> > Instead, I am thinking we can think in this way..
> >
> > (α=1.0,  β =0.5) model oscillates between 0.5 W1 and 1.0 W1 in a
> congestion epoch.
> > (α=X,  β =0.7) model oscillates between 0.7 W2 and 1.0 W2 in the same
> congestion
> > epoch.
> >   where W1 is max window for (α=1.0,  β =0.5), W2 is max window for
> (α=X,  β =0.7)
> > When these two models have the same loss ratio, it should satisfy (1.0 +
> 0.5) W1 =
> > (1.0 + 0.7) W2
> >
> > Also, in one congestion epoch, (α=1.0,  β =0.5) increases 0.5 W1 while
> (α=X,  β
> > =0.7) increases 0.3 W2
> > Now, if we define the congestion epoch as e RTTs, e = 0.5 W1 / 1.0  and
> also e =
> > 0.3 W2 / X
> > This means, X should satisfy
> >   0.5 W1 /1.0  = 0.3 * 1.5/1.7 W1/ X
> > then we get X = 0.529.
> >
> >
> > [BB] This is the same equation as in [FHP00], but with numbers not
> variables. it's also
> > the same as the formula in the current Cubic RFC. And it gives the same
> answer you've
> > got when you plug in the numbers. Because you've used the same technique
> to compare the
> > geometry of the sawteeth.
> >
> > But is there empirical evidence to support this formula, given the
> simulations in the
> > paper itself don't support it? That's the question.
> >
> > Also, I'd like to hear Markku's response to your original question to
> him. I was only
> > trying to second-guess what he was trying to say.
> >
> >
> > Bob
> >
> >
> >
> > This will mean if we launch (α=1.0,  β =0.5) and (α=0.529,  β =0.7) at
> the same
> > time and reduce them when the sum of their windows becomes Wmax, the
> throughput of
> > them will mostly be the same.
> > --
> > Yoshi
> >
> > On Tue, Sep 7, 2021 at 5:17 AM Bob Briscoe <ietf@bobbriscoe.net> wrote:
> >       Yoshi,
> >
> >       You recently asked Markku for an explanation of his point #4.
> >
> >       Coincidentally, a few weeks ago, I was checking the formula in
> RFC8312
> >       for Cubic in TCP-Friendly mode (called "C-Reno" in this email for
> >       brevity), and I also checked back on that preliminary paper that
> >       Markku refers to. I think Markku has a different way of explaining
> the
> >       flaw in the equation, but below I'll try to explain what I think
> the
> >       problems are. See [BB] inline...
> >
> >       If you don't read HTML email, I'm afraid the maths is probably
> going
> >       to look like gobbledygook.
> >
> >       On 30/08/2021 17:33, Markku Kojo wrote:
> >       Hi Yoshi, all,
> >
> >       On Wed, 18 Aug 2021, Yoshifumi Nishida wrote:
> >
> >       4. Fairness to AIMD congestion control
> >
> >          The equation on page 12 to derive increase factor α_cubic
> >       that
> >          intends to achieve the same average window as AIMD TCP seems
> >       to
> >          have its origins in a preliminary paper that states that the
> >          authors do not have an explanation to the discrepancy between
> >          their AIMD model and experimental results, which clearly
> >       deviate.
> >          It seems to have gone unnoticed that the equation assumes
> >       equal
> >          drop probability for the different values of the increase
> >       factor
> >          and multiplicative decrease factor but the drop probability
> >          changes when these factors change.
> >
> >
> > [BB] The source of eqn (4) in RFC8312 is eqn (4) in [FHP00]. This paper
> has
> > been cited 250 times but it is not peer reviewed.
> >
> > Equations (2) & (3) of [FHP00] that lead up to (4) use an RTT (R) as if
> it
> > doesn't vary. But it does, because the sawteeth vary the queue. But the
> real
> > problem is that the model (silently) assumes that R is an average RTT
> > sitting part-way up the sawteeth, whatever the link rate or base RTT.
> This
> > overlooks the fact that, if the tips of the sawteeth are always at the
> same
> > queue depth (as in a tail-drop buffer) the average R will be higher if
> the
> > sawteeth have smaller amplitude. This sounds picky, but the assumption on
> > whether sawteeth vary about their bottom, top or middle greatly alters
> the
> > outcome:
> >
> > I've taken three possible alternative assumptions, with their resulting
> > additive increase factors for theoretical equality with Reno (I've given
> all
> > the derivations at the end of this email):
> > #1) All sawteeth have the same Rmin: α ~= (1/β2 - 1) / 3;
> Example: β
> > = 0.7; α ~= 0.35
> > #2) All sawteeth have the same Rmax: α ~= 4(1-β2) / 3;
> Example: β
> > = 0.7; α ~= 0.68
> > #3) All sawteeth have the same Ravg: α ~= 3(1-β) / (1+β);
> Example: β
> > = 0.7; α ~= 0.53    <== used in [FHP00] and therefore in RFC8312
> >
> >
> > Which assumption is most applicable?
> > #2 models a tail-drop queue.
> > #3 (or somewhere between #3 and #2) models a probabilistic AQM, such as
> PIE
> > or RED.
> > #1 is not applicable to the current Internet, but it would be if a magic
> AQM
> > was invented that minimized the standing queue.
> > I don't think any of the assumptions are good models of C-Reno's
> interaction
> > with CoDel.
> >
> > So which one should draft-ietf-tcpm-rfc8312bis recommend?
> > You might think #2, given one suspects the vast majority of queues are
> still
> > tail drop...
> > However, it may be 'none of the above', 'cos the paper also says it
> assumes
> > deterministic marking; i.e. that that each flow experiences just one
> drop or
> > mark at the top of each sawtooth. If that applies at all, it only
> applies to
> > AQMs, not to tail-drop buffers that tend to drop more than one packet at
> a
> > time. Nonetheless, for the purpose of rate comparison, all the flows will
> > share the same bottleneck. So if flow rates are equal under a certain
> AQM,
> > they might be equal under tail-drop too (see {Note 1} at the very end).
> >
> > I would still like to say 'none of the above' because Linux C-Reno has
> been
> > widely used with α=1, β=0.7 for many years now without the sky falling.
> So
> > it's questionable whether friendliness should be defined relative to
> Reno.
> > Otherwise, from day 1, RFC8312bis will be stating the forlorn hope that
> > Linux C-Reno should regress to become less aggressive than its former
> self,
> > just to be friendly to Reno, which apparently barely exists any more
> > (according to the Great TCP Congestion Control Census of 2019 [MSJ+19]).
> >
> > Nonetheless, we need to bear in mind that BSD variants of Cubic (e.g.
> > Apple's) are also widely used and I believe they comply with the α for
> > C-Reno defined in RFC8312.
> >
> > (FAQ: Why does C-Reno add <1 segment per RTT to be friendly to Reno,
> which
> > adds 1 segment per RTT?
> > C-Reno in the RFC multiplicatively decreases cwnd to 0.7*cwnd on a loss
> > (multiplicative decrease factor β=0.7), whereas Reno uses β=0.5. So,
> because
> > C-Reno reduces less, it's also meant to increase less per round so that
> the
> > sawteeth don't reach the max window again more quickly than Reno would.
> That
> > is, it's meant to use a smaller AI factor (α) than 1 segment. Using α<1
> > doesn't necessarily slow down Cubic's acceleration into unused capacity,
> > because it can switch to a convex (hockey stick) Cubic curve once its
> slow
> > linear increase has made the queue large enough to come out of its
> > 'TCP-Friendly' mode.)
> >
> > Whatever, rfc8312bis certainly shouldn't refer solely to [FHP00] without
> > caveats.
> >
> > Then we have the problem that none of the above formulae are verified
> > against reality.
> > Even in the [FHP00] paper, which uses assumption #3, the simulations are
> out
> > by about 2x. The paper used β=7/8. So, from the theory, for each of the
> > above three assumptions, the additive increase factor would have had to
> be:
> > α1 = 0.10
> > α2 = 0.31
> > α3 = 0.20
> > They used a RED AQM (which should be somewhere between assumption #2 &
> #3).
> > But they had to configure C-Reno with α=0.4 to get close to equal flow
> rates
> > with Reno
> > Using my 'finger in the air' modification to the formulae in {Note 1},
> α3 =
> > sqrt(0.20) = 0.44, which would have been about right (perhaps only
> > coincidentally).
> >
> > The question over what α to put in the RFC could run and run. So here's a
> > suggested path through this quagmire:
> > These alternatives throw doubt on the wisdom of using assumption#3. So we
> > could at least pick a new assumption for a theoretical α. Assumption#2 is
> > probably the 'most correct', and also happens to give the highest value
> of α
> > for the 'TCP-Friendly' region (α=0.68 when β=0.7).
> > (By my unsubstantiated formula in {Note 1}, α2 = sqrt(0.68) = 0.82.)
> >
> > If we look for an empirical value of α that gives reasonably equal flow
> > rates against Reno, I suspect we will get all sorts of different answers
> > depending on the conditions, e.g. AQM or not; high or low multiplexing;
> high
> > or low RTT (shared by all flows); pacing, burstiness, etc,etc. If you
> want
> > to take this path, feel free, but if you're still on it in a year's time,
> > don't say I didn't warn you.
> >
> >
> > Derivations of the above three formulae follow at the end.
> >
> >       The equations for the drop
> >          probability / the # of packets in one congestion epoch
> >          are available in the original paper and one can easily verify
> >          this. Therefore, the equations used in CUBIC are not correct
> >          and seem to underestimate _W_est_ for AIMD TCP, resulting in
> >          moving away from AIMD-Friendly region too early. This gives
> >          CUBIC unjustified advantage over AIMD TCP particularly in
> >          environments with low level of statistical multiplexing. With
> >          high level of multiplexing, drop probability goes higher and
> >          differences in the drop probablilities tend to get small. On
> >       the
> >          other hand, with such high level of competition, the
> >       theoretical
> >          equations may not be that valid anymore.
> >
> >       /Markku
> >
> >
> > [BB] Derivations of the formulae for α under the three different
> assumptions
> > follow. It's possible/likely that this is all already in a paper
> somewhere,
> > but I haven't been able to find one.
> >
> >   0) Preparatory material
> >
> > Terminology:
> > The subscript for C-Reno is omitted given it would be on every term.
> > The recovery time of an AIMD congestion control is the average time for
> > additive increase to recover the window reduction after a multiplicative
> > decrease.
> >
> > The overall approach is:
> > A) Find a formula in terms of the AI and MD factors (α & β), for the
> > recovery time of C-Reno.
> > B) Equate this to the recovery time of Reno, which will produce a formula
> > for the additive increase factor, α.
> >
> > The recovery time formula for C-Reno is derived by:
> > 1) summing up all the round trips in additive increase, which become
> > increasingly large as the queue grows. This sum is stated in terms of the
> > number of rounds per cycle, J.
> > 2) Then J is found by writing two formulae;
> >   2a) one for the average sawtooth decrease
> >   2b) and the other for the increase.
> > Then by assuming a steady state the two can be equated.
> > Here goes...
> >
> > The following formula gives the recovery time, Tr, from one max window to
> > the next, in C-Reno with:
> > * additive increase (AI) factor α [segment / RTT]
> > * and packet-rate r [packet / s]
> > assuming that the minimum window of each sawtooth still fully utilizes
> the
> > link.
> >     Tr = ΣJ-1j=0    (Rmin + jα/r)
> > where j is the index of each round, and J is the number of round trips of
> > additive increase per sawtooth. The rationale for the second term is that
> > the queue grows by α fractional packets in each round, and the queue
> delay
> > per packet is 1/r (seconds per packet is reciprocal of packets per
> second).
> > The sum of all the second terms forms an arithmetic progression, the sum
> of
> > which is given by the well-known formula, as follows:
> >          = J.Rmin + J(J-1)α/2r
> >        ~= J.Rmin + J2α/2r.                                (1)
> > The approximation holds as long as J>>1. In some implementations (e.g.
> > Linux), cwnd increases continuously, so strictly the limits should be
> j=1/2
> > to J-1/2 (the average half way through the first and last rounds), but
> > details like this are insignificant compared to the approximation.
> >
> > The difference in RTT due to the additional queue delay between the top
> and
> > bottom of the sawtooth can be stated in two ways.
> > Either as the sum of all the queue delays of each additive increase:
> >     (Rmax - Rmin) = Jα/r,                                (2a)
> > Or by the multiplicative relationship between the max and min RTTs, by
> the
> > definition of β: Rmin = β.Rmax.
> >     (Rmax - Rmin) = (Rmin / β) - Rmin
> >                                = Rmin(1-β)/β                 (2b)
> > Equating (2a) and (2b):
> >     J = r.Rmin(1-β) / (αβ)                                  (2)
> >
> > Substituting for J from (2) in (1):
> >     Tr = r.Rmin2 (1-β) / (βα) + r.Rmin2 (1-β)2/(2β2α)
> >          = r.Rmin2 (1-β) / (βα) * (1 + (1-β)/2β )
> >          = r.Rmin2 (1-β)(1+β) / (2β2α)
> >          = r.Rmin2 (1-β2) / (2β2α)
> (3)
> > And, because Rmin = β.Rmax.
> >     Tr = r.Rmax2 (1-β2) / (2α)
> (4)
> >
> > The well-known steady-state Reno formula from [FF99] is:
> >     rReno ~= (1/Ravg) sqrt(3 / 2p)
> >             (5)
> > assuming deterministic marking (which is also assumed for the above
> model of
> > C-Reno).
> >
> > Assuming a Reno sawtooth is approximately linear, the average RTT of a
> Reno
> > flow,
> >     Ravg ~= (Rmax + Rmin)/2
> > and
> >     Rmin = Rmax/2
> > Therefore
> >     rReno ~= (1/Rmin) sqrt(2 / 3p)
> >             (6)
> >     rReno ~= (1/Rmax) sqrt(8 / 3p)
> >             (7)
> >
> > With the preparatory material done, next we take each of the three
> > assumptions, one at a time.
> >
> >   #1) All sawteeth have the same Rmin
> >
> > From Eqn (3):
> >     Tr = r.Rmin2 (1-β2) / (2β2α)
> > Total packets sent over period Tr is r.Tr during which time assume 1
> packet
> > is lost (or marked) at the sawtooth peak {Note 1}. So loss probability:
> >     p = 1 / r.Tr
> > Substituting for Tr and rearranging :
> >     r2 = 2β2α / Rmin2 (1-β2)p
> >     r = (β/Rmin) sqrt(2α / (1-β2)p)
> > Equating the Reno packet rate, rReno, from eqn (6) to the C-Reno packet
> > rate, r, for all α and β:
> >     r = rReno
> >        = (1/Rmin) sqrt(2 / 3p).
> > Therefore,
> >     (1/Rmin) sqrt(2 / 3p) ~= (β/Rmin) sqrt(2α / (1-β2)p)
> > Rearranging
> >     2/3 ~= 2αβ2 / (1-β2)
> >     α ~= (1-β2)/3β2
> >        ~= (1/β2 - 1) / 3
>
> >                             (8)
> >
> >   #2) All sawteeth have the same Rmax = Rmin / β
> >
> > From Eqn (4):
> >     Tr = r.Rmax2 (1-β2) / (2α)
> > As before, after substituting for Tr and rearranging:
> >     p = 1 / r.Tr
> >     r = (1/Rmax) sqrt(2α / (1-β2)p)
> > Equating to rReno from eqn (7) for all α and β,
> >       ~= (1/Rmax) sqrt(8 / 3p).
> > Rearranging
> >     8/3 ~= 2α / (1-β2)
> >     α ~= 4(1-β2) / 3
>
> >                         (9)
> >
> >   #3) All sawteeth have the same Ravg
> >
> > Ravg = (Rmax + βRmax )/2
> >          = Rmax (1+β)/2
> > From Eqn (4):
> >     Tr = r.Rmax2 (1-β2) / (2α)
> >          = 2r.Ravg2 (1-β) / α(1+β)
> > As before, after substituting for Tr and rearranging:
> >     p = 1 / r.Tr
> >     r = (1/Ravg) sqrt(α (1+β) / 2(1-β)p)
> > Equating to rReno from eqn (5) for all α and β,
> >       ~= (1/Ravg) sqrt(3 / 2p).
> > Rearranging
> >     3/2 ~= α(1+β) / 2(1-β)
> >     α ~= 3(1-β) / (1+β)
>
> >                     (10)
> >
> > This last formula is same as the equation used for C-Reno in RFC8312 and
> in
> > the RFC8312bis draft.
> >
> >
> > {Note 1}: The assumption in [FHP00] of deterministic marking is suspect
> for
> > the Internet (meaning that the spacing between drops or marks is even and
> > there is always 1 packet dropped or marked at the top of each sawtooth).
> The
> > actual average dropped (or marked) per sawtooth will depend on whether
> the
> > buffer uses tail-drop or an AQM, and if so which AQM. It is perhaps
> better
> > not to assume the actual average number dropped (or marked) is known, but
> > instead assume the ratio of the average drops or marks between C-Reno and
> > Reno will be roughly proportional to their AI factors. This would modify
> the
> > formula for C-Reno's drop probability to:
> >     p = αReno / αC-Reno.r.Tr
> > Substituting αReno = 1:
> >     p = 1 / αC-Reno.r.Tr
> > This would modify all the results to:
> >
> > #1) All sawteeth have the same Rmin: α ~= sqrt( (1/β2 - 1) / 3 );
> > Example: β = 0.7; α ~= 0.59
> > #2) All sawteeth have the same Rmax: α ~= sqrt( 4(1-β2) / 3 );
> > Example: β = 0.7; α ~= 0.82
> > #3) All sawteeth have the same Ravg: α ~= sqrt( 3(1-β) / (1+β) );
> > Example: β = 0.7; α ~= 0.73
> >
> > However, these would then be wrong where there really is deterministic
> > marking.
> >
> > References
> > =========
> > [FHP00]    Floyd, S., Handley, M., and J. Padhye, "A Comparison of
> > Equation-Based and AIMD Congestion Control", May 2000,
> > <https://www.icir.org/tfrc/aimd.pdf>
> >
> > [MSJ+19] Ayush Mishra, Xiangpeng Sun, Atishya Jain, Sameer Pande, Raj
> Joshi,
> > and Ben Leong. The Great Internet TCP Congestion Control Census. Proc.
> ACM
> > on Measurement and Analysis of Computing Systems, 3(3), December 2019
> >
> > [FF99] Sally Floyd and Kevin Fall, "Promoting the Use of End-to-End
> > Congestion Control in the Internet", IEEE/ACM ToN (1999)
> >
> >
> > Bob
> >
> > --
> > ________________________________________________________________
> > Bob Briscoe                               http://bobbriscoe.net/
> >
> >
> > --
> > ________________________________________________________________
> > Bob Briscoe                               http://bobbriscoe.net/
> >
> >
> > --
> > ________________________________________________________________
> > Bob Briscoe                               http://bobbriscoe.net/
> >
> >