Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up

Markku Kojo <kojo@cs.helsinki.fi> Mon, 06 June 2022 10:29 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 18825C157B57; Mon, 6 Jun 2022 03:29:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.01
X-Spam-Level:
X-Spam-Status: No, score=-2.01 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vb55Nog-qM8w; Mon, 6 Jun 2022 03:29:49 -0700 (PDT)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B34CAC14F73B; Mon, 6 Jun 2022 03:29:47 -0700 (PDT)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Mon, 06 Jun 2022 13:29:37 +0300
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=Kg7gVS spvs1zlN2q1cstp0G/rFbW+bv5CTFe81luzCw=; b=Cyi9RdYvOn923yQNsNbWFF vPrWL+iPKdhkmAlcGKxZuDSlaDmM7C58wYprEjKOeJpiD2hrrQ26DcMBcBwTNLdv YEugZsVDgw/eylXrHRHYAInxwJN32JSmlNfyRL75otoQ0mQtpNvyviY7TxNI/mwT /hrm66ot9VzSDIF8kgAv8=
Received: from hp8x-60 (85-76-49-177-nat.elisa-mobile.fi [85.76.49.177]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Mon, 06 Jun 2022 13:29:37 +0300 id 00000000005A1C6F.00000000629DD711.00005267
Date: Mon, 06 Jun 2022 13:29:37 +0300
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Lars Eggert <lars@eggert.org>, tcpm-chairs <tcpm-chairs@ietf.org>
In-Reply-To: <4a2b0971-1159-fe25-c31c-fcfe42c285f6@erg.abdn.ac.uk>
Message-ID: <alpine.DEB.2.21.2206061135361.7292@hp8x-60.cs.helsinki.fi>
References: <CAAK044R12B3f+=2mR1ZK15Zkno5n0YvsjGy64LBiBgBN+9n71A@mail.gmail.com> <CAK6E8=fZs--fR+5Rie1NgtrA4cviatVW=Aw+qkeuqstk9DB0Hw@mail.gmail.com> <CAAK044S3RnvbTzOSHR+B26XCFEiT=YbiNGqQUH4zV4T8c9ZfgA@mail.gmail.com> <864B7333-A8EA-4C9F-A4A7-5DAB49AA4245@eggert.org> <CAAK044TGaTBYwDYU=_JC_MEH4u3Ln4T60BFzXJe681cX6eZjpg@mail.gmail.com> <CAAK044TjuQyQzyHmJCyfuOTUJ5VnyPSVn+EDzdKxLPZ6uahShg@mail.gmail.com> <CAK6E8=feYY-rznoYOokRpphSRDb07MTELKPS02w9N1kwar5k4Q@mail.gmail.com> <CAAK044SHORdqn8+CpwhmJaQoHB+1EqHKjD44CD++N+8JrZ3BLw@mail.gmail.com> <4a2b0971-1159-fe25-c31c-fcfe42c285f6@erg.abdn.ac.uk>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-21117-1654511377-0001-2"
Content-ID: <alpine.DEB.2.21.2206061326480.7292@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/H7GwAXdJlvXiNOKs4fr2XPmqM2k>
Subject: Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jun 2022 10:29:57 -0000

Hi Gorry, all,

Catching up...

Thanks for the proposed text. I had a bit hard time to understand why 
this text is proposed. Then I listened to the mtng audio recording and 
heard what Lars thought was the issue. Lars has fully misunderstood that 
my issue would be that the process was not followed. It is not.

The problem is that the CUBIC algorithm has a number of unresolved 
issues (as a result of not following the process). The issues were 
raised for the WGLC and for the 2nd WGLC I listed the unresolved issues 
with some new points revealed during the discussions and after a bit 
closer consideration.

These issues are significant and some number of people have also said 
they should not be left unaddressed. Almost all of them are related to 
the behaviour of CUBIC in the TCP-friendly region where it is intended 
and required to fairly compete with the current stds track congestion 
control mechanisms. The evaluation whether CUBIC competes fairly 
*cannot* be achieved without measuring the impact of CUBIC to the other 
traffic competing with it over a shared bottleneck link. This does not 
happen by deploying but requires specifically planned measurements.

This said, I agree with Gorry that the text he proposed should be
incorporated before proceeding the doc. However, I cannot see how
widespread deployment experience can be considered to be sufficient
when it does not involve any measurements on the impact of CUBIC to the
other traffic? That is, IMO this piece of the text is not quite rational.
Quite contrary, the papers cited have evidence not supporting the intended 
behaviour gets realized but show the opposite, likewise some other papers 
not cited do.

One major problem is that the issues have not been discussed on the list. 
IMO it is must before proceeding the doc. I regret that the process got 
even mentioned because that took the discussion into the side tracks away 
from the actual issues.

For the 2nd WGLC I linked the issues with the draft to the github issue
numbers which was probably not helpful at all to initiate discussions on
the list unless one had closely followed the issue discussions in the
github. I try to briefly summarize the issues below and suggest starting
discussion on each of them in a separate thread, which I can initiate.

IMO the wg should understand and decide separately how to proceed with
each of the issues. At minimum each issue must be clearly documented in
the draft and the wg should come up with a justification for each issue
why the doc can be published despite of the issue (unless the issue is
resolved).

In addition, the text in the draft about updating RFC 5681 is very vague. 
First, it does not make it clear what updating even means: whether it 
requires anyone implementing the basic CC algos in RFC 5681 to follow 
the algos in this draft where they differ, or only allows CUBIC algo to 
diverge from the current std CC algos. Second, all differences and 
justification for them must be separately documented.


Summary of the unresolved issues:

1 a) The model/formula used for the Reno-friendly region hos not
      been validated and has actually been shown to be incorrect.
      There is one validation attempt in the original paper
      manuscript and that failed (and the manuscript never got
      published).

   b) The validation of the heuristic in CUBIC that is used to
      decide when to shift from the Reno-friendly region to
      the "genuine" CUBIC mode is insufficient.
      That is, the constant C has not been validated properly
      in vide range of environments (there appears to be nothing
      after publishing the the original research paper).


2) CUBIC is specified to use incorrect multiplicative-decrease factor
    for a congestion event that occurs when operating in slow start.
    This is badly in conflict with the original theory and design logic
    by Van Jacobson and may easily result in some degree of congestion
    collapse due to injecting excess "undelivered packets".


3 a) The rule for changing alpha to 1 when Wmax is reached in the
      Reno-friendly region is the correct thing to do during the normal
      steady state. However, it is incorrect action to take when in the
      fast convergence mode within the Reno-friendly region because it
      would act just *opposite* to what CUBIC should do when in the fast
      convergence mode; instead of slowing down the increase rate during
      congestion avoidance it actually accelerates because alpha becomes
      increased to 1 earlier than when not in the fast convergence mode.
      This seems an obvious mistake with the quite recent modifications
      to the rfc8312bis.

   b) Wmax needs to be set differently for a congestion event arriving
      when in slow start and when in congestion avoidance (the co-authors
      who are the original developpers of CUBIC have agreed on this).


4) CUBIC decreases its sending rate (much) slower than Reno CC when
    sudden congestion is encountered (or network capacity is reduced).
    The draft states this explicitly (slow convergence) but does not
    identify it being unfair as shown in the paper [PFLDNeT'07] that
    Bob pointed out. Instead, the draft just mentions it as if it
    would be Ok.

[PFLDNeT'07] Leith, D. J.; Shorten, R. N. & McCullagh, G. "Experimental
evaluation of Cubic-TCP" Proc. Int'l Wkshp on Protocols for Future,
Large-scale & Diverse Network Transports (PFLDNeT'07), 2007


5) If someone implements detecting false fast rexmits and applying undo
    of cwnd as currently described in the draft it results exactly in
    the incorrect behaviour that Neil pointed out Linux had had for a
    decade or so before it was patched (see github issue #90). That is,
    the draft must not provide incorrect advise but explain if there
    are known problems with current RFCs it cites. In addition,
    implementing undo of cwnd enables the security attacks that the
    cited RFCs discuss. Note that applying just the cited detection
    algorithms does not enable the security threats but they become
    enabled only if undo of cwnd is applied.


6) Flightsize: The current text is fine except that it does not quite
    correcly reflect what stacks that use cwnd instead of flightsize
    actually do. AFAIK and what was discussed in github all stacks
    apply some sort of restrictions to not allow cwnd to grow beyond
    rwnd and do not to use an arbitrarily high (old) cwnd value to
    calculate new cwnd when a congestion event occurs.


Best regards,

/Markku

On Tue, 24 May 2022, Gorry Fairhurst wrote:

> I'll start by saying again that I think it is important to see this 
> published as a PS (as others have noted), but I still think it needs 
> additional text to say the process differs from the recommended IETF 
> process and evaluation. I don't see how this will proceed without that 
> text in some form as discussed at the IETF-113 meeting.  Alas, I also do 
> not think highlighting this only in the Shepherd write-up, just 
> postpones this as a discussion item to the IETF-LC, which can't be a 
> useful thing to do.
>
> I promised suggested text - but was unable to work on this after the 
> meeting - sorry - so here goes, this is what I suggest:
>
>
> "RFC 5033 provides the current BCP guidelines for the community, 
> describing what type of evaluation is expected by the IETF to understand 
> the suitabiliuty of an alternate congestion control, and the process to 
> enable a specification to be approved for widespread deployment in the 
> global Internet. The present document does not update that IETF BCP.
>
> However, in the case of Cubic, there has been widespread deployment 
> experience over a considerable period (4 years since publication of RFC 
> 8312). This experience was thought to be sufficient to allow this 
> publication as an IETF standards-track specification.
>
> There are areas in which the specified method differs from the 
> previously method specified in published RFCs, some of which have been 
> highlighted in this document. As a part of maintaining the congestion 
> control document, future IETF work is expected to evaluate these 
> differences and will if necessary update the relevent specifications."
>
>
> This is not aimed at changing anything in the cubic algorithm, but it is 
> aimed at explaining why this spec did not conform to the process, and 
> seeking to avoid setting a precedent for future methods - which I really 
> think do continue to need to be evaluated in public both by researchers 
> and developers at the IETF.
>
> Whether in future the IETF updates the BCP represented by RFC 5033 is 
> another topic;-).
>
> Best wishes,
>
> Gorry
>
>
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>