Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up

"Gorry (erg)" <gorry@erg.abdn.ac.uk> Mon, 06 June 2022 13:35 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B8536C14CF14; Mon, 6 Jun 2022 06:35:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.908
X-Spam-Level:
X-Spam-Status: No, score=-1.908 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UfMtcS56Czxs; Mon, 6 Jun 2022 06:35:53 -0700 (PDT)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [137.50.19.135]) by ietfa.amsl.com (Postfix) with ESMTP id B9C66C14F73A; Mon, 6 Jun 2022 06:35:49 -0700 (PDT)
Received: from smtpclient.apple (oa-edu-157-222.wireless.abdn.ac.uk [137.50.157.222]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPSA id 9BA0B1B0025E; Mon, 6 Jun 2022 14:35:12 +0100 (BST)
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: "Gorry (erg)" <gorry@erg.abdn.ac.uk>
Mime-Version: 1.0 (1.0)
Date: Mon, 06 Jun 2022 14:35:10 +0100
Message-Id: <B8BED289-D170-448C-B27B-6E86640F8B84@erg.abdn.ac.uk>
References: <alpine.DEB.2.21.2206061135361.7292@hp8x-60.cs.helsinki.fi>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Lars Eggert <lars@eggert.org>, tcpm-chairs <tcpm-chairs@ietf.org>
In-Reply-To: <alpine.DEB.2.21.2206061135361.7292@hp8x-60.cs.helsinki.fi>
To: Markku Kojo <kojo@cs.helsinki.fi>
X-Mailer: iPhone Mail (19E258)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/bxSLnLME-yIqDJg1RwsvgWj2mwc>
Subject: Re: [tcpm] Proceeding CUBIC draft - thoughts and late follow-up
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jun 2022 13:35:57 -0000

I will look for the reply on list.

I thought the sense of the working group was to consider the current deployment experience as sufficient, I was not trying to question that - but I was looking to explicitly state this was not a new precedent / the process has not changed, just this is an exception.
 I will follow on the list

Gorry

> On 6 Jun 2022, at 11:29, Markku Kojo <kojo@cs.helsinki.fi> wrote:
> 
> Hi Gorry, all,
> 
> Catching up...
> 
> Thanks for the proposed text. I had a bit hard time to understand why this text is proposed. Then I listened to the mtng audio recording and heard what Lars thought was the issue. Lars has fully misunderstood that my issue would be that the process was not followed. It is not.
> 
> The problem is that the CUBIC algorithm has a number of unresolved issues (as a result of not following the process). The issues were raised for the WGLC and for the 2nd WGLC I listed the unresolved issues with some new points revealed during the discussions and after a bit closer consideration.
> 
> These issues are significant and some number of people have also said they should not be left unaddressed. Almost all of them are related to the behaviour of CUBIC in the TCP-friendly region where it is intended and required to fairly compete with the current stds track congestion control mechanisms. The evaluation whether CUBIC competes fairly *cannot* be achieved without measuring the impact of CUBIC to the other traffic competing with it over a shared bottleneck link. This does not happen by deploying but requires specifically planned measurements.
> 
> This said, I agree with Gorry that the text he proposed should be
> incorporated before proceeding the doc. However, I cannot see how
> widespread deployment experience can be considered to be sufficient
> when it does not involve any measurements on the impact of CUBIC to the
> other traffic? That is, IMO this piece of the text is not quite rational.
> Quite contrary, the papers cited have evidence not supporting the intended behaviour gets realized but show the opposite, likewise some other papers not cited do.
> 
> One major problem is that the issues have not been discussed on the list. IMO it is must before proceeding the doc. I regret that the process got even mentioned because that took the discussion into the side tracks away from the actual issues.
> 
> For the 2nd WGLC I linked the issues with the draft to the github issue
> numbers which was probably not helpful at all to initiate discussions on
> the list unless one had closely followed the issue discussions in the
> github. I try to briefly summarize the issues below and suggest starting
> discussion on each of them in a separate thread, which I can initiate.
> 
> IMO the wg should understand and decide separately how to proceed with
> each of the issues. At minimum each issue must be clearly documented in
> the draft and the wg should come up with a justification for each issue
> why the doc can be published despite of the issue (unless the issue is
> resolved).
> 
> In addition, the text in the draft about updating RFC 5681 is very vague. First, it does not make it clear what updating even means: whether it requires anyone implementing the basic CC algos in RFC 5681 to follow the algos in this draft where they differ, or only allows CUBIC algo to diverge from the current std CC algos. Second, all differences and justification for them must be separately documented.
> 
> 
> Summary of the unresolved issues:
> 
> 1 a) The model/formula used for the Reno-friendly region hos not
>     been validated and has actually been shown to be incorrect.
>     There is one validation attempt in the original paper
>     manuscript and that failed (and the manuscript never got
>     published).
> 
>  b) The validation of the heuristic in CUBIC that is used to
>     decide when to shift from the Reno-friendly region to
>     the "genuine" CUBIC mode is insufficient.
>     That is, the constant C has not been validated properly
>     in vide range of environments (there appears to be nothing
>     after publishing the the original research paper).
> 
> 
> 2) CUBIC is specified to use incorrect multiplicative-decrease factor
>   for a congestion event that occurs when operating in slow start.
>   This is badly in conflict with the original theory and design logic
>   by Van Jacobson and may easily result in some degree of congestion
>   collapse due to injecting excess "undelivered packets".
> 
> 
> 3 a) The rule for changing alpha to 1 when Wmax is reached in the
>     Reno-friendly region is the correct thing to do during the normal
>     steady state. However, it is incorrect action to take when in the
>     fast convergence mode within the Reno-friendly region because it
>     would act just *opposite* to what CUBIC should do when in the fast
>     convergence mode; instead of slowing down the increase rate during
>     congestion avoidance it actually accelerates because alpha becomes
>     increased to 1 earlier than when not in the fast convergence mode.
>     This seems an obvious mistake with the quite recent modifications
>     to the rfc8312bis.
> 
>  b) Wmax needs to be set differently for a congestion event arriving
>     when in slow start and when in congestion avoidance (the co-authors
>     who are the original developpers of CUBIC have agreed on this).
> 
> 
> 4) CUBIC decreases its sending rate (much) slower than Reno CC when
>   sudden congestion is encountered (or network capacity is reduced).
>   The draft states this explicitly (slow convergence) but does not
>   identify it being unfair as shown in the paper [PFLDNeT'07] that
>   Bob pointed out. Instead, the draft just mentions it as if it
>   would be Ok.
> 
> [PFLDNeT'07] Leith, D. J.; Shorten, R. N. & McCullagh, G. "Experimental
> evaluation of Cubic-TCP" Proc. Int'l Wkshp on Protocols for Future,
> Large-scale & Diverse Network Transports (PFLDNeT'07), 2007
> 
> 
> 5) If someone implements detecting false fast rexmits and applying undo
>   of cwnd as currently described in the draft it results exactly in
>   the incorrect behaviour that Neil pointed out Linux had had for a
>   decade or so before it was patched (see github issue #90). That is,
>   the draft must not provide incorrect advise but explain if there
>   are known problems with current RFCs it cites. In addition,
>   implementing undo of cwnd enables the security attacks that the
>   cited RFCs discuss. Note that applying just the cited detection
>   algorithms does not enable the security threats but they become
>   enabled only if undo of cwnd is applied.
> 
> 
> 6) Flightsize: The current text is fine except that it does not quite
>   correcly reflect what stacks that use cwnd instead of flightsize
>   actually do. AFAIK and what was discussed in github all stacks
>   apply some sort of restrictions to not allow cwnd to grow beyond
>   rwnd and do not to use an arbitrarily high (old) cwnd value to
>   calculate new cwnd when a congestion event occurs.
> 
> 
> Best regards,
> 
> /Markku
> 
>> On Tue, 24 May 2022, Gorry Fairhurst wrote:
>> 
>> I'll start by saying again that I think it is important to see this published as a PS (as others have noted), but I still think it needs additional text to say the process differs from the recommended IETF process and evaluation. I don't see how this will proceed without that text in some form as discussed at the IETF-113 meeting.  Alas, I also do not think highlighting this only in the Shepherd write-up, just postpones this as a discussion item to the IETF-LC, which can't be a useful thing to do.
>> 
>> I promised suggested text - but was unable to work on this after the meeting - sorry - so here goes, this is what I suggest:
>> 
>> 
>> "RFC 5033 provides the current BCP guidelines for the community, describing what type of evaluation is expected by the IETF to understand the suitabiliuty of an alternate congestion control, and the process to enable a specification to be approved for widespread deployment in the global Internet. The present document does not update that IETF BCP.
>> 
>> However, in the case of Cubic, there has been widespread deployment experience over a considerable period (4 years since publication of RFC 8312). This experience was thought to be sufficient to allow this publication as an IETF standards-track specification.
>> 
>> There are areas in which the specified method differs from the previously method specified in published RFCs, some of which have been highlighted in this document. As a part of maintaining the congestion control document, future IETF work is expected to evaluate these differences and will if necessary update the relevent specifications."
>> 
>> 
>> This is not aimed at changing anything in the cubic algorithm, but it is aimed at explaining why this spec did not conform to the process, and seeking to avoid setting a precedent for future methods - which I really think do continue to need to be evaluated in public both by researchers and developers at the IETF.
>> 
>> Whether in future the IETF updates the BCP represented by RFC 5033 is another topic;-).
>> 
>> Best wishes,
>> 
>> Gorry
>> 
>> 
>> 
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm