Re: [tcpm] finalizing CUBIC draft (chairs' view)

Yoshifumi Nishida <nsd.ietf@gmail.com> Thu, 13 October 2022 08:05 UTC

Return-Path: <nsd.ietf@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 78C1EC1524DF; Thu, 13 Oct 2022 01:05:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.106
X-Spam-Level:
X-Spam-Status: No, score=-7.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p5fFXYvOmgey; Thu, 13 Oct 2022 01:05:01 -0700 (PDT)
Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF16DC152566; Thu, 13 Oct 2022 01:04:03 -0700 (PDT)
Received: by mail-wm1-x32f.google.com with SMTP id fn7-20020a05600c688700b003b4fb113b86so881947wmb.0; Thu, 13 Oct 2022 01:04:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sYTQff8qf8QNCuZf8jd56sg/zioROhWYs/e/+ghGBrc=; b=E2+FMpVtlYg412efEJVgwZVvVCTAJsDtWi46Es8AxLROzcKwntxd506Kzbu7O2NDJH yY66aI0k2p/elxyFWhd+4IjkKN9KBIXrG894CNsh1QKJv5TqU87yKaRK3WAVLs3e9MKc 5uciZZU56dA75mDCNXdBj1MBr/Y/Uy7VwNQfMUP/S4eEyP61SxBmUuhHYh4uNom/4DqT YeapqJ4iaOhQBQBu2/yRD+0O2ONPZAp6xn9gPd+dsrf0xhKNJhHzeaG5LbTjYBbMCIr7 gxkWYm26YTJSw2Zlg1uv72ITfw+BRwYGVUhzeQLOLzHjF1xfuPLjgPXLcRFkZ+R09CHh J7QA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sYTQff8qf8QNCuZf8jd56sg/zioROhWYs/e/+ghGBrc=; b=gayVeB4DTxrPFvWtdJEIE5ejW0ST2Kg1PeLUSeogh4lq01rt+lxAcwV6Z0B2zDB4Zm XJ67jfD99ukLy4jjKojrwRDcPmBjZdyILoOzPsYfwDXuuWC8/WK12+tK4czG8FPjjSBq dANMbOAtCJ63W/L2xUeNXuTEeOecHMj0g4rOBxjSLZuQx8ImNde57Bon9QC6Snl7ZfrF hzH8Sy1YHOgQoL9IfWitCXJMuNn/d4QAAzteADXrwb2AWhOfKUfoy9Dlq+4j+msAbgmV 6PLYwHTbNANuUNGfsUO2VbH3P2j4roXTvqlyyRNaFkJf7ujaqCJ1MKA0dHhpRbyG7KGm 5eiQ==
X-Gm-Message-State: ACrzQf2D8zRzSc13PvR9WotoXS7jKl5nzCI2KEE6r6l34byCGCOP7nDK MMrkqFFEmusuNwDqPNOMBmiJWROuJxyWd3MlPuM=
X-Google-Smtp-Source: AMsMyM4SQWdXwLR4f4zFbQde6cPajU6r7d1PHOjN4byPWBS6wX5vuRZjPRwpb1eSBJYWmf3TwadVj5tnSxCu7F0uWPw=
X-Received: by 2002:a05:600c:5104:b0:3c6:d8e0:bc2e with SMTP id o4-20020a05600c510400b003c6d8e0bc2emr3346800wms.156.1665648241900; Thu, 13 Oct 2022 01:04:01 -0700 (PDT)
MIME-Version: 1.0
References: <CAAK044QnUTW3Zr5sBZ3wv5e0A=q2OGdooHSZHAKRHmo5qMrSkg@mail.gmail.com> <alpine.DEB.2.21.2209120201550.5586@hp8x-60.cs.helsinki.fi> <CAAK044ResprcmpgEmtvR+0wVKri2OfnDcJQH4SD+pwaevdWRNw@mail.gmail.com> <alpine.DEB.2.21.2210100159000.4174@hp8x-60.cs.helsinki.fi>
In-Reply-To: <alpine.DEB.2.21.2210100159000.4174@hp8x-60.cs.helsinki.fi>
From: Yoshifumi Nishida <nsd.ietf@gmail.com>
Date: Thu, 13 Oct 2022 01:03:50 -0700
Message-ID: <CAAK044S77ZfiAydLDiCLJHr4OFPz22rEz3fU9bvwt2vRGXaS7w@mail.gmail.com>
To: Markku Kojo <kojo@cs.helsinki.fi>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, tcpm-chairs <tcpm-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000c77f3405eae5f3f3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/K9QQ1BrsmmhVaoe1REsqCetaF2Y>
Subject: Re: [tcpm] finalizing CUBIC draft (chairs' view)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Oct 2022 08:05:03 -0000

Hi Markku,

On Tue, Oct 11, 2022 at 3:13 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:

> Hi Yoshi,
>
> apologies for the delay ...
>
> See inline.
>
> On Tue, 27 Sep 2022, Yoshifumi Nishida wrote:
>
> > Hi Markku,
> >
> > Thanks for the comments.
> > On Sun, Sep 25, 2022 at 6:17 PM Markku Kojo <kojo@cs.helsinki.fi> wrote:
> >       Hi Yoshi,
> >
> >       catching up ...
> >
> >       I wonder why the Issues 4 and 5 that were not mentioned on the mtng
> >       slides have not been discussed on the wg list at all nor
> concluded? I
> >       sent the detailed description of the Issue 4 to the wg list on
> July 29th.
> >
> >
> > From my point of view, this point is also trying to point out some
> characteristics in the model
> > as issue 2.
>
> I believe you mean issue 1 which is about the model (issue 2 is about beta
> in slow start)?
>
> > We've already mentioned that the model is not very precise and will need
> more analysis in the
> > current draft.
> > I am not very sure how this can be a serious risk for the internet yet,
> hence I think what the
> > current draft describes is fine.
> > I think we will need to see some evidence to validate if this is a
> real issue.
>
> The issue 4 is not about the correctness or accuracy of the model (issue
> 1).
>
> Issue 4 is about slow convergence down when CUBIC is required to (quickly)
> release network capacity to competing flows, in particular when new flows
> start and create sudden congestion and CUBIC should reduce its sending
> rate to a small fraction from what is its current sending rate. Using
> beta=0.7 requires roughly two times more multiplicative decreases than
> beta=0.5. Not reacting to sudden congestion appropriately forces the
> competing flows to slow down instead and results in prolonged unfairness.
>
> Using a lower alpha value per the model compensates the use of larger beta
> in the normal CA cycle where AI reaches (roughly/on the average) the
> previous saturation point. But when sudden congestion occurs a flow needs
> to apply MD again quite immediately after the fast recovery ends (in one
> or in a few RTTs) and potentially repeat this several times. Hence,
> the lower alpha value per the model does not compensate the use of
> higher beta because effectively MD (beta) only is in use several times in
> a row.
>
> This has been studied and shown, e.g., in the paper Bob pointed out. It
> has results and explains why slower convergence results in notable
> unfairness. The current text in the draft w.r.t slower convergence
> explains mainly the impact to the CUBIC sender only and does not include
> any appropriate citations that would back up the claims that the draft
> makes about the impact to the other traffic. And the draft does not cite
> the results that show and explain the negative impact to the other
> traffic.
>

So, this is a topic for fairness.
I think we will need more detailed analysis on how it is unfair. When a
sudden congestion
happens, I think it may tend to get timeouts rather than repeating multiple
decreases.
I'm not sure at this point, though. This may take time.
We can continue discussions, but I don't think this leads to congestion
collapse or
any other serious problems.


> >       Issue 5 has been discussed in github. The major problem there is
> that
> >       by following the current advise in the draft with a TCP sender
> results
> >       exactly to the incorrect behaviour that Neal pointed out in github
> and
> >       that has been (partially) corrected in the Linux TCP stack (*). In
> >       addition, applying UNDO cwnd opens the security threats described
> in RFC
> >       3522 and RFC 3708. There is no single word on this in the Sec 6 for
> >       Security Considerations? Note that the security ptoblems described
> in RFC
> >       3522 and RFC 3708 become enabled only if one applies UNDO cwnd.
> >
> >       (*) The problem with RFC 3522 is that it did not consider SACK at
> all
> >       (silently assumes SACK is not enabled) and therefore works
> incorrectly if
> >       applied as specified when SACK is enabled. The fix for Linux that
> Neal
> >       pointed out seems to behave similarly for both SACK and non-SACK
> sender.
> >       That would make a non-SACK sender to behave against the basic idea
> of RFC
> >       3522 that avoids further unnecessary retransmissions after the
> first
> >       retransmismitted segment, the most important advantage of RFC 3522.
> >
> >
> > Hmm. sorry. I am not sure which point you're referring to. Can you
> provide a link for github?
>
> This is discussed as part of issue #90. There were 3 separate problems
> discussed under #90, Issue 5 concerns Sec. 4.9 of the draft w.r.t.
> spurious congestion events.
>
> Applying the CC response in Sec. 4.9.2 with either [RFC3522] or [RFC3708]
> opens the security threats discussed in those RFCs. The paper "TCP
> Congestion Control with a Misbehaving Receiver" that Neal pointed out in
> his last message and claimed that such mechanisms could be used "defeating
> just about any existing TCP congestion control or loss recovery mechanism"
> is IMHO exaggerated. The two first attacks discussed in the paper (ACK
> division and DupACK spoofing) have already been solved in RFC 5681. The
> third attack (Optimistic ACKing) is quite inattractive to use because it
> easily results in unreliable TCP data delivery. Even if that is acceptable
> for a TCP receiver, a TCP sender has other ways to mitigate/solve the
> problem by penalizing the sender: RSTing the connection as proposed in
> the paper or simply by reducing cwnd if it (continuously) receives ACks
> that ack data not yet sent. Instead, we do not have any known,
> well-working ways to reliably detect the security threats discussed in
> [RFC3522] and [RFC3708]. So, what is the justification to silently open
> these security threats?
>
> In addition, RFC 3522 does not work correctly with SACK enabled TCP. The
> Linux algo for handling false fast rexmits that Neal pointed out in his
> last message corrects it for a SACK-enabled TCP sender but it seems to
> operate with the same logic for a non-SACK sender (NewReno) which is
> impractical and breaks the original idea of Eifel to make the decision
> and exit fast recovery on the arrival of the first new Ack (after one RTT)
> and thereby prevent any further unnecessary retransmissions.
>
> In addition, the actions the draft proposes ignore the crucial additional
> action that a TCP sender that detects false fast rexmit should take
> (adjust dupack threshold, avoid blasting a huge burst of data, etc.).
>
> Issue 5 in summary:
> a) following the advice in the current text results in incorrect
>     TCP implementation with Eifel false rexmit detection [RFC3522]
> b) the draft enables the security threats w.r.t [RFC3522] or [RFC3708]
>     with a TCP implementation and ignores the problem.
>

I personally think this is the outside issue of CUBIC draft. it seems this
is not
very related as CUBIC doesn't depend on it.
If there's any threats for RFC3522 or RFC3708, I think it should
better to be mentioned
in the docs rather than in a doc that just cites them
In any case, this looks an editorial topic for me. I think we can sort it
out before
the doc is published.

When replying w.r.t issue 5 maybe you (or someone else who
> replies first) could start a new thread for it in order to keep some
> structure with the discussions, thank you.
>
> >       I'm also afraid that Bob's analysis for Issue 1 with a tail-drop
> router
> >       is incorrect. I'll send a separate reply for that.
> >
> >
> > OK. just in case, I would like to clarify that Bob's analysis is one
> data point for further
> > discussions.
> > We can continue discussions on the analysis, but the discussions won't
> affect the draft.
> >
> >       For Issue 2 Michael and Neal have agreed that the problematic
> behaviour
> >       occurs. I have not seen any other replies with technical arguments.
> >       Michael explains that the problem does not occur always like I
> described.
> >       That is true but it does not mean the problem is non-existing.
> Instead,
> >       it occurs when the bottleneck router has queue of size 1 BDP or
> larger,
> >       i.e., relatively often (I'll clarify this a bit more with a reply
> to
> >       Michael).
> >       Neal's major argument for not using beta = 0.5 in slow start was
> because
> >       it has been deployed as beta=0.7 and changing it to beta=0.5 would
> >       require research. Some others have also thought it requires
> research.
> >       This is quite odd given that TCP has worked with beta=0.5 since
> late 80's
> >       and there are already tons of research and experince over several
> >       decades with beta=0.5 when in slow start. What is the further
> >       analysis and evaluation that would be needed?
> >       In addition, Beta=0.5 in slow start is also the current draft
> standard,
> >       so the correct question for the wg is why to change beta=0.5 to
> 0.7 and
> >       what is the justification for the change (and why is such
> justification
> >       not written down in the draft)?
> >
> >       I'm also confused about the statement that the Issue 2 may just
> cause
> >       more losses and an additional round of recovery. That's simply to
> point
> >       of view of a CUBIC flow and fully ignores the impact to the other
> traffic.
> >       It is a severe violotion of congestion control principles to inject
> >       unnecessary packets (undelivered packets) to the network. Please
> see the
> >       description of congestion collapse in Sec 5 of RFC 2914 due to
> >       undelivered packets and explain why the overload of up to 40%
> undelivered
> >       packets when applying beta=0.7 in slow start is not creating
> congestion
> >       collapse (of some degree)? By this far nobody has explained or
> argued
> >       otherwise. I think this is the most serious issue with the draft.
> >
> >       In order to make progress with the draft, these issues must not be
> >       ignored IMHO. If the wg decides to publish the draft without
> addressing
> >       the issues, at minimum the facts with the issues must be
> appropriately
> >       documented in the draft.
> >
> >
> > I personally don't think MD=0.7 will cause congestion collapse as I
> haven't seen such evidence
> > so far.
>
> We cannot have evidence if nobody has done any measurements. The impact to
> other coexisting traffic cannot be observed without *measuring* it.
>
> Everyone also needs to understand that congestion collapse comes in
> different degrees, it does not need to be full collapse (see RFC 2914)
> and I am not claiming full collapse would occur (that would have been
> noticed without measurements). But any degree must be avoided by
> an acceptable congestion control algo.
>
> The evidence is present in the measurements done in RFC 2914. In the case
> of CUBIC, the sender is not totally unresponsive to congestion but
> its response is too mild, not following the "MUST" in RFC 5681, and
> results in up to 40 % of injected packets to be undelivered packets.
>
> > It can delay the convergence time and it's already explained in the
> draft.
>
> It is exactly the wrong way of looking at the problem from a CUBIC
> sender point of view. We are not discussing on the impact to the CUBIC
> sender itself but the impact to the other traffic. There are no words
> about it in the draft.
>
> The impact to the CUBIC sender itself is negligible as we all probably
> very well know (the sender may continue with transmitting data with
> higher rate or at least with the rate that is its share even if it needs
> to recover from another loss).
>
> Instead, it should be obvious that if a flow reduces cwnd so little that
> it continues sending at higher rate than what it itself just determined
> via slow start (remember that we are discussing the case *where the
> sender is in slow start when congestion is signelled*) to be the
> available network capacity then the flow quite likely steals this
> capacity from the other competing traffic. What is the justification for
> a design that deliberately transmits at higher rate than what is the
> available network capacity? The draft does not have text for such
> justification even though it modifies RFC 5681 for this part.
>
> > If you think MD=0.7 can cause serious problems such as congestion
> collapse, could you
> > elaborate?
>
> I think I have explained it a number of times but please see my response
> that I'll send as a response to Michael's comments.
>

Ok. But, this also looks another fairness topic to me, which is a bit hard
to evaluate
at this point.
--
Yoshi


> Thanks,
> > --
> > Yoshi
> >
> >
> >         On Fri, 26 Aug 2022, Yoshifumi Nishida wrote:
> >
> >       > Hello everyone,
> >       > Based on the feedback from the last meeting, the chairs have
> been discussing how
> >       to finalize the cubic
> >       > draft.
> >       > The below is our current view on the draft.
> >       >
> >       > The slide for the CUBIC draft from the last WG meeting listed 4
> discussion points
> >       in the draft.
> >       >
> >       >
> https://datatracker.ietf.org/meeting/114/materials/slides-114-tcpm-revised-cubic-as-ps
> >       >
> >       > In these items, we think that the last two points are already
> addressed now.
> >       > With regard to the remaining two points, our views are the
> following.
> >       >
> >       > Point 1: TCP friendly model in the cubic draft
> >       >      We can admit that the model is not valid as the paper
> describing the model
> >       uses some simplified
> >       > presumptions.
> >       >      But, it doesn't not mean the model will pose serious issues
> on the Internet
> >       as we haven't seen any
> >       > evidence yet.
> >       >
> >       > Point 2: Multicative decrease factor during slow-start phase
> >       >      We think using the current value: 0.7 may cause more packet
> losses in
> >       certain cases, but it can
> >       > work efficiently in other cases.
> >       >      We think this is a part of design choices in CUBIC as we
> haven't seen any
> >       tangible evidence that
> >       > it can cause serious problems.
> >       > We concluded this will require more detailed analysis and
> evaluations which can
> >       take a longer time.
> >       > Based on this, we think these points are NOT needed to be
> addressed in the draft
> >       while it will be good
> >       > to add some more explanations for them.
> >       > We saw there were several opinions about documenting these
> points in the draft
> >       during the last meeting.
> >       > If you have some suggestions here, please share your opinions.
> >       >
> >       > Please note that this doesn't mean we'll ignore them. we will
> try to publish a
> >       new version of the CUBIC
> >       > draft if we find some things on them.
> >       >
> >       > If you have any opinions or comments on the views, please share
> them with us.
> >       >
> >       > Thanks,
> >       > --
> >       > Yoshi on behalf of tcpm co-chair
> >       >
> >       >
> >
> >
> >