Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2

Yoshifumi Nishida <nsd.ietf@gmail.com> Tue, 19 July 2022 09:37 UTC

Return-Path: <nsd.ietf@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 21989C13C52E for <tcpm@ietfa.amsl.com>; Tue, 19 Jul 2022 02:37:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.104
X-Spam-Level:
X-Spam-Status: No, score=-7.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id d413Y-TEWEEt for <tcpm@ietfa.amsl.com>; Tue, 19 Jul 2022 02:37:23 -0700 (PDT)
Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B44CAC13C50B for <tcpm@ietf.org>; Tue, 19 Jul 2022 02:37:23 -0700 (PDT)
Received: by mail-wr1-x42c.google.com with SMTP id z12so20727691wrq.7 for <tcpm@ietf.org>; Tue, 19 Jul 2022 02:37:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5Smytj/oHdirjVsSLuOxDBP3RrfWH+VyDZJY8CxrNFE=; b=gtvb2j7j4b2HqFBBFy/u07+nKMaXQHIT0oR2d2MjPpmVsXVoO0Usd+Jq0VU2mHFCo4 s0Xc8si9zzvUAsSuS0nSOkThgH635Xcerx+S3DrEmCuCnjXu0s9eiVZqUmFWaLpZpkwY twBTYVvPbgXl6BdQsUrLcMiJb3VwmWBWi1SEw/fvF+NQF8TEhIoxEm3U7TL2Xs4535VH 5DVsMKsDLF9AVm/BEuq//GuJaiPz/E907L5uHlFmfRswOZz48szTbUumVDZC+gV0U6aX 1zSsMzrmDSiyfi1yb3aoyw9AHjE63vRprTMVkuT/29GhnuICG/hlzZkcxmZvuDGlHOyd w4Kw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5Smytj/oHdirjVsSLuOxDBP3RrfWH+VyDZJY8CxrNFE=; b=f/8As3SOvDsf/88gMihtZTxzOsKuZ16YviAnYYOKy7/utRuvqvvyn0NXO45qWjDZnD 8FH1UMohQKV1MmB1/KofbUurhbLZ90hvknQd9rFYLoh7SJDFB9/ecFX3F3SbjBf1zSwF iM5bdECVindzFG5hX9Bjfu+QSx0PmSdUD3//Ryc/x87klsjPCawQh1L+hKdVhdvZAXUt 1b2WRbG3890xETljBLOJaW967GgdlkGo7cghj6BAlHVnX3xtXCnrA12n9eJ6wDSgPktz C8Fj+elvJ7wkxR2BIxKMH7u3twajxFUsKkf1cQ3t6dETucGjz6bq7QAAg95aMIxnYXul tJSg==
X-Gm-Message-State: AJIora9iCwx0+JPtVAt5g/tyTocnCB52roAM08QIZvqZypnQhJnUOjaJ nrzUbkr5cJSQIJ89aVrvw/5C4XDuqVVjb80xd8aaChIJ
X-Google-Smtp-Source: AGRyM1sjVVU/wRg25IaXSAFyVS3UYPu3o6EIvg3K8YUHKhmmwR7nKWmQhXiNq25zeoLiDagmE/XoRrLYxPonMQfkCJc=
X-Received: by 2002:adf:a4dc:0:b0:21e:42d8:6e8b with SMTP id h28-20020adfa4dc000000b0021e42d86e8bmr647446wrb.196.1658223442023; Tue, 19 Jul 2022 02:37:22 -0700 (PDT)
MIME-Version: 1.0
References: <alpine.DEB.2.21.2206141500480.7292@hp8x-60.cs.helsinki.fi> <alpine.DEB.2.21.2207112144430.7292@hp8x-60.cs.helsinki.fi> <7CF26B3A-D6C3-48F6-AA82-424231DD95D4@apple.com> <CADVnQykd9z=vgkQ-FkQ8-sj_E0BrQnpwhsj8AoF9QgQiQNQEhg@mail.gmail.com>
In-Reply-To: <CADVnQykd9z=vgkQ-FkQ8-sj_E0BrQnpwhsj8AoF9QgQiQNQEhg@mail.gmail.com>
From: Yoshifumi Nishida <nsd.ietf@gmail.com>
Date: Tue, 19 Jul 2022 02:37:10 -0700
Message-ID: <CAAK044TTg1p8ebJ9yd7uEES+KQskVFYw=wHimj9qrSJXDTASUA@mail.gmail.com>
To: Neal Cardwell <ncardwell@google.com>
Cc: Vidhi Goel <vidhi_goel=40apple.com@dmarc.ietf.org>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000038617f05e4253b0c"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/FXQXoVOPiTQh13ORQbgghBTNk7Y>
Subject: Re: [tcpm] CUBIC rfc8312bis / WGLC Issue 2
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Jul 2022 09:37:25 -0000

Hi folks,
I think I understand this issue, but I'm personally not sure how bad this
is.
Because this looks a rather pathological case to me, also I don't think
this can cause congestion collapse as this is still multicative decrease.
It seems to me that this is a kind of shooting in the foot, a suboptimal
case. However,  there are some advantages in the current logic.
I'm not very sure if we should sacrifice better results to address some
rare cases. I think we will need more analysis of the pros and cons for
this.

Thanks,
--
Yoshi

On Wed, Jul 13, 2022 at 7:17 AM Neal Cardwell <ncardwell@google.com> wrote:

> Hi Markku and TCPMers,
>
> My understanding of Markku's concern here is that in slow start the cwnd
> can continue to grow in response to ACKs after the lost packet was sent, so
> that the cwnd is often twice the level of in-flight data at which the loss
> happened, by the time the loss is detected. So the cwnd ends up at 2 * 0.7
> = 1.4x the level at which losses happened, which causes an unnecessary
> follow-on round with losses, in order to again cut the cwnd, this time
> to 1.4 * 0.7 = 0.98x of the level that causes losses, which is likely to
> finally fit in the network path.
>
> However, there are two technical issues with this concern, as expressed in
> the proposed draft text in this thread:
>
> (1) The analysis for slow-start is not correct for the very common case
> where the flow is application-limited in slow-start, in which case the cwnd
> would not grow at all between the packet loss and the time the loss is
> detected. So the text is needlessly strict in this case.
>
> (2) For CUBIC the problematic dynamic (of cwnd growth between loss and
> loss detection exceeding the multiplicative decrease) can also occur
> outside of slow-start, in congestion avoidance. The CUBIC cwnd growth in
> congestion avoidance can be up to 1.5x per round trip. So after a packet
> loss the cwnd could grow by 1.5x before loss detection and then be cut in
> response to loss by 0.7, causing the ultimate cwnd to be 1.5 * 0.7 = 1.05x
> the volume of in-flight data at the time of the packet loss. This would
> likely cause an unnecessary follow-on round of packet loss due to failing
> to cut cwnd below the level that caused loss. So the problem is actually
> wider than slow-start.
>
> AFAICT a complete/general fix for this issue is best solved by recording
> the volume of inflight data at the point of each packet transmission, and
> then using that metric as the baseline for the multiplicative decrease when
> packet loss is detected, rather than using the current cwnd as the
> baseline. This is the approach that BBRv2 uses. Perhaps there are other,
> simpler approaches as well.
>
> I also agree with Vidhi's concern, that a change to the multiplicative
> decrease changes the algorithm substantially. To ensure that the draft/RFC
> is not recommending something that has unforeseen significant negative
> consequences, we shouldn't make such a significant change to the text until
> we get experience w/ the new variation.
>
> best regards,
> neal
>
>
> On Tue, Jul 12, 2022 at 6:08 PM Vidhi Goel <vidhi_goel=
> 40apple.com@dmarc.ietf.org> wrote:
>
>> Hi Markku,
>>
>> I emailed about this to other co-authors and we think that this change is
>> completely untested for Cubic and we think that this could be considered of
>> a future version of Cubic, not the current rfc8312bis.
>> To change Beta from 0.7 to 0.5 during slow-start, we would at least need
>> some experience either from lab testing or deployment since all current
>> deployments of Cubic for both TCP and QUIC use 0.7 as Beta during slow
>> start. Since a lot of implementations currently use hystart(++) along with
>> Cubic, we don’t see any high risk of overaggressive sending rate and that
>> is what the current rfc8312bis suggests as well. In fact, changing Beta
>> from 0.7 to 0.5 can still be aggressive without using hystart.
>>
>> Thanks,
>> Vidhi
>>
>> > On Jul 11, 2022, at 5:55 PM, Markku Kojo <kojo=
>> 40cs.helsinki.fi@dmarc.ietf.org> wrote:
>> >
>> > Hi all,
>> >
>> > below please find proposed text to solve the Issue 2 a). I will propose
>> text to solve 2 b) once we have come to conclusion with 2 a). For
>> description and arguments for issues 2 a) and 2 b), please see the original
>> issue descriptions below.
>> >
>> > Sec 4.6. Multiplicative Decrease
>> >
>> > Old:
>> >   The parameter Beta__cubic_ SHOULD be set to 0.7, which is different
>> >   from the multiplicative decrease factor used in [RFC5681] (and
>> >   [RFC6675]) during fast recovery.
>> >
>> >
>> > New:
>> >   If the sender is not in slow start when the congestion event is
>> >   detected, the parameter Beta__cubic_ SHOULD be set to 0.7, which
>> >   is different from the multiplicative decrease factor used in
>> >   [RFC5681] (and [RFC6675].
>> >   This change is justified in the Reno-friendly region during
>> >   congestion avoidance because a CUBIC sender compensates the higher
>> >   multiplicative decrease factor than that of Reno by applying
>> >   a lower additive increase factor during congestion avoidance.
>> >
>> >   However, if the sender is in slow start when the congestion event is
>> >   detected, the parameter Beta__cubic_ MUST be set to 0.5 [Jacob88].
>> >   This results in the sender continuing to transmit data at the maximum
>> >   rate that the slow start determined to be available for the flow.
>> >   Using Beta__cubic_ with a value larger than 0.5 when the congestion
>> >   event is detected in slow start would result in an overagressive send
>> >   rate where the sender injects excess packets into the network and
>> >   each such packet is guaranteed to be dropped or force a packet from
>> >   a competing flow to be dropped at a tail-drop bottleneck router.
>> >   Furthermore, injecting such undelivered packets creates a danger of
>> >   congestion collapse (of some degree) "by delivering packets through
>> >   the network that are dropped before reaching their ultimate
>> >   destination." [RFC 2914]
>> >
>> >
>> >   [Jacob88] V. Jacobson, Congestion avoidance and control, SIGCOMM '88.
>> >
>> > Thanks,
>> >
>> > /Markku
>> >
>> > On Tue, 14 Jun 2022, Markku Kojo wrote:
>> >
>> >> Hi all,
>> >>
>> >> this thread starts the discussion on the issue 2: CUBIC is specified
>> to use incorrect multiplicative-decrease factor for a congestion event that
>> occurs when operating in slow start. And, applying HyStart++ does not
>> remove the problem, it only mitigates it in some percentage of cases.
>> >>
>> >> I think it is useful to discuss this in two phases: 2 a) and 2 b)
>> below.
>> >> For anyone commenting/arguing on the part 2 b), it is important to
>> first
>> >> acknowledge whether (s)he thinks the original design and logic by Van
>> Jacobson is correct. If not, one should explain why Van's design logic is
>> incorrect.
>> >>
>> >> Issue 2 a)
>> >> ----------
>> >>
>> >> To begin with, let's but aside a potential use of HyStart++ (also
>> assume tail drop router unless otherwise mentioned).
>> >>
>> >> The use of an MD factor larger than 0.5 is against the theory and
>> original design by Van Jacobson as explained in the congavoid paper
>> [Jacob88]. Any MD factor value larger then 0.5 will result sending extra
>> packets during Fast Recovery following the congestion event (drop). All
>> extra packets will become dropped at a tail-drop bottleneck (if a lonely
>> flow).
>> >>
>> >> Note that at the time when the drop becomes signalled at the TCP
>> sender, the size of the cwnd is double the available network capacity that
>> slow start determined for the flow. That is, using MD=0.5 is already as
>> aggressive as possible, leaving no slack. Therefore, if MD=0.7 is used, the
>> TCP sender enters fast recovery with cwnd that is 40% larger that the
>> determined network capacity and all excess packets are guaranteed to become
>> dropped, or even worse, the excess packets are likely to force packets for
>> any competing flows to become unfairly be dropped.
>> >>
>> >> Moreover, if NewReno loss recovery is in use, a CUBIC sender will
>> >> operate overagressively for a very long time. For example, if the
>> >> available network capacity for the flow is 100 packets, cwnd will have
>> >> value 200 when the congestion is signalled and the CUBIC sender enters
>> >> fast recovery with cwnd=140 and injects 40 excess packets for each of
>> >> the subsequent 100 RTTs it stays in fast recovery, forcing 4000
>> packets to become inevitably and totally unnecessarily dropped.
>> >>
>> >> Even worse, this behaviour of sending 'undelivered packets' is against
>> >> the congestion control principles as it creates a danger of congestion
>> >> collapse (of some degree) "by delivering packets through the network
>> >> that are dropped before reaching their ultimate destination." [RFC
>> 2914]
>> >>
>> >> Such undelivered packets unnecessarily eat capacity from other flows
>> >> sharing the path before the bottleneck.
>> >>
>> >> RFC 2914 emphasises:
>> >>
>> >> "This is probably the largest unresolved danger with respect to
>> >> congestion collapse in the Internet today."
>> >>
>> >> It is very easy to envision a realistic network setup where this
>> creates a degree of congestion collapse where a notable portion of useful
>> network capacity is wasted due to the undelivered packets.
>> >>
>> >>
>> >> [Jacob88] V. Jacobson, Congestion avoidance and control, SIGCOMM '88.
>> >>
>> >>
>> >> Issue 2 b)
>> >> ----------
>> >>
>> >> The CUBIC draft suggests that HyStart++ should be used *everywhere*
>> instead of the traditional Slow Start (see section 4.10).
>> >>
>> >> Although the draft does not say it, seemingly the authors suggest
>> using HyStart++ instead of traditional Slow Start in order to avoid the
>> problem of over-aggressive behaviour discussed above. This, however, has
>> several issues.
>> >>
>> >> First. it is directly in conflict with HyStart++ specification which
>> says that HyStart++ should be used only for the initial Slow Start.
>> However, the overaggressive behaviour after slow start is also a potential
>> problem with slow start during an RTO recovery; in case of sudden
>> congestion that reduces available capacity for a flow down to a fraction of
>> the currently available capacity, it is very likely that an RTO occurs. In
>> such a case the RTO recovery in slow start inevitably overshoots and it is
>> crucial for all flows not to be overaggressive.
>> >>
>> >> Second, the experimental results for initial slow start in HyStart++
>> draft suggest that while HyStart++ achieves good results HyStart++ is
>> unable to exit slow start early and avoid overshoot in a significant
>> percentage of cases.
>> >>
>> >> Given the above issues, the CUBIC draft must require that MD of 0.5 is
>> used when the congestion event occurs while the sender is (still) in slow
>> start. The use of MD=0.5 is an obvious stumble in the original CUBIC and
>> the original CUBIC authors have already acknowledged this. It seems also
>> obvious that instead of correcting the actual problem (use of MD other than
>> 0.5), HyStart and HyStart++ have been proposed to address the design
>> mistake. While HyStart++ is a useful method also when used with MD=0.5,
>> when used alone it only mitigates the impact of the actual problem rather
>> than solves the problem.
>> >>
>> >> What should be done for the cases where HyStart++ exits slow start but
>> >> is not able to avoid (some level of) overshoot and dropped packets is
>> IMO an open issue. Resolving it requires additional experiments and it
>> should be resolved separately when we have more data. For now when we do
>> not have enough data and understanding of the behaviour we should IMO
>> follow the general IETF guideline "be conservative in what you send" and
>> specify that MD = 0.5 should be used for a congestion event that occurs for
>> a packet sent in slow start.
>> >>
>> >> Thanks,
>> >>
>> >> /Markku
>> >>
>> >
>> > _______________________________________________
>> > tcpm mailing list
>> > tcpm@ietf.org
>> > https://www.ietf.org/mailman/listinfo/tcpm
>>
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>>
>