[tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt

Neal Cardwell <ncardwell@google.com> Tue, 16 July 2024 19:50 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 527DBC151089 for <tcpm@ietfa.amsl.com>; Tue, 16 Jul 2024 12:50:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.597
X-Spam-Level:
X-Spam-Status: No, score=-17.597 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8Nue9zy1-LSD for <tcpm@ietfa.amsl.com>; Tue, 16 Jul 2024 12:50:03 -0700 (PDT)
Received: from mail-vs1-xe32.google.com (mail-vs1-xe32.google.com [IPv6:2607:f8b0:4864:20::e32]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 57601C14F70A for <tcpm@ietf.org>; Tue, 16 Jul 2024 12:50:03 -0700 (PDT)
Received: by mail-vs1-xe32.google.com with SMTP id ada2fe7eead31-48fddd09ba2so75073137.0 for <tcpm@ietf.org>; Tue, 16 Jul 2024 12:50:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721159402; x=1721764202; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=IHn7PF1jmzZlw+iHvSpFISeEI234cedR+D5Z7kcogHw=; b=V8ZFOnD031shlF6/Hngu8LG1r3mqeqDEpIwPMqZnlwDyrsBmJiEE4o5Cgi8nwT6Bjz CeCPTwhewzi+oHAV6bAlVaiJZMdaMet8lePlSbKxWEXeAE56sxOT6ij0aFpLplqlVsVH x9En7YuL84OhqcgTqSSAR0VUkO+4Q35mVDWsdr1zxTlrYgXrLattkbD7Dm0nr93VGTko xGb//7x3dzmMNQgc6xFJnmyFvTapBf8hHktLVQ8rU/4sf+8Q7+ltmtgmV8Z+QBpvzXCz p+Z08H1jAIQ/IkUpAGSluFxpKPS7QkRng551cL0jS8u4VfPkMmxADWZvFmm0B7llLDw1 sqUA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721159402; x=1721764202; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IHn7PF1jmzZlw+iHvSpFISeEI234cedR+D5Z7kcogHw=; b=CgrtJYjpHJzGzp07WEYkUq4PCA8d6hU/MSWfRWNBrEbiuNbmZ11o4zdDHUUKw1odI0 kZ2xJmy6Kbnp6G3yfMgxPR6IcHTzcnfQiiNu5MfJaNNhD0ClKB3UUaL274A8wq7vys+S 8/kUG1CPM/NG+8rPFijJuzrkxwsDg1yO/gK+yUjvA513DcS8ob0vjN3P28gEcWCa49mp iy83U0vONvFuXMuJHbaxCaCIrdmCFfpigj8aZUb5rVHUz88CVAW3EWIOFVrXzO8nmWL+ ZZ5nh9PxzOxkWHI4qBRPpS/mCL1ezvDCdKLCQd2v4in3D7srvujcg5xj7JLGj6NA9hID n+9w==
X-Gm-Message-State: AOJu0YyoG8ZvbsxWhej1qOazAH40RBP9XlAqXSCEU4NXoDPwIdSdAc0s K+xnoSb3XM8iAO9GdmaAkbij+aTy+TdFklkHbM/ordbvupX9wQCZlnIIeyOe4xSbVQvZmdF3K8w 1MHCEQEBCBh7A/mo7C5IcG+6WabqY0PYH2aDK
X-Google-Smtp-Source: AGHT+IGC24DRysTqzuFaVEv2JrQg5wHn4I9tGpS9zEZeNfxElELCEXYentcMobE0bm5gzLXduM5oUuwTfepmBMO0vM4=
X-Received: by 2002:a05:6102:c0c:b0:48f:a4ee:6d73 with SMTP id ada2fe7eead31-49156163f0bmr738775137.11.1721159401834; Tue, 16 Jul 2024 12:50:01 -0700 (PDT)
MIME-Version: 1.0
References: <170896098131.16189.4842811868600508870@ietfa.amsl.com> <CADVnQy=rvCoQC0RwVq=P2XWFGPrXvGKvj2cAooj94yx+WzXz3A@mail.gmail.com> <8e5f0a7-b39b-cfaa-5c38-edeb9916bef6@cs.helsinki.fi> <CADVnQynR99fQjWmYj-rYZ4nZxYS=-O7zbfWjJLMxd5Lqcpwgcg@mail.gmail.com> <fece694d-90dc-c86-6b59-484be8856e65@cs.helsinki.fi>
In-Reply-To: <fece694d-90dc-c86-6b59-484be8856e65@cs.helsinki.fi>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 16 Jul 2024 15:49:45 -0400
Message-ID: <CADVnQy=b_OOSgxmcEGS8LVeEzPB=_p1hvKKm5-1E6M8vTBiREQ@mail.gmail.com>
To: Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>
Content-Type: multipart/alternative; boundary="000000000000bfe9f0061d62a6c4"
Message-ID-Hash: CJJ6JB2ACQUBZVIX2RXAEDUX6SO5ZM2E
X-Message-ID-Hash: CJJ6JB2ACQUBZVIX2RXAEDUX6SO5ZM2E
X-MailFrom: ncardwell@google.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: tcpm@ietf.org, Matt Mathis <mattmathis@measurementlab.net>, Matt Mathis <ietf@mattmathis.net>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/_qnryB0LkWPWawULIn9iqgkW40g>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>

On Wed, Mar 27, 2024 at 9:51 AM Markku Kojo <kojo=
40cs.helsinki.fi@dmarc.ietf.org> wrote:

> Hi Neal,
>
> Pls see inline below. Note that I'm sending another reply to address the
> issues with cumulatively ACKed bytes with the ACK that triggers loss
> recovery and a number of other related issues that should be addressed.
>
> I'll also extract the issue with decreasing sending rate again on a loss
> of retransmitted segment to a thread of its own because it is a crucial
> issue and different from the other issues related to correct calculation
> bytes delivered during the recovery process.
>
> On Mon, 18 Mar 2024, Neal Cardwell wrote:
>
> > Hi Markku,
> > Thank you for your thoughtful, detailed, and useful feedback! Comments
> in-line below:
> >
> > On Mon, Mar 18, 2024 at 1:31 AM Markku Kojo <kojo=
> 40cs.helsinki.fi@dmarc.ietf.org> wrote:
> >       Hi, Neal, all,
> >
> >       I have not been able to follow the progress of this draft for a
> long
> >       while, so apologies for chiming in this late.
> >
> >       I took a quick look at the latest discussions on setting RecoverFS.
> >
> >       The idea of setting RecoverFS = pipe seems like a neat idea to get
> cwnd
> >       descend smoothly to the target in the given example. However,
> isn't it
> >       more important to ensure that in all cases the sender sends at most
> >       ssthresh pkts per RTT during the recovery and that in the end of
> the
> >       recovery cwnd is at most ssthresh?
> >
> >       If I am not mistaken, reodering (and Ack losses) may result in
> undesired
> >       outcome. Let's modify Neal's example a bit such that reordering
> occurs but
> >       reordering window (with RACK-TLP) is slightly too small:
> >
> >       CC = Reno
> >       cwnd = 100 packets
> >       The application writes 100*MSS.
> >       TCP sends 100 packets.
> >
> >       In this example the TCP sender has detected reordering with
> RACK-TLP or
> >       some other technique, so does not enter fast recovery on the third
> >       SACKed packet, but rather waits a while to accumulate more SACKs.
> >
> >       >From the flight of 100 packets, 1 packet is lost (P1), and 24
> packets
> >       are delayed (packets P2..P25) and 3 packets (P26..P28) are SACKed
> >       (assume P2..P25 arrive after P28, for example).
> >
> >       We enter fast recovery with PRR.
> >
> >       RecoverFS = snd.nxt - snd.una = 100
> >
> >       ssthresh = cwnd / 2 = 50  (Reno)
> >
> >       pipe = snd.nxt - snd.una - (lost + SACKed) = 100 - (25 + 3) = 72
> packets
> >
> >       The expression (pipe > ssthresh) is true for a number of
> consecutive
> >       SACKs, so we use the PRR code path repeatedly for a while as SACKs
> stream
> >       in for P2..25 and P29..P100.
> >
> >       When the the SACK for P100 has been processed we have sent
> >
> >       Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
> >                    = CEIL(96 * 50 / 72)
> >                    = 67
> >
> >       So, PRR does not exit with cwnd = 50 but with much higher cwnd than
> >       expected.
> >
> >       If CC = CUBIC
> >
> >       Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
> >                    = CEIL(96 * 70 / 72)
> >                    = 94
> >
> >
> >       The same behavior seems to occurs also if the is significant Ack
> loss
> >       among P2..25 and at least one pkt gets reordered out of the
> reordering
> >       window. But, maybe I am missing something?
> >
> >
> > Thanks!
> >
> > BTW, I gather the line above that reads:
> >   RecoverFS = snd.nxt - snd.una = 100
> > is referring to a different version of the algorithm, and for this
> discussion probably it makes
> > more sense as something like:
> >   RecoverFS = pipe = 72 packets
>
> My bad, sorry!
> (I just copied your earlier example and modified it but forgot to modify
> the line that sets RecoverFS. However, as you could see, I used RecoverFS
> exactly as you corrected above.)
>
> > But your point still stands, and you raise a great point: simply
> initializing RecoverFS to
> > "pipe" is not safe, because packets that were marked lost and removed
> from pipe may actually
> > have been merely reordered. So if those packets are delivered later,
> they will increase the
> > numerator of prr_delivered / RecoverFS without increasing the
> denominator, thus leading to a
> > result above 1.0, and thus potentially leading to a target for
> Sent_so_far that is above
> > ssthresh, causing the algorithm to erroneously exceed ssthresh.
> >
> > What do folks think about addressing this issue by changing the
> initialization of RecoverFS to
> > be as follows:
> >
> >   existing text:
> >      pipe = (RFC 6675 pipe algorithm)
> >      RecoverFS = pipe              // RFC 6675 pipe before recovery
> >
> >   proposed new text:
> >      RecoverFS = snd.nxt - snd.una + (bytes newly cumulatively
> acknowledged)
> > That should fix the issue you raise, and keep prr_delivered from being
> able to grow larger than
> > RecoverFS.
> >
> > I'm proposing here to add in  (bytes newly cumulatively acknowledged)
> because in a (probably
> > rare) corner case where the ACK that initiates fast recovery advances
> snd.una, then the   (bytes
> > newly cumulatively acknowledged)  on that first ACK will be non-zero,
> and will be added into
> > prr_delivered. So if RecoverFS does not include (bytes newly
> cumulatively acknowledged) in such
> > cases, then we could again trigger the kind of scenario you raise where
> prr_delivered >
> > RecoverFS.
>
> First, I was not quite sure what kind of scenario you were thinking where
> the ACK that initiates loss recovery would advance snd.una but then I got
> it. It's covered in my other reply.
>
> It seems to me that the original way to initialize RecoverFS to snd.nxt -
> snd.una, i.e., to effective flight size, is the way that ensures correct
> operation in all cases. However, I think computing DeliveredData should be
> clarified and modified to take into account all different cases such that
> all SACKed (and cumulatively Acked) bytes during the recovery
> process are counted in correctly.
>
> > We could try to make RecoverFS more precise by subtracting out SACKed
> data. But given the
> > proposed RecoverFS initialization has already changed three times in the
> last few months, IMHO
> > it seems safer to keep it as simple as possible. :-)
>
> Agreed. In my view the real problem in the examples that you have
> described (i.e., in cases where entering to loss recovery gets
> postponed) is not in how RecoverFS is initialized but in that how
> DeliveredData is specified and thereby what is counted in prr_delivered.
> Please see my explanation in another (later) reply to Yoshi and you.
>
> [SNIP: I am moving the discussion on reacting to loss of a rexmit to
> a new thread of its own and will reply to it separately]
>
>
> >       In addtion, there are a few other things that might be useful to
> >       correct/clarify:
> >
> >       - When exactly does PRR algorithm exit? That is, is the algo steps
> >          executed also for the final cumulative ACK that covers
> RecoveryPoint
> >          (or recover)?
> >
> >
> > The PRR algorithm exits implicitly when fast recovery exits. PRR takes
> no particular special
> > steps at the end of fast recovery, so the draft thus far hasn't bothered
> to spell that out. :-)
> > This is sort of treated implicitly by the fact that the pseudocode
> specifies when the blocks of
> > logic are triggered, which are the events: "At the beginning of
> recovery", "On every ACK
> > starting or during fast recovery", "On any data transmission or
> retransmission:" (I would
> > propose, however, to update that first phrase to be: "At the beginning
> of fast recovery", since
> > an unqualified "recovery" could conceivably apply to RTO recovery.)
>
> Understood, but I think it would be useful to confirm/clarify that the
> algo steps for "On every ACK starting or during fast recovery" are
> executed also for the final cumulative ACK that ends the loss recovery.
> Why? Because "during fast recovery" seems a bit ambiguous at the time the
> cumulative ACK that covers RecoveryPoint arrives. RFC 6675 does nothing
> but immediately terminates loss recovery upon arrival of the final
> cumulative ACK: "An incoming cumulative ACK for a sequence number greater
> than RecoveryPoint signals the end of loss recovery, and the loss
> recovery phase MUST be terminated." This could be intrepeted such that
> upon receipt of the final ACK fast recovery terminates immediately
> and there is no reason to execute the PRR steps for "On every ACK
> starting or during fast recovery" either.
>
> > Do you think that would be useful to discuss when the PRR algorithm
> exits, given that nothing
> > happens at that point? If so, is there particular text that you would
> find useful?
>
> Would it clarify enough if the text is simply  modified as follows:
>
> Existing text:
>
>   On every ACK starting or during fast recovery:
>
> Proposed new text:
>
>   On every ACK starting or during fast recovery, including the final
>   cumulative ACK that signals the end of loss recovery:   (*)
>
> (*) I'd actually suggest modifying this further (something like what
> follows):
>
>   On every arriving ACK during fast recovery, including the final
>   cumulative ACK that signals the end of loss recovery:
>
> Reasoning: litterarily speaking fast recovery does not include fast
> retransmit as specified in RFC 5681, Sec 3.2; the steps 2&3 specify fast
> retransmit that is followed by fast recovery starting from step 4.
> Similarly, RFC 6675, sec 5, step 4 specifies fast retransmit and steps
> (A)-(C) fast recovery.
> Please see my proposal in another reply to Yoshi and you explaining why it
> might be useful for fast retransmit to be a step of its own like it is in
> all other RFCs. That would solve a number of issues in counting
> delivered bytes at the arrival of the ACK that triggers loss recovery.
>

Regarding the question of how to replace the text:

On every ACK starting or during fast recovery:

Regarding your proposal:

  On every arriving ACK during fast recovery, including the final
  cumulative ACK that signals the end of loss recovery:

I would suggest instead replacing it with the new text:

    On every ACK starting or during fast recovery,
    excluding the ACK that concludes a PRR episode:

My rationale:

(1) I believe we have consensus that the PRR document should specify that
we set cwnd = ssthresh at the end of a PRR episode. If we define the main
action for PRR (using PRR state plus DeliveredData to calculate sndcnt and
cwnd) to occur even on the final ACK of PRR, then we have further
complexity where we have to define which happens first: (a) the main action
for PRR , or the cwnd = ssthresh. If an implementer implements them in the
wrong order, they will likely get an incorrect answer.

(2) Our team does not have implementation experience with the approach you
propose of running the main action for PRR on the final ACK of a PRR
episode. That's because, in Linux TCP, on the final ACK of a PRR episode
the code executes cwnd = ssthresh in tcp_end_cwnd_reduction() but then does
not execute the main action for PRR because the connection exits
CA_Recovery and enters CA_Open before the code reaches tcp_cong_control()
and decides whether to call tcp_cwnd_reduction() to execute the main action
for PRR. I would very much like to avoid making changes to the algorithm
that are not backed by implementation experience.

Thank you, Markku, for your thoughtful and very useful feedback!

best regards,
neal



>
> >       - The draft reads:
> >
> >          "Although increasing the window
> >           during recovery seems to be ill advised, it is important to
> remember
> >           that this is actually less aggressive than permitted by RFC
> 5681,
> >           which sends the same quantity of additional data as a single
> burst in
> >           response to the ACK that triggered Fast Retransmit."
> >
> >         I think it should cite RFC 6675 as TCP Reno loss recovery
> specified in
> >         RFC 5681 soes not send such a burst.
> >
> >
> > Sounds like a great point to me. I've updated this text in our internal
> copy to reference RFC
> > 6675. So the next draft revision will reflect that, unless there are
> objections or other ideas.
>
> Great, thanks.
>
> Thanks,
>
> /Markku
>
> > Thanks!
> > neal
> >
> >
> >
> >       On Mon, 26 Feb 2024, Neal Cardwell wrote:
> >
> >       > As noted in the draft, revision 06 primarily has a single change
> relative to 05:
> >       it updates
> >       > RecoverFS to be initialized as "RecoverFS = pipe" in both the
> prose and
> >       pseudocode.
> >       >
> >       > Thanks to Richard Scheffenegger and the TCPM community for
> reviewing the 05
> >       revision.
> >       > Comments/suggestions welcome!
> >       >
> >       > Thanks!
> >       > neal
> >       >
> >       >
> >       > On Mon, Feb 26, 2024 at 10:23 AM <internet-drafts@ietf.org>
> wrote:
> >       >       Internet-Draft draft-ietf-tcpm-prr-rfc6937bis-06.txt is
> now available. It is
> >       a
> >       >       work item of the TCP Maintenance and Minor Extensions
> (TCPM) WG of the IETF.
> >       >
> >       >          Title:   Proportional Rate Reduction for TCP
> >       >          Authors: Matt Mathis
> >       >                   Nandita Dukkipati
> >       >                   Yuchung Cheng
> >       >                   Neal Cardwell
> >       >          Name:    draft-ietf-tcpm-prr-rfc6937bis-06.txt
> >       >          Pages:   17
> >       >          Dates:   2024-02-26
> >       >
> >       >       Abstract:
> >       >
> >       >          This document updates the experimental Proportional
> Rate Reduction
> >       >          (PRR) algorithm, described RFC 6937, to standards
> track.  PRR
> >       >          provides logic to regulate the amount of data sent by
> TCP or other
> >       >          transport protocols during fast recovery.  PRR
> accurately regulates
> >       >          the actual flight size through recovery such that at
> the end of
> >       >          recovery it will be as close as possible to the slow
> start threshold
> >       >          (ssthresh), as determined by the congestion control
> algorithm.
> >       >
> >       >       The IETF datatracker status page for this Internet-Draft
> is:
> >       >
> https://datatracker.ietf.org/doc/draft-ietf-tcpm-prr-rfc6937bis/
> >       >
> >       >       There is also an HTML version available at:
> >       >
> https://www.ietf.org/archive/id/draft-ietf-tcpm-prr-rfc6937bis-06.html
> >       >
> >       >       A diff from the previous version is available at:
> >       >
> https://author-tools.ietf.org/iddiff?url2=draft-ietf-tcpm-prr-rfc6937bis-06
> >       >
> >       >       Internet-Drafts are also available by rsync at:
> >       >       rsync.ietf.org::internet-drafts
> >       >
> >       >
> >       >       _______________________________________________
> >       >       tcpm mailing list
> >       >       tcpm@ietf.org
> >       >       https://www.ietf.org/mailman/listinfo/tcpm
> >       >
> >       >
> >       >
> >
> >
> >