Re: [tcpm] draft-ietf-tcpm-prr-rfc6937bis-03 and RecoverFS initialization

Neal Cardwell <ncardwell@google.com> Thu, 20 April 2023 02:19 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A212BC16B5BD for <tcpm@ietfa.amsl.com>; Wed, 19 Apr 2023 19:19:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7kVKeFpMYIzK for <tcpm@ietfa.amsl.com>; Wed, 19 Apr 2023 19:19:55 -0700 (PDT)
Received: from mail-ua1-x935.google.com (mail-ua1-x935.google.com [IPv6:2607:f8b0:4864:20::935]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A58EBC16B5B6 for <tcpm@ietf.org>; Wed, 19 Apr 2023 19:19:55 -0700 (PDT)
Received: by mail-ua1-x935.google.com with SMTP id p12so1164557uak.13 for <tcpm@ietf.org>; Wed, 19 Apr 2023 19:19:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681957194; x=1684549194; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yDoWRQ9DSQGup3yylBiXzeMyZKsnDvSPbhE/0j46dqE=; b=RgVSmEWICuxMBGwpZu17NSYheTm3DfTLcZp8Ln3YevYbsyC8u1mW+WkfflyXUaZQ1H tr22RNQ5dns/I8b4jUDhAxxbjLG1bhnEESdUlhjKAZZkCztYOEaLM9vapdfmKYK4sebq 4rDKjd02V5py6NXCQucFecDYLOFPL4GRJfSa3o/TfQ8J3cKPJ5h8mR7LA1vEfGNkOsJ/ DLzrrba4Gzu77R5T9fLhluo/xTQ+rdq2b9WWYVgPaoehMPuqtMnamWB0DTNxyBewEwP1 KUNN2ql5x7V15c6G3KoRIcT8Urvez4Jz76cx0Et9myEm5jWdKJYjncDiSmHBD7NYUoBt yGpw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681957194; x=1684549194; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yDoWRQ9DSQGup3yylBiXzeMyZKsnDvSPbhE/0j46dqE=; b=EFekbJAuq84wrYpTWhLNzGHKEMY7ms4ryHvTV4JkW8HWqDB8EjI2cXZKwOvW7KjqU/ JNsPJeBPl5Io1WKVZ612/+EKTnspK4JCa5UqYIeqBTFiCXNmV1gLjRvqQSehnxug4l4T +CsJvt6snU9VNA8QQrNYmidINPu9AHoaTU/AqNernbLRHXiGjJNGjcc/B4vkqbyQJves 9sjLrKKUoNlth4KkAG7aMnuNbKjP2qHUFKl96DuS+AGoQpThhQRiFm/cb45ix5uHb5ac PuRvVA1bQtgCeBwzqpYJ6Cp1toYKxpYkDsBfZjPvZ1D8A0xQ3bu+VB8H5caHNLUPVStS aYlQ==
X-Gm-Message-State: AAQBX9d1CVvEzUWtmPVslTHrlUTdzRNKM9Bbr9yC668CUwY7ZW8aplFC kHXtOk5u3NCAKGWvyiscnPHfy173DzWVCsJ0QE5e+1JqykxZOJBYtYqH1pwb
X-Google-Smtp-Source: AKy350Yx4oHqUQRsGcpD23KDFyl5D+WnbVR83K6qTdSUGGnQeGis0S/l+4pOM4mB/mco9aZ263hfrjDBAeh1uEcG9gA=
X-Received: by 2002:a1f:c143:0:b0:43f:ab97:47c5 with SMTP id r64-20020a1fc143000000b0043fab9747c5mr172703vkf.3.1681957194253; Wed, 19 Apr 2023 19:19:54 -0700 (PDT)
MIME-Version: 1.0
References: <CADVnQy=rbTc1rb5PKA1mvSJm61UTb=T5xzOkMBBB2Yadoe691A@mail.gmail.com> <CAK6E8=ckFHoiRTmLEy6ZH8z2ovv9+7S_UzUqnO3W4xcumyA1Gg@mail.gmail.com> <CADVnQyk7nxmaoTHh5qo9XvhrWojoB2R78FK0zX5CcwoZq6c=hg@mail.gmail.com> <CAK6E8=cXXWfHd+T3GkDEhJ6TmbstygL=qD4nns3w50DTe2eaZw@mail.gmail.com>
In-Reply-To: <CAK6E8=cXXWfHd+T3GkDEhJ6TmbstygL=qD4nns3w50DTe2eaZw@mail.gmail.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Wed, 19 Apr 2023 22:19:37 -0400
Message-ID: <CADVnQy=Q5cvN_+Fa0rbNc2a_Aqe=haROOd4SNpk9TbvE1MXVvQ@mail.gmail.com>
To: Yuchung Cheng <ycheng@google.com>
Cc: tcpm <tcpm@ietf.org>, Matt Mathis <mattmathis@measurementlab.net>, Nandita Dukkipati <nanditad@google.com>
Content-Type: multipart/alternative; boundary="0000000000001791f405f9bb2d37"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/B7kNlRvZYYcuRkpzBD4rwRmIGmM>
Subject: Re: [tcpm] draft-ietf-tcpm-prr-rfc6937bis-03 and RecoverFS initialization
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Apr 2023 02:19:56 -0000

On Tue, Apr 18, 2023 at 7:35 PM Yuchung Cheng <ycheng@google.com> wrote:

>
>
> On Mon, Apr 17, 2023 at 2:00 PM Neal Cardwell <ncardwell@google.com>
> wrote:
>
>>
>>
>> On Mon, Apr 17, 2023 at 4:13 PM Yuchung Cheng <ycheng@google.com> wrote:
>>
>>> Hi Neal,
>>>
>>> That's a good point and it was considered in the early stage of PRR. We
>>> picked FlightSize (=snd.nxt - snd.una) to ensure ssthresh/RecoverFS
>>> faithfully reflects the proportion of the congestion control window
>>> reduction: RFC5681 still use FlightSize to compute ssthresh. But  some TCP
>>> or specific C.C.s may use either cwnd (e.g. Linux cubic/reno) or pipe.  How
>>> about a small graph:
>>>
>>> "If a TCP or congestion control implementation uses cwnd or pipe instead
>>> of FlightSize to compute ssthresh, then RecoverFS should use the specific
>>> metric accordingly, i.e. cwnd right before recovery"
>>>
>>
>> AFAICT that analysis is conflating two different issues:
>>
>> (1) How does the congestion control compute ssthresh (based on cwnd or
>> pipe or FlightSize?) You rightly point out that approaches vary for this
>> part.
>>
>> (2) How does PRR determine what fraction of outstanding packets have been
>> delivered (aka prr_delivered / RecoverFS)?
>>
>> AFAICT to get the right answer for the (2) question, RecoverFS should be
>> initialized to "pipe", no matter what approach the CC takes for answering
>> (1).
>>
>> My understanding of PRR is that if (pipe > ssthresh) is true, then the
>> algorithm is doing Proportional Rate Reduction, and is essentially
>> computing:
>>
>>       sndcnt ~= (target data sent in recovery)            - (actual data
>> sent in recovery)
>>       sndcnt ~= (fraction of data delivered) * ssthresh   - prr_out
>>       sndcnt ~= (prr_delivered / RecoverFS ) * ssthresh   - prr_out
>>
>> For the (target data sent in recovery) to equal ssthresh at the end of
>> the first round in recovery the algorithm must reach the point where
>> prr_delivered == RecoverFS, so that (prr_delivered / RecoverFS ) is 1.
>> Since  prr_delivered can only reach as high as "pipe" at the start of
>> recovery, to be able to reach that condition we need to have RecoverFS ==
>> "pipe". If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
>> prr_delivered won't be able to match RecoverFS and the (target data sent in
>> recovery) won't reach ssthresh, and the algorithm will undershoot (the cwnd
>> won't reach the ssthresh specified by congestion control, however it
>> calculated that).
>>
> I still can't parse your analysis after reading it multiple times.
>
> "prr_delivered can only reach as high as "pipe" at the start of recovery"
> --> prr_delivered is initiated to 0 at the start of the recovery?
> "If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
> prr_delivered won't be able to match RecoverFS" --> why is RecoverFS too
> big and prr_delivered won't reach it.
>
> I am not saying RecoverFS initiated to "pipe" is wrong. I just don't see a
> substantial difference between FlightSize vs pipe, unless the
> FlightSize/cwnd is small and/or limited-transmits were not used.
>
> maybe you can walk an example with FlightSize vs pipe...
>

Discussing a concrete example is a good idea!

Here's an example, sketching the behavior with
draft-ietf-tcpm-prr-rfc6937bis-03, AFAICT from trying to execute the
example by hand:

CC = Reno

cwnd = 100 packets

The application writes 100*MSS.

TCP sends 100 packets.

In this example, to make the effects more clear, the TCP sender has
detected reordering with RACK-TLP or some other technique, so does not
enter fast recovery on the third SACKed packet, but rather waits a while to
accumulate more SACKs.

>From the flight of 100 packets, 1 packet is lost (P1), and 24 packets are
SACKed (packets P2..P25).

We enter fast recovery with PRR.

RecoverFS = snd.nxt - snd.una = 100

ssthresh = cwnd / 2 = 50  (Reno)

pipe = snd.nxt - snd.una - (lost + SACKed) = 100 - (1 + 24) = 75 packets

The expression (pipe > ssthresh) is true for a number of consecutive SACKs,
so we use the PRR code path repeatedly for a while as SACKs stream in for
P26..P100.

Given the PRR code path math, in general, the target number of packets sent
so far in recovery will be:

   target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
                      = CEIL(prr_delivered * 50 / 100)
                      = CEIL(prr_delivered * .5)

What happens: This will cause the sender to send 1 packet for every 2
packets delivered (SACKed). Specifically, the connection will send 1 packet
for every 2 packets SACKed for the first 50 packets SACKed of the round
trip. This will cause pipe to fall from 75 to 75 - 50*0.5 = 75 - 25 = 50
packets during that period, at which point (pipe > ssthresh) becomes false
and the connection will follow the PRR-CRB path to match the sending
process to the delivery process (packet conservation) to keep pipe matching
ssthresh. So the sender's rate was inconsistent: for 50 SACKs it sends at 1
packet for every 2 packets delivered (SACKed); then for 25 SACKs it sends
at 1 packet for every 1 packet delivered (SACKed). So we don't meet the
goal of making "pipe" transition smoothly and consistently from its initial
value to ssthresh.

What we want instead: the in-flight data (pipe) progressing smoothly from
75 to 50 over the course of the full round trip, with the 75 packets SACKed
mapping smoothly into 50 packets transmitted, a ratio of 50 packets send
for 75 packets delivered, or a sent/delivered ratio of 50/75, or 0.666.

So what we want is: initializing with RecoverFS = pipe, so we have :
   target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
                      = CEIL(prr_delivered * 50 / 75)
                      = CEIL(prr_delivered * 0.666)

That should achieve the goal of sending 50 packets for 75 packets
delivered, or a sent/delivered ratio of 50/75, or 0.666, aka sending 2
packets for every 3 packets SACKed. In particular, at the end of the round
trip time we'll have:

   target_sent_so_far = CEIL(prr_delivered * 50 / 75)
                      = CEIL(75 * 50 / 75)
                      = 50

Hopefully that illustrates why, for the target_sent_so_far to smoothly rise
to ssthresh at the end of the first round in recovery, RecoverFS should be
initialized to pipe.

The difference between the current initialization (RecoverFS = snd.nxt -
snd.una) and the proposed initialization (RecoverFS = pipe) would probably
be small in the typical case. But in cases like this where the sender has
detected reordering and is therefore allowing many SACKed packets before
entering recovery, AFAICT the difference could be significant.

Best regards,
neal




>
>
>>
>> What am I missing? :-)
>>
>> best regards,
>> neal
>>
>>
>>
>>
>>> On Mon, Apr 17, 2023 at 11:23 AM Neal Cardwell <ncardwell@google.com>
>>> wrote:
>>>
>>>> Regarding this line in draft-ietf-tcpm-prr-rfc6937bis-03:
>>>>
>>>>    RecoverFS = snd.nxt - snd.una // FlightSize right before recovery
>>>>
>>>> AFAICT this should be:
>>>>
>>>>   RecoverFS = pipe  // RFC 6675 pipe algorithm
>>>>
>>>> Rationale: when recovery starts, often snd.nxt - snd.una includes 1 or
>>>> more lost packets above snd.una and 3 or more SACKed packets above that;
>>>> those packets are not really in the pipe, and not really in the FlightSize.
>>>>
>>>> With the draft as-is, packets that were SACKed on ACKs that happened
>>>> before entering fast recovery are incorporated in RecoverFS (snd.nxt -
>>>> snd.una) but never in prr_delivered (since that is set to 0 upon entering
>>>> fast recovery), so at the end of fast recovery the expression:
>>>>
>>>>   CEIL(prr_delivered * ssthresh / RecoverFS)
>>>>
>>>> can be quite far below ssthresh, for very large numbers of packets
>>>> SACKed before entering fast recovery (e.g., if the reordering degree is
>>>> large).
>>>>
>>>> AFAICT that means that at the end of recovery the cwnd could be quite
>>>> far below ssthresh, to the same degree, resulting in the cwnd being less
>>>> than what congestion control specified when the connection entered fast
>>>> recovery.
>>>>
>>>> AFAICT switching to RecoverFS = pipe fixes this, since it means that
>>>> RecoverFS only includes packets in the pipe when the connection enters fast
>>>> recovery, and thus prr_delivered can eventually reach RecoverFS, so tha) t
>>>> the target number of packets sent (CEIL(prr_delivered * ssthresh /
>>>> RecoverFS) can fully reach ssthresh.
>>>>
>>>> Apologies if I'm missing something or this has already been discussed.
>>>>
>>>> best regards,
>>>> neal
>>>>
>>>>