Re: [tcpm] draft-ietf-tcpm-prr-rfc6937bis-03 and RecoverFS initialization

Yoshifumi Nishida <nsd.ietf@gmail.com> Mon, 22 May 2023 18:38 UTC

Return-Path: <nsd.ietf@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 23C11C14CE4B for <tcpm@ietfa.amsl.com>; Mon, 22 May 2023 11:38:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I3ZWN56bD-4C for <tcpm@ietfa.amsl.com>; Mon, 22 May 2023 11:38:09 -0700 (PDT)
Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4231FC14CE38 for <tcpm@ietf.org>; Mon, 22 May 2023 11:38:09 -0700 (PDT)
Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-19a16355c51so4579722fac.0 for <tcpm@ietf.org>; Mon, 22 May 2023 11:38:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684780688; x=1687372688; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=83CuEFUv1wtMmIAFp/mvIQu4YsfHhkxjEb7DrMkF7ag=; b=XeKGoV6+FRTRs8KdbEiLMJAhKMJUOmzcsYw7dx7eby3jUQvwIoHdkw1XOfZ4HF3Jsf RpkDzRfOb1VBOq2vT/hwk+BwGprf4fiItp47uZCd055grFmOsjZ73UoOZPH5vRikO3Bk rntPAaHKFMmUjDzjy6D4NXnougeqtkZCk0RWp6LWQSzFwUU4e5xH3BEufggtjU+8ggVM 6F8HJ8wiTAft/BUxA9HaT+0/IiY+SjlLLq4XKKH2sbxMgP12h6LzBLy7lj1aR9SjYoaa 5CcTgrJJ45XtjXoYYA/stPyjJKdCAm1Oizo84uKkKsO8zQj+Ehbg1O2BQgo3HxjZaf+i fh/A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684780688; x=1687372688; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=83CuEFUv1wtMmIAFp/mvIQu4YsfHhkxjEb7DrMkF7ag=; b=ZZc4PJo38L1GObPbNMS/9CL3B5yZDdwTQUeDXZSm9kowu5p7REAZrJlIES5t/r1AFK wcIP6oz6ZhrLQFt1/DmO8SKZDnY5wCpvTW5iiTPoUexK92G0QEhFKzhBoGxKTeN9Cq8x BUEqfkOI9gy6OcCZJMhSCbpytg5i0WX5lOG5vrUTtsRkIDFuadDgNM1uDRC8yFJirriw LcMN4ZmejLPC5mteTJmpSz/HhjTtk/7oY0xTv0v1BxDMhkp4ifWgoZET1g/V3CVPzV9J 3fkoirlnGhkw87VYeILeUFXjG+k/ggIk1iQrqj6gPSUmavC5BRmv76aymFbMbe+7cVdt mUAg==
X-Gm-Message-State: AC+VfDxwEDCIt9URSU2m+UOuHj0+EvWR3ZLsMUpcoTb/qjzMoNPdD32o QHbL4j70430Dt4i4+O8x0eQYCGZlNwyVenGe9qk=
X-Google-Smtp-Source: ACHHUZ5RE3SGbyCssWNfS2Reo/9z5GtAf8NmkZ8TRDPcwhHbsfFQO6GYk9xr1jCslCgPRZawL4f5xo4Ycy7v/UGF7pA=
X-Received: by 2002:a05:6870:9512:b0:195:fd17:7554 with SMTP id u18-20020a056870951200b00195fd177554mr6535436oal.48.1684780688273; Mon, 22 May 2023 11:38:08 -0700 (PDT)
MIME-Version: 1.0
References: <CADVnQy=rbTc1rb5PKA1mvSJm61UTb=T5xzOkMBBB2Yadoe691A@mail.gmail.com> <CAK6E8=ckFHoiRTmLEy6ZH8z2ovv9+7S_UzUqnO3W4xcumyA1Gg@mail.gmail.com> <CADVnQyk7nxmaoTHh5qo9XvhrWojoB2R78FK0zX5CcwoZq6c=hg@mail.gmail.com> <CAK6E8=cXXWfHd+T3GkDEhJ6TmbstygL=qD4nns3w50DTe2eaZw@mail.gmail.com> <CADVnQy=Q5cvN_+Fa0rbNc2a_Aqe=haROOd4SNpk9TbvE1MXVvQ@mail.gmail.com> <CAAK044RnWkzjJAuwQpc6eir-sss8jqhgSEc5srkrdEtdtEKSjA@mail.gmail.com> <CAK6E8=fomoH3NfmZfvsn1jUODMfAQJ-Ep5h81g4Aed6FYYN6Eg@mail.gmail.com>
In-Reply-To: <CAK6E8=fomoH3NfmZfvsn1jUODMfAQJ-Ep5h81g4Aed6FYYN6Eg@mail.gmail.com>
From: Yoshifumi Nishida <nsd.ietf@gmail.com>
Date: Mon, 22 May 2023 11:37:57 -0700
Message-ID: <CAAK044RNkUkAFh-2jWymQWFe2fY90F-Z-QbCSSeZEGk2=Sc7bw@mail.gmail.com>
To: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>, Matt Mathis <mattmathis@measurementlab.net>, tcpm <tcpm@ietf.org>, Nandita Dukkipati <nanditad@google.com>
Content-Type: multipart/alternative; boundary="00000000000072f81e05fc4c920e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/ptv75gUt9mE-kSk9uLcVz6HPzaU>
Subject: Re: [tcpm] draft-ietf-tcpm-prr-rfc6937bis-03 and RecoverFS initialization
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 May 2023 18:38:11 -0000

Hi Yuchung,

Thank you so much! OK, we will wait for the updates.
--
Yoshi

On Mon, May 22, 2023 at 10:09 AM Yuchung Cheng <ycheng@google.com> wrote:

> Hi Yoshifumi,
>
> Sorry for the radio silence. Neal will help co-author and update the draft
> as he has many insights. We'll provide an update soon, hopefully we can
> move forward before next meeting in SF.
>
> On Sun, May 21, 2023 at 11:34 PM Yoshifumi Nishida <nsd.ietf@gmail.com>
> wrote:
>
>> Hello,
>>
>> Just in case, as this discussion has been quiet for a while..
>> I personally think what Neal mentions seems to make sense although I'm
>> not very sure which approach is better.
>> I hope this part will be addressed in the updated version of the draft.
>> --
>> Yoshi
>>
>> On Wed, Apr 19, 2023 at 7:20 PM Neal Cardwell <ncardwell=
>> 40google.com@dmarc.ietf.org> wrote:
>>
>>>
>>>
>>> On Tue, Apr 18, 2023 at 7:35 PM Yuchung Cheng <ycheng@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Apr 17, 2023 at 2:00 PM Neal Cardwell <ncardwell@google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 17, 2023 at 4:13 PM Yuchung Cheng <ycheng@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Neal,
>>>>>>
>>>>>> That's a good point and it was considered in the early stage of PRR.
>>>>>> We picked FlightSize (=snd.nxt - snd.una) to ensure ssthresh/RecoverFS
>>>>>> faithfully reflects the proportion of the congestion control window
>>>>>> reduction: RFC5681 still use FlightSize to compute ssthresh. But  some TCP
>>>>>> or specific C.C.s may use either cwnd (e.g. Linux cubic/reno) or pipe.  How
>>>>>> about a small graph:
>>>>>>
>>>>>> "If a TCP or congestion control implementation uses cwnd or pipe
>>>>>> instead of FlightSize to compute ssthresh, then RecoverFS should use the
>>>>>> specific metric accordingly, i.e. cwnd right before recovery"
>>>>>>
>>>>>
>>>>> AFAICT that analysis is conflating two different issues:
>>>>>
>>>>> (1) How does the congestion control compute ssthresh (based on cwnd or
>>>>> pipe or FlightSize?) You rightly point out that approaches vary for this
>>>>> part.
>>>>>
>>>>> (2) How does PRR determine what fraction of outstanding packets have
>>>>> been delivered (aka prr_delivered / RecoverFS)?
>>>>>
>>>>> AFAICT to get the right answer for the (2) question, RecoverFS should
>>>>> be initialized to "pipe", no matter what approach the CC takes for
>>>>> answering (1).
>>>>>
>>>>> My understanding of PRR is that if (pipe > ssthresh) is true, then the
>>>>> algorithm is doing Proportional Rate Reduction, and is essentially
>>>>> computing:
>>>>>
>>>>>       sndcnt ~= (target data sent in recovery)            - (actual
>>>>> data sent in recovery)
>>>>>       sndcnt ~= (fraction of data delivered) * ssthresh   - prr_out
>>>>>       sndcnt ~= (prr_delivered / RecoverFS ) * ssthresh   - prr_out
>>>>>
>>>>> For the (target data sent in recovery) to equal ssthresh at the end of
>>>>> the first round in recovery the algorithm must reach the point where
>>>>> prr_delivered == RecoverFS, so that (prr_delivered / RecoverFS ) is 1.
>>>>> Since  prr_delivered can only reach as high as "pipe" at the start of
>>>>> recovery, to be able to reach that condition we need to have RecoverFS ==
>>>>> "pipe". If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
>>>>> prr_delivered won't be able to match RecoverFS and the (target data sent in
>>>>> recovery) won't reach ssthresh, and the algorithm will undershoot (the cwnd
>>>>> won't reach the ssthresh specified by congestion control, however it
>>>>> calculated that).
>>>>>
>>>> I still can't parse your analysis after reading it multiple times.
>>>>
>>>> "prr_delivered can only reach as high as "pipe" at the start of
>>>> recovery" --> prr_delivered is initiated to 0 at the start of the recovery?
>>>> "If RecoverFS is (snd.nxt - snd.una) then RecoverFS is too big, and
>>>> prr_delivered won't be able to match RecoverFS" --> why is RecoverFS too
>>>> big and prr_delivered won't reach it.
>>>>
>>>> I am not saying RecoverFS initiated to "pipe" is wrong. I just don't
>>>> see a substantial difference between FlightSize vs pipe, unless the
>>>> FlightSize/cwnd is small and/or limited-transmits were not used.
>>>>
>>>> maybe you can walk an example with FlightSize vs pipe...
>>>>
>>>
>>> Discussing a concrete example is a good idea!
>>>
>>> Here's an example, sketching the behavior with
>>> draft-ietf-tcpm-prr-rfc6937bis-03, AFAICT from trying to execute the
>>> example by hand:
>>>
>>> CC = Reno
>>>
>>> cwnd = 100 packets
>>>
>>> The application writes 100*MSS.
>>>
>>> TCP sends 100 packets.
>>>
>>> In this example, to make the effects more clear, the TCP sender has
>>> detected reordering with RACK-TLP or some other technique, so does not
>>> enter fast recovery on the third SACKed packet, but rather waits a while to
>>> accumulate more SACKs.
>>>
>>> From the flight of 100 packets, 1 packet is lost (P1), and 24 packets
>>> are SACKed (packets P2..P25).
>>>
>>> We enter fast recovery with PRR.
>>>
>>> RecoverFS = snd.nxt - snd.una = 100
>>>
>>> ssthresh = cwnd / 2 = 50  (Reno)
>>>
>>> pipe = snd.nxt - snd.una - (lost + SACKed) = 100 - (1 + 24) = 75 packets
>>>
>>> The expression (pipe > ssthresh) is true for a number of consecutive
>>> SACKs, so we use the PRR code path repeatedly for a while as SACKs stream
>>> in for P26..P100.
>>>
>>> Given the PRR code path math, in general, the target number of packets
>>> sent so far in recovery will be:
>>>
>>>    target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>>>                       = CEIL(prr_delivered * 50 / 100)
>>>                       = CEIL(prr_delivered * .5)
>>>
>>> What happens: This will cause the sender to send 1 packet for every 2
>>> packets delivered (SACKed). Specifically, the connection will send 1 packet
>>> for every 2 packets SACKed for the first 50 packets SACKed of the round
>>> trip. This will cause pipe to fall from 75 to 75 - 50*0.5 = 75 - 25 = 50
>>> packets during that period, at which point (pipe > ssthresh) becomes false
>>> and the connection will follow the PRR-CRB path to match the sending
>>> process to the delivery process (packet conservation) to keep pipe matching
>>> ssthresh. So the sender's rate was inconsistent: for 50 SACKs it sends at 1
>>> packet for every 2 packets delivered (SACKed); then for 25 SACKs it sends
>>> at 1 packet for every 1 packet delivered (SACKed). So we don't meet the
>>> goal of making "pipe" transition smoothly and consistently from its initial
>>> value to ssthresh.
>>>
>>> What we want instead: the in-flight data (pipe) progressing smoothly
>>> from 75 to 50 over the course of the full round trip, with the 75 packets
>>> SACKed mapping smoothly into 50 packets transmitted, a ratio of 50 packets
>>> send for 75 packets delivered, or a sent/delivered ratio of 50/75, or 0.666.
>>>
>>> So what we want is: initializing with RecoverFS = pipe, so we have :
>>>    target_sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>>>                       = CEIL(prr_delivered * 50 / 75)
>>>                       = CEIL(prr_delivered * 0.666)
>>>
>>> That should achieve the goal of sending 50 packets for 75 packets
>>> delivered, or a sent/delivered ratio of 50/75, or 0.666, aka sending 2
>>> packets for every 3 packets SACKed. In particular, at the end of the round
>>> trip time we'll have:
>>>
>>>    target_sent_so_far = CEIL(prr_delivered * 50 / 75)
>>>                       = CEIL(75 * 50 / 75)
>>>                       = 50
>>>
>>> Hopefully that illustrates why, for the target_sent_so_far to smoothly
>>> rise to ssthresh at the end of the first round in recovery, RecoverFS
>>> should be initialized to pipe.
>>>
>>> The difference between the current initialization (RecoverFS = snd.nxt -
>>> snd.una) and the proposed initialization (RecoverFS = pipe) would probably
>>> be small in the typical case. But in cases like this where the sender has
>>> detected reordering and is therefore allowing many SACKed packets before
>>> entering recovery, AFAICT the difference could be significant.
>>>
>>> Best regards,
>>> neal
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> What am I missing? :-)
>>>>>
>>>>> best regards,
>>>>> neal
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Mon, Apr 17, 2023 at 11:23 AM Neal Cardwell <ncardwell@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Regarding this line in draft-ietf-tcpm-prr-rfc6937bis-03:
>>>>>>>
>>>>>>>    RecoverFS = snd.nxt - snd.una // FlightSize right before recovery
>>>>>>>
>>>>>>> AFAICT this should be:
>>>>>>>
>>>>>>>   RecoverFS = pipe  // RFC 6675 pipe algorithm
>>>>>>>
>>>>>>> Rationale: when recovery starts, often snd.nxt - snd.una includes 1
>>>>>>> or more lost packets above snd.una and 3 or more SACKed packets above that;
>>>>>>> those packets are not really in the pipe, and not really in the FlightSize.
>>>>>>>
>>>>>>> With the draft as-is, packets that were SACKed on ACKs that happened
>>>>>>> before entering fast recovery are incorporated in RecoverFS (snd.nxt -
>>>>>>> snd.una) but never in prr_delivered (since that is set to 0 upon entering
>>>>>>> fast recovery), so at the end of fast recovery the expression:
>>>>>>>
>>>>>>>   CEIL(prr_delivered * ssthresh / RecoverFS)
>>>>>>>
>>>>>>> can be quite far below ssthresh, for very large numbers of packets
>>>>>>> SACKed before entering fast recovery (e.g., if the reordering degree is
>>>>>>> large).
>>>>>>>
>>>>>>> AFAICT that means that at the end of recovery the cwnd could be
>>>>>>> quite far below ssthresh, to the same degree, resulting in the cwnd being
>>>>>>> less than what congestion control specified when the connection entered
>>>>>>> fast recovery.
>>>>>>>
>>>>>>> AFAICT switching to RecoverFS = pipe fixes this, since it means that
>>>>>>> RecoverFS only includes packets in the pipe when the connection enters fast
>>>>>>> recovery, and thus prr_delivered can eventually reach RecoverFS, so tha) t
>>>>>>> the target number of packets sent (CEIL(prr_delivered * ssthresh /
>>>>>>> RecoverFS) can fully reach ssthresh.
>>>>>>>
>>>>>>> Apologies if I'm missing something or this has already been
>>>>>>> discussed.
>>>>>>>
>>>>>>> best regards,
>>>>>>> neal
>>>>>>>
>>>>>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>>>
>>