[tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt

Markku Kojo <kojo@cs.helsinki.fi> Tue, 25 June 2024 14:21 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D5D67C14F616 for <tcpm@ietfa.amsl.com>; Tue, 25 Jun 2024 07:21:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_TEMPERROR=0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I9nPkD4fBFGy for <tcpm@ietfa.amsl.com>; Tue, 25 Jun 2024 07:21:39 -0700 (PDT)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) by ietfa.amsl.com (Postfix) with ESMTP id B8E6DC14CF15 for <tcpm@ietf.org>; Tue, 25 Jun 2024 07:21:37 -0700 (PDT)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Tue, 25 Jun 2024 17:21:35 +0300
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=JEQ6eq hPHvUXY1gcfYSPyCawrnHVQWpGp1ammtFNX0g=; b=aXmaZ4+GvI0jOK6FKbdVqp 8hRdowdTKTWmbPsT6g4u6q4DgnseUUrGDEcvDoNzezYPv7BmFm9xW0RdiCUB0mAd HoDBTTX523gLqhyKxf22V0wqh/uZa2Af6S68m4izcYQnfjOaKq9fE9E9cwxCKRyv 3dFvocwOyKhqHIRHXbfjc=
Received: from hp8x-60.cs.helsinki.fi (85-76-110-198-nat.elisa-mobile.fi [85.76.110.198]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Tue, 25 Jun 2024 17:21:34 +0300 id 00000000005A2730.00000000667AD26E.00000C38
Date: Tue, 25 Jun 2024 17:21:33 +0300
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Yoshifumi Nishida <nsd.ietf@gmail.com>
In-Reply-To: <CAAK044TT93bBTkF_J8HgXxLNGnp+LLZvySf+4TOdsc=_yD8HKQ@mail.gmail.com>
Message-ID: <9e3efc1f-e6e-8edb-d354-f442d78967ef@cs.helsinki.fi>
References: <170896098131.16189.4842811868600508870@ietfa.amsl.com> <CADVnQy=rvCoQC0RwVq=P2XWFGPrXvGKvj2cAooj94yx+WzXz3A@mail.gmail.com> <8e5f0a7-b39b-cfaa-5c38-edeb9916bef6@cs.helsinki.fi> <CADVnQynR99fQjWmYj-rYZ4nZxYS=-O7zbfWjJLMxd5Lqcpwgcg@mail.gmail.com> <CAAK044SJWsvqWf+Tt3wUeGpMRH6aVg175CFUBrsz_YyhDsKYwQ@mail.gmail.com> <CADVnQy=0yhx9U-ogVX_Dh66fZWGyMzqtWAgSfaYXX-6ppGx=Kg@mail.gmail.com> <CAAK044So4qO4zma-qdYbMrJcNVdRi1-o30wcP7QjKLuT2c_Zuw@mail.gmail.com> <CADVnQynvFY24cNfe4tfm6FMY47UaMPjxrtD5RwWhCPK6EH4=6w@mail.gmail.com> <CAAK044QWL2xBvg4S_cvHFE_iTddSmnEOkhu33pftvUiUCCGZ_g@mail.gmail.com> <44e1fff-f915-fdd-6c8e-4cd1cc3a9df3@cs.helsinki.fi> <CAAK044TT93bBTkF_J8HgXxLNGnp+LLZvySf+4TOdsc=_yD8HKQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-3152-1719325295-0001-2"
Content-ID: <1c96821-5393-4cc0-1f78-8c6742755e5e@cs.helsinki.fi>
Message-ID-Hash: 2CRSVEKDE236IWMAY2LWHUOTMKWO6PUI
X-Message-ID-Hash: 2CRSVEKDE236IWMAY2LWHUOTMKWO6PUI
X-MailFrom: kojo@cs.helsinki.fi
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Matt Mathis <ietf@mattmathis.net>, Matt Mathis <mattmathis@measurementlab.net>, tcpm@ietf.org
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/k5wq9LZj2Obxe2K983k2mh-qJFM>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>

Hi Yoshi,

please see inline tagged [MK2].

On Mon, 1 Apr 2024, Yoshifumi Nishida wrote:

> Hi Markku,
> 
> Thanks for the detailed comments. I put my comments in lines on the two points.
> 
> On Wed, Mar 27, 2024 at 7:54 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
>       Hi Yoshi, Neal, all,
>
>       Please see below inline (tagged [MK]). And my apologies for a very long
>       explanation. Hopefully I did not include too many mistakes this time ;)
>
>       In summary, it seems that we do not need to reset cwnd in the end of the
>       recovery nor adjust RecoverFS necessarily because all issues raised could
>       be resolved by simply correcting the definition of DeliveredData
>       (reverting back the original definition + small additional change) and
>       moving the actions to take with the ACK that triggers loss recovery to
>       the unconditional steps to be taken together with the initialization of
>       the algo in the beginning (this would also be in line with how the other
>       fast recovery algorithms are described in the RFC series).
>
>       Hopefully I did not misunderstand any parts of the algo (either in RFC
>       6937 or in the current -08 version of the draft).
>
>       On Thu, 21 Mar 2024, Yoshifumi Nishida wrote:
>
>       > Hi Neal,
>       > On Wed, Mar 20, 2024 at 1:32 PM Neal Cardwell <ncardwell@google.com> wrote:
>       >
>       >
>       > On Wed, Mar 20, 2024 at 3:29 PM Yoshifumi Nishida <nsd.ietf@gmail.com> wrote:
>       >       Hi Neal,
>       >
>       > On Wed, Mar 20, 2024 at 6:55 AM Neal Cardwell <ncardwell@google.com> wrote:
>       >
>       >       On Wed, Mar 20, 2024 at 3:07 AM Yoshifumi Nishida <nsd.ietf@gmail.com> wrote:
>       >
>       > On Mon, Mar 18, 2024 at 8:13 PM Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>
>       > wrote:
>       >
>       > But your point still stands, and you raise a great point: simply initializing
>       > RecoverFS to "pipe" is not safe, because packets that were marked lost and removed
>       > from pipe may actually have been merely reordered. So if those packets are
>       > delivered later, they will increase the numerator of prr_delivered / RecoverFS
>       > without increasing the denominator, thus leading to a result above 1.0, and thus
>       > potentially leading to a target for Sent_so_far that is above ssthresh, causing the
>       > algorithm to erroneously exceed ssthresh.
>       >
>       >
>       > Hmm. I have a native question here. If packets were merely reordered, isn't it ok for
>       > cwnd to be bigger than ssthresh?
>       >
>       >
>       > Yes, if packets were merely reordered and none were lost, then I agree it would be OK for
>       cwnd
>       > to be bigger than ssthresh. And in fact I would argue that cwnd should be reverted back to
>       its
>       > value before fast recovery. And this is in fact what Linux TCP would do, using loss recovery
>       > "undo" mechanisms based on TCP timestamps or DSACKs.
>       >
>       > However, in the kind of scenario Markku described, there was not merely reordering, but also
>       > real packet loss: "1 packet is lost (P1), and 24 packets are delayed (packets P2..P25)". In
>       the
>       > traditional loss-based Reno/CUBIC paradigm, any non-zero amount of real packet loss in fast
>       > recovery should result in the same multiplicative decrease in cwnd, regardless of the
>       > combination of reordering and loss. We could argue about whether that approach is the best
>       > approach (BBR, for example, takes a different approach), but that is a very different
>       > discussion. :-) For now AFAICT we are focused on PRR's faithful enactment of the congestion
>       > control algorithms decision to reduce cwnd toward ssthresh when there is any non-zero amount
>       of
>       > real packet loss in fast recovery.
>       >
>       >
>       > Got it. But, I just would like to clarify whether we are discussing the inflation of
>       sndcnt during
>       > the recovery process or cwnd after the exit of recovery.
>       >
>       >
>       > Good point. We are talking about inflation of sndcnt during the recovery process.
>
>       [MK] I think we are talking about both in practice because inflation of
>       sndcnt during the recovery process would also result in exiting recovery
>       with too big cwnd. In the examples that I gave the segments sent_so_far
>       was calculated when the SACK for P100 had arrived (actually the numbers
>       were off by one):
>
>       For Reno:
>
>       Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>                    = CEIL(97 * 50 / 72)
>                    = 68
>
>       For CUBIC:
>       Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
>                    = CEIL(97 * 70 / 72)
>                    = 95
>
>       Now, when the cumulative ACK triggered by rexmit of P1 arrives and
>       terminates fast recovery, the following is executed as per the *current
>       version* of the draft:
>
>       DeliveredData =  (bytes newly cumulatively acknowledged) = 100
>       DeliveredData += (bytes newly selectively acknowledged) = 100 + 0
>
>       prr_delivered += DeliveredData = 95 + 100 = 195
>       pipe = 94
>       if (pipe > ssthresh) => (94 > 70) => (true)
>         sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out
>                = CEIL(195*70/72) = 190 - 95 = 95
>       cwnd = pipe + sndcnt = 94 + 95 = 189
>
>       So oops, when exiting fast recovery cwnd would be nearly doubled from
>       what it was before entering loss recovery. It seems that there
>       is an additional problem because the definition of DeliveredData in the
>       current version of the draft is incorrect; the cumulatively acked bytes
>       that have already been SACKed are counted twice in DeliveredData. It
>       seems that RFC 6937 and rfc6937bis-01 both define DeliveredData (nearly)
>       correctly by including the change in snd.una in DeliveredData and
>       substracting data that has already been SACKed. The definition of
>       DeliveredData obviously needs to be corrected. See also below the issue
>       with bytes SACked that are above snd.una but get SACKed before the start
>       of recovery.
>
>       With the original definition of DeliveredData cwnd would not be
>       inflated further but fast recovery would still exit with too big cwnd
>       (For CUBIC cwnd=95, instead of 70, and for Reno cwnd=68, instead of 50),
>       if we use too small RecoveryFS (=Pipe)
>
>       So, it seems that we agree that the problem of sending too many bytes
>       during the recovery process gets corrected if RecoverFS is initialized to
>       snd.nxt - snd.una. The next question is, should RecoverFS be initialized
>       to even higher value in some scenarios? See below.
>
>       > Because I'm personally not very keen to address miscalculation of lost packets due to
>       reordering
>       > during the recovery process as it seems to be tricky. 
>       >
>       >
>       > It is tricky, but I think it is feasible to address. What do folks think about my suggestion
>       from above in
>       > this thread:
>       >
>       >   existing text:
>       >      pipe = (RFC 6675 pipe algorithm)
>       >      RecoverFS = pipe              // RFC 6675 pipe before recovery
>       >
>       >   proposed new text:
>       >      RecoverFS = snd.nxt - snd.una + (bytes newly cumulatively acknowledged)
>       >
>       >
>       > Hmm. Sorry. I'm not very sure about the difference between snd.nxt - snd.una and snd.nxt -
>       snd.una + (bytes newly
>       > cumulatively acknowledged)
>       > Could you elaborate a bit? I thought we don't have data which are cumulatively acked in case
>       of reordering.
>
>       [MK] It seems there might be another case that Neil is thinking where the
>       sender may end up sending too many segments during the first RTT in fast
>       recovery. If I understood it correctly this may occur in a scenario
>       with ACK loss for pkts preceeding the first dropped data pkt, for
>       example. Consider the following scenario where there are 100 pkts
>       outstanding
>
>         P1..P24, P25, P26, P27, P28..P100
>
>       Packets P1..P24 and P26..P100 are delivered succesfully to the
>       receiver. P25 is lost. ACKs (and SACKs) for pkts P1..P24, P26 and P27 get
>       dropped. SACKs for P28..P100 are delivered successfully. When SACK
>       for pkt P28 arrives, an RFC 6675 sender would declare P25 is lost, and
>       enter fast retransmit. Similarly, a RACK-TLP sender may declare P25 lost,
>       but this may happen with any of SACKs P28..P100 arriving.
>
>       Let's assume we were fully utilizing congestion window, i.e., cwnd=100
>       and we enter loss recovery when the SACK of P28 arrives (cumulative
>       ACK#=25):
>
>       ssthresh = cwnd / 2 = 50  (Reno)
>       prr_delivered = prr_out = 0
>       Pipe = snd.nxt - snd.una - (lost + SACKed) = 76 - (1 + 3) = 72
>       RecoverFS = snd.nxt - snd.una = 101 - 25 = 76
>
>       DeliveredData = (bytes newly cumulatively acknowledged) = 24
>       DeliveredData += change_in(SACKd) = 24+3 = 27
>       prr_delivered += DeliveredData = 0+27 = 27
>
>       if (pipe > ssthresh) => if (72 > 50) => true
>              // Proportional Rate Reduction
>              sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out
>                     = CEIL(27 * 50 / 76) = 19 - 0 = 19
>
>       cwnd = 72 + 19 = 91
>
>       so, we will send a burst of 19 pkts on entry to recovery and during the
>       rest of the recovery around 49 pkts, giving a total of 19+49=68 pkts
>       while only 50 was allowed. If we add 24 cumulatively acked pkts into
>       RecoverFS like Neil suggests, we are about to send around 14+37=51 pkts
>       which is almost fine. However, the major shortcoming of this approach is
>       that we'll still send a burst of 14 pkts in the beginning of the recovery
>       while avoiding such a burst was one of the major goals of PRR.
>
>       Alternatively we could modify the algo such that the cumulatively acked
>       bytes with the ACK that triggers loss recovery are not added to
>       DeliveredData nor in RecoverFS.
> 
> 
> I have thought about similar things. I'm thinking that this might be a minor point for now.
> My personal thoughts on this are:
> * The case you presented presumes huge ack losses before entering recovery and no ack loss after that. this
> doesn't look a very common case.

[MK2]: I am not sure if this is a minor point.
First, it does not need a huge number of ack losses before entering 
recovery. The severity of miscalculation depends on the number of lost 
acks.
Second, there is no requirement for no ack loss after that. There 
might or might not be further ack losses. The authors (and others) have 
reported that ack loss is a very common phenomenon, so I think we cannot 
consider it uncommon.

> * At any point of recovery, the inflation of DeliveredData can happen due to ack losses or other reasons
> I'm not very sure creating a special handling only for the first ACK is effective.

[MK2]: Sure, sending a burst at any point of recovery is common to all 
Fast Recovery algos not only PRR. But PRR justifies itself (for a good 
reason) by avoiding bursts in the beginning of the loss recovery. Such 
a burst is very bad because it is likely to result in loss of 
retransmitted data that would be always present with segment(s) 
transmitted in the beginning of the recovery. Particularly with PRR, 
sending a burst in the beginning is bad because PRR does not have a pause 
in sending segments in the beginning like NewReno or RFC 6675 would have 
(i.e., with PRR, there is no similar opportunity for the bottleneck queue 
to drain before retransmitted segments start to flow into the congested 
queue).

> * As long as there's no sudden increase of DeliveredData, I guess both logics behave mostly the same. So, I
> think a question would be how much we care about this kind of situations. It seems to me that this looks
> minor case.

[MK2]: I don't think they are mostly the same. My proposal would avoid a 
burst in the beginning and solve in a simple way also the other 
miscalculation problems which ignore a number of Acks as I have pointed 
out. It would also be consistent with the other Fast Retransmit & Fast 
Recovery algos in the RFC series that by definition always handle Fast 
Retransmit as a separate and unconditional step at the entry to loss 
recovery.
PRR algo as curently described is also inefficient as it includes an 
unnecessary step to check whether to send out the Fast Retransmit which 
can only happen in the beginning of the recovery. I doubt hardly any 
implementor would like to include an unnecessary condition check to be 
executed repeatedely on arrival of each Ack during the recovery while the 
condition can be true only with the first Ack that triggers the loss 
recovery?

>       Then we would send just one pkt (rexmit of
>       P1) on entering the recovery and during the rest of recovery around 49
>       pkts, i.e., 1+49=50 pkts during the recovery, which would be exactly equal
>       to ssthresh we set. With this approach we could avoid the burst in the
>       beginning.  In addition we could  have a consistent solution also for the
>       additional problem of including extra SACKed data with the ACK that
>       triggers the recovery. Let's look at the above scenario again cwnd=100
>       and pkts P1..P100 in flight:
>
>         P1..P24, P25, P26, P27, P28..P100
>
>       Packet P1..P24 are delivered to the receiver but ACKs get dropped (whether
>       ACKs are dropped or not is not relevant for this issue). P25 gets
>       dropped. If the DupAcks of pkt P26 and pkt P27 are delivered, from the
>       DupAck of P28 only the SACK info for P28 is counted in DeliveredData but
>       the SACK info for P26 and P27 are never counted in DeliveredData because
>       P26 and P27 are already SACKed when the DupAck of P28 arrives. However,
>       if the DupAcks of pkt P26 and pkt P27 get dropped as in the previous
>       example, the ACK of P28 includes new SACK info for pkts P26, P27, and
>       P28 and the bytes of P26 and P27 are also counted in DeliveredData. (If
>       also DupAck of P28 gets dropped, the DupAck of P29 may include up to 3
>       MSS of additional SACK info to be counted (P26, P27, and P28). This alone
>       will result in a miniburst in the beginning of the recovery or add to the
>       burst size as in the previous example where the two additinal SACKs (for
>       P26 and P27) inflated prr_delivered by 2, resulting in slightly too large
>       number of segments sent during the recovery (51).
>
>       As suggested above, this problem with additional SACKs would be solved
>       such that the DupAck that triggers the loss recovery is allowed to add
>       only "itself" into DeliveredData and let PRR to include the missing bytes
>       for pkts that were SACKed before the start of the recovery only at the
>       end of the recovery when the cumulative ACK for the first pkt (P1)
>       arrives and inherently covers those bytes.
>
>       In other words, the algo can be modified such that fast retransmit is
>       always handled separately in the beginning of the recovery together with
>       the initialization of the PRR variables:
>
>          ssthresh = CongCtrlAlg()      // Target flight size in recovery
>       //[MK]: the next three lines can be deleted as unnecessary
>          prr_delivered = 0             // Total bytes delivered in recovery
>          prr_out = 0                   // Total bytes sent in recovery
>          pipe = (RFC 6675 pipe algorithm)
>
>          RecoverFS = snd.nxt - snd.una // FlightSize right before recovery
>                                        // [MK]:maybe add cumulatively ACKed
>                                        //      bytes?
>
>          Fast Retransmit the first missing segment
>          prr_delivered  = (With SACK: bytes selectively acknowledged by the first
>                            SACK block of the ACK triggering the loss recovery, OR
>                            Without SACK: 1 MSS)
>          prr_out  = (data fast retransmitted)
>
>       On each arriving ACK during the rest of the fast recovery, including the
>       final cumulative ACK that signals the end of loss recovery:
>
>          DeliveredData = change_in(snd.una)
>          if (SACK is used) {
>             DeliveredData += change_in(SACKd) //[MK]:(*) modify change_in(SACKd)
>          ...
> 
>
>       The above changes would imply deleting
>
>         if (prr_out is 0 AND sndcnt is 0) {
>              // Force a fast retransmit upon entering recovery
>              sndcnt = MSS
>
>       from the algo and would make it consistent with the description of the
>       other fast retransmit & Fast recovery algorithms (RFC 5681, RFC 6582, RFC
>       6675) that include fast retransmit together with the initialization of
>       the algo in the unconditional first steps of the algorithm.
>
>       (*)
>       In addition, one more smallish but important correction is needed. The
>       bytes that are SACKed before the recovery starts (i.e., typically the
>       famous first two DupAcks or more bytes if the start of recovery is
>       postponed due to reordering) should be taken into account in the
>       DeliveredData during the recovery but with the current algo they
>       are never counted in DeliveredData (and prr_delivered).
>       Why? Because when the first cumulative ACK arrives, it advances snd.una
>       such that those bytes are covered but change_in(SACKd) is negative and
>       it incorrectly substracts also these bytes from DeliveredData (and
>       prr_delivered) even though they were never counted in. Usually this is
>       only 2 MSS but in scenarios similar to one that Neil earlier introduced
>       there might be much more data bytes that are not counted. This change
>       would also solve the problem of exiting PRR with too low cwnd.
>       Let's look at Neil's earlier example again (see comments with [MK] for
>       suggested change to solve the issue):
>
>       CC = CUBIC
>       cwnd = 10
>       The reordering degree was estimated to be large, so the connection will
>       wait for more than 3 packets to be SACKed before entering fast recovery.
>
>       --- Application writes 10*MSS.
>
>       TCP sends packets P1 .. P10.
>       pipe = 10 packets in flight (P1 .. P10)
>
>       --- P2..P9 SACKed  -> do nothing //
>
>       (Because the reordering degree was previously estimated to be large.)
>
>       --- P10 SACKed -> mark P1 as lost and enter fast recovery
>
>       PRR:
>       ssthresh = CongCtrlAlg() = 7 packets // CUBIC
>       prr_delivered = 0
>       prr_out = 0
>       RecoverFS = snd.nxt - snd.una = 10 packets (P1..P10)
>
>       DeliveredData = 1  (P10 was SACKed)
>
>       prr_delivered += DeliveredData   ==> prr_delivered = 1
>
>       pipe =  0  (all packets are SACKed or lost; P1 is lost, rest are SACKed)
>
>       safeACK = false (snd.una did not advance)
>
>       if (pipe > ssthresh) => if (0 > 7) => false
>       else
>         // PRR-CRB by default
>         sndcnt = MAX(prr_delivered - prr_out, DeliveredData)
>                = MAX(1 - 0, 1)
>                = 1
>
>         sndcnt = MIN(ssthresh - pipe, sndcnt)
>                = MIN(7 - 0, 1)
>                = 1
>
>       cwnd = pipe + sndcnt
>            = 0    + 1
>            = 1
>
>       retransmit P1
>
>       prr_out += 1   ==> prr_out = 1
>
>       --- P1 retransmit plugs hole; receive cumulative ACK for P1..P10
>
>       DeliveredData = 1  (P1 was newly ACKed) //[MK]: should be = 10 - 1 = 9
>
>       //[MK]: Note that SACKed bytes of P2..P9 were also newly acked
>       //      because the bytes have not been delivered *during* the
>       //      recovery by this far and thereby not yet counted in
>       //      prr_delivered.
>       //      So, they should not be substracted from DeliveredData
>       //      but included as those bytes got delivered only when
>       //      snd.una advanced. Only P10 should be substracted.
>
>       prr_delivered += DeliveredData   ==> prr_delivered = 2
>       //[MK]: should be = 1 + 9 = 10
>
>       pipe =  0  (all packets are cumuatively ACKed)
>
>       safeACK = (snd.una advances and no further loss indicated)
>       safeACK = true
>
>       if (pipe > ssthresh) => if (0 > 7) => false
>       else
>         // PRR-CRB by default
>         sndcnt = MAX(prr_delivered - prr_out, DeliveredData)
>                = MAX(2 - 1, 1)  //[MK]  = MAX(10-1, 1)
>                = 1              //[MK]  = 9
>         if (safeACK) => true
>           // PRR-SSRB when recovery is in good progress
>           sndcnt += 1   ==> sndcnt = 2 //[MK] ==> sndcnt = 10
>
>         sndcnt = MIN(ssthresh - pipe, sndcnt)
>                = MIN(7 - 0, 2) //[MK] = MIN(7 - 0, 10)
>                = 2             //[MK] = 7
>
>       cwnd = pipe + sndcnt
>            = 0    + 2  //[MK] = 0 + 7
>            = 2         //[MK] = 7
>
>       So we exit fast recovery with cwnd=2 even though ssthresh is 7.
>
>       [MK]: Or, we exit with cwnd=7 if we correctly count in DeliveredData
>       during the recovery process all data that is in flight when the recovery
>       starts. All bytes in flight at the start of the recovery are supposed to
>       become acknowledged by the end of the recovery, so they should be counted
>       in prr_delivered during the recovery.
>
>       >             However, I think it won't be good if it's propagated to ater the recovery.  But,
>       don't we
>       >             reset cwnd to ssthresh at the end of recovery?
>
>       [MK]: It seems that just by correcting the definition of DeliveredData
>       there is no need to reset cwnd to ssthresh at the end of recovery because
>       the algo would do it for us. But I am not opposing to reset cwnd to
>       ssthresh at the end. In that case it might be better to specify it by
>       giving two alternatives similar to what RFC 6582 does. Maybe:
>
>          Set cwnd to either (1) min (ssthresh, cwnd) or (2) ssthresh.
> 
> 
> I think we have discussed this in the past discussions.
> In case of (1), cwnd can become very low when there were big losses before the recovery.
> As many implementations don't take approach and they have been used for a long time, (2) became our
> consensus.

[MK2]: Yes, setting cwnd to ssthresh has been discussed and in my 
recalling there was two differing opinions between co-authors with no 
final resolution. My apologies if there was and I missed it.
But I am not disagreeing. First, with my proposed solution this simply 
becomes a non-issue and is not needed at all. Second, if we decide to 
propose setting cwnd at the exit of the recovery I just think it would be 
not a good idea to require all implementations to do (2) but let the 
implementor to have another safe alternative as well. In case of cwnd 
being small, (1) would result in slow start that avoids a burst and would 
quickly determine the available capacity. With (2), the draft should say 
(suggest/require) something about applying some sort of pacing to avoid a 
burst, I think (such a burst is not necessarily self-evident for 
everyone).

> We probably can come up with the scenarios where this approach will be suboptimal, but I think this is a
> trade-off.

[MK2]: Right, therefore it might be good to allow the implementor to make 
the decision on this trade-off.

Thanks,

/Markku

> Thanks,
> --
> Yoshi
> 
>