[tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt
Yoshifumi Nishida <nsd.ietf@gmail.com> Fri, 05 July 2024 08:39 UTC
Return-Path: <nsd.ietf@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A4904C14CE45 for <tcpm@ietfa.amsl.com>; Fri, 5 Jul 2024 01:39:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QNpJrDkA--Yw for <tcpm@ietfa.amsl.com>; Fri, 5 Jul 2024 01:39:45 -0700 (PDT)
Received: from mail-vk1-xa2e.google.com (mail-vk1-xa2e.google.com [IPv6:2607:f8b0:4864:20::a2e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B217CC14F73E for <tcpm@ietf.org>; Fri, 5 Jul 2024 01:39:45 -0700 (PDT)
Received: by mail-vk1-xa2e.google.com with SMTP id 71dfb90a1353d-4f2e13622e4so454330e0c.3 for <tcpm@ietf.org>; Fri, 05 Jul 2024 01:39:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720168784; x=1720773584; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DEn/eo+IUm2nGwxAJHdVnSBRQCpes01zjw0WTw2/nxc=; b=csHcpfcUuZG9XePdS2cOzJEGdGB0+0PBNDi0SHb/goXkIFGdr51kyn5I3R8aeEGzXg otqTdO3S7WhuVPLA67LjN0Y2N+KeYGPLHUiYodYJLIc4llCRpST4Y1LbylKzin8PbRN9 hXAhjnIwipMqB6zeWk3cWIsrNNxPfZUb4/cgPbIrI1mXwV5PsSq5+tfEKytDkrjGg80X SY3Dg5HhM33E5V8FpbDaa61K8m3eE/0ycROYiSZ9ah+6eYsgb7UYkBH75ipkRLk4irw5 Br0+Dr0xplMPuS1sYEOicVA+TfFer3QAmMsV0FlECX5dpXY3LtNdQV8lWBhlKuA2o86T BEpg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720168784; x=1720773584; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DEn/eo+IUm2nGwxAJHdVnSBRQCpes01zjw0WTw2/nxc=; b=DmC/qT7P6LfJQ7JEcMaBtXewYjtH1Q7p04Mocl6NDLQ2+hrHuTJO5j6Xbmdc164222 SYIStUznIPruHhXstCxZMSNYKU7X97pkS8L1TCaXKSon6WJdeOT5VuSqNMSAjX5bIrgl Ov0eObfji3wmino2E/S0H3P547HfzqsRZx0nDWaSA6Hjwx98lkogtenjSkKWH+7TjTPY Uck/kLgc96dYoQqTfeZvQ0aeS22lujqYbr9MopWnQmFRTGTF3uFTnnFgfOhQVdg6k4yK vHJjERrPuapMU9r6FCx6fN+QHCaSGy5Y1tpEk/9Z5KQB3y1NHvr8qWJsyzde6xNdZKqh oh3Q==
X-Forwarded-Encrypted: i=1; AJvYcCWLPDlY5YLNPLvBwNlHzODEYvLyv57S+bNyVSKY/LSN/XBqxmNXYFHmYKJwWSZPTCgYIQN0K+L01mbQBDZw
X-Gm-Message-State: AOJu0YzWKzdQw5R1RWy2jkIgdsG1OI88RYsURbdfAOPIT8L2jWcW3qqi fLSRcdTKMTDeCvQcC1dhuxiNOHgmu3SqNuEivk9tgnzUd4WVFLWlCqQc5/gFojPJu2NC2Tk4NLh rMZpdsloxbLLCz9EJcBqOCStcIIA=
X-Google-Smtp-Source: AGHT+IGxptrhlDyPq7U4eca5AWs0asuaxYhzMnpKYD2EOOExji1wvi8gKi7TnKLagozk4t3tNAtCnJUoRI0YnVycmGA=
X-Received: by 2002:a05:6122:289a:b0:4ec:ef78:d753 with SMTP id 71dfb90a1353d-4f2f402c431mr4913774e0c.14.1720168782665; Fri, 05 Jul 2024 01:39:42 -0700 (PDT)
MIME-Version: 1.0
References: <170896098131.16189.4842811868600508870@ietfa.amsl.com> <CADVnQy=rvCoQC0RwVq=P2XWFGPrXvGKvj2cAooj94yx+WzXz3A@mail.gmail.com> <8e5f0a7-b39b-cfaa-5c38-edeb9916bef6@cs.helsinki.fi> <CADVnQynR99fQjWmYj-rYZ4nZxYS=-O7zbfWjJLMxd5Lqcpwgcg@mail.gmail.com> <CAAK044SJWsvqWf+Tt3wUeGpMRH6aVg175CFUBrsz_YyhDsKYwQ@mail.gmail.com> <CADVnQy=0yhx9U-ogVX_Dh66fZWGyMzqtWAgSfaYXX-6ppGx=Kg@mail.gmail.com> <CAAK044So4qO4zma-qdYbMrJcNVdRi1-o30wcP7QjKLuT2c_Zuw@mail.gmail.com> <CADVnQynvFY24cNfe4tfm6FMY47UaMPjxrtD5RwWhCPK6EH4=6w@mail.gmail.com> <CAAK044QWL2xBvg4S_cvHFE_iTddSmnEOkhu33pftvUiUCCGZ_g@mail.gmail.com> <44e1fff-f915-fdd-6c8e-4cd1cc3a9df3@cs.helsinki.fi> <CAAK044TT93bBTkF_J8HgXxLNGnp+LLZvySf+4TOdsc=_yD8HKQ@mail.gmail.com> <9e3efc1f-e6e-8edb-d354-f442d78967ef@cs.helsinki.fi>
In-Reply-To: <9e3efc1f-e6e-8edb-d354-f442d78967ef@cs.helsinki.fi>
From: Yoshifumi Nishida <nsd.ietf@gmail.com>
Date: Fri, 05 Jul 2024 01:39:31 -0700
Message-ID: <CAAK044SE46f6RaqcRgFOhPtOJcY77OZGEMCPEGoQMCy+2E51Ww@mail.gmail.com>
To: Markku Kojo <kojo@cs.helsinki.fi>
Content-Type: multipart/alternative; boundary="0000000000003ea4d9061c7c01b1"
Message-ID-Hash: WNTZIZHEKCNARO5XQFCP7YV2SUT7MZN4
X-Message-ID-Hash: WNTZIZHEKCNARO5XQFCP7YV2SUT7MZN4
X-MailFrom: nsd.ietf@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Matt Mathis <ietf@mattmathis.net>, Matt Mathis <mattmathis@measurementlab.net>, tcpm@ietf.org
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc6937bis-06.txt
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/eAGOXyPxLSPGOEXnw-zyhDamjdk>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>
Hi Markku,
Thanks for the comments. Please see inlines.
On Tue, Jun 25, 2024 at 7:21 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
> Hi Yoshi,
>
> please see inline tagged [MK2].
>
> On Mon, 1 Apr 2024, Yoshifumi Nishida wrote:
>
> > Hi Markku,
> >
> > Thanks for the detailed comments. I put my comments in lines on the two
> points.
> >
> > On Wed, Mar 27, 2024 at 7:54 AM Markku Kojo <kojo@cs.helsinki.fi> wrote:
> > Hi Yoshi, Neal, all,
> >
> > Please see below inline (tagged [MK]). And my apologies for a very
> long
> > explanation. Hopefully I did not include too many mistakes this
> time ;)
> >
> > In summary, it seems that we do not need to reset cwnd in the end
> of the
> > recovery nor adjust RecoverFS necessarily because all issues
> raised could
> > be resolved by simply correcting the definition of DeliveredData
> > (reverting back the original definition + small additional change)
> and
> > moving the actions to take with the ACK that triggers loss
> recovery to
> > the unconditional steps to be taken together with the
> initialization of
> > the algo in the beginning (this would also be in line with how the
> other
> > fast recovery algorithms are described in the RFC series).
> >
> > Hopefully I did not misunderstand any parts of the algo (either in
> RFC
> > 6937 or in the current -08 version of the draft).
> >
> > On Thu, 21 Mar 2024, Yoshifumi Nishida wrote:
> >
> > > Hi Neal,
> > > On Wed, Mar 20, 2024 at 1:32 PM Neal Cardwell <
> ncardwell@google.com> wrote:
> > >
> > >
> > > On Wed, Mar 20, 2024 at 3:29 PM Yoshifumi Nishida <
> nsd.ietf@gmail.com> wrote:
> > > Hi Neal,
> > >
> > > On Wed, Mar 20, 2024 at 6:55 AM Neal Cardwell <
> ncardwell@google.com> wrote:
> > >
> > > On Wed, Mar 20, 2024 at 3:07 AM Yoshifumi Nishida <
> nsd.ietf@gmail.com> wrote:
> > >
> > > On Mon, Mar 18, 2024 at 8:13 PM Neal Cardwell <ncardwell=
> 40google.com@dmarc.ietf.org>
> > > wrote:
> > >
> > > But your point still stands, and you raise a great point: simply
> initializing
> > > RecoverFS to "pipe" is not safe, because packets that were
> marked lost and removed
> > > from pipe may actually have been merely reordered. So if those
> packets are
> > > delivered later, they will increase the numerator of
> prr_delivered / RecoverFS
> > > without increasing the denominator, thus leading to a result
> above 1.0, and thus
> > > potentially leading to a target for Sent_so_far that is above
> ssthresh, causing the
> > > algorithm to erroneously exceed ssthresh.
> > >
> > >
> > > Hmm. I have a native question here. If packets were merely
> reordered, isn't it ok for
> > > cwnd to be bigger than ssthresh?
> > >
> > >
> > > Yes, if packets were merely reordered and none were lost, then I
> agree it would be OK for
> > cwnd
> > > to be bigger than ssthresh. And in fact I would argue that cwnd
> should be reverted back to
> > its
> > > value before fast recovery. And this is in fact what Linux TCP
> would do, using loss recovery
> > > "undo" mechanisms based on TCP timestamps or DSACKs.
> > >
> > > However, in the kind of scenario Markku described, there was not
> merely reordering, but also
> > > real packet loss: "1 packet is lost (P1), and 24 packets are
> delayed (packets P2..P25)". In
> > the
> > > traditional loss-based Reno/CUBIC paradigm, any non-zero amount
> of real packet loss in fast
> > > recovery should result in the same multiplicative decrease in
> cwnd, regardless of the
> > > combination of reordering and loss. We could argue about whether
> that approach is the best
> > > approach (BBR, for example, takes a different approach), but
> that is a very different
> > > discussion. :-) For now AFAICT we are focused on PRR's faithful
> enactment of the congestion
> > > control algorithms decision to reduce cwnd toward ssthresh when
> there is any non-zero amount
> > of
> > > real packet loss in fast recovery.
> > >
> > >
> > > Got it. But, I just would like to clarify whether we are
> discussing the inflation of
> > sndcnt during
> > > the recovery process or cwnd after the exit of recovery.
> > >
> > >
> > > Good point. We are talking about inflation of sndcnt during the
> recovery process.
> >
> > [MK] I think we are talking about both in practice because
> inflation of
> > sndcnt during the recovery process would also result in exiting
> recovery
> > with too big cwnd. In the examples that I gave the segments
> sent_so_far
> > was calculated when the SACK for P100 had arrived (actually the
> numbers
> > were off by one):
> >
> > For Reno:
> >
> > Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
> > = CEIL(97 * 50 / 72)
> > = 68
> >
> > For CUBIC:
> > Sent_so_far = CEIL(prr_delivered * ssthresh / RecoverFS)
> > = CEIL(97 * 70 / 72)
> > = 95
> >
> > Now, when the cumulative ACK triggered by rexmit of P1 arrives and
> > terminates fast recovery, the following is executed as per the
> *current
> > version* of the draft:
> >
> > DeliveredData = (bytes newly cumulatively acknowledged) = 100
> > DeliveredData += (bytes newly selectively acknowledged) = 100 + 0
> >
> > prr_delivered += DeliveredData = 95 + 100 = 195
> > pipe = 94
> > if (pipe > ssthresh) => (94 > 70) => (true)
> > sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out
> > = CEIL(195*70/72) = 190 - 95 = 95
> > cwnd = pipe + sndcnt = 94 + 95 = 189
> >
> > So oops, when exiting fast recovery cwnd would be nearly doubled
> from
> > what it was before entering loss recovery. It seems that there
> > is an additional problem because the definition of DeliveredData
> in the
> > current version of the draft is incorrect; the cumulatively acked
> bytes
> > that have already been SACKed are counted twice in DeliveredData.
> It
> > seems that RFC 6937 and rfc6937bis-01 both define DeliveredData
> (nearly)
> > correctly by including the change in snd.una in DeliveredData and
> > substracting data that has already been SACKed. The definition of
> > DeliveredData obviously needs to be corrected. See also below the
> issue
> > with bytes SACked that are above snd.una but get SACKed before the
> start
> > of recovery.
> >
> > With the original definition of DeliveredData cwnd would not be
> > inflated further but fast recovery would still exit with too big
> cwnd
> > (For CUBIC cwnd=95, instead of 70, and for Reno cwnd=68, instead
> of 50),
> > if we use too small RecoveryFS (=Pipe)
> >
> > So, it seems that we agree that the problem of sending too many
> bytes
> > during the recovery process gets corrected if RecoverFS is
> initialized to
> > snd.nxt - snd.una. The next question is, should RecoverFS be
> initialized
> > to even higher value in some scenarios? See below.
> >
> > > Because I'm personally not very keen to address miscalculation
> of lost packets due to
> > reordering
> > > during the recovery process as it seems to be tricky.
> > >
> > >
> > > It is tricky, but I think it is feasible to address. What do
> folks think about my suggestion
> > from above in
> > > this thread:
> > >
> > > existing text:
> > > pipe = (RFC 6675 pipe algorithm)
> > > RecoverFS = pipe // RFC 6675 pipe before
> recovery
> > >
> > > proposed new text:
> > > RecoverFS = snd.nxt - snd.una + (bytes newly cumulatively
> acknowledged)
> > >
> > >
> > > Hmm. Sorry. I'm not very sure about the difference between
> snd.nxt - snd.una and snd.nxt -
> > snd.una + (bytes newly
> > > cumulatively acknowledged)
> > > Could you elaborate a bit? I thought we don't have data which
> are cumulatively acked in case
> > of reordering.
> >
> > [MK] It seems there might be another case that Neil is thinking
> where the
> > sender may end up sending too many segments during the first RTT
> in fast
> > recovery. If I understood it correctly this may occur in a scenario
> > with ACK loss for pkts preceeding the first dropped data pkt, for
> > example. Consider the following scenario where there are 100 pkts
> > outstanding
> >
> > P1..P24, P25, P26, P27, P28..P100
> >
> > Packets P1..P24 and P26..P100 are delivered succesfully to the
> > receiver. P25 is lost. ACKs (and SACKs) for pkts P1..P24, P26 and
> P27 get
> > dropped. SACKs for P28..P100 are delivered successfully. When SACK
> > for pkt P28 arrives, an RFC 6675 sender would declare P25 is lost,
> and
> > enter fast retransmit. Similarly, a RACK-TLP sender may declare
> P25 lost,
> > but this may happen with any of SACKs P28..P100 arriving.
> >
> > Let's assume we were fully utilizing congestion window, i.e.,
> cwnd=100
> > and we enter loss recovery when the SACK of P28 arrives (cumulative
> > ACK#=25):
> >
> > ssthresh = cwnd / 2 = 50 (Reno)
> > prr_delivered = prr_out = 0
> > Pipe = snd.nxt - snd.una - (lost + SACKed) = 76 - (1 + 3) = 72
> > RecoverFS = snd.nxt - snd.una = 101 - 25 = 76
> >
> > DeliveredData = (bytes newly cumulatively acknowledged) = 24
> > DeliveredData += change_in(SACKd) = 24+3 = 27
> > prr_delivered += DeliveredData = 0+27 = 27
> >
> > if (pipe > ssthresh) => if (72 > 50) => true
> > // Proportional Rate Reduction
> > sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) -
> prr_out
> > = CEIL(27 * 50 / 76) = 19 - 0 = 19
> >
> > cwnd = 72 + 19 = 91
> >
> > so, we will send a burst of 19 pkts on entry to recovery and
> during the
> > rest of the recovery around 49 pkts, giving a total of 19+49=68
> pkts
> > while only 50 was allowed. If we add 24 cumulatively acked pkts
> into
> > RecoverFS like Neil suggests, we are about to send around 14+37=51
> pkts
> > which is almost fine. However, the major shortcoming of this
> approach is
> > that we'll still send a burst of 14 pkts in the beginning of the
> recovery
> > while avoiding such a burst was one of the major goals of PRR.
> >
> > Alternatively we could modify the algo such that the cumulatively
> acked
> > bytes with the ACK that triggers loss recovery are not added to
> > DeliveredData nor in RecoverFS.
> >
> >
> > I have thought about similar things. I'm thinking that this might be a
> minor point for now.
> > My personal thoughts on this are:
> > * The case you presented presumes huge ack losses before entering
> recovery and no ack loss after that. this
> > doesn't look a very common case.
>
> [MK2]: I am not sure if this is a minor point.
> First, it does not need a huge number of ack losses before entering
> recovery. The severity of miscalculation depends on the number of lost
> acks.
> Second, there is no requirement for no ack loss after that. There
> might or might not be further ack losses. The authors (and others) have
> reported that ack loss is a very common phenomenon, so I think we cannot
> consider it uncommon.
>
> > * At any point of recovery, the inflation of DeliveredData can happen
> due to ack losses or other reasons
> > I'm not very sure creating a special handling only for the first ACK is
> effective.
>
> [MK2]: Sure, sending a burst at any point of recovery is common to all
> Fast Recovery algos not only PRR. But PRR justifies itself (for a good
> reason) by avoiding bursts in the beginning of the loss recovery. Such
> a burst is very bad because it is likely to result in loss of
> retransmitted data that would be always present with segment(s)
> transmitted in the beginning of the recovery. Particularly with PRR,
> sending a burst in the beginning is bad because PRR does not have a pause
> in sending segments in the beginning like NewReno or RFC 6675 would have
> (i.e., with PRR, there is no similar opportunity for the bottleneck queue
> to drain before retransmitted segments start to flow into the congested
> queue).
>
> * As long as there's no sudden increase of DeliveredData, I guess both
> logics behave mostly the same. So, I
> > think a question would be how much we care about this kind of
> situations. It seems to me that this looks
> > minor case.
>
> [MK2]: I don't think they are mostly the same. My proposal would avoid a
> burst in the beginning and solve in a simple way also the other
> miscalculation problems which ignore a number of Acks as I have pointed
> out. It would also be consistent with the other Fast Retransmit & Fast
> Recovery algos in the RFC series that by definition always handle Fast
> Retransmit as a separate and unconditional step at the entry to loss
> recovery.
> PRR algo as curently described is also inefficient as it includes an
> unnecessary step to check whether to send out the Fast Retransmit which
> can only happen in the beginning of the recovery. I doubt hardly any
> implementor would like to include an unnecessary condition check to be
> executed repeatedely on arrival of each Ack during the recovery while the
> condition can be true only with the first Ack that triggers the loss
> recovery?
>
Hmm, it seems to me that you're saying we might not be able to avoid the
bursts in the middle of recovery, but there's a way to avoid a burst in the
beginning.
Could you elaborate your proposal? I think I need to understand it a bit
more.
> Then we would send just one pkt (rexmit of
> > P1) on entering the recovery and during the rest of recovery
> around 49
> > pkts, i.e., 1+49=50 pkts during the recovery, which would be
> exactly equal
> > to ssthresh we set. With this approach we could avoid the burst in
> the
> > beginning. In addition we could have a consistent solution also
> for the
> > additional problem of including extra SACKed data with the ACK that
> > triggers the recovery. Let's look at the above scenario again
> cwnd=100
> > and pkts P1..P100 in flight:
> >
> > P1..P24, P25, P26, P27, P28..P100
> >
> > Packet P1..P24 are delivered to the receiver but ACKs get dropped
> (whether
> > ACKs are dropped or not is not relevant for this issue). P25 gets
> > dropped. If the DupAcks of pkt P26 and pkt P27 are delivered, from
> the
> > DupAck of P28 only the SACK info for P28 is counted in
> DeliveredData but
> > the SACK info for P26 and P27 are never counted in DeliveredData
> because
> > P26 and P27 are already SACKed when the DupAck of P28 arrives.
> However,
> > if the DupAcks of pkt P26 and pkt P27 get dropped as in the
> previous
> > example, the ACK of P28 includes new SACK info for pkts P26, P27,
> and
> > P28 and the bytes of P26 and P27 are also counted in
> DeliveredData. (If
> > also DupAck of P28 gets dropped, the DupAck of P29 may include up
> to 3
> > MSS of additional SACK info to be counted (P26, P27, and P28).
> This alone
> > will result in a miniburst in the beginning of the recovery or add
> to the
> > burst size as in the previous example where the two additinal
> SACKs (for
> > P26 and P27) inflated prr_delivered by 2, resulting in slightly
> too large
> > number of segments sent during the recovery (51).
> >
> > As suggested above, this problem with additional SACKs would be
> solved
> > such that the DupAck that triggers the loss recovery is allowed to
> add
> > only "itself" into DeliveredData and let PRR to include the
> missing bytes
> > for pkts that were SACKed before the start of the recovery only at
> the
> > end of the recovery when the cumulative ACK for the first pkt (P1)
> > arrives and inherently covers those bytes.
> >
> > In other words, the algo can be modified such that fast retransmit
> is
> > always handled separately in the beginning of the recovery
> together with
> > the initialization of the PRR variables:
> >
> > ssthresh = CongCtrlAlg() // Target flight size in recovery
> > //[MK]: the next three lines can be deleted as unnecessary
> > prr_delivered = 0 // Total bytes delivered in
> recovery
> > prr_out = 0 // Total bytes sent in recovery
> > pipe = (RFC 6675 pipe algorithm)
> >
> > RecoverFS = snd.nxt - snd.una // FlightSize right before
> recovery
> > // [MK]:maybe add cumulatively
> ACKed
> > // bytes?
> >
> > Fast Retransmit the first missing segment
> > prr_delivered = (With SACK: bytes selectively acknowledged by
> the first
> > SACK block of the ACK triggering the loss
> recovery, OR
> > Without SACK: 1 MSS)
> > prr_out = (data fast retransmitted)
> >
> > On each arriving ACK during the rest of the fast recovery,
> including the
> > final cumulative ACK that signals the end of loss recovery:
> >
> > DeliveredData = change_in(snd.una)
> > if (SACK is used) {
> > DeliveredData += change_in(SACKd) //[MK]:(*) modify
> change_in(SACKd)
> > ...
> >
> >
> > The above changes would imply deleting
> >
> > if (prr_out is 0 AND sndcnt is 0) {
> > // Force a fast retransmit upon entering recovery
> > sndcnt = MSS
> >
> > from the algo and would make it consistent with the description of
> the
> > other fast retransmit & Fast recovery algorithms (RFC 5681, RFC
> 6582, RFC
> > 6675) that include fast retransmit together with the
> initialization of
> > the algo in the unconditional first steps of the algorithm.
> >
> > (*)
> > In addition, one more smallish but important correction is needed.
> The
> > bytes that are SACKed before the recovery starts (i.e., typically
> the
> > famous first two DupAcks or more bytes if the start of recovery is
> > postponed due to reordering) should be taken into account in the
> > DeliveredData during the recovery but with the current algo they
> > are never counted in DeliveredData (and prr_delivered).
> > Why? Because when the first cumulative ACK arrives, it advances
> snd.una
> > such that those bytes are covered but change_in(SACKd) is negative
> and
> > it incorrectly substracts also these bytes from DeliveredData (and
> > prr_delivered) even though they were never counted in. Usually
> this is
> > only 2 MSS but in scenarios similar to one that Neil earlier
> introduced
> > there might be much more data bytes that are not counted. This
> change
> > would also solve the problem of exiting PRR with too low cwnd.
> > Let's look at Neil's earlier example again (see comments with [MK]
> for
> > suggested change to solve the issue):
> >
> > CC = CUBIC
> > cwnd = 10
> > The reordering degree was estimated to be large, so the connection
> will
> > wait for more than 3 packets to be SACKed before entering fast
> recovery.
> >
> > --- Application writes 10*MSS.
> >
> > TCP sends packets P1 .. P10.
> > pipe = 10 packets in flight (P1 .. P10)
> >
> > --- P2..P9 SACKed -> do nothing //
> >
> > (Because the reordering degree was previously estimated to be
> large.)
> >
> > --- P10 SACKed -> mark P1 as lost and enter fast recovery
> >
> > PRR:
> > ssthresh = CongCtrlAlg() = 7 packets // CUBIC
> > prr_delivered = 0
> > prr_out = 0
> > RecoverFS = snd.nxt - snd.una = 10 packets (P1..P10)
> >
> > DeliveredData = 1 (P10 was SACKed)
> >
> > prr_delivered += DeliveredData ==> prr_delivered = 1
> >
> > pipe = 0 (all packets are SACKed or lost; P1 is lost, rest are
> SACKed)
> >
> > safeACK = false (snd.una did not advance)
> >
> > if (pipe > ssthresh) => if (0 > 7) => false
> > else
> > // PRR-CRB by default
> > sndcnt = MAX(prr_delivered - prr_out, DeliveredData)
> > = MAX(1 - 0, 1)
> > = 1
> >
> > sndcnt = MIN(ssthresh - pipe, sndcnt)
> > = MIN(7 - 0, 1)
> > = 1
> >
> > cwnd = pipe + sndcnt
> > = 0 + 1
> > = 1
> >
> > retransmit P1
> >
> > prr_out += 1 ==> prr_out = 1
> >
> > --- P1 retransmit plugs hole; receive cumulative ACK for P1..P10
> >
> > DeliveredData = 1 (P1 was newly ACKed) //[MK]: should be = 10 - 1
> = 9
> >
> > //[MK]: Note that SACKed bytes of P2..P9 were also newly acked
> > // because the bytes have not been delivered *during* the
> > // recovery by this far and thereby not yet counted in
> > // prr_delivered.
> > // So, they should not be substracted from DeliveredData
> > // but included as those bytes got delivered only when
> > // snd.una advanced. Only P10 should be substracted.
> >
> > prr_delivered += DeliveredData ==> prr_delivered = 2
> > //[MK]: should be = 1 + 9 = 10
> >
> > pipe = 0 (all packets are cumuatively ACKed)
> >
> > safeACK = (snd.una advances and no further loss indicated)
> > safeACK = true
> >
> > if (pipe > ssthresh) => if (0 > 7) => false
> > else
> > // PRR-CRB by default
> > sndcnt = MAX(prr_delivered - prr_out, DeliveredData)
> > = MAX(2 - 1, 1) //[MK] = MAX(10-1, 1)
> > = 1 //[MK] = 9
> > if (safeACK) => true
> > // PRR-SSRB when recovery is in good progress
> > sndcnt += 1 ==> sndcnt = 2 //[MK] ==> sndcnt = 10
> >
> > sndcnt = MIN(ssthresh - pipe, sndcnt)
> > = MIN(7 - 0, 2) //[MK] = MIN(7 - 0, 10)
> > = 2 //[MK] = 7
> >
> > cwnd = pipe + sndcnt
> > = 0 + 2 //[MK] = 0 + 7
> > = 2 //[MK] = 7
> >
> > So we exit fast recovery with cwnd=2 even though ssthresh is 7.
> >
> > [MK]: Or, we exit with cwnd=7 if we correctly count in
> DeliveredData
> > during the recovery process all data that is in flight when the
> recovery
> > starts. All bytes in flight at the start of the recovery are
> supposed to
> > become acknowledged by the end of the recovery, so they should be
> counted
> > in prr_delivered during the recovery.
> >
> > > However, I think it won't be good if it's propagated
> to ater the recovery. But,
> > don't we
> > > reset cwnd to ssthresh at the end of recovery?
> >
> > [MK]: It seems that just by correcting the definition of
> DeliveredData
> > there is no need to reset cwnd to ssthresh at the end of recovery
> because
> > the algo would do it for us. But I am not opposing to reset cwnd to
> > ssthresh at the end. In that case it might be better to specify it
> by
> > giving two alternatives similar to what RFC 6582 does. Maybe:
> >
> > Set cwnd to either (1) min (ssthresh, cwnd) or (2) ssthresh.
> >
> >
> > I think we have discussed this in the past discussions.
> > In case of (1), cwnd can become very low when there were big losses
> before the recovery.
> > As many implementations don't take approach and they have been used for
> a long time, (2) became our
> > consensus.
>
> [MK2]: Yes, setting cwnd to ssthresh has been discussed and in my
> recalling there was two differing opinions between co-authors with no
> final resolution. My apologies if there was and I missed it.
> But I am not disagreeing. First, with my proposed solution this simply
> becomes a non-issue and is not needed at all. Second, if we decide to
> propose setting cwnd at the exit of the recovery I just think it would be
> not a good idea to require all implementations to do (2) but let the
> implementor to have another safe alternative as well. In case of cwnd
> being small, (1) would result in slow start that avoids a burst and would
> quickly determine the available capacity. With (2), the draft should say
> (suggest/require) something about applying some sort of pacing to avoid a
> burst, I think (such a burst is not necessarily self-evident for
> everyone).
>
I believe our consensus has been (2) while we are aware there can be a
burst.
I can agree that a burst here might not be obvious for everyone.
So, my personal recommendation would be to put something about bursts here.
But, if some folks have other opinions, please share.
--
Yoshi
- [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc6937bis… internet-drafts
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Markku Kojo
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Yoshifumi Nishida
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Yoshifumi Nishida
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Yoshifumi Nishida
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Yoshifumi Nishida
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Markku Kojo
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Yoshifumi Nishida
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Markku Kojo
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Yoshifumi Nishida
- [tcpm] Re: PRR behaviour on detecting loss of a r… Neal Cardwell
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Yoshifumi Nishida
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Matt Mathis
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Matt Mathis
- [tcpm] About growing cwnd when the sender is rate… Michael Welzl
- Re: [tcpm] I-D Action: draft-ietf-tcpm-prr-rfc693… Markku Kojo
- [tcpm] Re: PRR behaviour on detecting loss of a r… Markku Kojo
- [tcpm] Re: About growing cwnd when the sender is … Neal Cardwell
- [tcpm] Re: About growing cwnd when the sender is … Michael Welzl
- [tcpm] Re: About growing cwnd when the sender is … Neal Cardwell
- [tcpm] Re: About growing cwnd when the sender is … Michael Welzl
- [tcpm] Re: About growing cwnd when the sender is … Michael Welzl
- [tcpm] Re: About growing cwnd when the sender is … Christian Huitema
- [tcpm] Re: About growing cwnd when the sender is … Michael Welzl
- [tcpm] Re: I-D Action: draft-ietf-tcpm-prr-rfc693… Neal Cardwell
- [tcpm] Re: PRR behaviour on detecting loss of a r… Markku Kojo
- [tcpm] Re: PRR behaviour on detecting loss of a r… Neal Cardwell
- [tcpm] Re: PRR behaviour on detecting loss of a r… Neal Cardwell
- [tcpm] Re: PRR behaviour on detecting loss of a r… Markku Kojo
- [tcpm] Re: PRR behaviour on detecting loss of a r… Yoshifumi Nishida
- [tcpm] Re: PRR behaviour on detecting loss of a r… Markku Kojo
- [tcpm] Re: PRR behaviour on detecting loss of a r… Matt Mathis
- [tcpm] Re: PRR behaviour on detecting loss of a r… Neal Cardwell