Re: [tcpm] Updating Proportional Rate Reduction RFC6937 to PS

Michael Welzl <michawe@ifi.uio.no> Tue, 20 October 2020 06:11 UTC

Return-Path: <michawe@ifi.uio.no>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3032F3A0FFB for <tcpm@ietfa.amsl.com>; Mon, 19 Oct 2020 23:11:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ra_Sq7KN2QKH for <tcpm@ietfa.amsl.com>; Mon, 19 Oct 2020 23:11:37 -0700 (PDT)
Received: from mail-out02.uio.no (mail-out02.uio.no [IPv6:2001:700:100:8210::71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D83A83A0FFA for <tcpm@ietf.org>; Mon, 19 Oct 2020 23:11:36 -0700 (PDT)
Received: from mail-mx11.uio.no ([129.240.10.83]) by mail-out02.uio.no with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93.0.4) (envelope-from <michawe@ifi.uio.no>) id 1kUkru-000FkK-HT; Tue, 20 Oct 2020 08:11:34 +0200
Received: from boomerang.ifi.uio.no ([129.240.68.135]) by mail-mx11.uio.no with esmtpsa (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) user michawe (Exim 4.93.0.4) (envelope-from <michawe@ifi.uio.no>) id 1kUkrt-000Czo-L3; Tue, 20 Oct 2020 08:11:34 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Michael Welzl <michawe@ifi.uio.no>
In-Reply-To: <CAH56bmDfattW-kLd=PHu7684-rYKNKtqjLdY5waq9ZBU2KJePQ@mail.gmail.com>
Date: Tue, 20 Oct 2020 08:11:32 +0200
Cc: tcpm IETF list <tcpm@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <365A684B-6AC8-4DF4-9312-A934423DE7BA@ifi.uio.no>
References: <CAH56bmDXUrJRdnCRq1mug95B16yUQFp4mN4Hur7q9aau-DAk0Q@mail.gmail.com> <2CE9D0F2-88B6-4736-99C8-1533F625ACAA@ifi.uio.no> <CAH56bmDgkVKZvXWr=dL=LptZob+N_nFUP4AkO52J8EiHKvQgvQ@mail.gmail.com> <44DB6DE9-B150-4C2D-B516-A052D325370A@ifi.uio.no> <CAH56bmDfattW-kLd=PHu7684-rYKNKtqjLdY5waq9ZBU2KJePQ@mail.gmail.com>
To: Matt Mathis <mattmathis@google.com>
X-Mailer: Apple Mail (2.3445.9.1)
X-UiO-SPF-Received: Received-SPF: neutral (mail-mx11.uio.no: 129.240.68.135 is neither permitted nor denied by domain of ifi.uio.no) client-ip=129.240.68.135; envelope-from=michawe@ifi.uio.no; helo=boomerang.ifi.uio.no;
X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, AWL=0.027, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO)
X-UiO-Scanned: 862DD307F41DA3D37418306FEFEDC4AEC37A6631
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/0WVnhxiSQ2anth5muM_PN1FGH0A>
Subject: Re: [tcpm] Updating Proportional Rate Reduction RFC6937 to PS
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Oct 2020 06:11:39 -0000

Hi again,

You don't understand it because it doesn't make sense; I get it now, and I was just wrong, I'm sorry.
I've recently been thinking too much about a paced, non-ack-clocked behavior, where things might indeed play out as I described, even with an overshoot of only one segment.

With ACK clocking, as you said, that can't happen. In fact, I can see it become a problem only in one case: when the overshoot is so high that setting back the cwnd won't save you, even with ack clocking. But I think this translates exactly into "pipe < ssthresh", and this is the case that the heuristics cover - which, as you said, were designed to handle these double drops (*). So it all makes sense to me now, sorry for this excursion into the land on nonsense. Case closed, go home everyone, nothing to see here!  :-)

Cheers,
Michael
--
(*) In practice, I therefore think that these heuristics can play an important role after the very first slow start.



> On 20 Oct 2020, at 07:25, Matt Mathis <mattmathis@google.com> wrote:
> 
> I don't understand the situation you are thinking about.  If the bottleneck BW changes and it causes loss any loss, you would expect to see a whole RTT of periodic losses between the first detected loss, and the ACK from the first retransmission.  If inflight (aka pipe) is larger than ssthresh, PRR will always send less data than has left the network.  This is guaranteed by the
> conservative self clock and the "proportional" part of the algorithm.  Yes it drains the queue slowly, and yes cross traffic can cause additional losses, but that is because the cross traffic is being too aggressive (e.g. the cross traffic has to still be increasing its instantaneous rate for some reason).
> 
> Thanks,
> --MM--
> The best way to predict the future is to create it.  - Alan Kay
> 
> We must not tolerate intolerance;
>        however our response must be carefully measured: 
>             too strong would be hypocritical and risks spiraling out of control;
>             too weak risks being mistaken for tacit approval.
> 
> 
> On Mon, Oct 19, 2020 at 2:47 PM Michael Welzl <michawe@ifi.uio.no> wrote:
> Hi,
> 
> In line:
> 
> > On Oct 19, 2020, at 7:20 PM, Matt Mathis <mattmathis@google.com> wrote:
> > 
> > The heuristic was designed to address the "double drops" that you noted.    Both RFC 6675 and PRR-SSRB can be too aggressive in situations where there are burst losses that cause the flight size to fall below ssthresh.  Sometimes these are ok, but they often cause lost retransmissions.  The new heuristic is to prevent these situation from persisting.
> 
> I get that, but what I’m concerned about is a case where pipe is *not* below ssthresh, so the heuristics don’t even apply. It’s enough to consider the “diagram" on page 7, where it’s clear that RFC 6675 takes a break from segment 4 to 11 but PRR (or rate-halving) doesn’t. This is fine when the new rate is smaller than the bottleneck’s capacity, but if it’s not, the queue should just overflow again, I believe.
> 
> (again, these are all more suspicions than anything else - I only have indications, from our own experiments and the paper I pointed to, nothing really clear that shows that this really happens; come to think of it, it actually wouldn’t be hard for me to run a check in our testbed… perhaps that would be better than making a fuss here about something that may turn out to be nothing. Then again, maybe it’s a good conversation to have anyway, just to see what the thinking around this type of concern is.)
> 
> 
> > These situations are hard to reason about because they can not be caused by events that are modeled by simple queues.  The Flatch paper is about token bucket policers which cause huge losses when they run out of tokens -- all without significant queues.  Other potential events include bursty cross traffic from long RTT or non-responsive flows.
> 
> Ok, but that’s not the case I mean, mine is simpler (above). Even just a single packet loss would suffice for the situation I’m talking about.
> 
> 
> > With packet conservation, a single flow can not cause self inflicted losses that last more than one RTT, and thus can't cause lost retransmissions.
> 
> Hmmmmmm….. I doubt this; I think the backoff factor and queue also play a role. The point is that the new cwnd may be too much, just like the old was too much.
> 
> 
> >     If there are lost retransmissions something happened that is outside the scope a simple network model.
> > 
> > While I agree that 6675's half RTT of silence reduces queue pressure, the responsibility for that really belongs to congestion control, which should not be controlling against queue full.
> 
> I agree with the thought, but in IETF terms I think that a recovery mechanism in a PS RFC has to be compatible with RFC 5681 too.
> 
> 
> >   The goal of PRR is to minimize the opportunities to lose the self clock by accurately controlling flightsize to the target set by the congestion control.
> > 
> > The heuristic also helps in the situation where there was a large step change in the available bandwidth, and the CC can not possibly estimate the the correct flightsize quickly enough. 
> 
> I understand; generally, I don’t doubt that there are many good aspects to PRR, and that it will work better than RFC 6675 in many cases. The half-RTT-of-nothing pattern is just odd.
> So maybe what I’m worrying about is a corner case. But, as we talk about proceeding to PS, I think it deserves a thought.
> 
> Cheers,
> Michael
>