Re: [tcpm] Updating Proportional Rate Reduction RFC6937 to PS

Matt Mathis <mattmathis@google.com> Mon, 19 October 2020 17:20 UTC

Return-Path: <mattmathis@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 73E3D3A0E27 for <tcpm@ietfa.amsl.com>; Mon, 19 Oct 2020 10:20:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.699
X-Spam-Level:
X-Spam-Status: No, score=-15.699 tagged_above=-999 required=5 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IpHTYDfrrBoj for <tcpm@ietfa.amsl.com>; Mon, 19 Oct 2020 10:20:44 -0700 (PDT)
Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E2F363A10DC for <tcpm@ietf.org>; Mon, 19 Oct 2020 10:20:25 -0700 (PDT)
Received: by mail-io1-xd2e.google.com with SMTP id z5so602593iob.1 for <tcpm@ietf.org>; Mon, 19 Oct 2020 10:20:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zhIEdcZDFz4Be8teZ83k7hfyviYaRFvNCkxE/97+0pg=; b=BDoz46DHbA8hcShESCb/0QE7WTw3iCGuvNeFUbbZGl/zZ3QZIOBl4mCyf0HbyoCM8d Zrag7Tzs98TvPANOWGkv5JeapJvnWEDEXBCQ+YlyNo0GckCacEdDgNAxx6J4qiWtbWeA igA37YjDT2/2AbkeFPk9BjvFnlZNa/wrdwGwQSxYom5ADogszP3TttLmOqQdOHQP6Twg CP9ZP51so/MQWejH/zUx4Ai6yXZNer24b7SOeDX7Wb8qJEHPl3kJFliJ3KIdFqJ4ILAz kfp0NBpdNCRb22seayDWIKtJhZ/s14+TrrieLOluifCbbaY3xy04pS6jNLRQce6vZcVK o7QA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zhIEdcZDFz4Be8teZ83k7hfyviYaRFvNCkxE/97+0pg=; b=dfdvXa4ivDJ7+vJRTGk5RjBqwz6id+Yc6PH6v5TlhqDck9Te53XlS2wgZYBDkck+9q vOrPhvpq70lud5E0G1fx/8WXFc9eDoJeZnNC4iUO3BjrPXgSHeZHsab07bDbos/kaTSw zJTKqtW7GAG6zElkCD1piOLNluNAEZCiXmOskCogSVHfNu/MkN/fsx88sScnWG+k++TP W2D+uh5WoA7p1vHxHodCGjRCFZ1ke5GcG96Khc0NetCKV/OorNFPs1+g8Gh2r41goofU aQH44AzDlnysVlObVmBoW0dCbbVgaobu2AtJixN+ewDk9LoOPFeT3BJ2DmyKlx7/510v eMow==
X-Gm-Message-State: AOAM530iib/Kg4jpTRt0o2rYvMaNMGR7BFMTdVf2rQ+S+0kPyMTsVGQh LhbnBdnV5Wm6kSzNSY12v0OFNzmvk6NOFSCF6mA1uw==
X-Google-Smtp-Source: ABdhPJzDz+kPmQKonlhUHrdg4k3cBdEg82J2p48TBuwD15BCGuCPDaioK3/qFfxOfLGMqmMGAg6PQ7QiFwThy3m2TMw=
X-Received: by 2002:a5d:83cc:: with SMTP id u12mr366052ior.171.1603128024732; Mon, 19 Oct 2020 10:20:24 -0700 (PDT)
MIME-Version: 1.0
References: <CAH56bmDXUrJRdnCRq1mug95B16yUQFp4mN4Hur7q9aau-DAk0Q@mail.gmail.com> <2CE9D0F2-88B6-4736-99C8-1533F625ACAA@ifi.uio.no>
In-Reply-To: <2CE9D0F2-88B6-4736-99C8-1533F625ACAA@ifi.uio.no>
From: Matt Mathis <mattmathis@google.com>
Date: Mon, 19 Oct 2020 10:20:13 -0700
Message-ID: <CAH56bmDgkVKZvXWr=dL=LptZob+N_nFUP4AkO52J8EiHKvQgvQ@mail.gmail.com>
To: Michael Welzl <michawe@ifi.uio.no>
Cc: tcpm IETF list <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000071e6e205b209546d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/8Duobgqz2UaVhDpUsaHlHRuS-So>
Subject: Re: [tcpm] Updating Proportional Rate Reduction RFC6937 to PS
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Oct 2020 17:20:48 -0000

The heuristic was designed to address the "double drops" that you noted.
Both RFC 6675 and PRR-SSRB can be too aggressive in situations where there
are burst losses that cause the flight size to fall below ssthresh.
Sometimes these are ok, but they often cause lost retransmissions.  The new
heuristic is to prevent these situation from persisting.

These situations are hard to reason about because they can not be caused by
events that are modeled by simple queues.  The Flatch paper is about token
bucket policers which cause huge losses when they run out of tokens -- all
without significant queues.  Other potential events include bursty cross
traffic from long RTT or non-responsive flows.

With packet conservation, a single flow can not cause self inflicted losses
that last more than one RTT, and thus can't cause lost retransmissions.
If there are lost retransmissions something happened that is outside the
scope a simple network model.

While I agree that 6675's half RTT of silence reduces queue pressure, the
responsibility for that really belongs to congestion control, which should
not be controlling against queue full.  The goal of PRR is to minimize the
opportunities to lose the self clock by accurately controlling flightsize
to the target set by the congestion control.

The heuristic also helps in the situation where there was a large step
change in the available bandwidth, and the CC can not possibly estimate the
the correct flightsize quickly enough.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.


On Sun, Oct 18, 2020 at 2:07 PM Michael Welzl <michawe@ifi.uio.no> wrote:

> Hi,
>
> Are there any known bad experiences with PRR at all?
>
> I like RFC 6937, as I appreciate that it's trying to do the right thing,
> in the right way (and, as a side note, at least I did enjoy its “tutorial”
> tone writing style) … but I do think that there may be situations where the
> RFC 6675 style reduction has an advantage, as it allows the queue to drain
> for half an RTT before sending again.
>
> We once saw consistent double drops upon entering FR/FR from CA in local
> tests with a large queue (above a BDP), and ended up blaming PRR for it, as
> it would detect loss (little loss) and quite immediately set the rate to
> half of the previous rate, which (in case of an above-BDP bottleneck queue)
> was still faster than the capacity limit. Then, the queue never really got
> a chance to drain (it seemed), it filled up again, and there was another
> loss. Now, one could argue that such long queues are not a case to optimize
> for anyway, but e.g. Cubic’s backoff could show the same behavior with a
> shorter queue.
>
> We didn’t investigate this deeper, and now I can’t be say for sure if this
> really was PRR’s fault - but shortly after this experience, I stumbled over
> this:
> https://www.ee.technion.ac.il/~isaac/p/sigcomm16_vcc_extended.pdf
> (long version of the VCC Sigcomm’16 paper)
>
> … which also finds double drops occuring as a result of PRR (see appendix
> B).  I have to admit that I don’t fully get the discussion around ECN in
> this appendix; altogether, I’m not really convinced there are  hard facts
> about this being a “problem” at all, but I thought it’d be worth bringing
> to the group’s attention. Maybe someone has investigated this deeper and
> found out if this is a real issue or not.
>
> Cheers,
> Michael
>
>
> On Oct 18, 2020, at 3:56 AM, Matt Mathis <
> mattmathis=40google.com@dmarc.ietf.org> wrote:
>
> Following a discussion with the tcpm chairs, the authors of RFC6937 plan
> to introduce a .bis document to update PRR from Experimental to Proposed
> Standard.   PRR is supported in one form or another in 3 major operating
> systems and has come to be very widely deployed over the last several years.
>
> There have been no changes to the base algorithms for PRR-CRB
> (Conservative Rate Bound) and PRR-SSRB (Slow Start Rate Bound).  However
> PRR can be substantially improved by using a heuristic to dynamically
> switch between algorithms, depending on the presence of additional losses.
>   We plan to present a candidate heuristic, however there has not been any
> deep studies of alternatives.  This approach was first described in section
> 5.2. of Flach et al "An Internet-Wide Analysis of Traffic Policing"
> <https://dl.acm.org/doi/abs/10.1145/2934872.2934873> and has already been
> upstream for several years.
>
> I could use some editorial advice:  RFC 6937 is too long and much too
> "tutorial" in tone.   Does it work for RFC 6937.bis to state the
> algorithms in normative language, but to have non-normative references
> to RFC 6937 for context and background?  Can somebody point me to a pair of
> existing RFC's that use this approach?   How much explanation should remain
> in RFC 6937.bis?
>
> We are aiming to have something ready for the next tcpm meeting.
>
> Thanks,
> --MM--
> The best way to predict the future is to create it.  - Alan Kay
>
> We must not tolerate intolerance;
>        however our response must be carefully measured:
>             too strong would be hypocritical and risks spiraling out of
> control;
>             too weak risks being mistaken for tacit approval.
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>
>
>