Re: [tsvwg] RTT-independence

Sebastian Moeller <> Tue, 11 February 2020 22:23 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 084AE120180; Tue, 11 Feb 2020 14:23:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.349
X-Spam-Status: No, score=-2.349 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id KCaw7jlGiada; Tue, 11 Feb 2020 14:23:29 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id AA5E0120018; Tue, 11 Feb 2020 14:23:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;; s=badeba3b8450; t=1581459798; bh=mnvxQFmLnXPhPU2oetfcqWSU7REcwAqon1UWJdcxlps=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=hBrLKrWhvvZs767QaYM1JUFsOzweCihXms5MbDeD/gmVrQVLIVWqxfUNyNfL5Y2Na 2SzVoZz04ysR2vLmZKLXAOoyjKI4hLiaUVfzjXirYk3ZMZNfBcLag7hcVBROXCJS5U Lq6tW69RHTF5KGwDSieQFKiT7MBy4XgNuxa1eQms=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from hms-beagle2.lan ([]) by (mrgmx004 []) with ESMTPSA (Nemesis) id 1M5QJD-1j2THX0ftF-001NRU; Tue, 11 Feb 2020 23:23:18 +0100
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <>
In-Reply-To: <>
Date: Tue, 11 Feb 2020 23:23:13 +0100
Cc: Jonathan Morton <>, "" <>, "" <>, "" <>, "" <>, "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <FRXPR01MB0392712535834CCBB4ADE8019C180@FRXPR01MB0392.DEUPRD01.PROD.OUTLOOK.DE> <> <>
To: Greg White <>
X-Mailer: Apple Mail (2.3445.104.11)
X-Provags-ID: V03:K1:1JY2YEYF9ZSDI7HzM2A46Cvhh4zgutEZ6sQ/6PxeaLSZi6z7h6t NVYrQXvDz4J6FpvwAyRjoOAznPIbHJLBCcMrAqTDx37cEcW/F94IVj2BYHfodJ+ZVHzRlKS dBuMoUybXlqiU67K8+AY1wxj1YzHYZlovFlwSv0fBDF0Cj1ywSzFr2V5fIIEfmucPo7XrmB Nxvo0mLePPcEqvgPcb1aQ==
X-UI-Out-Filterresults: notjunk:1;V03:K0:nKcDyymRoSQ=:8eGrXUtiBb4b5YGWRGlDFc 6YmPekSe8GvjP0FqLuGb+X9CPa1M46ECwFyUly8/6qMmWdgbzNeQPq09H+hUJPxnJWEd6PjoG 7V08VsiJR7OvWBqb3bDYts9ZUUVSJ4YN+PMDasJJw70IoGYfiPuJJ9oeKyi+Y3Tp05zPLxIRo wcF2KrwD/8afiW0qwFKEultXWybAZcLUAux0G3773lL1yqA6dhKV0BNJimm4vbZfhcOnZe+GO Z9Ko2tkhgtL8z90LSkygTmjK7Vu6GE1W9GlrD0yG6HE3OONOF8MtFgkcXP2crKYKiv0+78X5J QNqN5fP9VktglGFBoylcRgTzFQcAk1P3mVbOSwZ2G+fLCljJyUV8wHU+z+RmG/O9fshxr/tB9 /vVFwJzAZkogNF9ovhndKt1BMuSOneM44CDxfNGJX2uQBXQ6O/w0YIea1giTQm/5dmYT3wAFz nES208DaI5rpIM3Um1lfjr7xgR/6b0M6XaQ6f+ufjLs38kLPdjNwCTZEMF6DTnfs7E243GVbV wsHKY+58BlQJLlpCi+vu1dsvugZKr5kwm1d+nkeex3k0Cd7den0hMujnBfshzPia/yMh/3DLy burelJFioYrdX1Kbnh5Fppu01A9cb3Avv8Fx5cp52dHUBVkj/jNyGLYF7491Oa7cYb42Gya9f KwRRS40FWr3pGPxveFlPCClaivlHN3VifFjXEm1iuLk4faJaP8+3NmuiDGXY70+k86diSCVQE 7Kg0HwYVv5XjdIJ1pqerhPgA+CAZiszqwijXGMJuA5hBh0OHfYJV0KaZJzksyzx6pAvWguOHu LxoGHKtn/vDB366fcj2Nkyj3NAXtGByscylgj9lL8jkorWvbv1QJ/bOAJs5++V7svnGerLHOA P4F/d8LItRZusurqiPHSA5tFzvbRoW10RBjLcbzo9yGjLwHhmbRGhxVhf6BnAiUQlOOxTi+TO nqVjitWk7vlXEWkAkojek3LUMULeqvo3uj6d6Ow4UGj7ICNFme8aHX1LF0ThWGiRLCjEk1vWy kgCBbXFuaEH5pj3cxPXKLcbw6953hz8eH+RhDU367dJGc5eQBOBQDopl1jqD0YPEyY/+Zps8O y9vboUg6PIHZ+0bmDCADhUo5ODTP6Gkv20gBm8HgCHHiE8j9IhG0woBeXho+G9lYSv5F8P1b0 15Uto2IdRx426RmU/bs80BfMhQA8RG97WMVYzoto3PAvYn1V2kOsBSR1l2fkOzU16ts9nXc1u 8jLetQHSQ0INY/7N6
Archived-At: <>
Subject: Re: [tsvwg] RTT-independence
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 11 Feb 2020 22:23:33 -0000

> On Feb 11, 2020, at 21:32, Greg White <> wrote:
> On 2/11/20, 4:54 AM, "tsvwg on behalf of Jonathan Morton" < on behalf of> wrote:
>> 15ms comes from the amplitude of the sawteeth of typical traffic at typical Internet bottlenecks over typical paths. 
>    For this to be true, as a peak-to-peak measure, while maintaining 100% throughput, the baseline RTT would have to be 15ms (for basic Reno) or 35ms (for CUBIC as implemented in Linux).  This does not square with other research which has established typical Internet path RTTs in the region of 80ms.
> [GW]  It's not a peak-to-peak measure.  The PIE or PI2 controller aims to keep the "average" queue delay at the target value of 15ms.  

	[SM] The peak-to-peak interpretation is trying to make some sense out of Bob's "15ms comes from the amplitude of the sawteeth of typical traffic at typical Internet bottlenecks over typical paths. [...] the sawtooth amplitude depends on the RTT of the path" but to me this highlights that the rationale for the choice of the numerical value of 15ms is missing in the DQCAQM draft

The "history" of the target delay//ref_del/RTT_typ goes something like this:

1) "The" PIE paper (Pan, et al., "PIE:  lightweight control scheme to address the bufferbloat problem", IEEE HPSR, DOI 10.1109/HPSR.2013.6602305, 2013, introduced the parameter ref_delay as follows:
"We present here a lightweight design, PIE (Proportional Integral controller Enhanced), that can effectively control the average queueing latency to a reference value"
and showed data acquired with ref_del set to 20, 15 and 5ms. It gives no guidance as to how to select the actual value and makes no judgement about which is better.

2) The PIE RFC ( calls the parameter QDELAY_REF or AQM Latency Target and recommends a default of  15 milliseconds:
"The target latency value, QDELAY_REF, SHOULD be set to 15 milliseconds"
without any further justification for that recommendation

3) The DOCSIS PIE RFC ( says the following about the Latency Target:
"4.1.  Latency Target
  The latency target (a.k.a. delay reference) is a key parameter that
  affects, among other things, the trade-off in performance between
  latency-sensitive applications and bulk TCP applications.  Via
  simulation studies, a value of 10 ms was identified as providing a
  good balance of performance.  However, it is recognized that there
  may be service offerings for which this value doesn't provide the
  best performance balance.  As a result, this is provided as a
  configuration parameter that the operator can set independently on
  each upstream Service Flow.  If not explicitly set by the operator,
  the modem will use 10 ms as the default value."

4) The DQCAQM draft ( renames the parameter in its pseudo-code to RTT_typ and states:
"for the PI2 algorithm (Appendix A) the queuing delay target is set to the typical RTT;"
without expanding what a typical RTT denotes exactly, and how RTT and target delay depend on each other.
IMHO that makes not too much sense theoretically as that parameter describes the amount of queueing delay the PIE instance will aim for, which is only partly depending on a flow's RTT.

IMHO this "history" shows that there is no clear theoretical underpinning/justification of any of the values in these four documents. I am not saying that there isn't any, but it would behoof the PIE proponents to come forward with a theoretical rationale or to present test data demonstrating that 15ms is a magical number... superior to all others or even a just good compromise.

> This means that for a single Reno flow to fully utilize the link, its baseline RTT needs to be 30ms or less, and cubic more like 50ms or less.  Keep in mind that cubic is also a bit less sensitive to under-buffering than Reno.  You can quibble with his wording if you like, but setting the latency target in existing AQMs comes down to (as Bob indicated) a tradeoff between minimizing occasions where the link is underutilized and minimizing delay.  I'm not aware of a specific goal of ensuring 100% utilization for a single 80ms base-RTT flow.

	[SM] We all agree that utilization versus latency-under-load-increase are the trade-offs of selecting a target delay, the question is why is the L4S-team both adamant that 15ms is the correct number and at the same time incapable of giving a reasoned justification of how the numeric value was reached. Of all the descriptions, I find the one in the DQCAQM draft the most forced one, RTT_typ, typical RTT is simply not how ref_del works, this is similar to codel the amount of queue delay under load that PIE will aim for (yes there are differences between codel's target and PIE's ref_delay, but essentially both are setpoints that directly affect long term queue duration).

>    NB: Codel has an even lower queue target of 5ms by default, but this is for controlling the *standing* queue, not peak nor average queue depth.  Some reduction in link utilisation is accepted in the design, in service of keeping the queue short.

	[SM] Question: with PIE the long term average median queue length will be ref-delay and with codel/fq_codel it will be close to target, so do you see any reason, why 5ms will not also work for PIE? Looking at the PIE paper it looks like PIE at a target of 5ms ref_delay has slightly higher utilization cost than Codel but we are talking PIE is 96.6% vs. CoDel is 98.4%, IMHO both well acceptable.

> [GW] It has been argued that way, but I'm not sure that argument actually holds up.  CoDel only aims to ensure that the minimum queue delay over the last interval (100ms) is below the target value.  

	[SM] testing under load shows that for fq_codel the median latency-under-load-increase for an unrelated measurement flow is close to either one or, for bi-directionally saturating traffic, two target's worth of milliseconds. So empirically it looks like Codel/fq_codel under load will give you a queues size that is mostly bound by target. Under overload conditions that does not hold anymore, but see fq_codel for a viable way to ameliorate that issue to some degree.

> Van and Kathie's definition of "standing" queue (or "bad" queue) was queue delay that lasts for more than about one RTT,

	[SM] No, it is not standing queue per se that is bad, but standing queue in excess of target that triggers Codel's signaling, in effect with well-behaved flows this ends up with an median queueing latency of target time units. And fq_codel then solves the well-behaving requirement, by isolation well and badly behaving flows such, that each flow mostly reaps what it sows itself, same principle appliies to fq_pie, by the way..

> and their description of why they landed on that definition appears to relate to handling stretch ACKs and other sources of burstiness rather than dealing with congestion control dynamics.  

	[SM] Partly, the recommendation is to set interval to the expected RTT, and from the AQM's perspective it will take at least one RTT until its signal has been ping-ponged to the receiver, back to the sender and the sender adjust its behavior in a way that is visible for the AQM, so unless already overloaded it makes little sense to send more signals before giving a flow time to react... but that is not explicit in the RFC.
Ideally the interval would be deduced for each flow independently, but empirically it turned out that getting interval exactly right is not really required for acceptable performance.

> The other definition of standing queue (the queue to eliminate if 100% utilization for a single flow is desired) is the remaining queue at the trough of the TCP sawtooth.  These two definitions only align when the sawtooth period is less than 100ms. As a result, CoDel only supports 100% link utilization in the single flow case when that is true.

	[SM] Exactly, the default interval is 100ms exactly to cater to RTTs of (up to) 100ms, in practice it turned out that codel, especially fq_codel are relative forgiving abut not getting interval exactly right, so an interval off 100 will work quite well for RTT <= 200ms , and in addition interval is a user-controllable parameter that can be adjusted to longer or shorter RTTs, the same also seems to hold for PIE that the ideal ref-delay depends on the expected RTT distribution and delay tolerance of the traffic. See e.g. the recommendation if rfc8033 to set ref_delay in data centers to 0.15 ms: 
"If the target latency is reduced, e.g., for data-center
   use, the values of alpha and beta should be increased by the same
   order of magnitude by which the target latency is reduced.  For
   example, if QDELAY_REF is reduced and changed from 15 milliseconds to
   150 microseconds -- a reduction of two orders of magnitude -- then
   alpha and beta values should be increased to alpha * 100 and
   beta * 100.")"
Side-note: Codel does not aim for strict 100% link utilization though, just "close to 100%", consciously trading-off some utilization for lower latency under load.

>  Since the period of the sawtooth for Reno or Cubic can be measured in seconds (or minutes) for typical broadband speeds and RTTs today, this doesn't really work as described, and CoDel reverts to more-or-less acting as a short (5ms-ish) drop-tail queue.  For 100Mbps links, CoDel only supports 100% utilization for RTTs less than ~3.5ms.  For 1Gpbs links, CoDel only supports 100% utilization for RTTs less than ~1ms.  Sebastian has mentioned a few times that there is a theoretical basis for the 5ms target in CoDel.  I've not seen one, and wonder if someone could point me where to find it. 

	[SM] Happy to deliver, is leaps and bounds beyond and above what the PIE literature cited above has to say about ref_delay, and I am sure I have cited that before here on the list. 

	But to repeat my position, while I welcome increasing RTT-independence of transport protocols (just as with most of the other TCP Prague goals I am all for it), I consider it to be a red herring in regards to DQCAQM failure to the one thing it exist for robustly and reliably (and without having to rely on well-behaving and cooperating flows). 
	My proposal was and still is for the L4S proponents to just change that 15 to 5 ms in the C-queues PIE instance and see how this affects worst case (short RTT) sharing between the two traffic classes and link utilization. Basically to test the hypothesis that RTT-independence might ameliorate DQCAQM systematic failure. Yes, I realize that the L4S plan is to improve RTT-independence of the L4S traffic and improve things from that side, but until there is demonstrably working transport protocol that archives sufficiently high RTT-independence it ld seem worthwhile to explore other ways to fix DQCAQM, especially if one is eager to roll the DQCAQM out ASAP, no?

Best Regards

>    In any case, from a scientific perspective, when testing a claim that a given system is less RTT-dependent than the status quo, it is of course necessary to choose a variety of test conditions which are capable of exposing this property.  Tests to date actually suggest *greater* RTT dependence of the L4S system compared to conventional TCPs with conventional AQM, largely because of the moderating effect (with a very short baseline RTT) of keeping a modest standing queue which L4S specifically avoids.
>    It is this evidence which the L4S team must defend against on this particular claim.  I think Sebastian is asking important questions here.
>     - Jonathan Morton