Re: [tcpPrague] Pacing IW

Bob Briscoe <> Fri, 05 February 2021 14:58 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CC1493A11F6 for <>; Fri, 5 Feb 2021 06:58:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.434
X-Spam-Status: No, score=-1.434 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id p9spw-qEeHtC for <>; Fri, 5 Feb 2021 06:58:38 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id CD67E3A11F5 for <>; Fri, 5 Feb 2021 06:58:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=MU+UIimK2SQ7CKoqPf6lj5X4yc4Hh2gaUWg3RBi6o3U=; b=AOqzbFb0te0M2miOFC3fOMo818 fViNmv32FUOsdgbFl9kmwGSM+uJ4iDiErh2bmVUeckVl/IwHPkLOLT5attO8FGznwYaFrkKzNoFoY hcAMqzvlaiTGNjBZsCtan6Em4zcNdtKXjITtkTMdtxLOHTWyVhOLPqLAkpRB75jhWo5O9oWf9e8UE Q31Cvh0yr73m4jLE5CFX7q9uhOA8HheSGCZDW0j45Z2RIgXkGYbsL85l++AeDQDsFQ6sVLNIG3u5T 1gacNQg7byuTTJ2q47tk2AyUxNkPvlr6ZHTJV697iS3PTx33DV6NpzeT5Z22wkngydsNGdPfthA+0 HtuOM6Ow==;
Received: from ([]:56086 helo=[]) by with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <>) id 1l82Z9-0003hX-Gm; Fri, 05 Feb 2021 14:58:35 +0000
To: Ingemar Johansson S <>, "Tilmans, Olivier (Nokia - BE/Antwerp)" <>
Cc: TCP Prague List <>
References: <> <> <>
From: Bob Briscoe <>
Message-ID: <>
Date: Fri, 05 Feb 2021 14:58:34 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
Archived-At: <>
Subject: Re: [tcpPrague] Pacing IW
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "To coordinate implementation and standardisation of TCP Prague across platforms. TCP Prague will be an evolution of DCTCP designed to live alongside other TCP variants and derivatives." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 05 Feb 2021 14:58:41 -0000

@Olivier (I've responded to your response in line tagged [BB])
TL;DR: Your point about already pacing over half the RTT estimate 
probably means the code is good enough already.

@Ingemar, Thx for this.
The data I mentioned that Mark Handley presented was for fixed networks. 
But yes, of course, I should have mentioned that mobile networks exhibit 
these (different) extra delays when getting a flow started.

more in response to Olivier below...

On 05/02/2021 10:48, Ingemar Johansson S wrote:
> On the pacing matters based on (initial RTT).
> It is true that an initial RTT based on e.g. 3WHS can give values that are
> higher than the actual values when for instance cellular access is used.
> There are a few reasons to this :
> 1) DRX (Discontinous reception) : A battery saving feature that makes the cell
> phone (a.k.a UE) sleep periodically. This feature is noticeable e.g. when ping
> measurements are done
> 2) UL scheduling : To transmit in uplink, an UE must first send a scheduling
> request (SR) to the base station, the base station then transmits a grant a
> short while after. The grant may however be too small, in which case the UE
> must piggy back a buffer status report (BSR) to the transmitted data.
> All this gives the potential that the RTT may become higher than the true
> value
> /Ingemar
>> -----Original Message-----
>> From: Tilmans, Olivier (Nokia - BE/Antwerp) <olivier.tilmans@nokia-bell-
>> Sent: den 2 februari 2021 14:08
>> To: Bob Briscoe <>
>> Cc: TCP Prague List <>
>> Subject: Re: [tcpPrague] Pacing IW
>> Hi Bob,
>>   > I notice you've pushed code to TCP Prague, to optionally pace out IW over
>>> the estimated rtt (srtt).
>>   >
>> e5f14
>>   > 342978c
>> Note that this is disabled by default: this approach is quite fragile as you
>> rightfully point later on; it enables quick experimentation. I'd expect
>> someone willing to use this in production to have a finer-grained approach,
>> e.g., using the destination cache to get recent srtt/cwnd values.
>>   > [...]
>>   > - If the initial srtt estimate is too low, there will be a queuing delay
>>   > spike,
>> but it will never be worse than just sending IW at line rate.
>> Agreed--I would expect this to happen, e.g., with PEPS terminating a
>> connection.
>>   > - If srtt is too high, a short flow would take longer to complete because
>>   >
>> the first ACK will return before the IW has finished sending.
>> Agreed. More generally, that tradeoff is extremely application dependent,
>> so applications sensitive to this extra delay would either not use this or
>> control the pacing rate used themselves by overwriting the socket max
>> pacing rate.

[BB] The tradeoff is between potentially delaying others with a queue 
spike vs. pacing too slowly and delaying your own flow completion a 
little. in most cases, getting this tradeoff wrong either way should 
only lead to slight extra delay (relative to the base RTT).

So, the goal of my email was merely to try to reach a better default 
position. Then few (if any) apps would have to fiddle with socket options.

>>   > [...]
>>   > So, possible solutions (either or both):
>>   >
>>   > 1. In tcp_prague.c, multiply the rate returned from  >
>> prague_update_pacing_rate() by N (where N=2, say). So IW is at least  >
>> paced over the first half of the RTT on average. Then, if the initial  >
>> srtt
>> estimate is longer than reality, the error has to be 2x before IW  > will be
>> delayed more than a whole round.
>> This is already the case, given the connection is seen as in slowstart (
>> ssthresh=infinity), the pacing rate will be (2 * IW/srtt)
>> 6

[BB] Good point. So, I suspect it could be a good enough default as it 

It might be useful to allow the app (or the user via a sysctl?) to set a 
different value of N during IW than during slow start (for people who 
want to find a good default N empirically rather than assuming N=2 is 
good enough).

Whether N is the same as for SS or different, this is a nice simple 

>>   > 2. Introduce a third sysctl option (tcp_pace_iw = 2) that only paces IW
>>   > if
>> srtt was initialized using the destination cache?
>> I'd happily merge any addition or better ways to do this.
>>   > Altho this is in TCP Prague now, I guess eventually it could be taken up
>>   >
>> by any transport with any CC.
>> Indeed, this was the quickest way to get it to work. It can definitely be
>> altered to use the destination cache, and/or to use sk_max_pacing_rate in a
>> smarter way (e.g.,!= infinity indicates the application explicitly requested
>> it
>> before queuing data for its IW, which could/should be preferred to any
>> generic in-kernel computation/assumption), and/or to be CCA independent.

[BB] I just thought of a possible third element to a default solution.
TL;DR: I don't think it's worth implementing by default - too little 
benefit unless you have cached info, in which case you wouldn't need it 

For completeness, here's my thought processes anyway.
I was trying to think what sort of app might not be happy with a default 
that paces IW over 1/2 of the estimated RTT (or whatever fraction is 
decided as default, e.g. 1/4). I could only think of a transactional app 
with fewer than 10 packets in the send buffer. The risk of building a 
queue diminishes as the number of packets buffered reduces. So it might 
want to pace faster, the fewer packets it has buffered.

Below is the algorithm for how fast to pace while keeping the queue 
below a certain (constant) delay. But first some terminology, with units 
in [square brackets]:

R [s]: base RTT estimate (from handshake or dst cache)
N []: no of full size segments worth of data in the send buffer (the 
algorithm only applies when N < IW).
W [segment]: window that would fill capacity at the bottleneck (unknown)

_Goal_: avoid qdelay exceeding R/L, where
L []: a constant
     e.g. L=40 would mean the goal is for the queue not to exceed R/40.
     That's a queue limit of 0.5ms when R=20ms or 2.5ms when R=100ms.

_Unknown variable_
T [s]: inter-packet departure time.

T/R = 1/W - 1/L/N
     with a floor at line rate, of course.
If the sender had an estimate of W in its dst cache, it could use that. 
Otherwise it would have to use a conservative guess as another constant.

For conservative (low) values of W, T/R doesn't vary much with N. So the 
algo would only really be useful if the sender had a value of W cached. 
but then it could cache the RTT as well, so it wouldn't need this algo.


>> Best,
>> Olivier
>>   >
>>   > Cheers
>>   >
>>   >
>>   > Bob
>>   >
>>   > PS. Defensive coding nit: In tcp_output.c, I noticed you've kept the  >
>> condition
>>   >     if ( ... || tp->data_segs_out >= 10 )
>>   > Surely this shouldn't have been a hard-coded '10' in the first place?
>>   >
>>   >
>>   > --
>>   >
>> __________________________________________________________
>> ______
>>   > Bob Briscoe                     

Bob Briscoe