Re: [tcpm] initial RTO (was Re: Tuning TCP parameters for the 21st century)

Jerry Chu <> Sun, 02 August 2009 01:42 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CE5583A68E9 for <>; Sat, 1 Aug 2009 18:42:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -101.527
X-Spam-Status: No, score=-101.527 tagged_above=-999 required=5 tests=[AWL=-0.150, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, J_CHICKENPOX_33=0.6, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id lh4+nM1bpq12 for <>; Sat, 1 Aug 2009 18:42:23 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id CB4BD3A68CC for <>; Sat, 1 Aug 2009 18:42:23 -0700 (PDT)
Received: from ( []) by with ESMTP id n721gPAc010685 for <>; Sat, 1 Aug 2009 18:42:25 -0700
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;; s=beta; t=1249177345; bh=nzg0mJm6xrcHX6l5p3626QUpqM0=; h=DomainKey-Signature:MIME-Version:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type: Content-Transfer-Encoding:X-System-Of-Record; b=SOWB2GRRCcu1d8Rbv7 x70GqvKMfzfZwgw2tnTvv2qih7W3raR/bA/uyKYx6tZIwbgaaToAdOMIXz2yCnO+/qq w==
DomainKey-Signature: a=rsa-sha1; s=beta;; c=nofws; q=dns; h=mime-version:in-reply-to:references:date:message-id:subject:from:to: cc:content-type:content-transfer-encoding:x-system-of-record; b=l4z8J51rl3dKJEY9GBTcaOTGq/a9QmagP8GaBm4PsyMgGJk4dL16h5IDb6NFHAy+E mFv277weuqFWhhzThqedg==
Received: from yxe11 ( []) by with ESMTP id n721gHac021103 for <>; Sat, 1 Aug 2009 18:42:23 -0700
Received: by yxe11 with SMTP id 11so5242858yxe.3 for <>; Sat, 01 Aug 2009 18:42:17 -0700 (PDT)
MIME-Version: 1.0
Received: by with SMTP id r7mr5989562anf.37.1249177337569; Sat, 01 Aug 2009 18:42:17 -0700 (PDT)
In-Reply-To: <>
References: <> <>
Date: Sat, 01 Aug 2009 18:42:17 -0700
Message-ID: <>
From: Jerry Chu <>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
X-System-Of-Record: true
Cc: "" <>
Subject: Re: [tcpm] initial RTO (was Re: Tuning TCP parameters for the 21st century)
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 02 Aug 2009 01:42:24 -0000

[Sorry for the delay in response. I have been traveling...]


On Thu, Jul 30, 2009 at 12:22 PM, Mark Allman<> wrote:
> Jerry-
> Let's put some (rough; based on your slides) numbers to things here ...
>  + With an initRTO of 3sec your data suggests that 98% of the
>    connections complete the 3WHS without retransmitting.  So, in 2% of
>    the connections we in fact lose a SYN.
>  + Also, you note that something like 2% of the connections have an RTT
>    longer than 1sec.  (And, I am making the assumption that is really >
>    1sec and < 3sec.)
>  + So, with an initRTO of 1sec we'd expect to see 2% of the connections
>    experience loss, 2% of the connections have a long RTT and
>    spuriously retransmit which leaves 96% of the connections Just
>    Working.  (All in rough terms.)
>  + Forget the 96%... they are good to go.  They got an RTT sample in
>    the 3WHS and so presumably are working fine and no longer have to
>    worry about the initRTO.
>  + The 2% of the connections that experienced loss will have each saved
>    2sec in the 3WHS by using an initRTO of 1sec vs. 3sec.  So, if we
>    care about X connections that's an aggregate savings of X*0.02*2sec
>    when using an initRTO of 1sec versus using an initRTO of 3sec (which
>    yields 0sec of savings).
>  + The connections that experienced loss will send data in the first
>    RTT (say) and experience another 2% loss rate.  If we have a try-2
>    approach and again use an initRTO of 1sec then this would save each
>    of these connections 2sec over my notion of reverting the initRTO to
>    3sec.  In the aggregate the savings here is X*0.0*0.02*2sec.
>  + So, now we have saved X*0.02*2sec + X*0.02*0.02*2sec with a try-2
>    approach vs. X*0.02*2sec with a try-1 approach.  For X=10K
>    connections that is a difference of 8sec in the aggregate (400sec
>    with try-1 vs. 408sec with try-2)---or, less than 1msec per
>    connection on average if you'd like to do it that way.

One thing I forgot to mention as part of why I felt try-1 may not be strong
enough is that although our average global retransmission rate is estimated
to be 1-2%, the rate distribution is far from even. It varies greatly between
regions, time of the day..., etc. In some regions and during busy hours users
can experience retransmission rate > 5% (perhaps as high as 10% but again
this is only my rough memory of how bad retransmissions can get). For those
connections experiencing high loss rate try-1 just feels a bit weak. One may
argue for connections experiencing high loss rate, performance sucks anyway.
But I think it doesn't have to be so for web surfing employing short
Also in high loss condition pkt drop events may be more correlated where try-2
can make a difference (I can try to collect some data to verify this

>  + Then there are the spurious RTOs caused by lowering the initRTO to
>    1sec.  We'll have 2% of those in the 3WHS.  The problem is that
>    keeping the initRTO at 1sec **ensures** a spurious retransmit in the
>    first RTT of data transfer, too.  So, the cwnd will be reduced to an
>    MSS, no RTT sample will be taken again, linear increase will be
>    forced upon the connection, etc.

As I mentioned in my previous email the try-N can diverge into
different flavors.
The simplest one (also one I had in mind) is to just limit its application on
SYN/SYN-ACK pkts. For that flavor there is no worry about it triggering
false congestion avoidance.

There is indeed another plus for the try-1 scheme over a try-N scheme where
N > 2 though - the latter in some scenario can cause enough dupack (i.e., 3)
to trigger unnecessary congestion avoidance (I think).

>  + (Note, I am ignoring connections that use timestamps.  Connections
>    that successfully use timestamps will have an RTT sample from the
>    3WHS and therefore we don't have to worry about the initRTO
>    further.)

So perhaps try-1 is indeed good enough - most of the server stacks have
TS enabled by default anyway (with the exception of Windows servers that I'm not
sure) so we may have been arguing over the corner of another corner case :).

> To me the tradeoff is clearly in favor of try-1.  For the advantage of a
> *tiny* time savings to the 0.02*0.02 of connections that experience loss
> in both the 3WHS and the initial window of data (i.e., what try-2 would
> help) you pay by dooming 0.02 of the connections (that now work fine,
> BTW) to no exponential ramp up.  That might be a tradeoff you are
> personally willing to make---i.e., to sacrifice one type of connection
> in favor of another.  But, I don't see that as a good tradeoff for the
> standards to make.

See above. I was more of trying to let only SYN/SYN-ACK have more shots.
Also the loss distribution is far from even so the 0.02*0.02 calcuation may
not be applicable.

> Also, note, your scheme of counting SYNs is not overly complicated and
> does not have overly onerous state requirements.  I didn't mean to
> indicate either of those.  However, it isn't terribly robust either and
> I am not ultimately sure how it'd play out.  So, say a connection has a
> 2sec RTT (works with others, too):
>  0.0 xmit SYN
>  1.0 RTO (==1sec), rexmit SYN
>  2.0 rec SYN+ACK (from original transmit) / send ACK / send DATA
>  3.0 resend DATA
>  3.0 rec SYN+ACK (from retransmit)
> Those last two events represent a race condition.  I.e., in this case,
> we hope we get the SYN+ACK before we resend the data because then we can
> use your scheme to revert to an initRTO of 3sec.  But, we might get it
> in the order given above.  And, we might not get that packet at all.
> So, it might work and it might not work.  But, the cost of not using it
> (possibly saving X*0.02*0.02*2sec) is so small that it seems like
> needless complication to me.

Again see above. I was more thinking about the consecutive-pkt case where one
simply count how many times a SYN/SYN-ACK has been retransmitted. (All
stack implementations must be counting this anyway in order to decide when to
give up.)

> Now, there is something you can do here... If you wanted to take the
> reception of the SYN+ACK and compare that to the *earliest* SYN
> transmission and use that as an RTT sample and then use that to seed the
> RTO estimator then fine.  I.e., in this case that'd (correctly) see an
> RTT of 2sec.  And, if the original SYN was lost then the returning
> SYN+ACK would yield an RTT sample of 3sec.  I.e., using this scheme
> might overestimate the RTT, but you won't underestimate it.  If that is
> less than 3sec you'd be better off for the first window of data and
> you'd be protected against spurious retransmits (to the best of the
> standard RTO estimator's abilities) by using a conservative RTT sample.

Sounds like another good idea. Will give it more thoughts when I go back to
office in a week.


> allman