Re: [tcpm] initial RTO (was Re: Tuning TCP parameters for the 21st century)

Mark Allman <> Thu, 30 July 2009 19:22 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 83EC83A7216 for <>; Thu, 30 Jul 2009 12:22:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.224
X-Spam-Status: No, score=-2.224 tagged_above=-999 required=5 tests=[AWL=-0.225, BAYES_00=-2.599, J_CHICKENPOX_33=0.6]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id COj4zmI37abD for <>; Thu, 30 Jul 2009 12:22:35 -0700 (PDT)
Received: from pork.ICSI.Berkeley.EDU (pork.ICSI.Berkeley.EDU []) by (Postfix) with ESMTP id A84063A6BE0 for <>; Thu, 30 Jul 2009 12:22:35 -0700 (PDT)
Received: from ( []) by pork.ICSI.Berkeley.EDU ( with ESMTP id n6UJMXuH017024; Thu, 30 Jul 2009 12:22:33 -0700
Received: from ( []) by (Postfix) with ESMTP id C6F883BDEBE8; Thu, 30 Jul 2009 15:22:25 -0400 (EDT)
Received: from (localhost []) by (Postfix) with ESMTP id 94B8C38CAEC; Thu, 30 Jul 2009 15:22:27 -0400 (EDT)
To: Jerry Chu <>
From: Mark Allman <>
In-Reply-To: <>
Organization: International Computer Science Institute (ICSI)
Song-of-the-Day: Sweet Emotion
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="--------ma62193-1"; micalg="pgp-sha1"; protocol="application/pgp-signature"
Date: Thu, 30 Jul 2009 15:22:27 -0400
Message-Id: <>
Cc: "" <>
Subject: Re: [tcpm] initial RTO (was Re: Tuning TCP parameters for the 21st century)
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 30 Jul 2009 19:22:36 -0000


Let's put some (rough; based on your slides) numbers to things here ... 

  + With an initRTO of 3sec your data suggests that 98% of the
    connections complete the 3WHS without retransmitting.  So, in 2% of
    the connections we in fact lose a SYN.

  + Also, you note that something like 2% of the connections have an RTT
    longer than 1sec.  (And, I am making the assumption that is really >
    1sec and < 3sec.)

  + So, with an initRTO of 1sec we'd expect to see 2% of the connections
    experience loss, 2% of the connections have a long RTT and
    spuriously retransmit which leaves 96% of the connections Just
    Working.  (All in rough terms.)

  + Forget the 96%... they are good to go.  They got an RTT sample in
    the 3WHS and so presumably are working fine and no longer have to
    worry about the initRTO.

  + The 2% of the connections that experienced loss will have each saved
    2sec in the 3WHS by using an initRTO of 1sec vs. 3sec.  So, if we
    care about X connections that's an aggregate savings of X*0.02*2sec
    when using an initRTO of 1sec versus using an initRTO of 3sec (which
    yields 0sec of savings).

  + The connections that experienced loss will send data in the first
    RTT (say) and experience another 2% loss rate.  If we have a try-2
    approach and again use an initRTO of 1sec then this would save each
    of these connections 2sec over my notion of reverting the initRTO to
    3sec.  In the aggregate the savings here is X*0.0*0.02*2sec.

  + So, now we have saved X*0.02*2sec + X*0.02*0.02*2sec with a try-2
    approach vs. X*0.02*2sec with a try-1 approach.  For X=10K
    connections that is a difference of 8sec in the aggregate (400sec
    with try-1 vs. 408sec with try-2)---or, less than 1msec per
    connection on average if you'd like to do it that way.

  + Then there are the spurious RTOs caused by lowering the initRTO to
    1sec.  We'll have 2% of those in the 3WHS.  The problem is that
    keeping the initRTO at 1sec **ensures** a spurious retransmit in the
    first RTT of data transfer, too.  So, the cwnd will be reduced to an
    MSS, no RTT sample will be taken again, linear increase will be
    forced upon the connection, etc.

  + (Note, I am ignoring connections that use timestamps.  Connections
    that successfully use timestamps will have an RTT sample from the
    3WHS and therefore we don't have to worry about the initRTO

To me the tradeoff is clearly in favor of try-1.  For the advantage of a
*tiny* time savings to the 0.02*0.02 of connections that experience loss
in both the 3WHS and the initial window of data (i.e., what try-2 would
help) you pay by dooming 0.02 of the connections (that now work fine,
BTW) to no exponential ramp up.  That might be a tradeoff you are
personally willing to make---i.e., to sacrifice one type of connection
in favor of another.  But, I don't see that as a good tradeoff for the
standards to make.

Also, note, your scheme of counting SYNs is not overly complicated and
does not have overly onerous state requirements.  I didn't mean to
indicate either of those.  However, it isn't terribly robust either and
I am not ultimately sure how it'd play out.  So, say a connection has a
2sec RTT (works with others, too):
  0.0 xmit SYN
  1.0 RTO (==1sec), rexmit SYN
  2.0 rec SYN+ACK (from original transmit) / send ACK / send DATA
  3.0 resend DATA
  3.0 rec SYN+ACK (from retransmit)

Those last two events represent a race condition.  I.e., in this case,
we hope we get the SYN+ACK before we resend the data because then we can
use your scheme to revert to an initRTO of 3sec.  But, we might get it
in the order given above.  And, we might not get that packet at all.
So, it might work and it might not work.  But, the cost of not using it
(possibly saving X*0.02*0.02*2sec) is so small that it seems like
needless complication to me.

Now, there is something you can do here... If you wanted to take the
reception of the SYN+ACK and compare that to the *earliest* SYN
transmission and use that as an RTT sample and then use that to seed the
RTO estimator then fine.  I.e., in this case that'd (correctly) see an
RTT of 2sec.  And, if the original SYN was lost then the returning
SYN+ACK would yield an RTT sample of 3sec.  I.e., using this scheme
might overestimate the RTT, but you won't underestimate it.  If that is
less than 3sec you'd be better off for the first window of data and
you'd be protected against spurious retransmits (to the best of the
standard RTO estimator's abilities) by using a conservative RTT sample.