Re: [tcpm] initial RTO (was Re: Tuning TCP parameters for the 21st century)

Jerry Chu <hkchu@google.com> Thu, 30 July 2009 16:38 UTC

Return-Path: <hkchu@google.com>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D61B028C1A8 for <tcpm@core3.amsl.com>; Thu, 30 Jul 2009 09:38:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.827
X-Spam-Level:
X-Spam-Status: No, score=-101.827 tagged_above=-999 required=5 tests=[AWL=0.150, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ndt98vNj5wRy for <tcpm@core3.amsl.com>; Thu, 30 Jul 2009 09:38:08 -0700 (PDT)
Received: from smtp-out.google.com (smtp-out.google.com [216.239.45.13]) by core3.amsl.com (Postfix) with ESMTP id 0738D28C186 for <tcpm@ietf.org>; Thu, 30 Jul 2009 09:38:07 -0700 (PDT)
Received: from spaceape12.eur.corp.google.com (spaceape12.eur.corp.google.com [172.28.16.146]) by smtp-out.google.com with ESMTP id n6UGc8P1012054 for <tcpm@ietf.org>; Thu, 30 Jul 2009 09:38:09 -0700
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta; t=1248971889; bh=lbSITgUjpJz4P8F+hiF7DgjlcHY=; h=DomainKey-Signature:MIME-Version:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type: Content-Transfer-Encoding:X-System-Of-Record; b=QSTjIObebof5tviK4O sxIIWrQzBcFvTFJ1rywrCou7kYyZnjeuXXwwCx7mH7ILzBAECmBBS9qzGORk0G/ij67 A==
DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:date:message-id:subject:from:to: cc:content-type:content-transfer-encoding:x-system-of-record; b=dDUcpPaf5zLC8Ip7mA02sGx+ngAkDjij/3m4ihFdl1P9vo+f+tn7E4ZIYfley0oYq Hr4MqkkgMnc8XJdtUhb3A==
Received: from an-out-0708.google.com (anac3.prod.google.com [10.100.54.3]) by spaceape12.eur.corp.google.com with ESMTP id n6UGc533005641 for <tcpm@ietf.org>; Thu, 30 Jul 2009 09:38:06 -0700
Received: by an-out-0708.google.com with SMTP id c3so839825ana.28 for <tcpm@ietf.org>; Thu, 30 Jul 2009 09:38:05 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.100.91.3 with SMTP id o3mr740624anb.157.1248971885090; Thu, 30 Jul 2009 09:38:05 -0700 (PDT)
In-Reply-To: <20090729160559.122DF3884F5@lawyers.icir.org>
References: <d1c2719f0907290756h6f4990afu8fe4a573c5669d79@mail.gmail.com> <20090729160559.122DF3884F5@lawyers.icir.org>
Date: Thu, 30 Jul 2009 09:38:05 -0700
Message-ID: <d1c2719f0907300938o443ff4a2ne2627425aa661b92@mail.gmail.com>
From: Jerry Chu <hkchu@google.com>
To: mallman@icir.org
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
X-System-Of-Record: true
Cc: "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] initial RTO (was Re: Tuning TCP parameters for the 21st century)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jul 2009 16:38:08 -0000

On Wed, Jul 29, 2009 at 9:05 AM, Mark Allman<mallman@icir.org> wrote:
>
>> I can think of a number of variations. The one-shot 1-sec-initRTO idea
>> you described above also came through my mind but the drawback is you
>> only get one-shot even though we know statistically > 98% of
>> connections have RTT < 1 sec so most likely the continuous use of
>> 1-sec-initRTO will turn out to be better. (A counter argument might be
>> one-shot is "good enough", benefitting > 90% of the cases
>> statistically...) The advantage of it is its simplicity, restricting
>> the max # of spurious retransmissions caused by the reduced initRTO to
>> 1, and obviously avoiding the RTO hell problem.
>
> Two responses to "one shot":
>
> (1) Yes, one-shot ought to be enough.  Considering losing the SYN,
>    retransmitting it using an initRTO of 1sec and reseting the initRTO
>    to 3sec.  Now, if there is actually loss in the first RTT of data
>    transmission talking about fine-grained performance (i.e., that we
>    can get from using 1sec again instead of 3sec) doesn't make a lot of
>    sense because 1sec vs. 3sec doesn't matter because performance is
>    going to suck no matter what.  So, why bother with anything terribly
>    "smart" here?

Guess we have some disagreement here and it's not surprise - although
we seem to agree upon the general principles - keep it simple, focus on
the main benefit not the corner cases, but both principles involve some
degree of subjective calls. E.g., the exit condition to initRTO of 1sec
suggested in my previous email doesn't seem all that complex to me.
It only involves a simple check of dupack (or dup SYN/SYN-ACK).

Also I'm not sure why performance has to suck just because a connection
experience early (e.g., SYN/SYN-ACK) loss? I don't remember if TCP
spec requires a connection to go into congestion avoidance mode after
SYN/SYN-ACK loss? Even if so, for short lived connections with small
amount of data (e.g., HTTP/TCP), a speedy recovery from a more optimal
RTO value does still seem matter. Also the reason for a possible speedy
recovery for a loss event AFTER 3WHS seems equally valid as the one
for 3WHS, assuming these loss events are stochastically independent
of each other..,

>
> (2) Using the numbers on your slides it seems to me that the fraction of
>    hosts with an RTT of > 1sec is roughly the same as the SYN
>    retransmit rate (at an RTT of 3sec, I assume).  To me that says that
>    if you use an initRTO of 1sec and then retransmit then the reason
>    for that retransmit is just as likely to be loss as it is to be a
>    long path.  So, your approach of preferring more than one-shot
>    assumes loss.  But, I don't see the measurements you gave as
>    suggesting that is the right approach.  The notion of going back to
>    3sec just sort of punts.  I.e., the notion is that we have hit a
>    situation whereby we don't know what is going on and so let's not
>    dogmatically try to push forward, but let's throw up our hands and
>    try to do things that ultimately will figure out what is going on.
>    And, further, one mistake does not propagate.

Again I agree with your statements in general but I just feel one-shot
is a little weak, shy of being "good enough". How about allowing N
shots where N is some TBD (> 1) number? I suspect N=2 may satisfy
the "good enough" criterion, taking cue from MacOS (see my slides on
MacOS's repeated 1sec RTO 5 times...) There may be a couple of
different flavors of allowing N shots, either consecutive-only or not but
that's details.

>
> So, for me one-shot is just about the right balance here.  Any more than
> that we're getting into the corner of a corner case and further in that
> corner the empirical evidence is not suggestive of a clear path.  So,
> let's just do something that will allow the protocol to get a handle on
> things as they are in the specific situation and not try to make guesses
> that propagate further.

The N-shot solution doesn't add any complexity (just log(N) more bits to
the TCP structure) and seems a strong enough dose for those HTTP
connections suffering the dreadful 3sec pause.

Jerry

>
> allman
>
>
>
>