Re: [tcpm] Review: draft-ietf-tcpm-early-rexmt-01

Joe Touch <touch@ISI.EDU> Wed, 23 September 2009 14:28 UTC

Return-Path: <touch@ISI.EDU>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 609CE3A6A22 for <tcpm@core3.amsl.com>; Wed, 23 Sep 2009 07:28:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2urxJNR-cwR0 for <tcpm@core3.amsl.com>; Wed, 23 Sep 2009 07:28:24 -0700 (PDT)
Received: from nitro.isi.edu (nitro.isi.edu [128.9.208.207]) by core3.amsl.com (Postfix) with ESMTP id 2D72B3A6A0F for <tcpm@ietf.org>; Wed, 23 Sep 2009 07:28:24 -0700 (PDT)
Received: from [70.208.68.126] (126.sub-70-208-68.myvzw.com [70.208.68.126]) (authenticated bits=0) by nitro.isi.edu (8.13.8/8.13.8) with ESMTP id n8NESEq8003755 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 23 Sep 2009 07:28:19 -0700 (PDT)
Message-ID: <4ABA307D.5080408@isi.edu>
Date: Wed, 23 Sep 2009 07:28:13 -0700
From: Joe Touch <touch@ISI.EDU>
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: mallman@icir.org
References: <20090923130457.50D1D4AEFE3@lawyers.icir.org>
In-Reply-To: <20090923130457.50D1D4AEFE3@lawyers.icir.org>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-MailScanner-ID: n8NESEq8003755
X-ISI-4-69-MailScanner: Found to be clean
X-MailScanner-From: touch@isi.edu
Cc: tcpm@ietf.org
Subject: Re: [tcpm] Review: draft-ietf-tcpm-early-rexmt-01
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Sep 2009 14:28:25 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, Mark,

Mark Allman wrote:
> Joe-
>
> Many thanks for the comments.
...
>> The example on page three considers a window of three segments (FWIW,
>> it should probably read "a window of three segments' worth of data",
>> since windows are in bytes not segments). I'm wondering if ACK
>> compression (as required) affects the example. It's worth either
>> fixing the example, or addressing the effect of ACK compression (even
>> if to clarify that there is none) somewhere in the doc.
>
> I think you're talking about duplicate ACKs and not ACK compression
> (ACKs getting squished together in the time domain), right?  The
> assumption here is that stacks are following 5681 which says they should
> not use duplicate ACKs if there is a hole in the sequence space.  I.e.,
> that they should immediately ACK each incoming segment.  I'll add a
> quick note.

I had been thinking of compression (sending an ACK every other segment),
i.e., you have a window of three segments, but you will receive one ACK
quickly, and the second ACK will just stall for some time (200 ms?).
It's like Nagle on the ACK side - would it wait for some time before
sending an ACK for a single segment (i.e., in anticipation of squishing
the next ACK with the pending one). At least that's what I'm wondering
about.

>> The data from BPS+98 implies that the bulk of RTOs can be avoided with
>> early rexmit. Is that true? Or could there be other reasons for large
>> numbers of RTOs that early rexmit won't help? If so, it'd be useful to
>> caveat the impact of the proposed mod.
>
> BPS+98 notes the problem ER addresses and solves it using Limited
> Transmit and also a scheme that sends dummy packets to induce three
> duplicate ACKs.  Without crunching that data I don't know precisely how
> ER would perform.  However, the problems they describe as the big issues
> with RTOs are the problems that ER addresses.

Worth noting that more explicitly.

> (And, this question of how ER will work in the wild is the key reason
> for experimental here.)
>
>> Also, this paragraph makes an error I saw at the last TCPM meeting
>> from the Google guy's talk - it equates median transfer size with
>> median TCP connection duration. HTTP still has persistent connections,
>> AFAIK, which mean that these aren't correlated. The conclusion that
>> non-RTO recovery would be useful may be true for short transfers over
>> persistent connections, not just short TCP connections (which is how I
>> read "short TCP transfers", since a TCP transfer is over when the FIN
>> sings ;-)
>
> You're just reading this differently from what I meant.  I have
> re-worded:
>
>     Furthermore, [All00] shows that for one particular web server the
>     median number of bytes carried by a connection is less than four
>     segments, indicating that more than half of the connections will be
>     forced to rely on the RTO timer to recover from any losses that
>     occur.
>
> I.e., I meant "transfer size" as the amount of data carried by a
> connection not the size of some subset of that data.  Hopefully the new
> version is more clear.

Yes.

...
>> In 2.1, do you want to define this in terms a fixed value of 4*SMSS,
>> or define it as a pointer (i.e., to the initial CWND, so if init CWND
>> increases, so does this?) same for the part about packet-based (again,
>> would that be segment-based?) not referring to 4, but the number of
>> segments in the initial CWND (e.g., as "currently 4" -- PS, should
>> that be 4, or shouldn't it be "initial_CWND/SMSS", i.e., a max of 4,
>> but in most current cases it seems like this would still be 3).
>
> No, I don't.  This doesn't have anything to do with the initial
> congestion window size.  The "4"---which I thought was well motivated,
> perhaps not---comes from fast retransmit's magic constant of "3".  I.e.,
> if there are at least 4 segments outstanding and we lose one then we'll
> have a shot at getting 3 dupacks.  If there are fewer segments
> outstanding then we will have no chance at getting 3 dupacks.  So, this
> has nothing to do with the initial window.

I didn't get that as clearly. It might be useful to reiterate it when
the number is introduced (maybe a few times for people like me who miss it).

...
>> Also, maybe I'm missing something, but I searched for ER_thresh all over
>> the place. It isn't *used* anywhere. I.e., you define a variable but
>> never use it. Seems like you need to use it where you say "the timer
>> (ER_thresh) goes off and ..." somewhere specific. However, you say that
>> you're lowering the fast rexmit threshold. So then wouldn't you be
>> setting "FR_thresh", not "ER_thresh"? Even if so, it's useful to recap
>> how the *_thresh value is used.
>
> There is no "FR_thresh" sort of variable defined in RFC5681.  So, while
> I understand sort of what you are saying I think this ...
>
>     When the above two conditions hold and a TCP connection does not
>     support SACK the duplicate ACK threshold used to trigger a
>     retransmission MUST be reduced to:
>
> is clear about how one goes about using the threshold.  (This is an
> example ... there are other places with the same text.)

It's simpler than that - you have a variable called ER_thresh. You don't
define it. You explain what values to set it to, but you never use it to
do anything.

I.e., you have an undefined variable that you set, but don't declare and
never use.

...
>> The examples on page 5 need to include a bit about Nagle; if Nagle is
>> on, you would never have three outstanding 400-byte segments  ;-)
>
> I didn't change anything here.  You can pushback some more, but these
> examples are in fact advertised as illustrations and I am not sure I
> want to get into needless discussions about Nagle here.  You are right
> that if Nagle is enable we wouldn't have 10 400~bytes segments
> outstanding.  But, if Nagle is not enabled we certainly could.  I don't
> think for the purposes of the examples this is an overly important
> point.

Nagle is on by default. Your example should be very clear about the fact
that you expect the default to be overridden, or should include Nagle IMO.

I don't think you need many different cases, but if this works much
better with Nagle off, then the doc needs a bit of explanation on this.

...
>> Which brings me to a wrinkle - what happens if TCP resends data with
>> different segment sizes, resulting in some segments being on different
>> boundaries than those that may already be received (e.g., on multipath
>> when a PMTU update comes in, and data is resent and resegmented
>> differently). Your segment alg needs to be robust to this, or you need
>> to explain why that doesn't matter.
>
> It doesn't matter because once you retransmit all is moot.  Early
> retransmit only *initiates* loss recovery.  What happens after that is
> not ER's problem.

Might (might) be worth noting, if only for corner-case people like me.

...
>> Other considerations: seems like you're making TCP send more segments
>> into the net when data is being lost, vs. the existing mechanisms. If
>> that's the case, and if loss is due to buffer overload, are you making
>> things potentially worse? If not, please explain.
>
> I don't understand this point.  Is it relative to ER or [Bal98]?

Relative to not doing ER.

Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)

iEYEARECAAYFAkq6MH0ACgkQE5f5cImnZruN8ACfVaVK9dWqI7kY7Xsw8ITKK935
C1sAn2phkk5AuUaOsHGSYvpgqQtG98To
=fhDK
-----END PGP SIGNATURE-----