Re: [tcpm] Review: draft-ietf-tcpm-early-rexmt-01

Joe Touch <touch@ISI.EDU> Wed, 26 August 2009 16:00 UTC

Return-Path: <touch@ISI.EDU>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 92AE628C15D for <>; Wed, 26 Aug 2009 09:00:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.579
X-Spam-Status: No, score=-2.579 tagged_above=-999 required=5 tests=[AWL=0.020, BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id OilpVCQ6ZeYJ for <>; Wed, 26 Aug 2009 09:00:44 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D49863A709E for <>; Wed, 26 Aug 2009 09:00:44 -0700 (PDT)
Received: from [] ( []) by (8.13.8/8.13.8) with ESMTP id n7QG0YsY021945; Wed, 26 Aug 2009 09:00:35 -0700 (PDT)
Message-ID: <>
Date: Wed, 26 Aug 2009 09:00:33 -0700
From: Joe Touch <touch@ISI.EDU>
User-Agent: Thunderbird (Windows/20090812)
MIME-Version: 1.0
To: Alexander Zimmermann <>
References: <>
In-Reply-To: <>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-ISI-4-43-8-MailScanner: Found to be clean
Cc:, Mark Allman <>
Subject: Re: [tcpm] Review: draft-ietf-tcpm-early-rexmt-01
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 26 Aug 2009 16:00:46 -0000

Hash: SHA1

Hi, all,

Here is a copy of the review I sent to Mark, so others know what issues
I raised:

Overall, the doc reads well, and is well organized. I found it nicely
self-contained (an exception for most transport mod docs, IMO). The
motivation was clear, and I think the doc is in fairly good shape.

Constructive edit suggestions follow.

The intro is nicely done and self-contained. Gave me enough context to
understand the problem without needing to dig up numerous RFCs and read
portions thereof, which is an exception in TCPM AFAICT.

Regarding Jac88, it's useful to make sure you're referencing the right
paper. There are several versions of this doc, at least two are often
confused: the Sigcomm version (most often cited):
	This is 16 pages, and has only a very brief appendices section

and an extended version that has a long appendix with footnotes (this is
the one that discusses idle times, too):
	This one is 21 pages, and decidedly NOT a reprint (even with
	formatting changes) of the one published at Sigcomm 1988, 	
	despite the note on the web page.

The second version has most of the details that ended up in TCP
implementations, several of which have been dogging us for years (lack
of slow-start restart after idle in the Web in particular). I don't know
if this is the case for the issues you're citing it for, but it's
certainly worth a check.

The example on page three considers a window of three segments (FWIW, it
should probably read "a window of three segments' worth of data", since
windows are in bytes not segments). I'm wondering if ACK compression (as
required) affects the example. It's worth either fixing the example, or
addressing the effect of ACK compression (even if to clarify that there
is none) somewhere in the doc.

The data from BPS+98 implies that the bulk of RTOs can be avoided with
early rexmit. Is that true? Or could there be other reasons for large
numbers of RTOs that early rexmit won't help? If so, it'd be useful to
caveat the impact of the proposed mod. Also, this paragraph makes an
error I saw at the last TCPM meeting from the Google guy's talk - it
equates median transfer size with median TCP connection duration. HTTP
still has persistent connections, AFAIK, which mean that these aren't
correlated. The conclusion that non-RTO recovery would be useful may be
true for short transfers over persistent connections, not just short TCP
connections (which is how I read "short TCP transfers", since a TCP
transfer is over when the FIN sings  ;-)

Section 2 again starts with math that appears to assume ACK per segment
(maybe I'm not catching this - maybe it's that ACKs aren't compressed --
or compressable -- when the data comes out of order, but that's worth
noting if so. Sorry, I figure you know this better than I do, so I'll
ask you before I dig into figuring it out. Let me know if you want me to
dig as well...).

Section 2 talks about TCP in bytes and SCTP in messages, neither one in
segments. It might be useful to put in enough context there, e.g., that
SCTP includes message boundaries, but that they don't correspond to
segment boundaries (right?).

Doublecheck the term 'packet' throughout; I think you mean segments
(i.e., TCP segments don't necessarily map to IP packets, e.g., given

Sec 2 talks about additional state at the sender for precision; this is
the first time you mention a side-specific cost. It might be useful to
hint earlier whether you are doing a send-side, receiver-side, or
require mods on both sides to achieve a benefit. Seems like it's all
send-side, but the benefit is receiver-side. Also worth noting that this
then would allow widescale impact by deploying this at busy servers,
avoiding per-client deployment for benefit.

In 2.1, do you want to define this in terms a fixed value of 4*SMSS, or
define it as a pointer (i.e., to the initial CWND, so if init CWND
increases, so does this?) same for the part about packet-based (again,
would that be segment-based?) not referring to 4, but the number of
segments in the initial CWND (e.g., as "currently 4" -- PS, should that
be 4, or shouldn't it be "initial_CWND/SMSS", i.e., a max of 4, but in
most current cases it seems like this would still be 3).

(2.b) should call it the 'advertised receive window' (or
receiver-advertised window, whatever is more common) for clarity

These rules (2.a, 2.b) seem odd in the context of saying this is a MAY
for SCTP (above the list) and then have a different set of rules in the
paragraph below (end of page 4). IMO, put in two different rulesets
(ditto for section 3, FWIW):

	1. TCP without SACK or for connections not supporting SACK

	2. TCP with SACK and SCTP

This would avoid the self-reference to Early Rexmit in the last
paragraph, which is (AFAICT, again), not yet defined. Did you mean to
define it above:

    When the above two conditions hold and the connection does not
    support SACK the duplicate ACK threshold used to trigger a
    retransmission MUST be reduced to:

                  ER_thresh = ceiling (ownd/SMSS) - 1

I would add "we call this reduced ACK threshhold enabling 'Early
Retransimission',and when a retransmission occurs because of ER_thresh,
we call that an Early Retransmission.", even if you split out the rules
for non-SACK and SACK. This allows you to refer to it at the top of page
5 correctly, since it would now be defined.

Also, maybe I'm missing something, but I searched for ER_thresh all over
the place. It isn't *used* anywhere. I.e., you define a variable but
never use it. Seems like you need to use it where you say "the timer
(ER_thresh) goes off and ..." somewhere specific. However, you say that
you're lowering the fast rexmit threshold. So then wouldn't you be
setting "FR_thresh", not "ER_thresh"? Even if so, it's useful to recap
how the *_thresh value is used.

The examples on page 5 need to include a bit about Nagle; if Nagle is
on, you would never have three outstanding 400-byte segments  ;-)

When you talk about packet-based rexmit, did you want to say as well
that "a TCP or SCTP that implements packet-based rexmit MUST NOT also
implement byte-based rexmit", i.e., that packet-based rexmit supecedes
byte-based rexmit? I could see the two MAYs being considered
simultaneously, and that would be bad, no?

When you say MUST NOT use ER, do you mean to use FR (fast) and LR
(Limited), or is LR a superset of FR?

Not sure about the explanation of the circular list for keeping track of
segment boundaries. You say "fall within this region", but with SACK the
region can be disconnected, so I'm not sure how to interpret "region".
Also, seems like you need to know the length of the segments too, no?
You can't assume they're all the same size...

Which brings me to a wrinkle - what happens if TCP resends data with
different segment sizes, resulting in some segments being on different
boundaries than those that may already be received (e.g., on multipath
when a PMTU update comes in, and data is resent and resegmented
differently). Your segment alg needs to be robust to this, or you need
to explain why that doesn't matter.

Sec 3 (Discussion) should start with an overview of what you intend to
discuss. Are these all issues? Do you want to say the benefits too?
Impact on legacy receivers (if any)? Deployment motivation (who does it
benefit, and who does the work?) Deployment asymmetry? etc. I'd then
break the section down to these sorts of topics (e.g., benefits, impact
of failure, deployment). (right now you jump in to the details of the
preferred variant - which IMO belongs in the packet-based section, not
here, then jump to the impact of failure without giving examples of
benefit first)

The SACK preference discussion needs more lead-in. The first paragraph
could easily be broken into three paragraphs with better context, and
would make the argument more clear.

Related work needs an intro. I.e., that ECN avoids dropping the segments
in the first place (which benefits TCP in many ways, but notably avoids
cases that would currently take an RTO to recover), and other ways to
cope with not having data to send (which implies loss, but different
ways to recover). Also, doesn't the second one (Bal98) also result in a
self-attack, i.e., it opens the CWND because an additional ACK will be
received, even though no data is sent? I.e., there's a side-effect to
this that is probably worth avoiding... (separate question - has anyone
noted that 0-window segment ACKs shouldn't open the window, or is that
already the case?)

Other considerations: seems like you're making TCP send more segments
into the net when data is being lost, vs. the existing mechanisms. If
that's the case, and if loss is due to buffer overload, are you making
things potentially worse? If not, please explain.

I don't see why you have a normative reference to Eiffel; you don't
depend on it that way. Seems like that's an informational reference in
this case - esp. since you refer to it in the appendix discussing
research options, not the body.

- ----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla -