[tcpm] revising 2581: setting ssthresh on RTOs

Folks-

We have talked about setting ssthresh after RTOs on this list a number
of times and we talked about it in Montreal.  I'd like to verify what
seemed the general mood in Montreal here on the list.  I think this is
the last bit we need to work out on 2581bis.  If you have something else
that you think needs done to the document, please yell.

RFC 2581 says that on each RTO a TCP reduces ssthresh to FlightSize /
2.  Consider a lost retransmit.  Say we RTO on segment X, cut ssthresh
to Y segments (Y > 2 - just for the example), cwnd becomes 1 segment and
the RTO is backed off.  Now, say we RTO on segment X again.  The cwnd
will stay at one, but the FlightSize is now 1 and so ssthresh takes its
minimum value of 2 segments.

The observation is that this forces linear growth for a potentially long
time if this loss hiccup was caused by some small network issue like a
handoff.  In that case, it'd be nice to be able to keep ssthresh higher
and use slow start when packets started flowing again.

Also, as a practical matter I don't think the above scenario is the way
we intended things to work.  Rather, I think we envisioned (but, alas,
did not write) suggestion #1:

Suggestion #1: On the first RTO for some segment, set ssthresh to
FlightSize/2.  On each subsequent RTO for the given segment halve
ssthresh (ssthresh =/ 2).

Basically, this slowly degrades ssthresh as the RTO gets backed off,
such that the longer TCP has been transmitting into a lousy network the
less the TCP gets to use exponential increase when packets start flowing
again.

In addition, another variant has been suggested in the meantime, ...

Suggestion #2: On the first RTO for some segment, set ssthresh to
FlightSize/2.  On each subsequent RTO for the given segment do not
adjust ssthresh at all.

This variant means that a TCP always gets to re-probe with slow start
based on the pre-loss conditions no matter how long it took to fix the
loss. 

Both #1 and #2 are quite safe.  If the network is in a really lousy
state then the TCP is going to continue to get losses even after getting
out of RTO backoff without increasing the congestion window all that
much.  And, if that happens then ssthresh will get further reduced
(probably to its minimum).  Essentially, if the network is heavily
loaded all of the sudden then this additional loss isn't really going to
be exacerbated by the first couple RTTs of slow start.  If the backoff
was caused by something other than a suddenly massively congested
network then this tweak lets TCP get back to a reasonable operating
point more rapidly.

So, a couple of questions ... and, the authors current hits ...

(1) Is this change in-scope for 2581bis?  We said "no algorithmic
    tweaks" and so one view is that this should be cooked elsewhere and
    rolled in later.

    The author's hit on this is that the behavior of slamming ssthresh
    down on the first backed off RTO is not our intent and so tweaking
    this seems in-scope.  

(2) Assuming folks are fine with making a change then which change
    should we make?  Suggestion #1 or #2?

    The author's chatted and we feel like #2 is fine.  As noted above,
    neither case really aggravates the state of the network in suddenly
    heavily loaded situations.  So, #2 seems OK.

What do people think?  Is #2 OK?  Or, something else?

Thanks in advance for the feedback!

allman