[Tsvwg] Eifel Detection & Loss of all ACKs

Sally,

I have given this thread a separate subject line to make it easier to keep 
track ...

At 02:02 28.08.2002, Sally Floyd wrote:
>Chiming in about the case of an RTO due to the loss of ACKs:
>
>Given the absence of any separate form of congestion control on
>reverse-path ACK-only traffic, it seems to me that the ideal behavior
>would be to interpret the loss of all of the ACKs for a window of
>data as an instance of congestion, and as a consequence to reduce
>the sending rate.  So it seems to me that the *ideal* behavior would
>be to interpret such an event as a valid retransmit timeout.  My
>vote would be for the draft to say that: e.g., "while some
>would argue that the ideal behavior would be to interpret such an
>event as a valid retransmit timeout, the Eifel detection algorithm
>interprets it as a spurious timeout."

Fine. That sounds reasonable, and I'll include that in the next revision.

I would like to explain why we ended up with the current definition of a 
spurious timeout given in the -04 version of the draft:

I admit that it sounds odd to call a timeout that occured because an entire 
window of ACKs got lost *while (at least) the oldest outstanding segment 
was not lost* a spurious timeout. This is even more so since such a timeout 
is unavoidable. However, also in this situation does the standard TCP 
sender "misbehave" since it does a go-back-N in slow start, potentially 
breaking packet conservation. I.e., it retransmits all outstanding segments 
at double ACK-clock speed without knowing whether those segments were lost 
or not.

It was our intention to have Eifel detection detect this case to give 
response schemes, e.g., Eifel response, the chance to avoid the go-back-N. 
Also, we didn't find it worth to distinguish that case from the common case 
of a spurious timeout where the retransmission timer simply fired to quick 
(usually because of a sudden delay spike on the path). This is because we 
believe that "loss of all ACKs while (at least) the oldest outstanding 
segment was not lost" must be a pretty rare corner case.

If we really wanted to make this a special case that the TCP sender could 
detect as such, I see no other alternative but to use the DSACK option. But 
that would only make things more complicated, and I don't think that effort 
is worth the benefit.

>It would not seem a big danger to end-to-end congestion control if
>the overall Eifel algorithm occasionally failed to halve its
>congestion window in the rare case when all of the ACKs in a window
>of data were lost ...

one needs to add: "... while (at least) the oldest outstanding segment was 
not lost".

Only in that case does Eifel detection announce that loss recovery was 
entered unnecessarily, and the Eifel response reacts by avoiding the 
go-back-N, reversing congestion control state, and re-initializing SRTT and 
RTTVAR.

If all ACKs were lost but also the oldest outstanding segment, Eifel 
detection/response does nothing.

>On the other hand, if the overall Eifel algorithm
>caused the TCP sender to repeatedly fail to halve its congestion
>window in an environment with heavy losses on the return path, when
>all of the ACKs in window of data were frequently lost, then that
>would seem like a problem to me.  I don't have any idea if such a
>scenario is realistic or even theoretically possible.

So, the scenario in which Eifel would repeatedly reverse to the previous 
congestion control state would be: never a loss of the segment for which 
the timeout occured & loss of all ACKs in that case. (Sounds pretty 
pathological to me.)

>However, it
>does illustrate one danger of trying to evaluate a detection algorithm
>without considering at the same time the corresponding mechanism
>for using the detection algorithm.

That's why I suggested to Scott Bradner to consider
http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-tcp-eifel-alg-04.txt
and
http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-tcp-eifel-response-00.txt
together.

>I agree with Reiner that "go-back-N loss
>recovery" is not good, but it doesn't seem helpful to me to define
>any "go-back-N loss recovery" as a "spurious timeout".  This probably
>just requires being somewhat more precise in terminology.  (It
>doesn't seem helpful to me to refer to a timeout from the loss of
>a window of acks as a "spurious timeout".)

See above. Unless we also mandate the use of DSACK, the TCP sender has no 
way to distinguish between the two cases of what have defined as a spurious 
timeout.

Please, also note that standard TCP inevitably goes into go-back-N loss 
recovery in either case. So, both terms, "go-back-N loss recovery" and 
"spurious timeout", are somewhat tied together.

///Reiner

_______________________________________________
tsvwg mailing list
tsvwg@ietf.org
https://www1.ietf.org/mailman/listinfo/tsvwg