Re: [tcpm] Should draft-ietf-tcpm-sack-recovery-entry update RFC 3717 (SACK-TCP)

"Scheffenegger, Richard" <rs@netapp.com> Mon, 09 November 2009 12:56 UTC

Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 09 Nov 2009 12:57:10 -0000
Message-ID: <5FDC413D5FA246468C200652D63E627A04B7BDD2@LDCMVEXC1-PRD.hq.netapp.com>
In-Reply-To: <BDCD27E02574334EAA007E46C0E3534C010107252A@LDCMVEXC1-PRD.hq.netapp.com>
Thread-Topic: [tcpm] Should draft-ietf-tcpm-sack-recovery-entry update RFC 3717 (SACK-TCP)
Thread-Index: AcpgvctD7c1gi066QDSC+UtyVkoUcAAeqhnQ
From: "Scheffenegger, Richard" <rs@netapp.com>
To: tcpm@ietf.org
Subject: Re: [tcpm] Should draft-ietf-tcpm-sack-recovery-entry update RFC 3717 (SACK-TCP)
Precedence: list

Hi Alexander et al.,

This will be the first post to this group, so excuse me if I act inappropriately.

I'm curious about one little tidbit which has been bugging me for the better part of the last two monts, and which is closely related with TCP SACK operations (thus it might belong to this thread?)

The implicit assumption for TCP fast recovery is, that packet loss happens randomly (ie. to different segments each time) with low correlation between the drop events. Also, a drop event is used as a implicit signal to indicate congestion. So far, so good.

It seems to me, that the focus of most developments has been the internet environment - where statistical assumptions like the above mentioned arguably hold true.

However, certain high-speed LANs seem to exhibit characteristics, which don't play well with these implicit assumptions (uncorrelated packet loss) - the smaller the network, the more deviation from an "good seasoned" link (exhibiting some form of congestion) is likely to occur.

Also, as has been noted in prior research, many internet routers do use more "tcp-friendly" RED or WRED queue policies, over the simplistic TailDrop most often encountered in LANs (default policy of L2 switches and L3 routers).

In one extreme, I have found a (misbehaving´?) TCP stack/host, which sends out a burst of segments (4-6) @ 10GbE wirespeed, which immediately cause queue buffer overload and TailDrop in the first hop L2 Switch, when two such high performance hosts try to establish a high speed communication. With other words, the hosts themselves seem to make sure that there is a high correlation between TCP (fast) recovery and further packet loss.

But what puzzles me the most - even with SACK enabled TCP stacks, virtually no implementation can detect / act upon detection of the loss of a retransmitted segment during fast recovery. This despite the fact, that the stipulations in RFC3517 requires the receiver to make the information to detect such an event implicitly available to the sender. The first SACK option has to reflect the last segment, which triggered this SACK.

Together with the scoreboard held at the sender, it should be rather easy to find out if the left edge of the lowest hole (relative to stream octets) closes.

If that left edge stays constant for "DupThresh" number of ACKs, which reduce the overall number of octets in holes (any one hole might close due to the retransmitted packets still received), AND the sender retransmits beginning with the lowest hole first, this would be a clear indication of another segment retransmit loss...

Even a less speedy detection logic would work for SACK-enabled sessions: once the fast recovery is finished from the sender's point of view, if the receiver still complains about missing segments (indicated by having the SACK rightmost edge - in the first slot SACK option - at a segment higher than when fast recovery started), another round of fast recovery could be invoked, rather than waiting for RTO.

Of course, the first approach would be better for low cwnd sessions with only very few segments in transit - and both could be combined with the proposed sack recovery speed-ups... (Reducing DupThresh for low cwnd sessions / when little data is being sent).

Congestion control should act to this event (it will now, but only one RTO later...), and the SACK retransmit vector (HighRxt) reset, using LimitedTransmit for sending out the retransmission segments - once cwnd + pipe allows; any retransmitted segments still in the network will close their respective SACK holes before the new HighRxt advances to them.

And, RTO should be reduce (I guess to nearly zero, between SACK-enabled hosts).

I have run numerous tests, to check the behavior of different TCP Stacks (FreeBSD 4.2 - 8.0; windows xp, vista, 7, 2003; Linux 2.6.16 and others).

All these stacks seem to exhibit this issue; What I don't know yet is the percentage of multi-loss segement events triggering RTO - but I assume that the majority of RTOs happen because of this.

In LAN environments (ie. 10 GbE over 1 km @ 2 ms latency due to the L2 hops in between) featuring relatively few streams, the effect of any single RTO can be quite tremendeous - taking considerable theoretical bandwidth away from the session (ie. 1 sec minimum RTO equals 1.2 GB; even with more recent RTO values around 0.2 - 0.4 sec, each RTO is still a few hundred MB "lost" capacity under optimal circumstances.

Nevertheless, I cann't imagine that I am the first one to bring up this issue (despite having failed to find any study of this effect). :)

One more clarification, which came up after I looked at the FreeBSD implementation of Limited Transmit; this might be a nit-pick, but when RFC 3042 is active, shouldn't ABC also be used during LimitedTransmit / FastRecovery? (FreeBSD MAIN is increasing cwnd by 1 mss for each new ACK, instead for the amount of data in that ack...

Thanks a lot!

Best regards,

Richard Scheffenegger
Field Escalation Engineer
NetApp Global Support
NetApp
+43 1 3676811 3146 Office (2143 3146 - internal)
+43 676 654 3146 Mobile
www.netapp.com <BLOCKED::http://www.netapp.com/>
Franz-Klein-Gasse 5
1190 Wien

* To: "tcpm at ietf.org <mailto:tcpm@DOMAIN.HIDDEN> WG Extensions" <tcpm at ietf.org <mailto:tcpm@DOMAIN.HIDDEN> >
* Subject: [tcpm] Should draft-ietf-tcpm-sack-recovery-entry update RFC 3717 (SACK-TCP)
* From: Alexander Zimmermann <alexander.zimmermann at nets.rwth-aachen.de <mailto:alexander.zimmermann@DOMAIN.HIDDEN> >
* Date: Wed, 21 Oct 2009 12:22:50 +0200

_____

Hi folks,

based on the fact that the draft "draft-ietf-tcpm-sack-recovery-entry" is adopted as WG item now and intended to be a "standards track" document, I would like to start a poll/discussion whether the draft should update RFC 3517 or not? Moreover, should we produce a separate document or an update of RFC 3517?

a) separate document, do not update RFC 3517
b) separate document, update RFC 3517
c) RFC3517bis, obsolete RFC 3517

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimmermann at cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//

[tcpm] Should draft-ietf-tcpm-sack-recovery-entry… Alexander Zimmermann
Re: [tcpm] Should draft-ietf-tcpm-sack-recovery-e… Scheffenegger, Richard
Re: [tcpm] Should draft-ietf-tcpm-sack-recovery-e… Arnd Hannemann
Re: [tcpm] Should draft-ietf-tcpm-sack-recovery-e… Scheffenegger, Richard
[tcpm] Detect Lost Retransmit mit SACK Arnd Hannemann
Re: [tcpm] Detect Lost Retransmit mit SACK Scheffenegger, Richard
Re: [tcpm] Detect Lost Retransmit mit SACK Arnd Hannemann
Re: [tcpm] Detect Lost Retransmit mit SACK Scheffenegger, Richard
Re: [tcpm] Detect Lost Retransmit mit SACK Alexander Zimmermann
Re: [tcpm] Detect Lost Retransmit mit SACK Arnd Hannemann
Re: [tcpm] Detect Lost Retransmit mit SACK Scheffenegger, Richard