Re: [tcpm] Detect Lost Retransmit with SACK
"Scheffenegger, Richard" <rs@netapp.com> Mon, 09 November 2009 18:06 UTC
Return-Path: <rs@netapp.com>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AF6B23A6916 for <tcpm@core3.amsl.com>; Mon, 9 Nov 2009 10:06:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.232
X-Spam-Level:
X-Spam-Status: No, score=-4.232 tagged_above=-999 required=5 tests=[AWL=-1.033, BAYES_00=-2.599, J_CHICKENPOX_33=0.6, RCVD_IN_DNSWL_MED=-4, SARE_BAYES_7x5=0.8, SARE_BAYES_8x5=0.8, SARE_BAYES_9x5=1.2]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tCssxJKtCU+m for <tcpm@core3.amsl.com>; Mon, 9 Nov 2009 10:06:00 -0800 (PST)
Received: from mx3.netapp.com (mx3.netapp.com [217.70.210.9]) by core3.amsl.com (Postfix) with ESMTP id 13D3C3A67B2 for <tcpm@ietf.org>; Mon, 9 Nov 2009 10:05:59 -0800 (PST)
X-IronPort-AV: E=Sophos;i="4.44,710,1249282800"; d="scan'208";a="103035994"
Received: from smtp3.europe.netapp.com ([10.64.2.67]) by mx3-out.netapp.com with ESMTP; 09 Nov 2009 10:06:24 -0800
Received: from amsrsexc1-prd.hq.netapp.com (emeaexchrs.hq.netapp.com [10.64.251.107]) by smtp3.europe.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id nA9I6OBw005579; Mon, 9 Nov 2009 10:06:24 -0800 (PST)
Received: from LDCMVEXC1-PRD.hq.netapp.com ([10.65.251.108]) by amsrsexc1-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 9 Nov 2009 19:06:25 +0100
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 09 Nov 2009 18:05:53 -0000
Message-ID: <5FDC413D5FA246468C200652D63E627A0649169F@LDCMVEXC1-PRD.hq.netapp.com>
In-Reply-To: <5FDC413D5FA246468C200652D63E627A06491679@LDCMVEXC1-PRD.hq.netapp.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [tcpm] Detect Lost Retransmit with SACK
Thread-Index: AcphUPVmAHvrINqeQEyz/nm2u8XmcwACuYPgAAHdeFA=
From: "Scheffenegger, Richard" <rs@netapp.com>
To: Alexander Zimmermann <zimmermann@nets.rwth-aachen.de>
X-OriginalArrivalTime: 09 Nov 2009 18:06:25.0018 (UTC) FILETIME=[59D5A5A0:01CA6167]
Cc: tcpm@ietf.org
Subject: Re: [tcpm] Detect Lost Retransmit with SACK
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Nov 2009 18:06:02 -0000
Hi group, I forgot to mention the actual testing scenario I was doing, to profile all these TCP stack against. Basically, I used a userland TCP "forging" tool, where each frame can be individually crafted (content, timing, loss). My test opens a tcp session (http get request, for simplicity's sake), with SACK negotiated and then counts the segments being received, behaving (mostly) like a well-behaving TCP client. However, the segments with these numbers: 200, 250, 253, 255, 257, 258, 259, 260, 265, 267 are dropped this number of times they are seen in the stream: 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 The grace period of 200 packets is to have a decently wide open cwnd; the drop at Segment 200 also serves to check if the cwnd is larger than 50 segments when the Burst drop (250-267) occurs, and also to "prime" the SACK scoreboard (preventing the sender from fastpathing). The burst in this case is in the time axis and, with segment number 258, sequence space axis... None of the TCP Stacks I have investigated so far, were able to recover without a RTO (between 0.2 and 1 sec later; Windows 7 was particularly peculiar, as it starts shifting the original segments after the 2nd or 3rd dropped segment; it seems to retransmit 1/2 1 1 1 1/2 segments if a contingeous hole > 1 segment is being announced by SACK... :) But my code still drops the one containing the 258th sequence number again, leading even Win7 to a RTO... And, on another front, I have checked a few systems in the field (our gear Is run typically in high-speed (1/10 Gbps) LANs; I found one example where Nearly 50% of the retransmissions where followed by a RTO, and even the Less loaded systems showed a quite high number of RTOs (15-35%) after Retransmissions. I assume at this point, that only a minority of the RTOs is "legitimate" in the sense that *) TCP Session is not running with SACK *) Client was forcefully removed from the network (loss of connectivity) Which leaves probably between 70 and 95% or the RTO events as "burst loss" candidates, where keeping the DUPACK detection armed during FastRetransmit would help. I will see to it, that I get statistically more relevant data, and also Put this into context (i.e. total segments transmitted per week vs. total retransmitted segments per week vs. retransmit timeout events per week). (Actually, I got scared at first, when I saw that high-load system reporting 50% of all retransmissions are followed by RTOs... :) ). Richard Scheffenegger Field Escalation Engineer NetApp Global Support NetApp +43 1 3676811 3146 Office (2143 3146 - internal) +43 676 654 3146 Mobile www.netapp.com Franz-Klein-Gasse 5 1190 Wien -----Original Message----- From: Scheffenegger, Richard Sent: Montag, 9. November 2009 18:27 To: Alexander Zimmermann Cc: tcpm@ietf.org Subject: Re: [tcpm] Detect Lost Retransmit with SACK Hi Alexander, Thanks for the welcome :) I fork another thread with the LimitedTransport||FastRecovery / ABC interaction... I will try to sketch up an example to demonstrate what problem I'm trying to address: Let's assume the cwnd is already open for at least 7 segments, before the segment with sequence number 10000 is the first one to be dropped by the network. Also, let's assume that FastRetransmit runs from the left edge of the leftmost hole (SND.UNA) upwards, and that per ACK only a single segment is sent. Triggering ACK Left Right Left Right Segment Edge 1 Edge 1 Edge 2 Edge 2 9000 9000 10000 (lost) * 11000 (lost) 12000 (lost) 13000 (lost) 14000 (lost) 15000 10000 15000 16000 16000 10000 15000 17000 17000 10000 15000 18000 3 ACKs trigger fast retransmit 10000 (lost again) 11000 10000 11000 12000 15000 18000 12000 10000 11000 13000 15000 18000 13000 10000 11000 14000 15000 18000 -> here we have again 3 ACKs indicating a another loss of one of the retransmitted packets. The leftmost hole did not change, while the overall number of SACKed octets did decrease for 3 consecutive ACKs (4; 3 and 2 segments marked by SACK). Current behaviour of investigated TCP Stacks: 14000 10000 11000 18000 (normal transmit resumes) 18000 10000 11000 19000 19000 10000 11000 20000 20000 10000 11000 21000 21000 10000 11000 22000 22000 10000 11000 23000 :: :: :: :: Eventually, RTO trips off, retransmitting the lost segment; this happens RTO later, followed by slow-start... 50000 10000 11000 50000 :: :: 10000 50000 However, this can be somewhere between 0.2 and 1.0 sec later with a "fresh" TCP session (no prior connection properties known (cached) by sender). Most likely, the cwnd has filled up already way sooner (as demonstated, the problem seems to be most prominent in Highspeed LANs), so that for nearly as long, no data is actually transmitted. Proposed behaviour: Triggering ACK Left Right Left Right Segment Edge 1 Edge 1 Edge 2 Edge 2 9000 9000 10000 (lost) * 11000 (lost) 12000 (lost) 13000 (lost) 14000 (lost) 15000 10000 15000 16000 16000 10000 15000 17000 17000 10000 15000 18000 3 ACKs trigger fast retransmit 10000 (lost again) * 11000 10000 11000 12000 15000 18000 12000 10000 11000 13000 15000 18000 13000 10000 11000 14000 15000 18000 Once the ACK + SACK options indicate that the leftmost hole is not shrinking, while the SACKed octets are increasing (to deal with clients which send one retransmission segment and one new segment interspaced, or when multiple holes exist which are being filled, or when network reordering occurs, or when some wore segments get lost again): Reset the Rexmit vector to the beginning of the Hole-List (SND.UNA), clear counter to count duplicates (just in case one segment gets lost again during retransmit), and keep the DUPACK detection logic armed... Also, this reaction should not occur before 1 RTT - so ACKs subsequent to the three which indicated the "lost again" segment will take care (in the typical case) that no segments are retransmitted needlessly. ACK processing has to occur before deciding which segment (retransmit / new) to send next. Holes will then be marked fully retransmitted, before the 2nd retransmission round would advance to them). 10000 14000 15000 18000 14000 18000 (normal transmit resumes, but with cwnd shrunk by 2 congestion events) 18000 19000 19000 20000 And yes, I was unclear with the use of the terminology; I should have probably stated "pipe" instead of "cwnd" below, as cwnd is not touched during LimitedTransmit / FastRetransmit... Richard Scheffenegger Field Escalation Engineer NetApp Global Support NetApp +43 1 3676811 3146 Office (2143 3146 - internal) +43 676 654 3146 Mobile www.netapp.com Franz-Klein-Gasse 5 1190 Wien -----Original Message----- From: Alexander Zimmermann [mailto:zimmermann@nets.rwth-aachen.de] Sent: Montag, 9. November 2009 16:26 To: Scheffenegger, Richard Cc: tcpm@ietf.org Extensions WG Subject: Detect Lost Retransmit with SACK Hi Richard, firstly welcome on the list :-) Since your question in not really related to the poll I change the title... Comments inline. Am 09.11.2009 um 13:57 schrieb Scheffenegger, Richard: > > > Hi Alexander et al., > > This will be the first post to this group, so excuse me if I act > inappropriately. > > I'm curious about one little tidbit which has been bugging me for the > better part of the last two monts, and which is closely related with > TCP SACK operations (thus it might belong to this thread?) > > > The implicit assumption for TCP fast recovery is, that packet loss > happens randomly (ie. to different segments each time) with low > correlation between the drop events. Also, a drop event is used as a > implicit signal to indicate congestion. So far, so good. > > It seems to me, that the focus of most developments has been the > internet environment - where statistical assumptions like the above > mentioned arguably hold true. > > However, certain high-speed LANs seem to exhibit characteristics, > which don't play well with these implicit assumptions (uncorrelated > packet loss) - the smaller the network, the more deviation from an > "good seasoned" link (exhibiting some form of congestion) is likely > to occur. > > Also, as has been noted in prior research, many internet routers do > use more "tcp-friendly" RED or WRED queue policies, over the > simplistic TailDrop most often encountered in LANs (default policy > of L2 switches and L3 routers). > > In one extreme, I have found a (misbehaving´?) TCP stack/host, which > sends out a burst of segments (4-6) @ 10GbE wirespeed, which > immediately cause queue buffer overload and TailDrop in the first > hop L2 Switch, when two such high performance hosts try to establish > a high speed communication. With other words, the hosts themselves > seem to make sure that there is a high correlation between TCP > (fast) recovery and further packet loss. > > > But what puzzles me the most - even with SACK enabled TCP stacks, > virtually no implementation can detect / act upon detection of the > loss of a retransmitted segment during fast recovery. This despite > the fact, that the stipulations in RFC3517 requires the receiver to > make the information to detect such an event implicitly available to > the sender. The first SACK option has to reflect the last segment, > which triggered this SACK. > > Together with the scoreboard held at the sender, it should be rather > easy to find out if the left edge of the lowest hole (relative to > stream octets) closes. What do you with "left edge of the lowest hole"? Do you mean SND.UNA? If ACK covers SND.UNA then it is an cumulative ACK. > > If that left edge stays constant for "DupThresh" number of ACKs, > which reduce the overall number of octets in holes (any one hole > might close due to the retransmitted packets still received), AND > the sender retransmits beginning with the lowest hole first, this > would be a clear indication of another segment retransmit loss... Sorry, I don't understand. If we have 20 segments in flight and one segment gets lost, you will retransmit after 3 DUPACKS the oldest outstanding segment. Then, assuming no reordering and no further lost, you will get 17 DUPACKS (without Limited Transmit) before your hole is closed. What do I miss here? Can you give me an example? > > Even a less speedy detection logic would work for SACK-enabled > sessions: once the fast recovery is finished from the sender's point > of view, if the receiver still complains about missing segments > (indicated by having the SACK rightmost edge - in the first slot > SACK option - at a segment higher than when fast recovery started), > another round of fast recovery could be invoked, rather than waiting > for RTO. > > Of course, the first approach would be better for low cwnd sessions > with only very few segments in transit - and both could be combined > with the proposed sack recovery speed-ups... (Reducing DupThresh for > low cwnd sessions / when little data is being sent). > > > > Congestion control should act to this event (it will now, but only > one RTO later...), and the SACK retransmit vector (HighRxt) reset, > using LimitedTransmit for sending out the retransmission segments - > once cwnd + pipe allows; any retransmitted segments still in the > network will close their respective SACK holes before the new > HighRxt advances to them. > > And, RTO should be reduce (I guess to nearly zero, between SACK- > enabled hosts). > > > I have run numerous tests, to check the behavior of different TCP > Stacks (FreeBSD 4.2 - 8.0; windows xp, vista, 7, 2003; Linux 2.6.16 > and others). > > > All these stacks seem to exhibit this issue; What I don't know yet > is the percentage of multi-loss segement events triggering RTO - but > I assume that the majority of RTOs happen because of this. > > In LAN environments (ie. 10 GbE over 1 km @ 2 ms latency due to the > L2 hops in between) featuring relatively few streams, the effect of > any single RTO can be quite tremendeous - taking considerable > theoretical bandwidth away from the session (ie. 1 sec minimum RTO > equals 1.2 GB; even with more recent RTO values around 0.2 - 0.4 > sec, each RTO is still a few hundred MB "lost" capacity under > optimal circumstances. > > > Nevertheless, I cann't imagine that I am the first one to bring up > this issue (despite having failed to find any study of this > effect). :) > > > One more clarification, which came up after I looked at the FreeBSD > implementation of Limited Transmit; this might be a nit-pick, but > when RFC 3042 is active, shouldn't ABC also be used during > LimitedTransmit / FastRecovery? Why? One reason for ABC are lying receivers (ACK Division). So, the worst case is Slow-Start... > (FreeBSD MAIN is increasing cwnd by 1 mss for each new ACK, instead > for the amount of data in that ack... What do you describe here? Slow-Start? RFC 3042 says: "The congestion window (cwnd) MUST NOT be changed when these new segments are transmitted." > > Thanks a lot! > > > Best regards, Alex > > > > Richard Scheffenegger > Field Escalation Engineer > NetApp Global Support > NetApp > +43 1 3676811 3146 Office (2143 3146 - internal) > +43 676 654 3146 Mobile > www.netapp.com <BLOCKED::http://www.netapp.com/> > Franz-Klein-Gasse 5 > 1190 Wien > > * To: "tcpm at ietf.org <mailto:tcpm@DOMAIN.HIDDEN> WG Extensions" > <tcpm at ietf.org <mailto:tcpm@DOMAIN.HIDDEN> > > * Subject: [tcpm] Should draft-ietf-tcpm-sack-recovery-entry update > RFC 3717 (SACK-TCP) > * From: Alexander Zimmermann <alexander.zimmermann at nets.rwth-aachen.de > <mailto:alexander.zimmermann@DOMAIN.HIDDEN> > > * Date: Wed, 21 Oct 2009 12:22:50 +0200 > > _____ > > Hi folks, > > based on the fact that the draft "draft-ietf-tcpm-sack-recovery- > entry" is adopted as WG item now and intended to be a "standards > track" document, I would like to start a poll/discussion whether the > draft should update RFC 3517 or not? Moreover, should we produce a > separate document or an update of RFC 3517? > > a) separate document, do not update RFC 3517 > b) separate document, update RFC 3517 > c) RFC3517bis, obsolete RFC 3517 > > // > // Dipl.-Inform. Alexander Zimmermann > // Department of Computer Science, Informatik 4 > // RWTH Aachen University > // Ahornstr. 55, 52056 Aachen, Germany > // phone: (49-241) 80-21422, fax: (49-241) 80-22220 > // email: zimmermann at cs.rwth-aachen.de > // web: http://www.umic-mesh.net > // > > > _______________________________________________ > tcpm mailing list > tcpm@ietf.org > https://www.ietf.org/mailman/listinfo/tcpm // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22220 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // _______________________________________________ tcpm mailing list tcpm@ietf.org https://www.ietf.org/mailman/listinfo/tcpm
- [tcpm] Detect Lost Retransmit with SACK Alexander Zimmermann
- Re: [tcpm] Detect Lost Retransmit with SACK Scheffenegger, Richard
- [tcpm] LimitedTransport||FastRecovery / ABC inter… Scheffenegger, Richard
- Re: [tcpm] Detect Lost Retransmit with SACK Scheffenegger, Richard
- Re: [tcpm] Detect Lost Retransmit with SACK Alexander Zimmermann
- Re: [tcpm] Detect Lost Retransmit with SACK Scheffenegger, Richard
- Re: [tcpm] Detect Lost Retransmit with SACK Alexander Zimmermann
- Re: [tcpm] Detect Lost Retransmit with SACK Ilpo Järvinen