[tcpm] Detect Lost Retransmit with SACK
Alexander Zimmermann <alexander.zimmermann@nets.rwth-aachen.de> Mon, 09 November 2009 15:27 UTC
Return-Path: <alexander.zimmermann@nets.rwth-aachen.de>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8DD013A69A1 for <tcpm@core3.amsl.com>; Mon, 9 Nov 2009 07:27:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.248
X-Spam-Level:
X-Spam-Status: No, score=-3.248 tagged_above=-999 required=5 tests=[AWL=0.953, BAYES_00=-2.599, HELO_EQ_DE=0.35, HELO_MISMATCH_DE=1.448, J_CHICKENPOX_33=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MJW-xsUPCKyY for <tcpm@core3.amsl.com>; Mon, 9 Nov 2009 07:27:22 -0800 (PST)
Received: from mta-2.ms.rz.rwth-aachen.de (mta-2.ms.rz.RWTH-Aachen.DE [134.130.7.73]) by core3.amsl.com (Postfix) with ESMTP id D98483A6B61 for <tcpm@ietf.org>; Mon, 9 Nov 2009 07:27:15 -0800 (PST)
MIME-version: 1.0
Content-type: text/plain; charset="utf-8"; format="flowed"; delsp="yes"
Received: from ironport-out-1.rz.rwth-aachen.de ([134.130.5.40]) by mta-2.ms.rz.RWTH-Aachen.de (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008)) with ESMTP id <0KSU00EP0LM5LEF0@mta-2.ms.rz.RWTH-Aachen.de> for tcpm@ietf.org; Mon, 09 Nov 2009 16:27:41 +0100 (CET)
X-IronPort-AV: E=Sophos;i="4.44,708,1249250400"; d="scan'208";a="33057856"
Received: from relay-auth-2.ms.rz.rwth-aachen.de (HELO relay-auth-2) ([134.130.7.79]) by ironport-in-1.rz.rwth-aachen.de with ESMTP; Mon, 09 Nov 2009 16:27:41 +0100
Received: from miami.nets.rwth-aachen.de ([unknown] [137.226.12.180]) by relay-auth-2.ms.rz.rwth-aachen.de (Sun Java(tm) System Messaging Server 7.0-3.01 64bit (built Dec 9 2008)) with ESMTPA id <0KSU00JNWLM5HI60@relay-auth-2.ms.rz.rwth-aachen.de> for tcpm@ietf.org; Mon, 09 Nov 2009 16:27:41 +0100 (CET)
From: Alexander Zimmermann <alexander.zimmermann@nets.rwth-aachen.de>
Content-transfer-encoding: quoted-printable
Date: Mon, 09 Nov 2009 16:27:42 +0100
Message-id: <2D6C2761-5FC9-46E3-95D4-FDE632D4469C@nets.rwth-aachen.de>
To: Richard Scheffenegger <rs@netapp.com>
X-Mailer: Apple Mail (2.1076)
Cc: "tcpm@ietf.org Extensions WG" <tcpm@ietf.org>
Subject: [tcpm] Detect Lost Retransmit with SACK
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Nov 2009 15:27:23 -0000
Hi Richard, firstly welcome on the list :-) Since your question in not really related to the poll I change the title... Comments inline. Am 09.11.2009 um 13:57 schrieb Scheffenegger, Richard: > > > Hi Alexander et al., > > This will be the first post to this group, so excuse me if I act > inappropriately. > > I'm curious about one little tidbit which has been bugging me for > the better part of the last two monts, and which is closely related > with TCP SACK operations (thus it might belong to this thread?) > > > The implicit assumption for TCP fast recovery is, that packet loss > happens randomly (ie. to different segments each time) with low > correlation between the drop events. Also, a drop event is used as a > implicit signal to indicate congestion. So far, so good. > > It seems to me, that the focus of most developments has been the > internet environment - where statistical assumptions like the above > mentioned arguably hold true. > > However, certain high-speed LANs seem to exhibit characteristics, > which don't play well with these implicit assumptions (uncorrelated > packet loss) - the smaller the network, the more deviation from an > "good seasoned" link (exhibiting some form of congestion) is likely > to occur. > > Also, as has been noted in prior research, many internet routers do > use more "tcp-friendly" RED or WRED queue policies, over the > simplistic TailDrop most often encountered in LANs (default policy > of L2 switches and L3 routers). > > In one extreme, I have found a (misbehaving´?) TCP stack/host, which > sends out a burst of segments (4-6) @ 10GbE wirespeed, which > immediately cause queue buffer overload and TailDrop in the first > hop L2 Switch, when two such high performance hosts try to establish > a high speed communication. With other words, the hosts themselves > seem to make sure that there is a high correlation between TCP > (fast) recovery and further packet loss. > > > But what puzzles me the most - even with SACK enabled TCP stacks, > virtually no implementation can detect / act upon detection of the > loss of a retransmitted segment during fast recovery. This despite > the fact, that the stipulations in RFC3517 requires the receiver to > make the information to detect such an event implicitly available to > the sender. The first SACK option has to reflect the last segment, > which triggered this SACK. > > Together with the scoreboard held at the sender, it should be rather > easy to find out if the left edge of the lowest hole (relative to > stream octets) closes. What do you with "left edge of the lowest hole"? Do you mean SND.UNA? If ACK covers SND.UNA then it is an cumulative ACK. > > If that left edge stays constant for "DupThresh" number of ACKs, > which reduce the overall number of octets in holes (any one hole > might close due to the retransmitted packets still received), AND > the sender retransmits beginning with the lowest hole first, this > would be a clear indication of another segment retransmit loss... Sorry, I don't understand. If we have 20 segments in flight and one segment gets lost, you will retransmit after 3 DUPACKS the oldest outstanding segment. Then, assuming no reordering and no further lost, you will get 17 DUPACKS (without Limited Transmit) before your hole is closed. What do I miss here? Can you give me an example? > > Even a less speedy detection logic would work for SACK-enabled > sessions: once the fast recovery is finished from the sender's point > of view, if the receiver still complains about missing segments > (indicated by having the SACK rightmost edge - in the first slot > SACK option - at a segment higher than when fast recovery started), > another round of fast recovery could be invoked, rather than waiting > for RTO. > > Of course, the first approach would be better for low cwnd sessions > with only very few segments in transit - and both could be combined > with the proposed sack recovery speed-ups... (Reducing DupThresh for > low cwnd sessions / when little data is being sent). > > > > Congestion control should act to this event (it will now, but only > one RTO later...), and the SACK retransmit vector (HighRxt) reset, > using LimitedTransmit for sending out the retransmission segments - > once cwnd + pipe allows; any retransmitted segments still in the > network will close their respective SACK holes before the new > HighRxt advances to them. > > And, RTO should be reduce (I guess to nearly zero, between SACK- > enabled hosts). > > > I have run numerous tests, to check the behavior of different TCP > Stacks (FreeBSD 4.2 - 8.0; windows xp, vista, 7, 2003; Linux 2.6.16 > and others). > > > All these stacks seem to exhibit this issue; What I don't know yet > is the percentage of multi-loss segement events triggering RTO - but > I assume that the majority of RTOs happen because of this. > > In LAN environments (ie. 10 GbE over 1 km @ 2 ms latency due to the > L2 hops in between) featuring relatively few streams, the effect of > any single RTO can be quite tremendeous - taking considerable > theoretical bandwidth away from the session (ie. 1 sec minimum RTO > equals 1.2 GB; even with more recent RTO values around 0.2 - 0.4 > sec, each RTO is still a few hundred MB "lost" capacity under > optimal circumstances. > > > Nevertheless, I cann't imagine that I am the first one to bring up > this issue (despite having failed to find any study of this > effect). :) > > > One more clarification, which came up after I looked at the FreeBSD > implementation of Limited Transmit; this might be a nit-pick, but > when RFC 3042 is active, shouldn't ABC also be used during > LimitedTransmit / FastRecovery? Why? One reason for ABC are lying receivers (ACK Division). So, the worst case is Slow-Start... > (FreeBSD MAIN is increasing cwnd by 1 mss for each new ACK, instead > for the amount of data in that ack... What do you describe here? Slow-Start? RFC 3042 says: "The congestion window (cwnd) MUST NOT be changed when these new segments are transmitted." > > Thanks a lot! > > > Best regards, Alex > > > > Richard Scheffenegger > Field Escalation Engineer > NetApp Global Support > NetApp > +43 1 3676811 3146 Office (2143 3146 - internal) > +43 676 654 3146 Mobile > www.netapp.com <BLOCKED::http://www.netapp.com/> > Franz-Klein-Gasse 5 > 1190 Wien > > * To: "tcpm at ietf.org <mailto:tcpm@DOMAIN.HIDDEN> WG Extensions" > <tcpm at ietf.org <mailto:tcpm@DOMAIN.HIDDEN> > > * Subject: [tcpm] Should draft-ietf-tcpm-sack-recovery-entry update > RFC 3717 (SACK-TCP) > * From: Alexander Zimmermann <alexander.zimmermann at nets.rwth-aachen.de > <mailto:alexander.zimmermann@DOMAIN.HIDDEN> > > * Date: Wed, 21 Oct 2009 12:22:50 +0200 > > _____ > > Hi folks, > > based on the fact that the draft "draft-ietf-tcpm-sack-recovery- > entry" is adopted as WG item now and intended to be a "standards > track" document, I would like to start a poll/discussion whether the > draft should update RFC 3517 or not? Moreover, should we produce a > separate document or an update of RFC 3517? > > a) separate document, do not update RFC 3517 > b) separate document, update RFC 3517 > c) RFC3517bis, obsolete RFC 3517 > > // > // Dipl.-Inform. Alexander Zimmermann > // Department of Computer Science, Informatik 4 > // RWTH Aachen University > // Ahornstr. 55, 52056 Aachen, Germany > // phone: (49-241) 80-21422, fax: (49-241) 80-22220 > // email: zimmermann at cs.rwth-aachen.de > // web: http://www.umic-mesh.net > // > > > _______________________________________________ > tcpm mailing list > tcpm@ietf.org > https://www.ietf.org/mailman/listinfo/tcpm // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22220 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22220 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net //
- [tcpm] Detect Lost Retransmit with SACK Alexander Zimmermann
- Re: [tcpm] Detect Lost Retransmit with SACK Scheffenegger, Richard
- [tcpm] LimitedTransport||FastRecovery / ABC inter… Scheffenegger, Richard
- Re: [tcpm] Detect Lost Retransmit with SACK Scheffenegger, Richard
- Re: [tcpm] Detect Lost Retransmit with SACK Alexander Zimmermann
- Re: [tcpm] Detect Lost Retransmit with SACK Scheffenegger, Richard
- Re: [tcpm] Detect Lost Retransmit with SACK Alexander Zimmermann
- Re: [tcpm] Detect Lost Retransmit with SACK Ilpo Järvinen