Re: [tcpm] An issue in RFC3517? (Re: end-of-stream loss recovery (TCP SACK) )

"Scheffenegger, Richard" <rs@netapp.com> Sat, 30 April 2011 09:11 UTC

Return-Path: <rs@netapp.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2DD2AE06E5 for <tcpm@ietfa.amsl.com>; Sat, 30 Apr 2011 02:11:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.545
X-Spam-Level:
X-Spam-Status: No, score=-10.545 tagged_above=-999 required=5 tests=[AWL=0.054, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QXRzEf9mGtxb for <tcpm@ietfa.amsl.com>; Sat, 30 Apr 2011 02:11:27 -0700 (PDT)
Received: from mx3.netapp.com (mx3.netapp.com [217.70.210.9]) by ietfa.amsl.com (Postfix) with ESMTP id 8D7C6E06E4 for <tcpm@ietf.org>; Sat, 30 Apr 2011 02:11:26 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.64,292,1301900400"; d="scan'208";a="252466896"
Received: from smtp3.europe.netapp.com ([10.64.2.67]) by mx3-out.netapp.com with ESMTP; 30 Apr 2011 02:11:25 -0700
Received: from ldcrsexc2-prd.hq.netapp.com (webmail.europe.netapp.com [10.65.251.110]) by smtp3.europe.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id p3U9BOEN005265; Sat, 30 Apr 2011 02:11:25 -0700 (PDT)
Received: from LDCMVEXC1-PRD.hq.netapp.com ([10.65.251.108]) by ldcrsexc2-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 30 Apr 2011 10:11:24 +0100
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Sat, 30 Apr 2011 10:11:03 +0100
Message-ID: <5FDC413D5FA246468C200652D63E627A0E190E33@LDCMVEXC1-PRD.hq.netapp.com>
In-Reply-To: <20110418175032.E9AB73A356D8@lawyers.icir.org>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [tcpm] An issue in RFC3517? (Re: end-of-stream loss recovery (TCP SACK) )
Thread-Index: Acv9+cQrdnSGpQxJRJSUnQ7Y+JcyiAJGXqYQ
References: <BANLkTin_sjEKDHKW8zLyFDddGAj7ByekAA@mail.gmail.com> <20110418175032.E9AB73A356D8@lawyers.icir.org>
From: "Scheffenegger, Richard" <rs@netapp.com>
To: mallman@icir.org, Yoshifumi Nishida <nishida@sfc.wide.ad.jp>
X-OriginalArrivalTime: 30 Apr 2011 09:11:24.0308 (UTC) FILETIME=[9425C540:01CC0716]
Cc: tcpm@ietf.org
Subject: Re: [tcpm] An issue in RFC3517? (Re: end-of-stream loss recovery (TCP SACK) )
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Apr 2011 09:11:28 -0000

Hi,

I have been thinking about this a little bit more.

There may be one peculiar corner case with the new Rule 4; I believe it's probably out-of-scope for 3517bis, but would like to point it out nevertheless.

>   (4) If the conditions for rules (1), (2) and (3) fail, but there
>       exists outstanding and unSACKed data we provide the opportunity
>       for a single "rescue" retransmission.  If HighACK is greater than
>       RescueRxt then one segment of up to SMSS octets that includes the
>       highest outstanding unSACKed sequence number SHOULD be returned.
>       Further, RescueRxt MUST be set to RecoveryPoint.  Finally,
>       HighRxt MUST NOT be updated.

The issue revolves around the observation, that with tail-drop there are often periods of burst loss (a number of consecutive packets of a session get lost). During a good fraction of these events, a number of retransmitted packets are still encountering the network condition that led to the loss, and are lost too... The very first retransmitted packet (which would change HighAck) is particularly vulnerable to encounter such a condition, unfortunately. The senders sharing the bottleneck may have not yet reacted to reduce their sending rate enough, when the sender(s) with the lowest RTT is staring the retransmission.

Given the following events (segments sent/lost),

S0 XX S2 S3 S4 XX S6 XX XX |(end of stream)
   XX          S5 

Rule 4 would not trigger even when another retransmitted segment (S5) was successfully received (and S1, S7, S8 would require a RTO to recover).

Anyway, lost retransmissions themselves are out-of-scope for 3517 (and at the root of this corner case is a lost retransmission). So I don't think addressing this particular instance has to be done there. However, if someone implements the Rule 4 above in a stack that does lost-retransmission recovery (ie. Linux), the statement "HighACK is greater than RescueRxt" may need to be expanded to include SACKed octets. For example, if a newly arrived ACK SACKs bytes below FACK, but doesn't advance FACK (some hole closes at least partially in the senders SACK datastructure), that could also be a valid trigger for Rule 4. Just using SACK information alone, such a heuristic would still be prone to be triggered by delayed original segments as well as true retransmissions. But the case has been made, that a single retransmission (of S8 in the above example) would be acceptable...




Richard Scheffenegger


> -----Original Message-----
> From: Mark Allman [mailto:mallman@icir.org]
> Sent: Montag, 18. April 2011 19:51
> To: Yoshifumi Nishida
> Cc: tcpm@ietf.org"
> Subject: Re: [tcpm] An issue in RFC3517? (Re: end-of-stream loss
> recovery (TCP SACK) )
> 
> 
> > Filename:        draft-nishida-tcpm-rescue-retransmission
> 
> First, I must say its a nicely clever trick, Yoshifumi.  The neat
> observation is that by sending one segment per loss event you can
> encourage the information needed to let the standard 3517 algorithm fix
> end-of-window loss.
> 
> The basic idea here is that when you're stuck and cannot send anything
> according to RFC3517 (and the bis) the algorithm is extended to allow
> sending the last unSACKed packet again.  This, in turn, triggers an ACK
> with new SACK information that can further prod the loss recovery
> process.  (Details are in Yoshifumi's draft).
> 
> Second, the proposed fix isn't quite right, IMO.  Consider the case of
> sending 6 segments (cwnd=6), losing only segment #1 and having no
> further data to send.  So, this is what arrives at and is sent by the
> data sender:
> 
>     1.  ACK 1, SACK 2:2
>         [do nothing]
>     2.  ACK 1, SACK 2:3
>         [do nothing]
>     3.  ACK 1, SACK 2:4
>         --> cwnd = FlightSize / 2 = 3, pipe = 2 (segs 5 & 6)
>         --> retransmit segment 1, pipe = 3 (segs 1(rxt), 5 & 6)
>     4.  ACK 1, SACK 2:5
>         --> pipe = 2 (segs 1(rxt) & 6)
>         --> we have no data judged as having been lost ((1) in
>             NextSeg())
>         --> we have no new data to send ((2) in NextSeg())
>         --> we have no "last resort" data to send ((3) in NextSeg()),
>             i.e., unSACKed data not judged as lost, but with SACKed
> data
>             above it)
>         --> so, we invoke the proposed (4) in NextSeg() and transmit
> the
>             segment at the end of the window, or segment 6
>     5.  ACK 1, SACK 2:6
>     6.  ACK 7
>     7.  ACK 7, (DSACK 6, if supported)
> 
> In this case we just don't wait long enough to re-send this rescue
> packet (step 4 above) and so we really have no idea if the end of the
> window has been lost or not (and, in the above example it has not and
> so the retransmit is spurious).  In step 4 above we cannot possibly
> have gained an understanding about the end of the window.
> 
> Third, this failure mode is readily fixed by forcing the rescue
> retransmit to happen an RTT after loss recovery has started (i.e., to
> ensure all ACKs for the window of data that experienced loss roll in
> first).  This is easy enough.  When loss recovery is initiated we set
> RescueRxt to the highest octet of the first retransmission.  Now, rule
> (4) in NextSeg()
> 
>   (4) If the conditions for rules (1), (2) and (3) fail, but there
>       exists outstanding and unSACKed data we provide the opportunity
>       for a single "rescue" retransmission.  If HighACK is greater than
>       RescueRxt then one segment of up to SMSS octets that includes the
>       highest outstanding unSACKed sequence number SHOULD be returned.
>       Further, RescueRxt MUST be set to RecoveryPoint.  Finally,
>       HighRxt MUST NOT be updated.
> 
> I.e., before we can send a rescue segment the first segment
> retransmitted must be ACKed (ensuring an RTT has passed).
> 
> Finally, contrary to my usual mantra of keeping 3517bis clean and only
> including well-understood things and not gooping it up, I think this
> change [including the fix above] may well be OK.  I think that for
> several reasons:
> 
>   (1) The change does not alter any of the *actions* or algorithms of
>       RFC3517.  Rather, the change turns a current inaction into an
>       action.  Therefore, the implications are easier to understand.
> 
>   (2) RFC3517 strives to fix all loss within an RTT of loss detection.
>       Therefore, by pushing this rescue transmission to at least an RTT
>       after the guts of the algorithm has been completed we are
> reducing
>       the possible interactions and badness.
> 
>   (3) The scope of the proposed change is quite small.  The change
>       allows for **one** additional packet transmission **per loss
>       event**.  This packet is meant to then coax us into a regime
> where
>       the already understood parts of RFC3517 take over and fix things
>       up.  So, even if we needlessly retransmit this rescue packet its
>       just one packet per event and at that rate cannot likely cause
> any
>       damage to the network, competing traffic or the connection
>       itself.  And, if it is spurious it isn't going to add any
>       information to change the behavior of the algorithm that won't
>       come along anyway.
> 
> So, I'm all for rolling this nifty little tweak in.
> 
> allman
> 
>