Re: [tcpm] F-RTO and RFC 3517 interaction issues
Murari Sridharan <muraris@microsoft.com> Thu, 13 March 2008 20:32 UTC
Return-Path: <tcpm-bounces@ietf.org>
X-Original-To: ietfarch-tcpm-archive@core3.amsl.com
Delivered-To: ietfarch-tcpm-archive@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6E70E3A6E4F; Thu, 13 Mar 2008 13:32:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.925
X-Spam-Level:
X-Spam-Status: No, score=-100.925 tagged_above=-999 required=5 tests=[AWL=-1.088, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_ORG=0.611, J_CHICKENPOX_33=0.6, RDNS_NONE=0.1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WlVZ69Sji6gs; Thu, 13 Mar 2008 13:32:52 -0700 (PDT)
Received: from core3.amsl.com (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 43D963A6816; Thu, 13 Mar 2008 13:32:52 -0700 (PDT)
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 127523A6A28 for <tcpm@core3.amsl.com>; Thu, 13 Mar 2008 13:32:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lIA9EhzTjLa6 for <tcpm@core3.amsl.com>; Thu, 13 Mar 2008 13:32:49 -0700 (PDT)
Received: from smtp.microsoft.com (mail1.microsoft.com [131.107.115.212]) by core3.amsl.com (Postfix) with ESMTP id 467E03A69D5 for <tcpm@ietf.org>; Thu, 13 Mar 2008 13:32:21 -0700 (PDT)
Received: from tk1-exhub-c101.redmond.corp.microsoft.com (157.54.46.185) by TK5-EXGWY-E801.partners.extranet.microsoft.com (10.251.56.50) with Microsoft SMTP Server (TLS) id 8.1.240.5; Thu, 13 Mar 2008 13:30:14 -0700
Received: from NA-EXMSG-C110.redmond.corp.microsoft.com ([157.54.62.162]) by tk1-exhub-c101.redmond.corp.microsoft.com ([157.54.46.185]) with mapi; Thu, 13 Mar 2008 13:30:03 -0700
From: Murari Sridharan <muraris@microsoft.com>
To: Murari Sridharan <muraris@microsoft.com>, Pasi Sarolahti <pasi.sarolahti@nokia.com>
Date: Thu, 13 Mar 2008 13:30:02 -0700
Thread-Topic: F-RTO and RFC 3517 interaction issues
Thread-Index: AciDDBZURGBdrNFmSbaMa3csR5zchgAf/jgwAAJPPHAAbNMzcA==
Message-ID: <FCA794787FDE0D4DBE9FFA11053ECEB60C79593226@NA-EXMSG-C110.redmond.corp.microsoft.com>
References: <FCA794787FDE0D4DBE9FFA11053ECEB60C7947458E@NA-EXMSG-C110.redmond.corp.microsoft.com> <7D419A49-15F9-450F-B51B-C217123475F7@nokia.com> <FCA794787FDE0D4DBE9FFA11053ECEB60C79504CD1@NA-EXMSG-C110.redmond.corp.microsoft.com>
In-Reply-To: <FCA794787FDE0D4DBE9FFA11053ECEB60C79504CD1@NA-EXMSG-C110.redmond.corp.microsoft.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
MIME-Version: 1.0
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, "mallman@icir.org" <mallman@icir.org>
Subject: Re: [tcpm] F-RTO and RFC 3517 interaction issues
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: tcpm-bounces@ietf.org
Errors-To: tcpm-bounces@ietf.org
Pasi any updates on this? I'd like to try out a fix to address this issue. -----Original Message----- From: tcpm-bounces@ietf.org [mailto:tcpm-bounces@ietf.org] On Behalf Of Murari Sridharan Sent: Tuesday, March 11, 2008 9:48 AM To: Pasi Sarolahti Cc: tcpm@ietf.org; mallman@icir.org Subject: Re: [tcpm] F-RTO and RFC 3517 interaction issues The main issue I see is that without maintaining extra state how do you know which situation you are in? I think the draft should clarify how exactly this should be done to avoid incorrect implementations. Here are the cases as I see it broadly there are two cases but its really 3 sub-cases. The cases are as FRTO sees the TCP state when it is about to classify a timeout. a) This is the simplest case when timeout has happened without any prior recovery attempt. Setting Recover = SndUna works fine here. b) Recovery in progress. Fast retransmit is triggered, 3517 may be active, now the timeout happens. If SPUR_TO, Recover = SndUna is ok for the reasons you outline below. c) Recovery in progress like (b), timeout happens but this is a real timeout. Now next phase starts with valid Recover value based on 3517. Another timeout happens before Recover is crossed, now *without any additional state* there is no way to differentiate this from case (b). There should be some additional state associated with the value stored in Recover which is if FRTO declared a real timeout. Now you can say only reset Recover = SndUna only if Recover is not associated with a real timeout. If we want to avoid implementation bugs we need to be prescriptive here so it may not be enough to say " On the other hand, the draft says that F-RTO SHOULD NOT be applied when an earlier SACK recovery is in progress" Because it doesn't clarify how case (b) is different from (c). In both cases an earlier SACK recovery is in progress. Hope this helps. Murari -----Original Message----- From: Pasi Sarolahti [mailto:pasi.sarolahti@nokia.com] Sent: Monday, March 10, 2008 5:08 PM To: Murari Sridharan Cc: tcpm@ietf.org; mallman@icir.org Subject: Re: F-RTO and RFC 3517 interaction issues Hi Murari, Thanks for the careful reading! Clarification might indeed be in place. From quite early on when specifying the F-RTO algorithm we have thought it would be ok to allow fast recovery/SACK recovery immediately after detected spurious timeout, because then the TCP sender does not send the RTO retransmissions, and therefore the potential for RTO retransmissions triggering multiple fast retransmits should not exist (so the recover variable can be reset). For a case where spurious retransmission timeout follows the SACK recovery or fast recovery directly, this reasoning should be valid, do we agree? In case where spurious timeout happens during an earlier RTO recovery, there could be potential for false fast retransmits to happen, as described in the original NewReno draft. (RFC 3517 does not describe it in such detail, but I assume the motivation for the text you quote is same also there). On the other hand, the draft says that F-RTO SHOULD NOT be applied when an earlier SACK recovery is in progress, as I think it was in the example you presented. Do you think this note is sufficient, or should it be clarified somehow? - Pasi On Mar 10, 2008, at 13:05, ext Murari Sridharan wrote: > I am seeing an inconsistency between FRTO and RFC 3517. May be the > authors could clarify. > > F-RTO defines recovery as follows > > Set variable "recover" to > indicate the highest segment transmitted so far. > > RFC 3517 defines > "HighData" is the highest sequence number transmitted at a > given point. > > RFC 3517 clearly mandates that if RTO occurs during loss recovery > new recovery phase MUST not be initiated until the RecoveryPoint is > crossed. > "If an RTO occurs during loss recovery as specified in this > document, RecoveryPoint MUST be set to HighData. Further, the > new value of RecoveryPoint MUST be preserved and the loss > recovery algorithm outlined in this document MUST be terminated. > In addition, a new recovery phase (as described in section 5) > MUST NOT be initiated until HighACK is greater than or equal to > the new value of RecoveryPoint." Now FRTO spec seems to violate > the above rule with the following statementIf the algorithm exits > with SpuriousRecovery set to SPUR_TO, "recover" is set to > SND.UNA, thus allowing fast recovery on incoming duplicate > acknowledgments. This means that if we are in the middle of loss > recovery and a real timeout occurs we save the recovery point per > RFC 3517. At this point we continue with slow start and congestion > avoidance, now say we are still below the earlier recovery point > and a new timeout occurs. This time if the timeout is classified as > SPUR_TO, then RecoveryPoint is set to SndUNA, overwriting the older > value and a new recovery phase can begin, clearly violating RFC > 3517. ThanksMurari _______________________________________________ tcpm mailing list tcpm@ietf.org https://www.ietf.org/mailman/listinfo/tcpm _______________________________________________ tcpm mailing list tcpm@ietf.org https://www.ietf.org/mailman/listinfo/tcpm
- [tcpm] F-RTO and RFC 3517 interaction issues Murari Sridharan
- Re: [tcpm] F-RTO and RFC 3517 interaction issues Pasi Sarolahti
- Re: [tcpm] F-RTO and RFC 3517 interaction issues Murari Sridharan
- Re: [tcpm] F-RTO and RFC 3517 interaction issues Murari Sridharan
- Re: [tcpm] F-RTO and RFC 3517 interaction issues Pasi Sarolahti
- Re: [tcpm] F-RTO and RFC 3517 interaction issues Murari Sridharan