Re: [tcpm] F-RTO and RFC 3517 interaction issues

Pasi Sarolahti <pasi.sarolahti@nokia.com> Tue, 11 March 2008 00:11 UTC

Return-Path: <tcpm-bounces@ietf.org>
X-Original-To: ietfarch-tcpm-archive@core3.amsl.com
Delivered-To: ietfarch-tcpm-archive@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 173F428C247; Mon, 10 Mar 2008 17:11:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.628
X-Spam-Level:
X-Spam-Status: No, score=-100.628 tagged_above=-999 required=5 tests=[AWL=-0.790, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_ORG=0.611, J_CHICKENPOX_33=0.6, RDNS_NONE=0.1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id alT4OXmKC0Jd; Mon, 10 Mar 2008 17:11:00 -0700 (PDT)
Received: from core3.amsl.com (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1853928C1C4; Mon, 10 Mar 2008 17:11:00 -0700 (PDT)
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A85073A6998 for <tcpm@core3.amsl.com>; Mon, 10 Mar 2008 17:10:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UQXln5WpfGUd for <tcpm@core3.amsl.com>; Mon, 10 Mar 2008 17:10:56 -0700 (PDT)
Received: from mgw-mx03.nokia.com (smtp.nokia.com [192.100.122.230]) by core3.amsl.com (Postfix) with ESMTP id 0DC3C3A67F9 for <tcpm@ietf.org>; Mon, 10 Mar 2008 17:10:55 -0700 (PDT)
Received: from esebh106.NOE.Nokia.com (esebh106.ntc.nokia.com [172.21.138.213]) by mgw-mx03.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id m2B08Qvv012248; Tue, 11 Mar 2008 02:08:30 +0200
Received: from esebh102.NOE.Nokia.com ([172.21.138.183]) by esebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 11 Mar 2008 02:08:29 +0200
Received: from esebh101.NOE.Nokia.com ([172.21.138.177]) by esebh102.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 11 Mar 2008 02:08:29 +0200
Received: from [130.129.17.79] ([10.241.59.4]) by esebh101.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 11 Mar 2008 02:08:29 +0200
In-Reply-To: <FCA794787FDE0D4DBE9FFA11053ECEB60C7947458E@NA-EXMSG-C110.redmond.corp.microsoft.com>
References: <FCA794787FDE0D4DBE9FFA11053ECEB60C7947458E@NA-EXMSG-C110.redmond.corp.microsoft.com>
Mime-Version: 1.0 (Apple Message framework v753)
Message-Id: <7D419A49-15F9-450F-B51B-C217123475F7@nokia.com>
From: Pasi Sarolahti <pasi.sarolahti@nokia.com>
Date: Mon, 10 Mar 2008 20:08:26 -0400
To: ext Murari Sridharan <muraris@microsoft.com>
X-Mailer: Apple Mail (2.753)
X-OriginalArrivalTime: 11 Mar 2008 00:08:29.0388 (UTC) FILETIME=[08FFB0C0:01C8830C]
X-Nokia-AV: Clean
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, "mallman@icir.org" <mallman@icir.org>
Subject: Re: [tcpm] F-RTO and RFC 3517 interaction issues
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Sender: tcpm-bounces@ietf.org
Errors-To: tcpm-bounces@ietf.org

Hi Murari,

Thanks for the careful reading! Clarification might indeed be in place.

 From quite early on when specifying the F-RTO algorithm we have  
thought it would be ok to allow fast recovery/SACK recovery  
immediately after detected spurious timeout, because then the TCP  
sender does not send the RTO retransmissions, and therefore the  
potential for RTO retransmissions triggering multiple fast  
retransmits should not exist (so the recover variable can be reset).  
For a case where spurious retransmission timeout follows the SACK  
recovery or fast recovery directly, this reasoning should be valid,  
do we agree?

In case where spurious timeout happens during an earlier RTO  
recovery, there could be potential for false fast retransmits to  
happen, as described in the original NewReno draft. (RFC 3517 does  
not describe it in such detail, but I assume the motivation for the  
text you quote is same also there). On the other hand, the draft says  
that F-RTO SHOULD NOT be applied when an earlier SACK recovery is in  
progress, as I think it was in the example you presented. Do you  
think this note is sufficient, or should it be clarified somehow?

- Pasi


On Mar 10, 2008, at 13:05, ext Murari Sridharan wrote:

> I am seeing an inconsistency between FRTO and RFC 3517. May be the  
> authors could clarify.
>
> F-RTO defines recovery as follows
>
> Set variable "recover" to
>       indicate the highest segment transmitted so far.
>
> RFC 3517 defines
> "HighData" is the highest sequence number transmitted at a  
> given      point.
>
> RFC 3517 clearly mandates that if RTO occurs during loss recovery  
> new recovery phase MUST not be initiated until the RecoveryPoint is  
> crossed.
> “If an RTO occurs during loss recovery as specified in this  
> document,   RecoveryPoint MUST be set to HighData.  Further, the  
> new value of   RecoveryPoint MUST be preserved and the loss  
> recovery algorithm   outlined in this document MUST be terminated.   
> In addition, a new   recovery phase (as described in section 5)  
> MUST NOT be initiated   until HighACK is greater than or equal to  
> the new value of   RecoveryPoint.”  Now FRTO spec seems to violate  
> the above rule with the following statementIf the algorithm exits  
> with   SpuriousRecovery set to SPUR_TO, "recover" is set to  
> SND.UNA, thus   allowing fast recovery on incoming duplicate  
> acknowledgments. This means that if we are in the middle of loss  
> recovery and a real timeout occurs we save the recovery point per  
> RFC 3517. At this point we continue with slow start and congestion  
> avoidance, now say we are still below the earlier recovery point  
> and a new timeout occurs. This time if the timeout is classified as  
> SPUR_TO, then RecoveryPoint is set to SndUNA, overwriting the older  
> value and a new recovery phase can begin, clearly violating RFC  
> 3517.  ThanksMurari

_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www.ietf.org/mailman/listinfo/tcpm