Re: [tcpm] F-RTO and RFC 3517 interaction issues

Pasi Sarolahti <pasi.sarolahti@nokia.com> Fri, 14 March 2008 01:38 UTC

Return-Path: <tcpm-bounces@ietf.org>
X-Original-To: ietfarch-tcpm-archive@core3.amsl.com
Delivered-To: ietfarch-tcpm-archive@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EEADC28C21E; Thu, 13 Mar 2008 18:38:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.714
X-Spam-Level:
X-Spam-Status: No, score=-100.714 tagged_above=-999 required=5 tests=[AWL=-0.277, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_ORG=0.611, RDNS_NONE=0.1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a6SIoVd8QU6o; Thu, 13 Mar 2008 18:38:13 -0700 (PDT)
Received: from core3.amsl.com (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id F29463A6942; Thu, 13 Mar 2008 18:38:12 -0700 (PDT)
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EA4FE3A6908 for <tcpm@core3.amsl.com>; Thu, 13 Mar 2008 18:38:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zlGtMBzPU1A5 for <tcpm@core3.amsl.com>; Thu, 13 Mar 2008 18:38:11 -0700 (PDT)
Received: from mgw-mx03.nokia.com (smtp.nokia.com [192.100.122.230]) by core3.amsl.com (Postfix) with ESMTP id B20323A6800 for <tcpm@ietf.org>; Thu, 13 Mar 2008 18:38:10 -0700 (PDT)
Received: from esebh106.NOE.Nokia.com (esebh106.ntc.nokia.com [172.21.138.213]) by mgw-mx03.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id m2E1ZFCu022381; Fri, 14 Mar 2008 03:35:42 +0200
Received: from esebh104.NOE.Nokia.com ([172.21.143.34]) by esebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 14 Mar 2008 03:35:40 +0200
Received: from esebh101.NOE.Nokia.com ([172.21.138.177]) by esebh104.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 14 Mar 2008 03:35:40 +0200
Received: from [10.150.134.0] ([10.241.59.27]) by esebh101.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 14 Mar 2008 03:35:40 +0200
In-Reply-To: <FCA794787FDE0D4DBE9FFA11053ECEB60C79593226@NA-EXMSG-C110.redmond.corp.microsoft.com>
References: <FCA794787FDE0D4DBE9FFA11053ECEB60C7947458E@NA-EXMSG-C110.redmond.corp.microsoft.com> <7D419A49-15F9-450F-B51B-C217123475F7@nokia.com> <FCA794787FDE0D4DBE9FFA11053ECEB60C79504CD1@NA-EXMSG-C110.redmond.corp.microsoft.com> <FCA794787FDE0D4DBE9FFA11053ECEB60C79593226@NA-EXMSG-C110.redmond.corp.microsoft.com>
Mime-Version: 1.0 (Apple Message framework v753)
Message-Id: <58A857FF-A752-474C-9CD1-2F99506DDA10@nokia.com>
From: Pasi Sarolahti <pasi.sarolahti@nokia.com>
Date: Thu, 13 Mar 2008 21:35:34 -0400
To: ext Murari Sridharan <muraris@microsoft.com>
X-Mailer: Apple Mail (2.753)
X-OriginalArrivalTime: 14 Mar 2008 01:35:40.0521 (UTC) FILETIME=[B63C7D90:01C88573]
X-Nokia-AV: Clean
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, "mallman@icir.org" <mallman@icir.org>
Subject: Re: [tcpm] F-RTO and RFC 3517 interaction issues
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: tcpm-bounces@ietf.org
Errors-To: tcpm-bounces@ietf.org

Hi Murari,

Yes, I think your reasoning below is basically right, except that I  
think case (b) should not happen, because F-RTO is not allowed during  
earlier recovery (I had this mistake also in my earlier mail). We'll  
revise the draft ASAP, but before that I'd like to think a bit more  
about what is the minimal state needed at the sender, and if it was  
possible to do something without adding extra state or too much extra  
complexity.

I wonder if rephrasing the "This algorithm SHOULD NOT be applied if  
the TCP sender is already in SACK loss recovery when retransmission  
timeout occurs." condition in the following way would help:

when timeout occurs:
if (RecoveryPoint <= snd.una) then
    execute F-RTO
else
    follow normal RTO recovery procedure directly (e.g., set  
RecoveryPoint = HighData)

in addition setting RecoveryPoint = HighData in F-RTO algorithm  
should be postponed to step 2 to allow F-RTO for cases where delay  
causes multiple consecutive timeouts. I need to think about this a  
bit more, but wouldn't this solve the case (c) below? It would also  
prevent F-RTO during recovery from earlier genuine timeout, but I  
think this should be ok.

What do you think? Does this seem like a way forward?

- Pasi


On Mar 13, 2008, at 16:30, ext Murari Sridharan wrote:

> Pasi any updates on this? I'd like to try out a fix to address this  
> issue.
>
> -----Original Message-----
> From: tcpm-bounces@ietf.org [mailto:tcpm-bounces@ietf.org] On  
> Behalf Of Murari Sridharan
> Sent: Tuesday, March 11, 2008 9:48 AM
> To: Pasi Sarolahti
> Cc: tcpm@ietf.org; mallman@icir.org
> Subject: Re: [tcpm] F-RTO and RFC 3517 interaction issues
>
> The main issue I see is that without maintaining extra state how do  
> you know which situation you are in? I think the draft should  
> clarify how exactly this should be done to avoid incorrect  
> implementations. Here are the cases as I see it broadly there are  
> two cases but its really 3 sub-cases. The cases are as FRTO sees  
> the TCP state when it is about to classify a timeout.
>
> a) This is the simplest case when timeout has happened without any  
> prior recovery attempt. Setting Recover = SndUna works fine here.
> b) Recovery in progress. Fast retransmit is triggered, 3517 may be  
> active, now the timeout happens. If SPUR_TO, Recover = SndUna is ok  
> for the reasons you outline below.
> c) Recovery in progress like (b), timeout happens but this is a  
> real timeout. Now next phase starts with valid Recover value based  
> on 3517. Another timeout happens before Recover is crossed, now  
> *without any additional state* there is no way to differentiate  
> this from case (b).
>
> There should be some additional state associated with the value  
> stored in Recover which is if FRTO declared a real timeout. Now you  
> can say only reset Recover = SndUna only if Recover is not  
> associated with a real timeout. If we want to avoid implementation  
> bugs we need to be prescriptive here so it may not be enough to say  
> " On the other hand, the draft says
> that F-RTO SHOULD NOT be applied when an earlier SACK recovery is in
> progress" Because it doesn't clarify how case (b) is different from  
> (c). In both cases an earlier SACK recovery is in progress.
>
_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www.ietf.org/mailman/listinfo/tcpm