Re: [tcpm] F-RTO and RFC 3517 interaction issues

Murari Sridharan <muraris@microsoft.com> Fri, 14 March 2008 18:26 UTC

Return-Path: <tcpm-bounces@ietf.org>
X-Original-To: ietfarch-tcpm-archive@core3.amsl.com
Delivered-To: ietfarch-tcpm-archive@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 587C23A6DB9; Fri, 14 Mar 2008 11:26:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.047
X-Spam-Level:
X-Spam-Status: No, score=-101.047 tagged_above=-999 required=5 tests=[AWL=-0.610, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_ORG=0.611, RDNS_NONE=0.1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZXNQg278jCDS; Fri, 14 Mar 2008 11:26:29 -0700 (PDT)
Received: from core3.amsl.com (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 377273A6ABD; Fri, 14 Mar 2008 11:26:29 -0700 (PDT)
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 447C73A6B61 for <tcpm@core3.amsl.com>; Fri, 14 Mar 2008 11:26:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iummQ0BNVdGR for <tcpm@core3.amsl.com>; Fri, 14 Mar 2008 11:26:27 -0700 (PDT)
Received: from smtp.microsoft.com (maila.microsoft.com [131.107.115.212]) by core3.amsl.com (Postfix) with ESMTP id 2DC523A67CF for <tcpm@ietf.org>; Fri, 14 Mar 2008 11:26:27 -0700 (PDT)
Received: from tk5-exhub-c104.redmond.corp.microsoft.com (157.54.88.97) by TK5-EXGWY-E801.partners.extranet.microsoft.com (10.251.56.50) with Microsoft SMTP Server (TLS) id 8.1.240.5; Fri, 14 Mar 2008 11:24:23 -0700
Received: from NA-EXMSG-C110.redmond.corp.microsoft.com ([157.54.62.162]) by tk5-exhub-c104.redmond.corp.microsoft.com ([157.54.88.97]) with mapi; Fri, 14 Mar 2008 11:24:10 -0700
From: Murari Sridharan <muraris@microsoft.com>
To: Pasi Sarolahti <pasi.sarolahti@nokia.com>
Date: Fri, 14 Mar 2008 11:24:09 -0700
Thread-Topic: F-RTO and RFC 3517 interaction issues
Thread-Index: AciFc8DVKl3ffsaJSWecjnHXq+bsYgAi1DRw
Message-ID: <FCA794787FDE0D4DBE9FFA11053ECEB60C795936AF@NA-EXMSG-C110.redmond.corp.microsoft.com>
References: <FCA794787FDE0D4DBE9FFA11053ECEB60C7947458E@NA-EXMSG-C110.redmond.corp.microsoft.com> <7D419A49-15F9-450F-B51B-C217123475F7@nokia.com> <FCA794787FDE0D4DBE9FFA11053ECEB60C79504CD1@NA-EXMSG-C110.redmond.corp.microsoft.com> <FCA794787FDE0D4DBE9FFA11053ECEB60C79593226@NA-EXMSG-C110.redmond.corp.microsoft.com> <58A857FF-A752-474C-9CD1-2F99506DDA10@nokia.com>
In-Reply-To: <58A857FF-A752-474C-9CD1-2F99506DDA10@nokia.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
MIME-Version: 1.0
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, "mallman@icir.org" <mallman@icir.org>
Subject: Re: [tcpm] F-RTO and RFC 3517 interaction issues
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: tcpm-bounces@ietf.org
Errors-To: tcpm-bounces@ietf.org

I agree that this approach would solve the problem I mentioned below but this will prevent FRTO recovery for case (b) isn't that too restrictive? Would it negate some of the benefits of FRTO? The reason I ask is that packet reordering may be very common and we might very well be in recovery when a timeout happens. The case is that we see some dupacks indicating out of order packets may be during some mobility handoff, we enter recovery but the delay spike causes a timeout. This timeout is possibly spurious. I think the question really is how often does recovery precede a spurious timeout in the real world? Do you have any data from your testing. I know you discuss some of this in the appendix section of the draft as possible extensions, however it may be simple to fix if we maintain the extra bit that says that a real timeout occurred during earlier loss recovery so FRTO cannot begin till the recovery point is crossed.

-----Original Message-----
From: Pasi Sarolahti [mailto:pasi.sarolahti@nokia.com]
Sent: Thursday, March 13, 2008 6:36 PM
To: Murari Sridharan
Cc: tcpm@ietf.org; mallman@icir.org
Subject: Re: F-RTO and RFC 3517 interaction issues

Hi Murari,

Yes, I think your reasoning below is basically right, except that I
think case (b) should not happen, because F-RTO is not allowed during
earlier recovery (I had this mistake also in my earlier mail). We'll
revise the draft ASAP, but before that I'd like to think a bit more
about what is the minimal state needed at the sender, and if it was
possible to do something without adding extra state or too much extra
complexity.

I wonder if rephrasing the "This algorithm SHOULD NOT be applied if
the TCP sender is already in SACK loss recovery when retransmission
timeout occurs." condition in the following way would help:

when timeout occurs:
if (RecoveryPoint <= snd.una) then
    execute F-RTO
else
    follow normal RTO recovery procedure directly (e.g., set
RecoveryPoint = HighData)

in addition setting RecoveryPoint = HighData in F-RTO algorithm
should be postponed to step 2 to allow F-RTO for cases where delay
causes multiple consecutive timeouts. I need to think about this a
bit more, but wouldn't this solve the case (c) below? It would also
prevent F-RTO during recovery from earlier genuine timeout, but I
think this should be ok.

What do you think? Does this seem like a way forward?

- Pasi


On Mar 13, 2008, at 16:30, ext Murari Sridharan wrote:

> Pasi any updates on this? I'd like to try out a fix to address this
> issue.
>
> -----Original Message-----
> From: tcpm-bounces@ietf.org [mailto:tcpm-bounces@ietf.org] On
> Behalf Of Murari Sridharan
> Sent: Tuesday, March 11, 2008 9:48 AM
> To: Pasi Sarolahti
> Cc: tcpm@ietf.org; mallman@icir.org
> Subject: Re: [tcpm] F-RTO and RFC 3517 interaction issues
>
> The main issue I see is that without maintaining extra state how do
> you know which situation you are in? I think the draft should
> clarify how exactly this should be done to avoid incorrect
> implementations. Here are the cases as I see it broadly there are
> two cases but its really 3 sub-cases. The cases are as FRTO sees
> the TCP state when it is about to classify a timeout.
>
> a) This is the simplest case when timeout has happened without any
> prior recovery attempt. Setting Recover = SndUna works fine here.
> b) Recovery in progress. Fast retransmit is triggered, 3517 may be
> active, now the timeout happens. If SPUR_TO, Recover = SndUna is ok
> for the reasons you outline below.
> c) Recovery in progress like (b), timeout happens but this is a
> real timeout. Now next phase starts with valid Recover value based
> on 3517. Another timeout happens before Recover is crossed, now
> *without any additional state* there is no way to differentiate
> this from case (b).
>
> There should be some additional state associated with the value
> stored in Recover which is if FRTO declared a real timeout. Now you
> can say only reset Recover = SndUna only if Recover is not
> associated with a real timeout. If we want to avoid implementation
> bugs we need to be prescriptive here so it may not be enough to say
> " On the other hand, the draft says
> that F-RTO SHOULD NOT be applied when an earlier SACK recovery is in
> progress" Because it doesn't clarify how case (b) is different from
> (c). In both cases an earlier SACK recovery is in progress.
>
_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www.ietf.org/mailman/listinfo/tcpm