Re: [tcpm] A question about PTO/RTO rescheduling in RACK draft

hiren panchasara <hiren@strugglingcoder.info> Wed, 17 April 2019 19:22 UTC

Return-Path: <hiren@strugglingcoder.info>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7E96C120392 for <tcpm@ietfa.amsl.com>; Wed, 17 Apr 2019 12:22:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.106
X-Spam-Level:
X-Spam-Status: No, score=-1.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_NONE=0.793, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FNbMZiSBKW5U for <tcpm@ietfa.amsl.com>; Wed, 17 Apr 2019 12:22:03 -0700 (PDT)
Received: from mail.strugglingcoder.info (unknown [104.236.146.68]) by ietfa.amsl.com (Postfix) with ESMTP id 38E94120363 for <tcpm@ietf.org>; Wed, 17 Apr 2019 12:22:03 -0700 (PDT)
Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id C649117018; Wed, 17 Apr 2019 12:22:02 -0700 (PDT)
Date: Wed, 17 Apr 2019 12:22:02 -0700
From: hiren panchasara <hiren@strugglingcoder.info>
To: Matt Olson <maolson@microsoft.com>, draft-cheng-tcpm-rack@ietf.org
Cc: Yi Huang <huanyi=40microsoft.com@dmarc.ietf.org>, "tcpm@ietf.org" <tcpm@ietf.org>
Message-ID: <20190417192202.GN31257@strugglingcoder.info>
References: <BL0PR2101MB104347EF7FC7CD5C86DF08B7C3250@BL0PR2101MB1043.namprd21.prod.outlook.com> <20190417181344.GM31257@strugglingcoder.info> <BYAPR21MB12568E60F973DB41C4905D8EBC250@BYAPR21MB1256.namprd21.prod.outlook.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="EkxpYdHiqGHPYbUt"
Content-Disposition: inline
In-Reply-To: <BYAPR21MB12568E60F973DB41C4905D8EBC250@BYAPR21MB1256.namprd21.prod.outlook.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/YhyDoNrt0-psKW2mEXTvySW3rfc>
Subject: Re: [tcpm] A question about PTO/RTO rescheduling in RACK draft
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2019 19:22:05 -0000

On 04/17/19 at 06:52P, Matt Olson wrote:
> Previously, you'd start the RTO timer on the first packet, and the timer would run until an ACK was received. Now, the timer is reset on each new packet. So the concern is that there exist cases where previously TCP would reduce the CWnd and retransmit, but not anymore.

I see your point. I guess in this particular case, RACK (which is the
smallest/first timer to fire) is also not useful as there are no acks
coming back.

> 
> If all such cases are app-limited cases, then I like Hiren's point about this being a "keep CWnd in check while app-limited" problem. 

iirc, there was some guidance around this in previous version(s) of the
draft. I don't find that in -04. May be authors (CC'd) can chime in.

Cheers,
Hiren
> 
> -----Original Message-----
> From: hiren panchasara <hiren@strugglingcoder.info> 
> Sent: Wednesday, April 17, 2019 11:14 AM
> To: Yi Huang <huanyi=40microsoft.com@dmarc.ietf.org>
> Cc: tcpm@ietf.org; Matt Olson <maolson@microsoft.com>
> Subject: Re: [tcpm] A question about PTO/RTO rescheduling in RACK draft
> 
> On 04/17/19 at 01:48P, Yi Huang wrote:
> > Hi,
> 
> Hi!
> I don't see anything inherently wrong in how this works or I am missing something.
> > 
> > I have a question about PTO/RTO rescheduling logic in RACK draft. The draft states, in Section 6.5.1, "If there was a previously scheduled PTO or RTO pending, then that pending PTO or RTO should first be cancelled, and then the new PTO should be scheduled". So if an app keeps posting data in an interval less than PTO and RTO (say one byte each time and Cwnd is large enough) and no acks are coming back due to network failures or a receiver that just does not send anything back, the app will never fires RTO since the timer is pushed out each time we send this one byte data. Is this a problem?
> By growing CWND, TCP has decided that it is okay to send that much without needing an ACK.
> I think something along the lines of detecting that connection is app-limited OR doing newcwv like validation may help to keep CWND in check?
> 
> > What is the intention of pushing out the timer for each send?
> 6.5.1 (Step 1) indicated a few conditions for scheduling PTO (i.e.
> pushing out the timer) so its not blindly doing for each send. But it would do that for the case you described as app *is* handing new data to be sent to TCP.
> 
> >I think in this case the sender should disconnect the connection because, without TLP, the sender would have had several RTOs and closed the connection.
> I am probably not understanding something. I think you are saying TLP is causing RTO to be pushed out unnecessarily.
> 
> One cannot schedule PTO when a TLP is out. So, on first timeout i.e. PTO as we prefer that over RTO, you send a TLP out. And if you don't get an ACK back for that, next timeout would be an RTO as 6.5.1.  Phase 1 condition 1 suggests.
> 
> But if there is new data to be sent, RTO would also gets pushed out, no?
> > 
> > Also, RFC 6298 Section 5 (5.1) states "Every time a packet containing data is sent (including a retransmission), if the timer is not running, start it running so that it will expire after RTO seconds (for the current value of RTO).", which says TCP should schedule timer only when it is not schedule yet and this kind of conflicts with (or is overridden by) what is in RACK/TLP draft.
> 
> I am not sure I follow. For the connection, you need a timeout mechanism. And RACK draft suggests keeping PTO aligned with RTO as the former is less expensive. Try PTO and if it doesn't work, do RTO.
> 
> I believe the wordings around "first cancel the pending one and then only schedule new one" are for clarity around implementation details.
> Gist is to not keep 2 sets of timers running for the same timeout mechanism. 
> 
> Someone with more clue on the list would probably correct me. :-) Cheers, Hiren