Re: [tcpm] RFC5681: why halving FlightSize not cwnd?

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Fri, 07 September 2012 08:32 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 850ED21F8447 for <tcpm@ietfa.amsl.com>; Fri, 7 Sep 2012 01:32:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3qNCBVYIjssA for <tcpm@ietfa.amsl.com>; Fri, 7 Sep 2012 01:32:46 -0700 (PDT)
Received: from spey.erg.abdn.ac.uk (spey.erg.abdn.ac.uk [139.133.204.173]) by ietfa.amsl.com (Postfix) with ESMTP id 5683D21F8444 for <tcpm@ietf.org>; Fri, 7 Sep 2012 01:32:46 -0700 (PDT)
Received: by spey.erg.abdn.ac.uk (Postfix, from userid 5001) id A44AC2B45E7; Fri, 7 Sep 2012 09:32:45 +0100 (BST)
Received: from Gorry.local (ra-gorry.erg.abdn.ac.uk [139.133.204.42]) by spey.erg.abdn.ac.uk (Postfix) with ESMTPSA id DCD1C2B44BB; Fri, 7 Sep 2012 09:32:39 +0100 (BST)
Message-ID: <5049B126.90800@erg.abdn.ac.uk>
Date: Fri, 07 Sep 2012 09:32:38 +0100
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Organization: The University of Aberdeen is a charity registered in Scotland, No SC013683.
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:15.0) Gecko/20120824 Thunderbird/15.0
MIME-Version: 1.0
To: Michael Welzl <michawe@ifi.uio.no>
References: <CAK6E8=f9iE11EOgJWar25qa0nFLjUuDrc34p2xP1MPRrQewhig@mail.gmail.com> <20120906195403.3F3452B075E6@lawyers.icir.org> <20120906212530.GB8018@mail.kb8ojh.net> <CAH56bmDP5YwGqpfDQ5=OGceohx+xrsoOSDkd7RZHg7O=V=y+aQ@mail.gmail.com> <d58c929a176447e351d0c579334d2d50.squirrel@www.erg.abdn.ac.uk> <41A94FC2-A138-4485-B7B9-21F3E9D34237@ifi.uio.no>
In-Reply-To: <41A94FC2-A138-4485-B7B9-21F3E9D34237@ifi.uio.no>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, "iccrg@cs.ucl.ac.uk list" <iccrg@cs.ucl.ac.uk>, Matt Mathis <mattmathis@google.com>, Mark Allman <mallman@icir.org>
Subject: Re: [tcpm] RFC5681: why halving FlightSize not cwnd?
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: gorry@erg.abdn.ac.uk
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Sep 2012 08:32:47 -0000

It's not easy to just say FlightSize decreases, since this can happen in 
various cases. I'm wondering how useful it is to measure FlightSize 
during a transfer. It's a reasonable measure when there is reported 
congestion to figure out what was sent in the last "pipe", but it can 
result in an unexpected value (e.g. at the end of a burst).

One of the changes we'd like to introduce in the next revision of 
new-cwv is to avoid this reliance on FlightSize to determine the window 
state of the sender. Instead we will suggest that the sender dtermines 
the state of the "pipe" by comparing the window that was ACK'ed with 
cwnd - not the volume of data was sent (FlightSize). In some cases, the 
two seem to be quite different.

Gorry

On 07/09/2012 08:42, Michael Welzl wrote:
> Hi all,
>
> Not too long ago, Yuchung pointed out a very similar problem to the one here with our own work, draft-hurtig-tcpm-rtorestart-02.txt, which intends to make the RTO timer more aggressive if the really used window is very small. We used FlightSize in our algorithm to represent this "really used window". As with RFC 5681, our rationale is that FlightSize captures the amount of data that is actually transmitted onto the network, which may be significantly smaller than cwnd if the sender is application-limited.
>
> Yuchung has now shown us that this choice is bad, twice. FlightSize captures what we want, but additionally, a small FlightSize is also always reached towards the end of an arbitrarily large window (yes, this is also an application-limited case - it's about the difference between an application with a low constant sending rate and the end of a transfer (or, similarly, an application with a stop-and-go behavior)). Neither our draft nor RFC 5681 really captures this situation. Indeed, it seems to me the same problem appears in draft-fairhurst-tcpm-newcwv-03.txt, which also relies on FlightSize in a similar way:
>
> Section 4.2:
> ***
>        Non-validated phase: FlightSize <(2/3)*cwnd.  This is the phase
>        where the cwnd has a value based on a previous measurement of the
>        available capacity, and the usage of this capacity has not been
>        validated in the previous RTT.
> ***
>
> If FlightSize reaches this value at the end of a larger window, I think this statement is wrong. Indeed, in accordance with Yuchung's examples, this would make the mechanisms in this draft kick in every time at the end of e.g. a 20-packet-long web flow (or indeed at the end of any transfer, with arbitrary length). I don't think that's the intention?
>
> I suspect that we can find more examples of RFCs or drafts with this issue. The problem is indeed more fundamental - as Matt said, we're using the wrong state variables. I can well imagine that the problem automatically disappears with TCP Laminar, but as a smaller update to existing / ongoing work, what we really need is probably a means to differentiate Yuchung's case from the case of concern in at least RFC 5681, draft-hurtig-tcpm-rtorestart and draft-fairhurst-tcpm-newcwv-03.txt.
>
> Here's a simple suggestion: can we identify Yuchung's case by saying that FlightSize has been continuously decreasing?
>
> Cheers,
> Michael
>
>
> On 7. sep. 2012, at 08:53, gorry@erg.abdn.ac.uk wrote:
>
>> If FlightSize is not ~cwnd, then at the time, the flow is not fully
>> utilising cwnd, but there are a wide range of (app) behaviours that result
>> in this condition. My thoughts are that we should update the behaviour in
>> both in standards-track TCP and also in Laminar.
>>
>> This has been the topic of the new-cwv draft that we have been proposing
>> as an update to TCP. It's been discussed on the ICCRG list, and we're
>> re-structuring our draft and expect to publish this revision in a week or
>> so.
>>
>> Gorry
>>
>>
>>> On Thu, Sep 6, 2012 at 2:25 PM, Ethan Blanton <eblanton@cs.ohiou.edu>
>>> wrote:
>>>> Mark Allman spake unto us the following wisdom:
>>>>> A few things ...
>>>>>
>>>>>   - I don't buy Ethan's argument that the burden on the network is 4
>>>>>     packets if you lose the 17th.  It seems to me the burden is
>>>>> measured
>>>>>     from the front of the window not the back.  So, in this case it was
>>>>>     a burden of 17 packets that caused the loss.
>>>>
>>>> Note that this was not intended to be my argument; my argument is that
>>>> a TCP that doesn't "remember" such things (and 5681 does not) only
>>>> knows about the 4 packets at the time of the loss, so it *thinks* the
>>>> burden is 4.  This is clearly not optimal, and I would not argue that
>>>> it is.  Perhaps I stated this poorly.
>>>
>>> I would say that you are using the wrong state variables:  at this
>>> point cwnd is simultaneously being used to suppress bursts and
>>> remember the congestion state from before the application pause.   If
>>> you parameterize it differently (e.g. Laminar) this becomes a
>>> non-problem.
>>>
>>>>>   - So, without additional schemes we're left with being too aggressive
>>>>>     (using cwnd) or too conservative (using FlightSize).  But, if we're
>>>>>     going to error that is probably the right direction.
>>>>
>>>> Agreed.
>>>
>>> Yes exactly.   One solution would be to pace from FS up to ccwind....
>>> This case is described in the Laminar draft.
>>>
>>>>>   - I probably would not have all that much heartburn making the
>>>>>     ssthresh 10 in the case you describe as long as there was some
>>>>>     knowledge that a cwnd of 20 was used recently.  I.e., it isn't the
>>>>>     result of some large storage of permission to send that was built
>>>>> up
>>>>>     over time, but was in fact the result of the application's sending
>>>>>     pattern.  I think one could design some rules around that notion
>>>>>     that would be OK.
>>>>
>>>> Also agreed.  I think it's reasonable to assume that the FlightSize in
>>>> effect at the time a lost packet was *sent* is safe, for sure.
>>>
>>> I point out this logic assumes that it was the instantaneous queue
>>> length that triggered the losses (as it is for drop tail).  With RED
>>> or CoDel, the losses are normally triggered by a persistent queue,
>>> which may or may not depend on the instantaneous window at the time
>>> the packet was either sent or received.   With CoDel, drops are
>>> triggered when the minimum window size is still large enough to
>>> sustain a queue....  The peak window has no effect on the drops
>>> (unless the queue overflows).
>>>
>>> Thanks,
>>> --MM--
>>> The best way to predict the future is to create it.  - Alan Kay
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>>>
>>
>>
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>