Re: [tcpm] [Tmrg] Increasing the Initial Window - Notes

Andrew Yourtchenko <ayourtch@cisco.com> Fri, 19 November 2010 00:50 UTC

Return-Path: <ayourtch@cisco.com>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8F3C13A63EC for <tcpm@core3.amsl.com>; Thu, 18 Nov 2010 16:50:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.324
X-Spam-Level: **
X-Spam-Status: No, score=2.324 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FB_YOU_CAN_BECOME=1.258, FRT_LOLITA1=1.865, J_CHICKENPOX_24=0.6, J_CHICKENPOX_43=0.6, J_CHICKENPOX_73=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uWPzrrYS7vTQ for <tcpm@core3.amsl.com>; Thu, 18 Nov 2010 16:50:43 -0800 (PST)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by core3.amsl.com (Postfix) with ESMTP id 70F683A63CB for <tcpm@ietf.org>; Thu, 18 Nov 2010 16:50:43 -0800 (PST)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id oAJ0lkuC022105; Fri, 19 Nov 2010 01:47:46 +0100 (CET)
Received: from sweet-brew-4.cisco.com (sweet-brew-4.cisco.com [144.254.10.205]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id oAJ0lfo2026070; Fri, 19 Nov 2010 01:47:41 +0100 (CET)
Date: Fri, 19 Nov 2010 01:47:41 +0100
From: Andrew Yourtchenko <ayourtch@cisco.com>
To: Joe Touch <touch@isi.edu>
In-Reply-To: <4CE595E3.3010109@isi.edu>
Message-ID: <Pine.GSO.4.64.1011190001550.27897@sweet-brew-4.cisco.com>
References: <20101110152857.GA5094@hell> <804D02FE-39AF-4437-BB15-C2247842E120@mac.com> <20101110170017.GF5094@hell> <97C75EA8-6CC7-444C-A19D-370148B81918@mac.com> <20101110174056.GH5094@hell> <AANLkTim7g=XqfSMHpHVbw1qqPOL-oNApt2i_2RCt0SCi@mail.gmail.com> <AD2BFE84-CA5B-4CDC-8822-1FC2713E3AE0@cisco.com> <alpine.DEB.2.00.1011161345170.11898@wel-95.cs.helsinki.fi> <E798B9A8-29BB-425B-B0C2-2B2735C49948@cisco.com> <5FDC413D5FA246468C200652D63E627A0B7BD0DF@LDCMVEXC1-PRD.hq.netapp.com> <686EBD23-7B65-455F-9348-196BBFD88ECD@comsys.rwth-aachen.de> <931FAE2C-F66E-43B2-8EE1-CFEB17DABD5E@windriver.com> <7309FCBCAE981B43ABBE69B31C8D213909B72A7B1F@EUSAACMS0701.eamcs.ericsson.se><36F89B79-EABA-4C38-A59E-023D9A630832@windriver.com><4CE585E9.6060203@isi.edu><6B25AF56-AFED-4085-AF42-F8AD47CB9F41@windriver.com> <4CE58FED.608@isi.edu> <0C53DCFB700D144284A584F54711EC580B36940A@xmb-sjc-21c.amer.cisco.com> <4CE595E3.3010109@isi.edu>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-869749773-1290127661=:27897"
Cc: David Borman <david.borman@windriver.com>, tcpm <tcpm@ietf.org>, "Anantha Ramaiah (ananth)" <ananth@cisco.com>
Subject: Re: [tcpm] [Tmrg] Increasing the Initial Window - Notes
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Nov 2010 00:50:45 -0000


On Thu, 18 Nov 2010, Joe Touch wrote:

> That's basically what I'm suggesting; it can easily be per subnet, per 
> interface, or per the entire machine as desired.
>
> The ultimate point is:
>
> 	- put something in the end node that notices if/when
> 	it fails objectively, and fixes it
>
> 	- that same thing can allow the IW to increase
> 	over time if there are no problems

I think this is a very viable idea.

However, I think there is a catch due to which it would need tweaking - there 
should be a signaling mechanism whereby a node on the "conservative" side of 
the connection would be able to signal the node on the "eager" side of the 
connection not to be too aggressive, in case the data flow is mostly from 
from "eager" to "conservative" side.

One might consider the following model for a path:

[peer A] --- [access A] ----- [ backbone ] ---- [access B] --- [peer B]

Where the "access A" and "access B" are the access networks for the peers, and 
the "backbone" is the "big Internet" inbetween.

I am speculating based on experience (please correct me if I am wrong) that the 
congestion would typically be either in [access A] or [access B] cloud.

So in case peer B is a server then storing the losses about the connections on a 
per-something basis would not be productive - because a connection from peer A 
may experience the congestion at access A, whereas connection from peer C would 
not traverse the same access network - so there would not be a congestion.

OTOH if the congestion is in [access B] for both cases then storing the data in 
peer B would indeed make sense.

If the above logic is right, then the peers during the 3whs should communicate 
their memorized values and pick up the minimum of theirs and the communicated by the 
peer.

Therefore both parties know by the end of 3whs the IW value they are going to 
use - and can monitor the initial loss within the connection.

If there happen to be a packet loss during the IW burst - both parties will be 
aware of it.

the result: whether there was a loss or not - will affect the party that had 
the smaller value to begin with - if there was no loss, it means that party can 
use larger value for subsequent connections *if* it had LOSS_THRESH previous 
connections happen without the losses. If there was loss - that party will use smaller 
value for subsequent connections.

I've done a poor man's simulation program (attached) to experiment with this 
idea - it seems more viable in terms of controlling the loss. It assumes a full 
mesh between 10 nodes with nodeX-nodeY backbone capacities potentially unique.

What's more entertaining with this approach is that with sufficiently large 
value of LOSS_THRESH, it relatively converges *regardless* of the initial 
setting for IW.

You can run this code in the following way:

1) Assume access and backbone links can uniformly support IW=10, nonadaptive:

./a.out 10 0 1000 10 11 10 11 | grep round

...
Results from the round: loss: 0, headroom: 0, avg IW size: 10.000000
Results from the round: loss: 0, headroom: 0, avg IW size: 10.000000
Results from the round: loss: 0, headroom: 0, avg IW size: 10.000000
....

2) Assume access and backbone links can uniformly support only IW=5, nonadaptive:

./a.out 10 0 1000 5 6 5 6 | grep round

Results from the round: loss: 500, headroom: 0, avg IW size: 10.000000
Results from the round: loss: 500, headroom: 0, avg IW size: 10.000000
Results from the round: loss: 500, headroom: 0, avg IW size: 10.000000

3) same as (1), adaptive with the LOSS_THRESH=1000 and IW of 4:

./a.out 4 1 1000 10 11 10 11 | grep round

Results from the round: loss: 2, headroom: 0, avg IW size: 10.020000
Results from the round: loss: 1, headroom: 0, avg IW size: 10.010000
Results from the round: loss: 0, headroom: 0, avg IW size: 10.000000
Results from the round: loss: 0, headroom: 0, avg IW size: 10.000000

(loss fluctuates betweeen 0 and 2)

4) same as (2), same adaptive parameters as (3):

./a.out 4 1 1000 5 6 5 6 | grep round

Results from the round: loss: 0, headroom: 0, avg IW size: 5.000000
Results from the round: loss: 1, headroom: 0, avg IW size: 5.010000
Results from the round: loss: 0, headroom: 0, avg IW size: 5.000000
Results from the round: loss: 0, headroom: 0, avg IW size: 5.000000

On to fun stuff with nonuniform distributions:

Assume access links can support 5..8, and backbone 8..15

5) non-adaptive with IW=10:

./a.out 10 0 1000 5 8 8 15 | grep round

Results from the round: loss: 434, headroom: 0, avg IW size: 10.000000
Results from the round: loss: 423, headroom: 0, avg IW size: 10.000000
Results from the round: loss: 415, headroom: 0, avg IW size: 10.000000

6) non-adaptive with IW = 5:

./a.out 5 0 1000 5 8 8 15 | grep round

Results from the round: loss: 0, headroom: 71, avg IW size: 5.000000
Results from the round: loss: 0, headroom: 84, avg IW size: 5.000000
Results from the round: loss: 0, headroom: 66, avg IW size: 5.000000

7) adaptive with IW=10:

./a.out 10 1 1000 5 8 8 15 | grep round

Results from the round: loss: 0, headroom: 12, avg IW size: 5.660000
Results from the round: loss: 0, headroom: 11, avg IW size: 5.670000
Results from the round: loss: 2, headroom: 13, avg IW size: 5.610000
Results from the round: loss: 0, headroom: 26, avg IW size: 5.47000

8) adaptive with IW=5:

./a.out 5 1 1000 5 8 8 15 | grep round

Results from the round: loss: 0, headroom: 11, avg IW size: 5.550000
Results from the round: loss: 0, headroom: 10, avg IW size: 5.670000
Results from the round: loss: 1, headroom: 10, avg IW size: 5.750000
Results from the round: loss: 0, headroom: 3, avg IW size: 5.810000
Results from the round: loss: 2, headroom: 6, avg IW size: 5.800000
Results from the round: loss: 2, headroom: 0, avg IW size: 5.670000

9) (extreme:) - adaptive with IW=100:

./a.out 100 1 1000 5 8 8 15 | grep round

Results from the round: loss: 0, headroom: 8, avg IW size: 5.550000
Results from the round: loss: 0, headroom: 3, avg IW size: 5.600000
Results from the round: loss: 0, headroom: 5, avg IW size: 5.68000


10) "conservative futuristic" - adaptive with IW=5, and access cap between 10 
and 20 and backbone cap between 15 and 35:

./a.out 5 1 1000 10 20 15 35 | grep round

Results from the round: loss: 0, headroom: 54, avg IW size: 14.200000
Results from the round: loss: 0, headroom: 58, avg IW size: 13.750000
Results from the round: loss: 0, headroom: 31, avg IW size: 13.470000
Results from the round: loss: 0, headroom: 53, avg IW size: 13.860000
Results from the round: loss: 0, headroom: 32, avg IW size: 13.590000
Results from the round: loss: 0, headroom: 41, avg IW size: 13.790000
Results from the round: loss: 0, headroom: 62, avg IW size: 14.290000

11) same as (10), "badly converging" - the THRESH being 5:

./a.out 5 1 5 10 20 15 35 | grep round

Results from the round: loss: 20, headroom: 13, avg IW size: 14.140000
Results from the round: loss: 18, headroom: 9, avg IW size: 14.760000
Results from the round: loss: 21, headroom: 4, avg IW size: 14.470000
Results from the round: loss: 18, headroom: 6, avg IW size: 14.420000
Results from the round: loss: 19, headroom: 11, avg IW size: 14.370000
Results from the round: loss: 22, headroom: 11, avg IW size: 14.42000

12) same as (10) with the fixed IW of 10, non-adaptive:

./a.out 10 0 5 10 20 15 35 | grep round

Results from the round: loss: 0, headroom: 398, avg IW size: 10.000000
Results from the round: loss: 0, headroom: 435, avg IW size: 10.000000
Results from the round: loss: 0, headroom: 418, avg IW size: 10.000000
Results from the round: loss: 0, headroom: 422, avg IW size: 10.000000

How to do the signaling during the 3whs is another story - might be either an 
option or some tricks with the TCP window size in 3whs, so far I am not talking 
about it.

Critiques very welcome - as I said, it is a rather non-scientific simulation 
done in half an hour just to test the concept - you'll see some "non-working" 
logic commented out in the code, maybe there is a better approach.

cheers,
andrew

(also, the security side of things on this approach would need to be given a 
thought. There are some interesting attacks lurking with all of the "adaptive" 
logic.)

>
> I'll be glad to write this up if people need a more concrete proposal.
>
> Joe
>
> On 11/18/2010 1:05 PM, Anantha Ramaiah (ananth) wrote:
>> Why can't you do something like this :
>> 
>> - If any one of the TCP connections egressing out of that interface
>> (this can determined in TCP layer) is in the TCP retransmit state (or
>> has experienced congestion in the past xxx secs/mins), then go back to a
>> lower IW for new TCP connections which are using the same output
>> interface..
>> 
>> - When you want to start a connection, use the connection history (Joe's
>> RFC)
>> 
>> Well, there may be some gotchas of this scheme, but you can become
>> conservative when there is some concrete information.
>> 
>> Thanks,
>> -Anantha
>>> 
>>> To be more clear, here's the case:
>>>
>>> 	start 1000 connections in a row.
>>>
>>> 	during the first connection, lose some packets and do
>>> 	normal TCP backoff
>>>
>>> 	so what do the other 999 connections start with?
>>> 	ans: 10 packets
>>> 
>>> The point is that subsequent connections don't do anything different.
>>> If
>>> you have 1000 connections, you're sending a certain amount of data
>> into
>>> the network without reacting. We're tripling that.
>>> 
>>> That can easily cause congestion. At which point the *existing*
>>> connections will backoff, but new connections keep making problems.
>>> 
>>> Joe
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>