Re: [tcpm] Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)

Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Tue, 20 May 2014 07:02 UTC

Return-Path: <ingemar.s.johansson@ericsson.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6CA711A0425 for <tcpm@ietfa.amsl.com>; Tue, 20 May 2014 00:02:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9MF5nV29qS_u for <tcpm@ietfa.amsl.com>; Tue, 20 May 2014 00:02:12 -0700 (PDT)
Received: from sessmg22.ericsson.net (sessmg22.ericsson.net [193.180.251.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B3E621A0285 for <tcpm@ietf.org>; Tue, 20 May 2014 00:02:11 -0700 (PDT)
X-AuditID: c1b4fb3a-f79746d000006fe2-f4-537afdf19671
Received: from ESESSHC011.ericsson.se (Unknown_Domain [153.88.253.124]) by sessmg22.ericsson.net (Symantec Mail Security) with SMTP id 00.14.28642.1FDFA735; Tue, 20 May 2014 09:02:09 +0200 (CEST)
Received: from ESESSMB205.ericsson.se ([169.254.5.196]) by ESESSHC011.ericsson.se ([153.88.183.51]) with mapi id 14.03.0174.001; Tue, 20 May 2014 09:02:08 +0200
From: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
To: "gorry@erg.abdn.ac.uk" <gorry@erg.abdn.ac.uk>, "McAlpine, Gary" <gary.mcalpine@bluecoat.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Thread-Topic: Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)
Thread-Index: Ac9uevrHDRZF5TP/SM+28Qj1oLeHCwB3WJigACkcQIAAvg+zAA==
Date: Tue, 20 May 2014 07:02:08 +0000
Message-ID: <81564C0D7D4D2A4B9A86C8C7404A13DA31F7FD7A@ESESSMB205.ericsson.se>
References: <81564C0D7D4D2A4B9A86C8C7404A13DA31F62CA6@ESESSMB205.ericsson.se> <FD2F17B9B55D72489D521ADC634E4628A44836@pwsvl-excmbx-05.internal.cacheflow.com> <53761743.6090906@erg.abdn.ac.uk>
In-Reply-To: <53761743.6090906@erg.abdn.ac.uk>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [153.88.183.17]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpgkeLIzCtJLcpLzFFi42KZGfG3Rvfj36pgg3fv5S36lkxltnjdNpvR 4lDrTBaLbSfnMzmweLx8cZPVo+fzCyaPJUt+Mnkceh4UwBLFZZOSmpNZllqkb5fAldF18zhT wSOzihN//zI2MC7S7mLk4JAQMJE49Cy4i5ETyBSTuHBvPVsXIxeHkMBRRomLOxaxQjhLGCX6 us+xglSxCdhIrDz0nREkISLQzCgxe3kzE0iCWcBY4mJ3AzuILSwQIXHy2VdGkA0iApESj++m gYRFBJwkXmy5BVbOIqAqca7zGAuIzSvgKzH5+A1miGWHGSXW7f4MVsQpoCdxb9F/ZhCbUUBW 4v73eywQu8Qlbj2ZzwRxtoDEkj3nmSFsUYmXj/+xQnymKLG8Xw6iXE/ixtQpbBC2tsSyha+Z IfYKSpyc+YRlAqPYLCRTZyFpmYWkZRaSlgWMLKsYRYtTi4tz042M9FKLMpOLi/Pz9PJSSzYx AqPs4JbfVjsYDz53PMQowMGoxMO7wK0qWIg1say4MvcQozQHi5I4r48MUEggPbEkNTs1tSC1 KL6oNCe1+BAjEwenVANjQRH/4WVH78m8mah0Tl7YW2+WyEK/lEa38G9HtZwkts66K7/l7SIG j+4WJrae9ka3qTnNX++wbFh7fbfwvKBUoydrT+iYTehx9npx8uWpy/GMUatnvanbnn1Yesu9 DkMmppw1kzoYmo9vPjOnuvpm7YHsBYd2diwPCTum+m6zgRRDIWPNrfOJSizFGYmGWsxFxYkA 2zCPOZMCAAA=
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/NNZRloBYu1-BbW4lR_ZmPV3IFOc
Subject: Re: [tcpm] Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 May 2014 07:02:16 -0000

Thanks for the response sofar

I run a TCP stack that is part of a larger LTE system simulator, I know that it is not a full match of e.g  a Linux TCP stack, I am trying to dig into the Linux TCP code as well as in ns2 code to see if something important is missing that makes my results unreliable, or if this is infact a flaw in how 

The setup:
1Mbps bottleneck, RTT ~35ms
Large FTP transfer (100MB)
No AQM, infinite buffer = bufferbloated network
Fake path change after 10s, packets are dropped for 100ms
	- Note! dropper after bottleneck

The chain of events that I see are 
1) DUPACKs received from from T~10.15 to T~14.2s
2) Fast retransmit at T=10.17s re-arms RTO timer (3.60s)
   2a) CWND restriction. Only one segment transmitted at T=17 the remaining segments transmitted at T = 12.18s due to CWND restriction, but RTO timer not restarted at T=12.18 because of loss recovery state
3. Retransmission timeout at T=13.78s  DUPACK counter and SACK scoreboard is reset
4. DUPACK counter=3 at T=13.81s  fast retransmit, SSThresh set to 2MSS !

I have a few slides that show more detail, I can email them to anyone interested. 
A questions: 
In my code the DUPACK counter and the SACK scoreboard is reset  at retransmission timeout and this is what causes SSThresh to drop to  2MSS in this case. Is this a bug in my code or that it is too simplistic ?, or is it just the way it should be ?. 

I am thankful for any pointers to a more comprehensible documentation on how e.g a Linux TCP stack behaves.

Now I believe that bufferbloat is at least partly to blame for this, but I am not yet convinced that it is the full story.

Regards
/Ingemar



> -----Original Message-----
> From: Gorry Fairhurst [mailto:gorry@erg.abdn.ac.uk]
> Sent: den 16 maj 2014 15:49
> To: McAlpine, Gary; Ingemar Johansson S; tcpm@ietf.org
> Cc: Karen E. E. Nielsen
> Subject: Re: Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-
> newcwv-03.txt)
> 
> Yes - this seems like an oversight in the way ssthresh was conceived.
> The value is intended to capture congestion history, and hence it's nice to
> cache this for new flows to  prevent them overshooting the bottleneck
> capacity. I can think of 3 things where current ssthresh methods give
> unwarranted poor performance:
> 
>   - path changes, presumably uncommon.
>   - historic information (an event hours ago has little bearing on a long-lived
> connection and is a real impediment when the RTT is large).
>   - non-congestion loss (especially when FS was small)
> 
> Gorry
> 
> On 15/05/2014 18:21, McAlpine, Gary wrote:
> > Hi Ingemar,
> >
> > I'm not sure what the rational was for dropping ssthresh to 2*MSS, but it
> seems to me that there are too many non-congestion-related events that
> can cause this extreme setting of ssthresh. Once ssthresh gets set so low,
> the real problem we have seen is recovery on long-lived connections or
> where an ssthresh-host-cache is in use. In these cases, the current
> congestion RFCs don't provide a specific mechanism for ssthresh to recover
> to a level that represents the actual congestion level. So what we have seen
> are connections between a particular client and server go to very low
> throughput and not recover for a very long time. In fact, the traffic between
> the client and server may be such that it can never recover until the
> connection is dropped (in the case of a long-lived connections) or the cached
> ssthresh is reset.
> >
> > To allow our customers to avoid this problem, we have provided two
> mechanisms in our software:
> >
> >
> > 1.       They can disable ssthresh-host-cache so that new connections will
> always restart ssthresh.
> >
> > 2.       Since RFC 5681 paragraph 4.1 is silent on what to do with ssthresh
> when restarting idle connections, we assume ssthresh is no longer valid after
> a sufficiently long idle period. Given the next transfer is going to perform a
> slow-start that will (essentially) search for the appropriate cwnd level and
> restart the ack clock,  we also restart ssthresh so that the appropriate cwnd
> level can be found.
> >
> > These mechanisms seem to work quite well and we haven't seen any cases
> where they have caused other problems, but I would be much happier if the
> congestion RFCs were not so silent on what to do with ssthresh to recover
> from cases where ssthresh gets set to an inappropriately low level.
> >
> > Thanks,
> > Gary
> >
> >
> >
> > From: Ingemar Johansson S [mailto:ingemar.s.johansson@ericsson.com]
> > Sent: Tuesday, May 13, 2014 1:15 AM
> > To: tcpm@ietf.org
> > Cc: Karen E. E. Nielsen; gorry@erg.abdn.ac.uk; McAlpine, Gary
> > Subject: Problem with Low SSThresh (was I-D Action:
> > draft-ietf-tcpm-newcwv-03.txt)
> >
> > Hi
> >
> > Karen pointed out this thread to me
> > http://www.ietf.org/mail-archive/web/tcpm/current/msg08315.html
> > I started to look closer at this issue quite recently, the problem I have is
> that SSthresh can drop to very low values after only a few lost packets. I have
> seen this odd effect earlier but have not bothered with it.
> >
> >
> > The experiment is to run a large FTP transfer over a 1Mbps bottleneck with
> min RTT = 40ms. No AQM or tail drop queue enabled i.e a buffer bloated
> scenario.
> >
> > TCP NewReno. After 10s I "pull the plug" for 100ms (100% packet drop), this
> leads to 5 lost segments. at T =10.1s packets are forwarded as usual.
> >
> > What I have seen is that a retransmission timeout is immediately followed
> by a loss event, the effect of which is that SSThresh goes down to 2 MSS.  It
> seems to me that the RTO timer value is too low, I have not understood the
> effect completely though. Could the RTO timer be the culprit or is there
> some other effect ?.
> >
> > I am running these experiments in a proprietary LTE system simulator, we
> try to keep it up to date to match the Linux TCP stack reasonably well, it
> cannot be ruled out however that our implementation miss some important
> feature.
> >
> > /Ingemar
> >
> > =================================
> > Ingemar Johansson  M.Sc.
> > Senior Researcher
> >
> > Ericsson AB
> > Wireless Access Networks
> > Labratoriegränd 11
> > 971 28, Luleå, Sweden
> > Phone +46-1071 43042
> > SMS/MMS +46-73 078 3289
> >
> ingemar.s.johansson@ericsson.com<mailto:ingemar.s.johansson@ericsson.c
> > om>
> > www.ericsson.com
> >
> > "Those are my principles, and if you don't like them...
> > well, I have others."  Groucho
> >
> Marx<http://www.brainyquote.com/quotes/authors/g/groucho_marx.html
> >
> > =================================
> >
> >
> >